speaker identification based: Topics by Science.gov

Sample records for speaker identification based

Unsupervised real-time speaker identification for daily movies

NASA Astrophysics Data System (ADS)

Li, Ying; Kuo, C.-C. Jay

2002-07-01

The problem of identifying speakers for movie content analysis is addressed in this paper. While most previous work on speaker identification was carried out in a supervised mode using pure audio data, more robust results can be obtained in real-time by integrating knowledge from multiple media sources in an unsupervised mode. In this work, both audio and visual cues will be employed and subsequently combined in a probabilistic framework to identify speakers. Particularly, audio information is used to identify speakers with a maximum likelihood (ML)-based approach while visual information is adopted to distinguish speakers by detecting and recognizing their talking faces based on face detection/recognition and mouth tracking techniques. Moreover, to accommodate for speakers' acoustic variations along time, we update their models on the fly by adapting to their newly contributed speech data. Encouraging results have been achieved through extensive experiments, which shows a promising future of the proposed audiovisual-based unsupervised speaker identification system.
Evaluation of speaker de-identification based on voice gender and age conversion

NASA Astrophysics Data System (ADS)

Přibil, Jiří; Přibilová, Anna; Matoušek, Jindřich

2018-03-01

Two basic tasks are covered in this paper. The first one consists in the design and practical testing of a new method for voice de-identification that changes the apparent age and/or gender of a speaker by multi-segmental frequency scale transformation combined with prosody modification. The second task is aimed at verification of applicability of a classifier based on Gaussian mixture models (GMM) to detect the original Czech and Slovak speakers after applied voice deidentification. The performed experiments confirm functionality of the developed gender and age conversion for all selected types of de-identification which can be objectively evaluated by the GMM-based open-set classifier. The original speaker detection accuracy was compared also for sentences uttered by German and English speakers showing language independence of the proposed method.
Open-set speaker identification with diverse-duration speech data

NASA Astrophysics Data System (ADS)

Karadaghi, Rawande; Hertlein, Heinz; Ariyaeeinia, Aladdin

2015-05-01

The concern in this paper is an important category of applications of open-set speaker identification in criminal investigation, which involves operating with short and varied duration speech. The study presents investigations into the adverse effects of such an operating condition on the accuracy of open-set speaker identification, based on both GMMUBM and i-vector approaches. The experiments are conducted using a protocol developed for the identification task, based on the NIST speaker recognition evaluation corpus of 2008. In order to closely cover the real-world operating conditions in the considered application area, the study includes experiments with various combinations of training and testing data duration. The paper details the characteristics of the experimental investigations conducted and provides a thorough analysis of the results obtained.
Advancements in robust algorithm formulation for speaker identification of whispered speech

NASA Astrophysics Data System (ADS)

Fan, Xing

Whispered speech is an alternative speech production mode from neutral speech, which is used by talkers intentionally in natural conversational scenarios to protect privacy and to avoid certain content from being overheard/made public. Due to the profound differences between whispered and neutral speech in production mechanism and the absence of whispered adaptation data, the performance of speaker identification systems trained with neutral speech degrades significantly. This dissertation therefore focuses on developing a robust closed-set speaker recognition system for whispered speech by using no or limited whispered adaptation data from non-target speakers. This dissertation proposes the concept of "High''/"Low'' performance whispered data for the purpose of speaker identification. A variety of acoustic properties are identified that contribute to the quality of whispered data. An acoustic analysis is also conducted to compare the phoneme/speaker dependency of the differences between whispered and neutral data in the feature domain. The observations from those acoustic analysis are new in this area and also serve as a guidance for developing robust speaker identification systems for whispered speech. This dissertation further proposes two systems for speaker identification of whispered speech. One system focuses on front-end processing. A two-dimensional feature space is proposed to search for "Low''-quality performance based whispered utterances and separate feature mapping functions are applied to vowels and consonants respectively in order to retain the speaker's information shared between whispered and neutral speech. The other system focuses on speech-mode-independent model training. The proposed method generates pseudo whispered features from neutral features by using the statistical information contained in a whispered Universal Background model (UBM) trained from extra collected whispered data from non-target speakers. Four modeling methods are proposed for the transformation estimation in order to generate the pseudo whispered features. Both of the above two systems demonstrate a significant improvement over the baseline system on the evaluation data. This dissertation has therefore contributed to providing a scientific understanding of the differences between whispered and neutral speech as well as improved front-end processing and modeling method for speaker identification of whispered speech. Such advancements will ultimately contribute to improve the robustness of speech processing systems.
Analysis of human scream and its impact on text-independent speaker verification.

PubMed

Hansen, John H L; Nandwana, Mahesh Kumar; Shokouhi, Navid

2017-04-01

Scream is defined as sustained, high-energy vocalizations that lack phonological structure. Lack of phonological structure is how scream is identified from other forms of loud vocalization, such as "yell." This study investigates the acoustic aspects of screams and addresses those that are known to prevent standard speaker identification systems from recognizing the identity of screaming speakers. It is well established that speaker variability due to changes in vocal effort and Lombard effect contribute to degraded performance in automatic speech systems (i.e., speech recognition, speaker identification, diarization, etc.). However, previous research in the general area of speaker variability has concentrated on human speech production, whereas less is known about non-speech vocalizations. The UT-NonSpeech corpus is developed here to investigate speaker verification from scream samples. This study considers a detailed analysis in terms of fundamental frequency, spectral peak shift, frame energy distribution, and spectral tilt. It is shown that traditional speaker recognition based on the Gaussian mixture models-universal background model framework is unreliable when evaluated with screams.
Discriminative analysis of lip motion features for speaker identification and speech-reading.

PubMed

Cetingül, H Ertan; Yemez, Yücel; Erzin, Engin; Tekalp, A Murat

2006-10-01

There have been several studies that jointly use audio, lip intensity, and lip geometry information for speaker identification and speech-reading applications. This paper proposes using explicit lip motion information, instead of or in addition to lip intensity and/or geometry information, for speaker identification and speech-reading within a unified feature selection and discrimination analysis framework, and addresses two important issues: 1) Is using explicit lip motion information useful, and, 2) if so, what are the best lip motion features for these two applications? The best lip motion features for speaker identification are considered to be those that result in the highest discrimination of individual speakers in a population, whereas for speech-reading, the best features are those providing the highest phoneme/word/phrase recognition rate. Several lip motion feature candidates have been considered including dense motion features within a bounding box about the lip, lip contour motion features, and combination of these with lip shape features. Furthermore, a novel two-stage, spatial, and temporal discrimination analysis is introduced to select the best lip motion features for speaker identification and speech-reading applications. Experimental results using an hidden-Markov-model-based recognition system indicate that using explicit lip motion information provides additional performance gains in both applications, and lip motion features prove more valuable in the case of speech-reading application.
Single-Word Intelligibility in Speakers with Repaired Cleft Palate

ERIC Educational Resources Information Center

Whitehill, Tara; Chau, Cynthia

2004-01-01

Many speakers with repaired cleft palate have reduced intelligibility, but there are limitations with current procedures for assessing intelligibility. The aim of this study was to construct a single-word intelligibility test for speakers with cleft palate. The test used a multiple-choice identification format, and was based on phonetic contrasts…
Performance enhancement for audio-visual speaker identification using dynamic facial muscle model.

PubMed

Asadpour, Vahid; Towhidkhah, Farzad; Homayounpour, Mohammad Mehdi

2006-10-01

Science of human identification using physiological characteristics or biometry has been of great concern in security systems. However, robust multimodal identification systems based on audio-visual information has not been thoroughly investigated yet. Therefore, the aim of this work to propose a model-based feature extraction method which employs physiological characteristics of facial muscles producing lip movements. This approach adopts the intrinsic properties of muscles such as viscosity, elasticity, and mass which are extracted from the dynamic lip model. These parameters are exclusively dependent on the neuro-muscular properties of speaker; consequently, imitation of valid speakers could be reduced to a large extent. These parameters are applied to a hidden Markov model (HMM) audio-visual identification system. In this work, a combination of audio and video features has been employed by adopting a multistream pseudo-synchronized HMM training method. Noise robust audio features such as Mel-frequency cepstral coefficients (MFCC), spectral subtraction (SS), and relative spectra perceptual linear prediction (J-RASTA-PLP) have been used to evaluate the performance of the multimodal system once efficient audio feature extraction methods have been utilized. The superior performance of the proposed system is demonstrated on a large multispeaker database of continuously spoken digits, along with a sentence that is phonetically rich. To evaluate the robustness of algorithms, some experiments were performed on genetically identical twins. Furthermore, changes in speaker voice were simulated with drug inhalation tests. In 3 dB signal to noise ratio (SNR), the dynamic muscle model improved the identification rate of the audio-visual system from 91 to 98%. Results on identical twins revealed that there was an apparent improvement on the performance for the dynamic muscle model-based system, in which the identification rate of the audio-visual system was enhanced from 87 to 96%.
Identification and tracking of particular speaker in noisy environment

NASA Astrophysics Data System (ADS)

Sawada, Hideyuki; Ohkado, Minoru

2004-10-01

Human is able to exchange information smoothly using voice under different situations such as noisy environment in a crowd and with the existence of plural speakers. We are able to detect the position of a source sound in 3D space, extract a particular sound from mixed sounds, and recognize who is talking. By realizing this mechanism with a computer, new applications will be presented for recording a sound with high quality by reducing noise, presenting a clarified sound, and realizing a microphone-free speech recognition by extracting particular sound. The paper will introduce a realtime detection and identification of particular speaker in noisy environment using a microphone array based on the location of a speaker and the individual voice characteristics. The study will be applied to develop an adaptive auditory system of a mobile robot which collaborates with a factory worker.
Evaluation of a speaker identification system with and without fusion using three databases in the presence of noise and handset effects

NASA Astrophysics Data System (ADS)

S. Al-Kaltakchi, Musab T.; Woo, Wai L.; Dlay, Satnam; Chambers, Jonathon A.

2017-12-01

In this study, a speaker identification system is considered consisting of a feature extraction stage which utilizes both power normalized cepstral coefficients (PNCCs) and Mel frequency cepstral coefficients (MFCC). Normalization is applied by employing cepstral mean and variance normalization (CMVN) and feature warping (FW), together with acoustic modeling using a Gaussian mixture model-universal background model (GMM-UBM). The main contributions are comprehensive evaluations of the effect of both additive white Gaussian noise (AWGN) and non-stationary noise (NSN) (with and without a G.712 type handset) upon identification performance. In particular, three NSN types with varying signal to noise ratios (SNRs) were tested corresponding to street traffic, a bus interior, and a crowded talking environment. The performance evaluation also considered the effect of late fusion techniques based on score fusion, namely, mean, maximum, and linear weighted sum fusion. The databases employed were TIMIT, SITW, and NIST 2008; and 120 speakers were selected from each database to yield 3600 speech utterances. As recommendations from the study, mean fusion is found to yield overall best performance in terms of speaker identification accuracy (SIA) with noisy speech, whereas linear weighted sum fusion is overall best for original database recordings.
Speaker Recognition by Combining MFCC and Phase Information in Noisy Conditions

NASA Astrophysics Data System (ADS)

Wang, Longbiao; Minami, Kazue; Yamamoto, Kazumasa; Nakagawa, Seiichi

In this paper, we investigate the effectiveness of phase for speaker recognition in noisy conditions and combine the phase information with mel-frequency cepstral coefficients (MFCCs). To date, almost speaker recognition methods are based on MFCCs even in noisy conditions. For MFCCs which dominantly capture vocal tract information, only the magnitude of the Fourier Transform of time-domain speech frames is used and phase information has been ignored. High complement of the phase information and MFCCs is expected because the phase information includes rich voice source information. Furthermore, some researches have reported that phase based feature was robust to noise. In our previous study, a phase information extraction method that normalizes the change variation in the phase depending on the clipping position of the input speech was proposed, and the performance of the combination of the phase information and MFCCs was remarkably better than that of MFCCs. In this paper, we evaluate the robustness of the proposed phase information for speaker identification in noisy conditions. Spectral subtraction, a method skipping frames with low energy/Signal-to-Noise (SN) and noisy speech training models are used to analyze the effect of the phase information and MFCCs in noisy conditions. The NTT database and the JNAS (Japanese Newspaper Article Sentences) database added with stationary/non-stationary noise were used to evaluate our proposed method. MFCCs outperformed the phase information for clean speech. On the other hand, the degradation of the phase information was significantly smaller than that of MFCCs for noisy speech. The individual result of the phase information was even better than that of MFCCs in many cases by clean speech training models. By deleting unreliable frames (frames having low energy/SN), the speaker identification performance was improved significantly. By integrating the phase information with MFCCs, the speaker identification error reduction rate was about 30%-60% compared with the standard MFCC-based method.
Speaker identification for the improvement of the security communication between law enforcement units

NASA Astrophysics Data System (ADS)

Tovarek, Jaromir; Partila, Pavol

2017-05-01

This article discusses the speaker identification for the improvement of the security communication between law enforcement units. The main task of this research was to develop the text-independent speaker identification system which can be used for real-time recognition. This system is designed for identification in the open set. It means that the unknown speaker can be anyone. Communication itself is secured, but we have to check the authorization of the communication parties. We have to decide if the unknown speaker is the authorized for the given action. The calls are recorded by IP telephony server and then these recordings are evaluate using classification If the system evaluates that the speaker is not authorized, it sends a warning message to the administrator. This message can detect, for example a stolen phone or other unusual situation. The administrator then performs the appropriate actions. Our novel proposal system uses multilayer neural network for classification and it consists of three layers (input layer, hidden layer, and output layer). A number of neurons in input layer corresponds with the length of speech features. Output layer then represents classified speakers. Artificial Neural Network classifies speech signal frame by frame, but the final decision is done over the complete record. This rule substantially increases accuracy of the classification. Input data for the neural network are a thirteen Mel-frequency cepstral coefficients, which describe the behavior of the vocal tract. These parameters are the most used for speaker recognition. Parameters for training, testing and validation were extracted from recordings of authorized users. Recording conditions for training data correspond with the real traffic of the system (sampling frequency, bit rate). The main benefit of the research is the system developed for text-independent speaker identification which is applied to secure communication between law enforcement units.
Optimization of multilayer neural network parameters for speaker recognition

NASA Astrophysics Data System (ADS)

Tovarek, Jaromir; Partila, Pavol; Rozhon, Jan; Voznak, Miroslav; Skapa, Jan; Uhrin, Dominik; Chmelikova, Zdenka

2016-05-01

This article discusses the impact of multilayer neural network parameters for speaker identification. The main task of speaker identification is to find a specific person in the known set of speakers. It means that the voice of an unknown speaker (wanted person) belongs to a group of reference speakers from the voice database. One of the requests was to develop the text-independent system, which means to classify wanted person regardless of content and language. Multilayer neural network has been used for speaker identification in this research. Artificial neural network (ANN) needs to set parameters like activation function of neurons, steepness of activation functions, learning rate, the maximum number of iterations and a number of neurons in the hidden and output layers. ANN accuracy and validation time are directly influenced by the parameter settings. Different roles require different settings. Identification accuracy and ANN validation time were evaluated with the same input data but different parameter settings. The goal was to find parameters for the neural network with the highest precision and shortest validation time. Input data of neural networks are a Mel-frequency cepstral coefficients (MFCC). These parameters describe the properties of the vocal tract. Audio samples were recorded for all speakers in a laboratory environment. Training, testing and validation data set were split into 70, 15 and 15 %. The result of the research described in this article is different parameter setting for the multilayer neural network for four speakers.
INTERPOL survey of the use of speaker identification by law enforcement agencies.

PubMed

Morrison, Geoffrey Stewart; Sahito, Farhan Hyder; Jardine, Gaëlle; Djokic, Djordje; Clavet, Sophie; Berghs, Sabine; Goemans Dorny, Caroline

2016-06-01

A survey was conducted of the use of speaker identification by law enforcement agencies around the world. A questionnaire was circulated to law enforcement agencies in the 190 member countries of INTERPOL. 91 responses were received from 69 countries. 44 respondents reported that they had speaker identification capabilities in house or via external laboratories. Half of these came from Europe. 28 respondents reported that they had databases of audio recordings of speakers. The clearest pattern in the responses was that of diversity. A variety of different approaches to speaker identification were used: The human-supervised-automatic approach was the most popular in North America, the auditory-acoustic-phonetic approach was the most popular in Europe, and the spectrographic/auditory-spectrographic approach was the most popular in Africa, Asia, the Middle East, and South and Central America. Globally, and in Europe, the most popular framework for reporting conclusions was identification/exclusion/inconclusive. In Europe, the second most popular framework was the use of verbal likelihood ratio scales. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Perceptual Detection of Subtle Dysphonic Traits in Individuals with Cervical Spinal Cord Injury Using an Audience Response Systems Approach.

PubMed

Johansson, Kerstin; Strömbergsson, Sofia; Robieux, Camille; McAllister, Anita

2017-01-01

Reduced respiratory function following lower cervical spinal cord injuries (CSCIs) may indirectly result in vocal dysfunction. Although self-reports indicate voice change and limitations following CSCI, earlier efforts using global perceptual ratings to distinguish speakers with CSCI from noninjured speakers have not been very successful. We investigate the use of an audience response system-based approach to distinguish speakers with CSCI from noninjured speakers, and explore whether specific vocal traits can be identified as characteristic for speakers with CSCI. Fourteen speech-language pathologists participated in a web-based perceptual task, where their overt reactions to vocal dysfunction were registered during the continuous playback of recordings of 36 speakers (18 with CSCI, and 18 matched controls). Dysphonic events were identified through manual perceptual analysis, to allow the exploration of connections between dysphonic events and listener reactions. More dysphonic events, and more listener reactions, were registered for speakers with CSCI than for noninjured speakers. Strain (particularly in phrase-final position) and creak (particularly in nonphrase-final position) distinguish speakers with CSCI from noninjured speakers. For the identification of intermittent and subtle signs of vocal dysfunction, an approach where the temporal distribution of symptoms is registered offers a viable means to distinguish speakers affected by voice dysfunction from non-affected speakers. In speakers with CSCI, clinicians should listen for presence of final strain and nonfinal creak, and pay attention to self-reported voice function and voice problems, to identify individuals in need for clinical assessment and intervention. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Recognition of speaker-dependent continuous speech with KEAL

NASA Astrophysics Data System (ADS)

Mercier, G.; Bigorgne, D.; Miclet, L.; Le Guennec, L.; Querre, M.

1989-04-01

A description of the speaker-dependent continuous speech recognition system KEAL is given. An unknown utterance, is recognized by means of the followng procedures: acoustic analysis, phonetic segmentation and identification, word and sentence analysis. The combination of feature-based, speaker-independent coarse phonetic segmentation with speaker-dependent statistical classification techniques is one of the main design features of the acoustic-phonetic decoder. The lexical access component is essentially based on a statistical dynamic programming technique which aims at matching a phonemic lexical entry containing various phonological forms, against a phonetic lattice. Sentence recognition is achieved by use of a context-free grammar and a parsing algorithm derived from Earley's parser. A speaker adaptation module allows some of the system parameters to be adjusted by matching known utterances with their acoustical representation. The task to be performed, described by its vocabulary and its grammar, is given as a parameter of the system. Continuously spoken sentences extracted from a 'pseudo-Logo' language are analyzed and results are presented.
"Who" is saying "what"? Brain-based decoding of human voice and speech.

PubMed

Formisano, Elia; De Martino, Federico; Bonte, Milene; Goebel, Rainer

2008-11-07

Can we decipher speech content ("what" is being said) and speaker identity ("who" is saying it) from observations of brain activity of a listener? Here, we combine functional magnetic resonance imaging with a data-mining algorithm and retrieve what and whom a person is listening to from the neural fingerprints that speech and voice signals elicit in the listener's auditory cortex. These cortical fingerprints are spatially distributed and insensitive to acoustic variations of the input so as to permit the brain-based recognition of learned speech from unknown speakers and of learned voices from previously unheard utterances. Our findings unravel the detailed cortical layout and computational properties of the neural populations at the basis of human speech recognition and speaker identification.
Cost-sensitive learning for emotion robust speaker recognition.

PubMed

Li, Dongdong; Yang, Yingchun; Dai, Weihui

2014-01-01

In the field of information security, voice is one of the most important parts in biometrics. Especially, with the development of voice communication through the Internet or telephone system, huge voice data resources are accessed. In speaker recognition, voiceprint can be applied as the unique password for the user to prove his/her identity. However, speech with various emotions can cause an unacceptably high error rate and aggravate the performance of speaker recognition system. This paper deals with this problem by introducing a cost-sensitive learning technology to reweight the probability of test affective utterances in the pitch envelop level, which can enhance the robustness in emotion-dependent speaker recognition effectively. Based on that technology, a new architecture of recognition system as well as its components is proposed in this paper. The experiment conducted on the Mandarin Affective Speech Corpus shows that an improvement of 8% identification rate over the traditional speaker recognition is achieved.
Cost-Sensitive Learning for Emotion Robust Speaker Recognition

PubMed Central

Li, Dongdong; Yang, Yingchun

2014-01-01

In the field of information security, voice is one of the most important parts in biometrics. Especially, with the development of voice communication through the Internet or telephone system, huge voice data resources are accessed. In speaker recognition, voiceprint can be applied as the unique password for the user to prove his/her identity. However, speech with various emotions can cause an unacceptably high error rate and aggravate the performance of speaker recognition system. This paper deals with this problem by introducing a cost-sensitive learning technology to reweight the probability of test affective utterances in the pitch envelop level, which can enhance the robustness in emotion-dependent speaker recognition effectively. Based on that technology, a new architecture of recognition system as well as its components is proposed in this paper. The experiment conducted on the Mandarin Affective Speech Corpus shows that an improvement of 8% identification rate over the traditional speaker recognition is achieved. PMID:24999492
Noise Reduction with Microphone Arrays for Speaker Identification

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cohen, Z

Reducing acoustic noise in audio recordings is an ongoing problem that plagues many applications. This noise is hard to reduce because of interfering sources and non-stationary behavior of the overall background noise. Many single channel noise reduction algorithms exist but are limited in that the more the noise is reduced; the more the signal of interest is distorted due to the fact that the signal and noise overlap in frequency. Specifically acoustic background noise causes problems in the area of speaker identification. Recording a speaker in the presence of acoustic noise ultimately limits the performance and confidence of speaker identificationmore » algorithms. In situations where it is impossible to control the environment where the speech sample is taken, noise reduction filtering algorithms need to be developed to clean the recorded speech of background noise. Because single channel noise reduction algorithms would distort the speech signal, the overall challenge of this project was to see if spatial information provided by microphone arrays could be exploited to aid in speaker identification. The goals are: (1) Test the feasibility of using microphone arrays to reduce background noise in speech recordings; (2) Characterize and compare different multichannel noise reduction algorithms; (3) Provide recommendations for using these multichannel algorithms; and (4) Ultimately answer the question - Can the use of microphone arrays aid in speaker identification?« less

Using Avatars for Improving Speaker Identification in Captioning

NASA Astrophysics Data System (ADS)

Vy, Quoc V.; Fels, Deborah I.

Captioning is the main method for accessing television and film content by people who are deaf or hard-of-hearing. One major difficulty consistently identified by the community is that of knowing who is speaking particularly for an off screen narrator. A captioning system was created using a participatory design method to improve speaker identification. The final prototype contained avatars and a coloured border for identifying specific speakers. Evaluation results were very positive; however participants also wanted to customize various components such as caption and avatar location.
The impact of compression of speech signal, background noise and acoustic disturbances on the effectiveness of speaker identification

NASA Astrophysics Data System (ADS)

Kamiński, K.; Dobrowolski, A. P.

2017-04-01

The paper presents the architecture and the results of optimization of selected elements of the Automatic Speaker Recognition (ASR) system that uses Gaussian Mixture Models (GMM) in the classification process. Optimization was performed on the process of selection of individual characteristics using the genetic algorithm and the parameters of Gaussian distributions used to describe individual voices. The system that was developed was tested in order to evaluate the impact of different compression methods used, among others, in landline, mobile, and VoIP telephony systems, on effectiveness of the speaker identification. Also, the results were presented of effectiveness of speaker identification at specific levels of noise with the speech signal and occurrence of other disturbances that could appear during phone calls, which made it possible to specify the spectrum of applications of the presented ASR system.
Intonation contrast in Cantonese speakers with hypokinetic dysarthria associated with Parkinson's disease.

PubMed

Ma, Joan K-Y; Whitehill, Tara L; So, Susanne Y-S

2010-08-01

Speech produced by individuals with hypokinetic dysarthria associated with Parkinson's disease (PD) is characterized by a number of features including impaired speech prosody. The purpose of this study was to investigate intonation contrasts produced by this group of speakers. Speech materials with a question-statement contrast were collected from 14 Cantonese speakers with PD. Twenty listeners then classified the productions as either questions or statements. Acoustic analyses of F0, duration, and intensity were conducted to determine which acoustic cues distinguished the production of questions from statements, and which cues appeared to be exploited by listeners in identifying intonational contrasts. The results show that listeners identified statements with a high degree of accuracy, but the accuracy of question identification ranged from 0.56% to 96% across the 14 speakers. The speakers with PD used similar acoustic cues as nondysarthric Cantonese speakers to mark the question-statement contrast, although the contrasts were not observed in all speakers. Listeners mainly used F0 cues at the final syllable for intonation identification. These data contribute to the researchers' understanding of intonation marking in speakers with PD, with specific application to the production and perception of intonation in a lexical tone language.
A language-familiarity effect for speaker discrimination without comprehension.

PubMed

Fleming, David; Giordano, Bruno L; Caldara, Roberto; Belin, Pascal

2014-09-23

The influence of language familiarity upon speaker identification is well established, to such an extent that it has been argued that "Human voice recognition depends on language ability" [Perrachione TK, Del Tufo SN, Gabrieli JDE (2011) Science 333(6042):595]. However, 7-mo-old infants discriminate speakers of their mother tongue better than they do foreign speakers [Johnson EK, Westrek E, Nazzi T, Cutler A (2011) Dev Sci 14(5):1002-1011] despite their limited speech comprehension abilities, suggesting that speaker discrimination may rely on familiarity with the sound structure of one's native language rather than the ability to comprehend speech. To test this hypothesis, we asked Chinese and English adult participants to rate speaker dissimilarity in pairs of sentences in English or Mandarin that were first time-reversed to render them unintelligible. Even in these conditions a language-familiarity effect was observed: Both Chinese and English listeners rated pairs of native-language speakers as more dissimilar than foreign-language speakers, despite their inability to understand the material. Our data indicate that the language familiarity effect is not based on comprehension but rather on familiarity with the phonology of one's native language. This effect may stem from a mechanism analogous to the "other-race" effect in face recognition.
The Role of Speaker Identification in Korean University Students' Attitudes towards Five Varieties of English

ERIC Educational Resources Information Center

Yook, Cheongmin; Lindemann, Stephanie

2013-01-01

This study investigates how the attitudes of 60 Korean university students towards five varieties of English are affected by the identification of the speaker's nationality and ethnicity. The study employed both a verbal guise technique and questions eliciting overt beliefs and preferences related to learning English. While the majority of the…
Speaker gender identification based on majority vote classifiers

NASA Astrophysics Data System (ADS)

Mezghani, Eya; Charfeddine, Maha; Nicolas, Henri; Ben Amar, Chokri

2017-03-01

Speaker gender identification is considered among the most important tools in several multimedia applications namely in automatic speech recognition, interactive voice response systems and audio browsing systems. Gender identification systems performance is closely linked to the selected feature set and the employed classification model. Typical techniques are based on selecting the best performing classification method or searching optimum tuning of one classifier parameters through experimentation. In this paper, we consider a relevant and rich set of features involving pitch, MFCCs as well as other temporal and frequency-domain descriptors. Five classification models including decision tree, discriminant analysis, nave Bayes, support vector machine and k-nearest neighbor was experimented. The three best perming classifiers among the five ones will contribute by majority voting between their scores. Experimentations were performed on three different datasets spoken in three languages: English, German and Arabic in order to validate language independency of the proposed scheme. Results confirm that the presented system has reached a satisfying accuracy rate and promising classification performance thanks to the discriminating abilities and diversity of the used features combined with mid-level statistics.
Sensing of Particular Speakers for the Construction of Voice Interface Utilized in Noisy Environment

NASA Astrophysics Data System (ADS)

Sawada, Hideyuki; Ohkado, Minoru

Human is able to exchange information smoothly using voice under different situations such as noisy environment in a crowd and with the existence of plural speakers. We are able to detect the position of a source sound in 3D space, extract a particular sound from mixed sounds, and recognize who is talking. By realizing this mechanism with a computer, new applications will be presented for recording a sound with high quality by reducing noise, presenting a clarified sound, and realizing a microphone-free speech recognition by extracting particular sound. The paper will introduce a realtime detection and identification of particular speaker in noisy environment using a microphone array based on the location of a speaker and the individual voice characteristics. The study will be applied to develop an adaptive auditory system of a mobile robot which collaborates with a factory worker.
Towards the identification of Idiopathic Parkinson’s Disease from the speech. New articulatory kinetic biomarkers

PubMed Central

Shattuck-Hufnagel, S.; Choi, J. Y.; Moro-Velázquez, L.; Gómez-García, J. A.

2017-01-01

Although a large amount of acoustic indicators have already been proposed in the literature to evaluate the hypokinetic dysarthria of people with Parkinson’s Disease, the goal of this work is to identify and interpret new reliable and complementary articulatory biomarkers that could be applied to predict/evaluate Parkinson’s Disease from a diadochokinetic test, contributing to the possibility of a further multidimensional analysis of the speech of parkinsonian patients. The new biomarkers proposed are based on the kinetic behaviour of the envelope trace, which is directly linked with the articulatory dysfunctions introduced by the disease since the early stages. The interest of these new articulatory indicators stands on their easiness of identification and interpretation, and their potential to be translated into computer based automatic methods to screen the disease from the speech. Throughout this paper, the accuracy provided by these acoustic kinetic biomarkers is compared with the one obtained with a baseline system based on speaker identification techniques. Results show accuracies around 85% that are in line with those obtained with the complex state of the art speaker recognition techniques, but with an easier physical interpretation, which open the possibility to be transferred to a clinical setting. PMID:29240814
Effects of various electrode configurations on music perception, intonation and speaker gender identification.

PubMed

Landwehr, Markus; Fürstenberg, Dirk; Walger, Martin; von Wedel, Hasso; Meister, Hartmut

2014-01-01

Advances in speech coding strategies and electrode array designs for cochlear implants (CIs) predominantly aim at improving speech perception. Current efforts are also directed at transmitting appropriate cues of the fundamental frequency (F0) to the auditory nerve with respect to speech quality, prosody, and music perception. The aim of this study was to examine the effects of various electrode configurations and coding strategies on speech intonation identification, speaker gender identification, and music quality rating. In six MED-EL CI users electrodes were selectively deactivated in order to simulate different insertion depths and inter-electrode distances when using the high definition continuous interleaved sampling (HDCIS) and fine structure processing (FSP) speech coding strategies. Identification of intonation and speaker gender was determined and music quality rating was assessed. For intonation identification HDCIS was robust against the different electrode configurations, whereas fine structure processing showed significantly worse results when a short electrode depth was simulated. In contrast, speaker gender recognition was not affected by electrode configuration or speech coding strategy. Music quality rating was sensitive to electrode configuration. In conclusion, the three experiments revealed different outcomes, even though they all addressed the reception of F0 cues. Rapid changes in F0, as seen with intonation, were the most sensitive to electrode configurations and coding strategies. In contrast, electrode configurations and coding strategies did not show large effects when F0 information was available over a longer time period, as seen with speaker gender. Music quality relies on additional spectral cues other than F0, and was poorest when a shallow insertion was simulated.
Shibboleth: An Automated Foreign Accent Identification Program

ERIC Educational Resources Information Center

Frost, Wende

2013-01-01

The speech of non-native (L2) speakers of a language contains phonological rules that differentiate them from native speakers. These phonological rules characterize or distinguish accents in an L2. The Shibboleth program creates combinatorial rule-sets to describe the phonological pattern of these accents and classifies L2 speakers into their…
An Acoustic and Social Dialect Analysis of Perceptual Variables in Listener Identification and Rating of Negro Speakers. Final Report.

ERIC Educational Resources Information Center

Bryden, James D.

The purpose of this study was to specify variables which function significantly in the racial identification and speech quality rating of Negro and white speakers by Negro and white listeners. Ninety-one adults served as subjects for the speech task; 86 of these subjects, 43 Negro and 43 white, provided the listener responses. Subjects were chosen…
Application of the wavelet transform for speech processing

NASA Technical Reports Server (NTRS)

Maes, Stephane

1994-01-01

Speaker identification and word spotting will shortly play a key role in space applications. An approach based on the wavelet transform is presented that, in the context of the 'modulation model,' enables extraction of speech features which are used as input for the classification process.
Cross-language identification of long-term average speech spectra in Korean and English: toward a better understanding of the quantitative difference between two languages.

PubMed

Noh, Heil; Lee, Dong-Hee

2012-01-01

To identify the quantitative differences between Korean and English in long-term average speech spectra (LTASS). Twenty Korean speakers, who lived in the capital of Korea and spoke standard Korean as their first language, were compared with 20 native English speakers. For the Korean speakers, a passage from a novel and a passage from a leading newspaper article were chosen. For the English speakers, the Rainbow Passage was used. The speech was digitally recorded using GenRad 1982 Precision Sound Level Meter and GoldWave® software and analyzed using MATLAB program. There was no significant difference in the LTASS between the Korean subjects reading a news article or a novel. For male subjects, the LTASS of Korean speakers was significantly lower than that of English speakers above 1.6 kHz except at 4 kHz and its difference was more than 5 dB, especially at higher frequencies. For women, the LTASS of Korean speakers showed significantly lower levels at 0.2, 0.5, 1, 1.25, 2, 2.5, 6.3, 8, and 10 kHz, but the differences were less than 5 dB. Compared with English speakers, the LTASS of Korean speakers showed significantly lower levels in frequencies above 2 kHz except at 4 kHz. The difference was less than 5 dB between 2 and 5 kHz but more than 5 dB above 6 kHz. To adjust the formula for fitting hearing aids for Koreans, our results based on the LTASS analysis suggest that one needs to raise the gain in high-frequency regions.
High stimulus variability in nonnative speech learning supports formation of abstract categories: evidence from Japanese geminates.

PubMed

Sadakata, Makiko; McQueen, James M

2013-08-01

This study reports effects of a high-variability training procedure on nonnative learning of a Japanese geminate-singleton fricative contrast. Thirty native speakers of Dutch took part in a 5-day training procedure in which they identified geminate and singleton variants of the Japanese fricative /s/. Participants were trained with either many repetitions of a limited set of words recorded by a single speaker (low-variability training) or with fewer repetitions of a more variable set of words recorded by multiple speakers (high-variability training). Both types of training enhanced identification of speech but not of nonspeech materials, indicating that learning was domain specific. High-variability training led to superior performance in identification but not in discrimination tests, and supported better generalization of learning as shown by transfer from the trained fricatives to the identification of untrained stops and affricates. Variability thus helps nonnative listeners to form abstract categories rather than to enhance early acoustic analysis.
Gender identification from high-pass filtered vowel segments: the use of high-frequency energy.

PubMed

Donai, Jeremy J; Lass, Norman J

2015-10-01

The purpose of this study was to examine the use of high-frequency information for making gender identity judgments from high-pass filtered vowel segments produced by adult speakers. Specifically, the effect of removing lower-frequency spectral detail (i.e., F3 and below) from vowel segments via high-pass filtering was evaluated. Thirty listeners (ages 18-35) with normal hearing participated in the experiment. A within-subjects design was used to measure gender identification for six 250-ms vowel segments (/æ/, /ɪ /, /ɝ/, /ʌ/, /ɔ/, and /u/), produced by ten male and ten female speakers. The results of this experiment demonstrated that despite the removal of low-frequency spectral detail, the listeners were accurate in identifying speaker gender from the vowel segments, and did so with performance significantly above chance. The removal of low-frequency spectral detail reduced gender identification by approximately 16 % relative to unfiltered vowel segments. Classification results using linear discriminant function analyses followed the perceptual data, using spectral and temporal representations derived from the high-pass filtered segments. Cumulatively, these findings indicate that normal-hearing listeners are able to make accurate perceptual judgments regarding speaker gender from vowel segments with low-frequency spectral detail removed via high-pass filtering. Therefore, it is reasonable to suggest the presence of perceptual cues related to gender identity in the high-frequency region of naturally produced vowel signals. Implications of these findings and possible mechanisms for performing the gender identification task from high-pass filtered stimuli are discussed.
Greek perception and production of an English vowel contrast: A preliminary study

NASA Astrophysics Data System (ADS)

Podlipský, Václav J.

2005-04-01

This study focused on language-independent principles functioning in acquisition of second language (L2) contrasts. Specifically, it tested Bohn's Desensitization Hypothesis [in Speech perception and linguistic experience: Issues in Cross Language Research, edited by W. Strange (York Press, Baltimore, 1995)] which predicted that Greek speakers of English as an L2 would base their perceptual identification of English /i/ and /I/ on durational differences. Synthetic vowels differing orthogonally in duration and spectrum between the /i/ and /I/ endpoints served as stimuli for a forced-choice identification test. To assess L2 proficiency and to evaluate the possibility of cross-language category assimilation, productions of English /i/, /I/, and /ɛ/ and of Greek /i/ and /e/ were elicited and analyzed acoustically. The L2 utterances were also rated for the degree of foreign accent. Two native speakers of Modern Greek with low and 2 with intermediate experience in English participated. Six native English (NE) listeners and 6 NE speakers tested in an earlier study constituted the control groups. Heterogeneous perceptual behavior was observed for the L2 subjects. It is concluded that until acquisition in completely naturalistic settings is tested, possible interference of formally induced meta-linguistic differentiation between a ``short'' and a ``long'' vowel cannot be eliminated.
Robust speaker's location detection in a vehicle environment using GMM models.

PubMed

Hu, Jwu-Sheng; Cheng, Chieh-Cheng; Liu, Wei-Han

2006-04-01

Abstract-Human-computer interaction (HCI) using speech communication is becoming increasingly important, especially in driving where safety is the primary concern. Knowing the speaker's location (i.e., speaker localization) not only improves the enhancement results of a corrupted signal, but also provides assistance to speaker identification. Since conventional speech localization algorithms suffer from the uncertainties of environmental complexity and noise, as well as from the microphone mismatch problem, they are frequently not robust in practice. Without a high reliability, the acceptance of speech-based HCI would never be realized. This work presents a novel speaker's location detection method and demonstrates high accuracy within a vehicle cabinet using a single linear microphone array. The proposed approach utilize Gaussian mixture models (GMM) to model the distributions of the phase differences among the microphones caused by the complex characteristic of room acoustic and microphone mismatch. The model can be applied both in near-field and far-field situations in a noisy environment. The individual Gaussian component of a GMM represents some general location-dependent but content and speaker-independent phase difference distributions. Moreover, the scheme performs well not only in nonline-of-sight cases, but also when the speakers are aligned toward the microphone array but at difference distances from it. This strong performance can be achieved by exploiting the fact that the phase difference distributions at different locations are distinguishable in the environment of a car. The experimental results also show that the proposed method outperforms the conventional multiple signal classification method (MUSIC) technique at various SNRs.
Robust Recognition of Loud and Lombard speech in the Fighter Cockpit Environment

DTIC Science & Technology

1988-08-01

the latter as inter-speaker variability. According to Zue [Z85j, inter-speaker variabilities can be attributed to sociolinguistic background, dialect...34 Journal of the Acoustical Society of America , Vol 50, 1971. [At74I B. S. Atal, "Linear prediction for speaker identification," Journal of the Acoustical...Society of America , Vol 55, 1974. [B771 B. Beek, E. P. Neuberg, and D. C. Hodge, "An Assessment of the Technology of Automatic Speech Recognition for
Robust Speaker Authentication Based on Combined Speech and Voiceprint Recognition

NASA Astrophysics Data System (ADS)

Malcangi, Mario

2009-08-01

Personal authentication is becoming increasingly important in many applications that have to protect proprietary data. Passwords and personal identification numbers (PINs) prove not to be robust enough to ensure that unauthorized people do not use them. Biometric authentication technology may offer a secure, convenient, accurate solution but sometimes fails due to its intrinsically fuzzy nature. This research aims to demonstrate that combining two basic speech processing methods, voiceprint identification and speech recognition, can provide a very high degree of robustness, especially if fuzzy decision logic is used.
"Feminism Lite?" Feminist Identification, Speaker Appearance, and Perceptions of Feminist and Antifeminist Messengers

ERIC Educational Resources Information Center

Bullock, Heather E.; Fernald, Julian L.

2003-01-01

Drawing on a communications model of persuasion (Hovland, Janis, & Kelley, 1953), this study examined the effect of target appearance on feminists' and nonfeminists' perceptions of a speaker delivering a feminist or an antifeminist message. One hundred three college women watched one of four videotaped speeches that varied by content (profeminist…

Accent Identification by Adults with Aphasia

ERIC Educational Resources Information Center

Newton, Caroline; Burns, Rebecca; Bruce, Carolyn

2013-01-01

The UK is a diverse society where individuals regularly interact with speakers with different accents. Whilst there is a growing body of research on the impact of speaker accent on comprehension in people with aphasia, there is none which explores their ability to identify accents. This study investigated the ability of this group to identify the…
Performance of wavelet analysis and neural networks for pathological voices identification

NASA Astrophysics Data System (ADS)

Salhi, Lotfi; Talbi, Mourad; Abid, Sabeur; Cherif, Adnane

2011-09-01

Within the medical environment, diverse techniques exist to assess the state of the voice of the patient. The inspection technique is inconvenient for a number of reasons, such as its high cost, the duration of the inspection, and above all, the fact that it is an invasive technique. This study focuses on a robust, rapid and accurate system for automatic identification of pathological voices. This system employs non-invasive, non-expensive and fully automated method based on hybrid approach: wavelet transform analysis and neural network classifier. First, we present the results obtained in our previous study while using classic feature parameters. These results allow visual identification of pathological voices. Second, quantified parameters drifting from the wavelet analysis are proposed to characterise the speech sample. On the other hand, a system of multilayer neural networks (MNNs) has been developed which carries out the automatic detection of pathological voices. The developed method was evaluated using voice database composed of recorded voice samples (continuous speech) from normophonic or dysphonic speakers. The dysphonic speakers were patients of a National Hospital 'RABTA' of Tunis Tunisia and a University Hospital in Brussels, Belgium. Experimental results indicate a success rate ranging between 75% and 98.61% for discrimination of normal and pathological voices using the proposed parameters and neural network classifier. We also compared the average classification rate based on the MNN, Gaussian mixture model and support vector machines.
Identifying the nonlinear mechanical behaviour of micro-speakers from their quasi-linear electrical response

NASA Astrophysics Data System (ADS)

Zilletti, Michele; Marker, Arthur; Elliott, Stephen John; Holland, Keith

2017-05-01

In this study model identification of the nonlinear dynamics of a micro-speaker is carried out by purely electrical measurements, avoiding any explicit vibration measurements. It is shown that a dynamic model of the micro-speaker, which takes into account the nonlinear damping characteristic of the device, can be identified by measuring the response between the voltage input and the current flowing into the coil. An analytical formulation of the quasi-linear model of the micro-speaker is first derived and an optimisation method is then used to identify a polynomial function which describes the mechanical damping behaviour of the micro-speaker. The analytical results of the quasi-linear model are compared with numerical results. This study potentially opens up the possibility of efficiently implementing nonlinear echo cancellers.
Perception of Melodic Contour and Intonation in Autism Spectrum Disorder: Evidence From Mandarin Speakers.

PubMed

Jiang, Jun; Liu, Fang; Wan, Xuan; Jiang, Cunmei

2015-07-01

Tone language experience benefits pitch processing in music and speech for typically developing individuals. No known studies have examined pitch processing in individuals with autism who speak a tone language. This study investigated discrimination and identification of melodic contour and speech intonation in a group of Mandarin-speaking individuals with high-functioning autism. Individuals with autism showed superior melodic contour identification but comparable contour discrimination relative to controls. In contrast, these individuals performed worse than controls on both discrimination and identification of speech intonation. These findings provide the first evidence for differential pitch processing in music and speech in tone language speakers with autism, suggesting that tone language experience may not compensate for speech intonation perception deficits in individuals with autism.
Speaker recognition with temporal cues in acoustic and electric hearing

NASA Astrophysics Data System (ADS)

Vongphoe, Michael; Zeng, Fan-Gang

2005-08-01

Natural spoken language processing includes not only speech recognition but also identification of the speaker's gender, age, emotional, and social status. Our purpose in this study is to evaluate whether temporal cues are sufficient to support both speech and speaker recognition. Ten cochlear-implant and six normal-hearing subjects were presented with vowel tokens spoken by three men, three women, two boys, and two girls. In one condition, the subject was asked to recognize the vowel. In the other condition, the subject was asked to identify the speaker. Extensive training was provided for the speaker recognition task. Normal-hearing subjects achieved nearly perfect performance in both tasks. Cochlear-implant subjects achieved good performance in vowel recognition but poor performance in speaker recognition. The level of the cochlear implant performance was functionally equivalent to normal performance with eight spectral bands for vowel recognition but only to one band for speaker recognition. These results show a disassociation between speech and speaker recognition with primarily temporal cues, highlighting the limitation of current speech processing strategies in cochlear implants. Several methods, including explicit encoding of fundamental frequency and frequency modulation, are proposed to improve speaker recognition for current cochlear implant users.
A study of voice production characteristics of astronuat speech during Apollo 11 for speaker modeling in space.

PubMed

Yu, Chengzhu; Hansen, John H L

2017-03-01

Human physiology has evolved to accommodate environmental conditions, including temperature, pressure, and air chemistry unique to Earth. However, the environment in space varies significantly compared to that on Earth and, therefore, variability is expected in astronauts' speech production mechanism. In this study, the variations of astronaut voice characteristics during the NASA Apollo 11 mission are analyzed. Specifically, acoustical features such as fundamental frequency and phoneme formant structure that are closely related to the speech production system are studied. For a further understanding of astronauts' vocal tract spectrum variation in space, a maximum likelihood frequency warping based analysis is proposed to detect the vocal tract spectrum displacement during space conditions. The results from fundamental frequency, formant structure, as well as vocal spectrum displacement indicate that astronauts change their speech production mechanism when in space. Moreover, the experimental results for astronaut voice identification tasks indicate that current speaker recognition solutions are highly vulnerable to astronaut voice production variations in space conditions. Future recommendations from this study suggest that successful applications of speaker recognition during extended space missions require robust speaker modeling techniques that could effectively adapt to voice production variation caused by diverse space conditions.
The non-trusty clown attack on model-based speaker recognition systems

NASA Astrophysics Data System (ADS)

Farrokh Baroughi, Alireza; Craver, Scott

2015-03-01

Biometric detectors for speaker identification commonly employ a statistical model for a subject's voice, such as a Gaussian Mixture Model, that combines multiple means to improve detector performance. This allows a malicious insider to amend or append a component of a subject's statistical model so that a detector behaves normally except under a carefully engineered circumstance. This allows an attacker to force a misclassification of his or her voice only when desired, by smuggling data into a database far in advance of an attack. Note that the attack is possible if attacker has access to database even for a limited time to modify victim's model. We exhibit such an attack on a speaker identification, in which an attacker can force a misclassification by speaking in an unusual voice, and replacing the least weighted component of victim's model by the most weighted competent of the unusual voice of the attacker's model. The reason attacker make his or her voice unusual during the attack is because his or her normal voice model can be in database, and by attacking with unusual voice, the attacker has the option to be recognized as himself or herself when talking normally or as the victim when talking in the unusual manner. By attaching an appropriately weighted vector to a victim's model, we can impersonate all users in our simulations, while avoiding unwanted false rejections.
English Language Schooling, Linguistic Realities, and the Native Speaker of English in Hong Kong

ERIC Educational Resources Information Center

Hansen Edwards, Jette G.

2018-01-01

The study employs a case study approach to examine the impact of educational backgrounds on nine Hong Kong tertiary students' English and Cantonese language practices and identifications as native speakers of English and Cantonese. The study employed both survey and interview data to probe the participants' English and Cantonese language use at…
Priming of Non-Speech Vocalizations in Male Adults: The Influence of the Speaker's Gender

ERIC Educational Resources Information Center

Fecteau, Shirley; Armony, Jorge L.; Joanette, Yves; Belin, Pascal

2004-01-01

Previous research reported a priming effect for voices. However, the type of information primed is still largely unknown. In this study, we examined the influence of speaker's gender and emotional category of the stimulus on priming of non-speech vocalizations in 10 male participants, who performed a gender identification task. We found a…
Effects of Phonetic Similarity in the Identification of Mandarin Tones

ERIC Educational Resources Information Center

Li, Bin; Shao, Jing; Bao, Mingzhen

2017-01-01

Tonal languages differ in how they use phonetic correlates, e.g. average pitch height and pitch direction, for tonal contrasts. Thus, native speakers of a tonal language may need to adjust their attention to familiar or unfamiliar phonetic cues when perceiving non-native tones. On the other hand, speakers of a non-tonal language may need to…
Brain Plasticity in Speech Training in Native English Speakers Learning Mandarin Tones

NASA Astrophysics Data System (ADS)

Heinzen, Christina Carolyn

The current study employed behavioral and event-related potential (ERP) measures to investigate brain plasticity associated with second-language (L2) phonetic learning based on an adaptive computer training program. The program utilized the acoustic characteristics of Infant-Directed Speech (IDS) to train monolingual American English-speaking listeners to perceive Mandarin lexical tones. Behavioral identification and discrimination tasks were conducted using naturally recorded speech, carefully controlled synthetic speech, and non-speech control stimuli. The ERP experiments were conducted with selected synthetic speech stimuli in a passive listening oddball paradigm. Identical pre- and post- tests were administered on nine adult listeners, who completed two-to-three hours of perceptual training. The perceptual training sessions used pair-wise lexical tone identification, and progressed through seven levels of difficulty for each tone pair. The levels of difficulty included progression in speaker variability from one to four speakers and progression through four levels of acoustic exaggeration of duration, pitch range, and pitch contour. Behavioral results for the natural speech stimuli revealed significant training-induced improvement in identification of Tones 1, 3, and 4. Improvements in identification of Tone 4 generalized to novel stimuli as well. Additionally, comparison between discrimination of across-category and within-category stimulus pairs taken from a synthetic continuum revealed a training-induced shift toward more native-like categorical perception of the Mandarin lexical tones. Analysis of the Mismatch Negativity (MMN) responses in the ERP data revealed increased amplitude and decreased latency for pre-attentive processing of across-category discrimination as a result of training. There were also laterality changes in the MMN responses to the non-speech control stimuli, which could reflect reallocation of brain resources in processing pitch patterns for the across-category lexical tone contrast. Overall, the results support the use of IDS characteristics in training non-native speech contrasts and provide impetus for further research.
Native Speakers of Arabic and ESL Texts: Evidence for the Transfer of Written Word Identification Processes

ERIC Educational Resources Information Center

Hayes-Harb, Rachel

2006-01-01

English as a second language (ESL) teachers have long noted that native speakers of Arabic exhibit exceptional difficulty with English reading comprehension (e.g., Thompson-Panos & Thomas-Ruzic, 1983). Most existing work in this area has looked to higher level aspects of reading such as familiarity with discourse structure and cultural knowledge…
The Effect of Scene Variation on the Redundant Use of Color in Definite Reference

ERIC Educational Resources Information Center

Koolen, Ruud; Goudbeek, Martijn; Krahmer, Emiel

2013-01-01

This study investigates to what extent the amount of variation in a visual scene causes speakers to mention the attribute color in their definite target descriptions, focusing on scenes in which this attribute is not needed for identification of the target. The results of our three experiments show that speakers are more likely to redundantly…
The perception of FM sweeps by Chinese and English listeners.

PubMed

Luo, Huan; Boemio, Anthony; Gordon, Michael; Poeppel, David

2007-02-01

Frequency-modulated (FM) signals are an integral acoustic component of ecologically natural sounds and are analyzed effectively in the auditory systems of humans and animals. Linearly frequency-modulated tone sweeps were used here to evaluate two questions. First, how rapid a sweep can listeners accurately perceive? Second, is there an effect of native language insofar as the language (phonology) is differentially associated with processing of FM signals? Speakers of English and Mandarin Chinese were tested to evaluate whether being a speaker of a tone language altered the perceptual identification of non-speech tone sweeps. In two psychophysical studies, we demonstrate that Chinese subjects perform better than English subjects in FM direction identification, but not in an FM discrimination task, in which English and Chinese speakers show similar detection thresholds of approximately 20 ms duration. We suggest that the better FM direction identification in Chinese subjects is related to their experience with FM direction analysis in the tone-language environment, even though supra-segmental tonal variation occurs over a longer time scale. Furthermore, the observed common discrimination temporal threshold across two language groups supports the conjecture that processing auditory signals at durations of approximately 20 ms constitutes a fundamental auditory perceptual threshold.
Shhh… I Need Quiet! Children's Understanding of American, British, and Japanese-accented English Speakers.

PubMed

Bent, Tessa; Holt, Rachael Frush

2018-02-01

Children's ability to understand speakers with a wide range of dialects and accents is essential for efficient language development and communication in a global society. Here, the impact of regional dialect and foreign-accent variability on children's speech understanding was evaluated in both quiet and noisy conditions. Five- to seven-year-old children ( n = 90) and adults ( n = 96) repeated sentences produced by three speakers with different accents-American English, British English, and Japanese-accented English-in quiet or noisy conditions. Adults had no difficulty understanding any speaker in quiet conditions. Their performance declined for the nonnative speaker with a moderate amount of noise; their performance only substantially declined for the British English speaker (i.e., below 93% correct) when their understanding of the American English speaker was also impeded. In contrast, although children showed accurate word recognition for the American and British English speakers in quiet conditions, they had difficulty understanding the nonnative speaker even under ideal listening conditions. With a moderate amount of noise, their perception of British English speech declined substantially and their ability to understand the nonnative speaker was particularly poor. These results suggest that although school-aged children can understand unfamiliar native dialects under ideal listening conditions, their ability to recognize words in these dialects may be highly susceptible to the influence of environmental degradation. Fully adult-like word identification for speakers with unfamiliar accents and dialects may exhibit a protracted developmental trajectory.
Acoustic and perceptual effects of overall F0 range in a lexical pitch accent distinction

NASA Astrophysics Data System (ADS)

Wade, Travis

2002-05-01

A speaker's overall fundamental frequency range is generally considered a variable, nonlinguistic element of intonation. This study examined the precision with which overall F0 is predictable based on previous intonational context and the extent to which it may be perceptually significant. Speakers of Tokyo Japanese produced pairs of sentences differing lexically only in the presence or absence of a single pitch accent as responses to visual and prerecorded speech cues presented in an interactive manner. F0 placement of high tones (previously observed to be relatively variable in pitch contours) was found to be consistent across speakers and uniformly dependent on the intonation of the different sentences used as cues. In a subsequent perception experiment, continuous manipulation of these same sentences between typical accented and typical non-accent-containing versions were presented to Japanese listeners for lexical identification. Results showed that listeners' perception was not significantly altered in compensation for artificial manipulation of preceding intonation. Implications are discussed within an autosegmental analysis of tone. The current results are consistent with the notion that pitch range (i.e., specific vertical locations of tonal peaks) does not simply vary gradiently across speakers and situations but constitutes a predictable part of the phonetic specification of tones.
How Captain Amerika uses neural networks to fight crime

NASA Technical Reports Server (NTRS)

Rogers, Steven K.; Kabrisky, Matthew; Ruck, Dennis W.; Oxley, Mark E.

1994-01-01

Artificial neural network models can make amazing computations. These models are explained along with their application in problems associated with fighting crime. Specific problems addressed are identification of people using face recognition, speaker identification, and fingerprint and handwriting analysis (biometric authentication).
The Effects of the Literal Meaning of Emotional Phrases on the Identification of Vocal Emotions.

PubMed

Shigeno, Sumi

2018-02-01

This study investigates the discrepancy between the literal emotional content of speech and emotional tone in the identification of speakers' vocal emotions in both the listeners' native language (Japanese), and in an unfamiliar language (random-spliced Japanese). Both experiments involve a "congruent condition," in which the emotion contained in the literal meaning of speech (words and phrases) was compatible with vocal emotion, and an "incongruent condition," in which these forms of emotional information were discordant. Results for Japanese indicated that performance in identifying emotions did not differ significantly between the congruent and incongruent conditions. However, the results for random-spliced Japanese indicated that vocal emotion was correctly identified more often in the congruent than in the incongruent condition. The different results for Japanese and random-spliced Japanese suggested that the literal meaning of emotional phrases influences the listener's perception of the speaker's emotion, and that Japanese participants could infer speakers' intended emotions in the incongruent condition.
Mother and Father Speech: Distribution of Parental Speech Features in English and Spanish. Papers and Reports on Child Language Development, No. 12.

ERIC Educational Resources Information Center

Blount, Ben G.; Padgug, Elise J.

Features of parental speech to young children was studied in four English-speaking and four Spanish-speaking families. Children ranged in age from 9 to 12 months for the English speakers and from 8 to 22 months for the Spanish speakers. Examination of the utterances led to the identification of 34 prosodic, paralinguistic, and interactional…
Age as a Factor in Ethnic Accent Identification in Singapore

ERIC Educational Resources Information Center

Tan, Ying Ying

2012-01-01

This study seeks to answer two research questions. First, can listeners distinguish the ethnicity of the speakers on the basis of voice quality alone? Second, do demographic differences among the listeners affect discriminability? A simple but carefully designed and controlled ethnic identification test was carried out on 325 Singaporean…

Partially supervised speaker clustering.

PubMed

Tang, Hao; Chu, Stephen Mingyu; Hasegawa-Johnson, Mark; Huang, Thomas S

2012-05-01

Content-based multimedia indexing, retrieval, and processing as well as multimedia databases demand the structuring of the media content (image, audio, video, text, etc.), one significant goal being to associate the identity of the content to the individual segments of the signals. In this paper, we specifically address the problem of speaker clustering, the task of assigning every speech utterance in an audio stream to its speaker. We offer a complete treatment to the idea of partially supervised speaker clustering, which refers to the use of our prior knowledge of speakers in general to assist the unsupervised speaker clustering process. By means of an independent training data set, we encode the prior knowledge at the various stages of the speaker clustering pipeline via 1) learning a speaker-discriminative acoustic feature transformation, 2) learning a universal speaker prior model, and 3) learning a discriminative speaker subspace, or equivalently, a speaker-discriminative distance metric. We study the directional scattering property of the Gaussian mixture model (GMM) mean supervector representation of utterances in the high-dimensional space, and advocate exploiting this property by using the cosine distance metric instead of the euclidean distance metric for speaker clustering in the GMM mean supervector space. We propose to perform discriminant analysis based on the cosine distance metric, which leads to a novel distance metric learning algorithm—linear spherical discriminant analysis (LSDA). We show that the proposed LSDA formulation can be systematically solved within the elegant graph embedding general dimensionality reduction framework. Our speaker clustering experiments on the GALE database clearly indicate that 1) our speaker clustering methods based on the GMM mean supervector representation and vector-based distance metrics outperform traditional speaker clustering methods based on the “bag of acoustic features” representation and statistical model-based distance metrics, 2) our advocated use of the cosine distance metric yields consistent increases in the speaker clustering performance as compared to the commonly used euclidean distance metric, 3) our partially supervised speaker clustering concept and strategies significantly improve the speaker clustering performance over the baselines, and 4) our proposed LSDA algorithm further leads to state-of-the-art speaker clustering performance.
Hearing history influences voice gender perceptual performance in cochlear implant users.

PubMed

Kovačić, Damir; Balaban, Evan

2010-12-01

The study was carried out to assess the role that five hearing history variables (chronological age, age at onset of deafness, age of first cochlear implant [CI] activation, duration of CI use, and duration of known deafness) play in the ability of CI users to identify speaker gender. Forty-one juvenile CI users participated in two voice gender identification tasks. In a fixed, single-interval task, subjects listened to a single speech item from one of 20 adult male or 20 adult female speakers and had to identify speaker gender. In an adaptive speech-based voice gender discrimination task with the fundamental frequency difference between the voices as the adaptive parameter, subjects listened to a pair of speech items presented in sequential order, one of which was always spoken by an adult female and the other by an adult male. Subjects had to identify the speech item spoken by the female voice. Correlation and regression analyses between perceptual scores in the two tasks and the hearing history variables were performed. Subjects fell into three performance groups: (1) those who could distinguish voice gender in both tasks, (2) those who could distinguish voice gender in the adaptive but not the fixed task, and (3) those who could not distinguish voice gender in either task. Gender identification performance for single voices in the fixed task was significantly and negatively related to the duration of deafness before cochlear implantation (shorter deafness yielded better performance), whereas performance in the adaptive task was weakly but significantly related to age at first activation of the CI device, with earlier activations yielding better scores. The existence of a group of subjects able to perform adaptive discrimination but unable to identify the gender of singly presented voices demonstrates the potential dissociability of the skills required for these two tasks, suggesting that duration of deafness and age of cochlear implantation could have dissociable effects on the development of different skills required by CI users to identify speaker gender.
Event identification by acoustic signature recognition

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dress, W.B.; Kercel, S.W.

1995-07-01

Many events of interest to the security commnnity produce acoustic emissions that are, in principle, identifiable as to cause. Some obvious examples are gunshots, breaking glass, takeoffs and landings of small aircraft, vehicular engine noises, footsteps (high frequencies when on gravel, very low frequencies. when on soil), and voices (whispers to shouts). We are investigating wavelet-based methods to extract unique features of such events for classification and identification. We also discuss methods of classification and pattern recognition specifically tailored for acoustic signatures obtained by wavelet analysis. The paper is divided into three parts: completed work, work in progress, and futuremore » applications. The completed phase has led to the successful recognition of aircraft types on landing and takeoff. Both small aircraft (twin-engine turboprop) and large (commercial airliners) were included in the study. The project considered the design of a small, field-deployable, inexpensive device. The techniques developed during the aircraft identification phase were then adapted to a multispectral electromagnetic interference monitoring device now deployed in a nuclear power plant. This is a general-purpose wavelet analysis engine, spanning 14 octaves, and can be adapted for other specific tasks. Work in progress is focused on applying the methods previously developed to speaker identification. Some of the problems to be overcome include recognition of sounds as voice patterns and as distinct from possible background noises (e.g., music), as well as identification of the speaker from a short-duration voice sample. A generalization of the completed work and the work in progress is a device capable of classifying any number of acoustic events-particularly quasi-stationary events such as engine noises and voices and singular events such as gunshots and breaking glass. We will show examples of both kinds of events and discuss their recognition likelihood.« less
Congenital amusia in speakers of a tone language: association with lexical tone agnosia.

PubMed

Nan, Yun; Sun, Yanan; Peretz, Isabelle

2010-09-01

Congenital amusia is a neurogenetic disorder that affects the processing of musical pitch in speakers of non-tonal languages like English and French. We assessed whether this musical disorder exists among speakers of Mandarin Chinese who use pitch to alter the meaning of words. Using the Montreal Battery of Evaluation of Amusia, we tested 117 healthy young Mandarin speakers with no self-declared musical problems and 22 individuals who reported musical difficulties and scored two standard deviations below the mean obtained by the Mandarin speakers without amusia. These 22 amusic individuals showed a similar pattern of musical impairment as did amusic speakers of non-tonal languages, by exhibiting a more pronounced deficit in melody than in rhythm processing. Furthermore, nearly half the tested amusics had impairments in the discrimination and identification of Mandarin lexical tones. Six showed marked impairments, displaying what could be called lexical tone agnosia, but had normal tone production. Our results show that speakers of tone languages such as Mandarin may experience musical pitch disorder despite early exposure to speech-relevant pitch contrasts. The observed association between the musical disorder and lexical tone difficulty indicates that the pitch disorder as defining congenital amusia is not specific to music or culture but is rather general in nature.
Understanding of emotions and false beliefs among hearing children versus deaf children.

PubMed

Ziv, Margalit; Most, Tova; Cohen, Shirit

2013-04-01

Emotion understanding and theory of mind (ToM) are two major aspects of social cognition in which deaf children demonstrate developmental delays. The current study investigated these social cognition aspects in two subgroups of deaf children-those with cochlear implants who communicate orally (speakers) and those who communicate primarily using sign language (signers)-in comparison to hearing children. Participants were 53 Israeli kindergartners-20 speakers, 10 signers, and 23 hearing children. Tests included four emotion identification and understanding tasks and one false belief task (ToM). Results revealed similarities among all children's emotion labeling and affective perspective taking abilities, similarities between speakers and hearing children in false beliefs and in understanding emotions in typical contexts, and lower performance of signers on the latter three tasks. Adapting educational experiences to the unique characteristics and needs of speakers and signers is recommended.
Standardization and future directions in pattern identification research: International brainstorming session.

PubMed

Jung, Jeeyoun; Park, Bongki; Lee, Ju Ah; You, Sooseong; Alraek, Terje; Bian, Zhao-Xiang; Birch, Stephen; Kim, Tae-Hun; Xu, Hao; Zaslawski, Chris; Kang, Byoung-Kab; Lee, Myeong Soo

2016-09-01

An international brainstorming session on standardizing pattern identification (PI) was held at the Korea Institute of Oriental Medicine on October 1, 2013 in Daejeon, South Korea. This brainstorming session was convened to gather insights from international traditional East Asian medicine specialists regarding PI standardization. With eight presentations and discussion sessions, the meeting allowed participants to discuss research methods and diagnostic systems used in traditional medicine for PI. One speaker presented a talk titled "The diagnostic criteria for blood stasis syndrome: implications for standardization of PI". Four speakers presented on future strategies and objective measurement tools that could be used in PI research. Later, participants shared information and methodology for accurate diagnosis and PI. They also discussed the necessity for standardizing PI and methods for international collaborations in pattern research.
Perception of musical and lexical tones by Taiwanese-speaking musicians.

PubMed

Lee, Chao-Yang; Lee, Yuh-Fang; Shr, Chia-Lin

2011-07-01

This study explored the relationship between music and speech by examining absolute pitch and lexical tone perception. Taiwanese-speaking musicians were asked to identify musical tones without a reference pitch and multispeaker Taiwanese level tones without acoustic cues typically present for speaker normalization. The results showed that a high percentage of the participants (65% with an exact match required and 81% with one-semitone errors allowed) possessed absolute pitch, as measured by the musical tone identification task. A negative correlation was found between occurrence of absolute pitch and age of onset of musical training, suggesting that the acquisition of absolute pitch resembles the acquisition of speech. The participants were able to identify multispeaker Taiwanese level tones with above-chance accuracy, even though the acoustic cues typically present for speaker normalization were not available in the stimuli. No correlations were found between the performance in musical tone identification and the performance in Taiwanese tone identification. Potential reasons for the lack of association between the two tasks are discussed. © 2011 Acoustical Society of America
Experimental study on GMM-based speaker recognition

NASA Astrophysics Data System (ADS)

Ye, Wenxing; Wu, Dapeng; Nucci, Antonio

2010-04-01

Speaker recognition plays a very important role in the field of biometric security. In order to improve the recognition performance, many pattern recognition techniques have be explored in the literature. Among these techniques, the Gaussian Mixture Model (GMM) is proved to be an effective statistic model for speaker recognition and is used in most state-of-the-art speaker recognition systems. The GMM is used to represent the 'voice print' of a speaker through modeling the spectral characteristic of speech signals of the speaker. In this paper, we implement a speaker recognition system, which consists of preprocessing, Mel-Frequency Cepstrum Coefficients (MFCCs) based feature extraction, and GMM based classification. We test our system with TIDIGITS data set (325 speakers) and our own recordings of more than 200 speakers; our system achieves 100% correct recognition rate. Moreover, we also test our system under the scenario that training samples are from one language but test samples are from a different language; our system also achieves 100% correct recognition rate, which indicates that our system is language independent.
Sound Localization and Speech Perception in Noise of Pediatric Cochlear Implant Recipients: Bimodal Fitting Versus Bilateral Cochlear Implants.

PubMed

Choi, Ji Eun; Moon, Il Joon; Kim, Eun Yeon; Park, Hee-Sung; Kim, Byung Kil; Chung, Won-Ho; Cho, Yang-Sun; Brown, Carolyn J; Hong, Sung Hwa

The aim of this study was to compare binaural performance of auditory localization task and speech perception in babble measure between children who use a cochlear implant (CI) in one ear and a hearing aid (HA) in the other (bimodal fitting) and those who use bilateral CIs. Thirteen children (mean age ± SD = 10 ± 2.9 years) with bilateral CIs and 19 children with bimodal fitting were recruited to participate. Sound localization was assessed using a 13-loudspeaker array in a quiet sound-treated booth. Speakers were placed in an arc from -90° azimuth to +90° azimuth (15° interval) in horizontal plane. To assess the accuracy of sound location identification, we calculated the absolute error in degrees between the target speaker and the response speaker during each trial. The mean absolute error was computed by dividing the sum of absolute errors by the total number of trials. We also calculated the hemifield identification score to reflect the accuracy of right/left discrimination. Speech-in-babble perception was also measured in the sound field using target speech presented from the front speaker. Eight-talker babble was presented in the following four different listening conditions: from the front speaker (0°), from one of the two side speakers (+90° or -90°), from both side speakers (±90°). Speech, spatial, and quality questionnaire was administered. When the two groups of children were directly compared with each other, there was no significant difference in localization accuracy ability or hemifield identification score under binaural condition. Performance in speech perception test was also similar to each other under most babble conditions. However, when the babble was from the first device side (CI side for children with bimodal stimulation or first CI side for children with bilateral CIs), speech understanding in babble by bilateral CI users was significantly better than that by bimodal listeners. Speech, spatial, and quality scores were comparable with each other between the two groups. Overall, the binaural performance was similar to each other between children who are fit with two CIs (CI + CI) and those who use bimodal stimulation (HA + CI) in most conditions. However, the bilateral CI group showed better speech perception than the bimodal CI group when babble was from the first device side (first CI side for bilateral CI users or CI side for bimodal listeners). Therefore, if bimodal performance is significantly below the mean bilateral CI performance on speech perception in babble, these results suggest that a child should be considered to transit from bimodal stimulation to bilateral CIs.
Interface of Linguistic and Visual Information During Audience Design.

PubMed

Fukumura, Kumiko

2015-08-01

Evidence suggests that speakers can take account of the addressee's needs when referring. However, what representations drive the speaker's audience design has been less clear. This study aims to go beyond previous studies by investigating the interplay between the visual and linguistic context during audience design. Speakers repeated subordinate descriptions (e.g., firefighter) given in the prior linguistic context less and used basic-level descriptions (e.g., man) more when the addressee did not hear the linguistic context than when s/he did. But crucially, this effect happened only when the referent lacked the visual attributes associated with the expressions (e.g., the referent was in plain clothes rather than in a firefighter uniform), so there was no other contextual cue available for the identification of the referent. This suggests that speakers flexibly use different contextual cues to help their addressee map the referring expression onto the intended referent. In addition, speakers used fewer pronouns when the addressee did not hear the linguistic antecedent than when s/he did. This suggests that although speakers may be egocentric during anaphoric reference (Fukumura & Van Gompel, 2012), they can cooperatively avoid pronouns when the linguistic antecedents were not shared with their addressee during initial reference. © 2014 Cognitive Science Society, Inc.
Voice input/output capabilities at Perception Technology Corporation

NASA Technical Reports Server (NTRS)

Ferber, Leon A.

1977-01-01

Condensed resumes of key company personnel at the Perception Technology Corporation are presented. The staff possesses recognition, speech synthesis, speaker authentication, and language identification. Hardware and software engineers' capabilities are included.
Perception of English palatal codas by Korean speakers of English

NASA Astrophysics Data System (ADS)

Yeon, Sang-Hee

2003-04-01

This study aimed at looking at perception of English palatal codas by Korean speakers of English to determine if perception problems are the source of production problems. In particular, first, this study looked at the possible first language effect on the perception of English palatal codas. Second, a possible perceptual source of vowel epenthesis after English palatal codas was investigated. In addition, individual factors, such as length of residence, TOEFL score, gender and academic status, were compared to determine if those affected the varying degree of the perception accuracy. Eleven adult Korean speakers of English as well as three native speakers of English participated in the study. Three sets of a perception test including identification of minimally different English pseudo- or real words were carried out. The results showed that, first, the Korean speakers perceived the English codas significantly worse than the Americans. Second, the study supported the idea that Koreans perceived an extra /i/ after the final affricates due to final release. Finally, none of the individual factors explained the varying degree of the perceptional accuracy. In particular, TOEFL scores and the perception test scores did not have any statistically significant association.
Training the perception of Hindi dental and retroflex stops by native speakers of American English and Japanese.

PubMed

Pruitt, John S; Jenkins, James J; Strange, Winifred

2006-03-01

Perception of second language speech sounds is influenced by one's first language. For example, speakers of American English have difficulty perceiving dental versus retroflex stop consonants in Hindi although English has both dental and retroflex allophones of alveolar stops. Japanese, unlike English, has a contrast similar to Hindi, specifically, the Japanese /d/ versus the flapped /r/ which is sometimes produced as a retroflex. This study compared American and Japanese speakers' identification of the Hindi contrast in CV syllable contexts where C varied in voicing and aspiration. The study then evaluated the participants' increase in identifying the distinction after training with a computer-interactive program. Training sessions progressively increased in difficulty by decreasing the extent of vowel truncation in stimuli and by adding new speakers. Although all participants improved significantly, Japanese participants were more accurate than Americans in distinguishing the contrast on pretest, during training, and on posttest. Transfer was observed to three new consonantal contexts, a new vowel context, and a new speaker's productions. Some abstract aspect of the contrast was apparently learned during training. It is suggested that allophonic experience with dental and retroflex stops may be detrimental to perception of the new contrast.
The Sound of Voice: Voice-Based Categorization of Speakers' Sexual Orientation within and across Languages.

PubMed

Sulpizio, Simone; Fasoli, Fabio; Maass, Anne; Paladino, Maria Paola; Vespignani, Francesco; Eyssel, Friederike; Bentler, Dominik

2015-01-01

Empirical research had initially shown that English listeners are able to identify the speakers' sexual orientation based on voice cues alone. However, the accuracy of this voice-based categorization, as well as its generalizability to other languages (language-dependency) and to non-native speakers (language-specificity), has been questioned recently. Consequently, we address these open issues in 5 experiments: First, we tested whether Italian and German listeners are able to correctly identify sexual orientation of same-language male speakers. Then, participants of both nationalities listened to voice samples and rated the sexual orientation of both Italian and German male speakers. We found that listeners were unable to identify the speakers' sexual orientation correctly. However, speakers were consistently categorized as either heterosexual or gay on the basis of how they sounded. Moreover, a similar pattern of results emerged when listeners judged the sexual orientation of speakers of their own and of the foreign language. Overall, this research suggests that voice-based categorization of sexual orientation reflects the listeners' expectations of how gay voices sound rather than being an accurate detector of the speakers' actual sexual identity. Results are discussed with regard to accuracy, acoustic features of voices, language dependency and language specificity.
29 CFR 102.142 - Transcripts, recordings or minutes of closed meetings; public availability; retention.

Code of Federal Regulations, 2011 CFR

2011-07-01

..., or transcriptions of electronic recordings including the identification of speakers, shall to the... cost of transcription. (c) The agency shall maintain a complete verbatim copy of the transcript, a...
29 CFR 102.142 - Transcripts, recordings or minutes of closed meetings; public availability; retention.

Code of Federal Regulations, 2010 CFR

2010-07-01

..., or transcriptions of electronic recordings including the identification of speakers, shall to the... cost of transcription. (c) The agency shall maintain a complete verbatim copy of the transcript, a...
Report of an international symposium on drugs and driving

DOT National Transportation Integrated Search

1975-06-30

This report presents the proceedings of a Symposium on Drugs (other than alcohol) and Driving. Speaker's papers and work session summaries are included. Major topics include: Overview of Problem, Risk Identification, Drug Measurement in Biological Ma...
Individual differences in selective attention predict speech identification at a cocktail party.

PubMed

Oberfeld, Daniel; Klöckner-Nowotny, Felicitas

2016-08-31

Listeners with normal hearing show considerable individual differences in speech understanding when competing speakers are present, as in a crowded restaurant. Here, we show that one source of this variance are individual differences in the ability to focus selective attention on a target stimulus in the presence of distractors. In 50 young normal-hearing listeners, the performance in tasks measuring auditory and visual selective attention was associated with sentence identification in the presence of spatially separated competing speakers. Together, the measures of selective attention explained a similar proportion of variance as the binaural sensitivity for the acoustic temporal fine structure. Working memory span, age, and audiometric thresholds showed no significant association with speech understanding. These results suggest that a reduced ability to focus attention on a target is one reason why some listeners with normal hearing sensitivity have difficulty communicating in situations with background noise.
Speaker Linking and Applications using Non-Parametric Hashing Methods

DTIC Science & Technology

2016-09-08

clustering method based on hashing—canopy- clustering . We apply this method to a large corpus of speaker recordings, demonstrate performance tradeoffs...and compare to other hash- ing methods. Index Terms: speaker recognition, clustering , hashing, locality sensitive hashing. 1. Introduction We assume...speaker in our corpus. Second, given a QBE method, how can we perform speaker clustering —each clustering should be a single speaker, and a cluster should
Speech recognition: Acoustic-phonetic knowledge acquisition and representation

NASA Astrophysics Data System (ADS)

Zue, Victor W.

1988-09-01

The long-term research goal is to develop and implement speaker-independent continuous speech recognition systems. It is believed that the proper utilization of speech-specific knowledge is essential for such advanced systems. This research is thus directed toward the acquisition, quantification, and representation, of acoustic-phonetic and lexical knowledge, and the application of this knowledge to speech recognition algorithms. In addition, we are exploring new speech recognition alternatives based on artificial intelligence and connectionist techniques. We developed a statistical model for predicting the acoustic realization of stop consonants in various positions in the syllable template. A unification-based grammatical formalism was developed for incorporating this model into the lexical access algorithm. We provided an information-theoretic justification for the hierarchical structure of the syllable template. We analyzed segmented duration for vowels and fricatives in continuous speech. Based on contextual information, we developed durational models for vowels and fricatives that account for over 70 percent of the variance, using data from multiple, unknown speakers. We rigorously evaluated the ability of human spectrogram readers to identify stop consonants spoken by many talkers and in a variety of phonetic contexts. Incorporating the declarative knowledge used by the readers, we developed a knowledge-based system for stop identification. We achieved comparable system performance to that to the readers.

A Model of Mandarin Tone Categories--A Study of Perception and Production

ERIC Educational Resources Information Center

Yang, Bei

2010-01-01

The current study lays the groundwork for a model of Mandarin tones based on both native speakers' and non-native speakers' perception and production. It demonstrates that there is variability in non-native speakers' tone productions and that there are differences in the perceptual boundaries in native speakers and non-native speakers. There…
Special Observance Planning Guide

DTIC Science & Technology

2015-11-01

Finding the right speaker for an event can be a challenge. Many speakers are recommended based on word-of-mouth or through a group connected to...An unprepared, rambling speaker or one who intentionally or unintentionally attacks a group or its members can be extremely damaging to a program...Don’t assume that an organizational senior leader is an adequate speaker based on position, rank, and/or affiliation with a reference group
The Sound of Voice: Voice-Based Categorization of Speakers’ Sexual Orientation within and across Languages

PubMed Central

Maass, Anne; Paladino, Maria Paola; Vespignani, Francesco; Eyssel, Friederike; Bentler, Dominik

2015-01-01

Empirical research had initially shown that English listeners are able to identify the speakers' sexual orientation based on voice cues alone. However, the accuracy of this voice-based categorization, as well as its generalizability to other languages (language-dependency) and to non-native speakers (language-specificity), has been questioned recently. Consequently, we address these open issues in 5 experiments: First, we tested whether Italian and German listeners are able to correctly identify sexual orientation of same-language male speakers. Then, participants of both nationalities listened to voice samples and rated the sexual orientation of both Italian and German male speakers. We found that listeners were unable to identify the speakers' sexual orientation correctly. However, speakers were consistently categorized as either heterosexual or gay on the basis of how they sounded. Moreover, a similar pattern of results emerged when listeners judged the sexual orientation of speakers of their own and of the foreign language. Overall, this research suggests that voice-based categorization of sexual orientation reflects the listeners' expectations of how gay voices sound rather than being an accurate detector of the speakers' actual sexual identity. Results are discussed with regard to accuracy, acoustic features of voices, language dependency and language specificity. PMID:26132820
An automatic speech recognition system with speaker-independent identification support

NASA Astrophysics Data System (ADS)

Caranica, Alexandru; Burileanu, Corneliu

2015-02-01

The novelty of this work relies on the application of an open source research software toolkit (CMU Sphinx) to train, build and evaluate a speech recognition system, with speaker-independent support, for voice-controlled hardware applications. Moreover, we propose to use the trained acoustic model to successfully decode offline voice commands on embedded hardware, such as an ARMv6 low-cost SoC, Raspberry PI. This type of single-board computer, mainly used for educational and research activities, can serve as a proof-of-concept software and hardware stack for low cost voice automation systems.
Rhythmic patterning in Malaysian and Singapore English.

PubMed

Tan, Rachel Siew Kuang; Low, Ee-Ling

2014-06-01

Previous work on the rhythm of Malaysian English has been based on impressionistic observations. This paper utilizes acoustic analysis to measure the rhythmic patterns of Malaysian English. Recordings of the read speech and spontaneous speech of 10 Malaysian English speakers were analyzed and compared with recordings of an equivalent sample of Singaporean English speakers. Analysis was done using two rhythmic indexes, the PVI and VarcoV. It was found that although the rhythm of read speech of the Singaporean speakers was syllable-based as described by previous studies, the rhythm of the Malaysian speakers was even more syllable-based. Analysis of the syllables in specific utterances showed that Malaysian speakers did not reduce vowels as much as Singaporean speakers in cases of syllables in utterances. Results of the spontaneous speech confirmed the findings for the read speech; that is, the same rhythmic patterning was found which normally triggers vowel reductions.
Computer-Mediated Assessment of Intelligibility in Aphasia and Apraxia of Speech

PubMed Central

Haley, Katarina L.; Roth, Heidi; Grindstaff, Enetta; Jacks, Adam

2011-01-01

Background Previous work indicates that single word intelligibility tests developed for dysarthria are sensitive to segmental production errors in aphasic individuals with and without apraxia of speech. However, potential listener learning effects and difficulties adapting elicitation procedures to coexisting language impairments limit their applicability to left hemisphere stroke survivors. Aims The main purpose of this study was to examine basic psychometric properties for a new monosyllabic intelligibility test developed for individuals with aphasia and/or AOS. A related purpose was to examine clinical feasibility and potential to standardize a computer-mediated administration approach. Methods & Procedures A 600-item monosyllabic single word intelligibility test was constructed by assembling sets of phonetically similar words. Custom software was used to select 50 target words from this test in a pseudo-random fashion and to elicit and record production of these words by 23 speakers with aphasia and 20 neurologically healthy participants. To evaluate test-retest reliability, two identical sets of 50-word lists were elicited by requesting repetition after a live speaker model. To examine the effect of a different word set and auditory model, an additional set of 50 different words was elicited with a pre-recorded model. The recorded words were presented to normal-hearing listeners for identification via orthographic and multiple-choice response formats. To examine construct validity, production accuracy for each speaker was estimated via phonetic transcription and rating of overall articulation. Outcomes & Results Recording and listening tasks were completed in less than six minutes for all speakers and listeners. Aphasic speakers were significantly less intelligible than neurologically healthy speakers and displayed a wide range of intelligibility scores. Test-retest and inter-listener reliability estimates were strong. No significant difference was found in scores based on recordings from a live model versus a pre-recorded model, but some individual speakers favored the live model. Intelligibility test scores correlated highly with segmental accuracy derived from broad phonetic transcription of the same speech sample and a motor speech evaluation. Scores correlated moderately with rated articulation difficulty. Conclusions We describe a computerized, single-word intelligibility test that yields clinically feasible, reliable, and valid measures of segmental speech production in adults with aphasia. This tool can be used in clinical research to facilitate appropriate participant selection and to establish matching across comparison groups. For a majority of speakers, elicitation procedures can be standardized by using a pre-recorded auditory model for repetition. This assessment tool has potential utility for both clinical assessment and outcomes research. PMID:22215933
Multilevel Analysis in Analyzing Speech Data

ERIC Educational Resources Information Center

Guddattu, Vasudeva; Krishna, Y.

2011-01-01

The speech produced by human vocal tract is a complex acoustic signal, with diverse applications in phonetics, speech synthesis, automatic speech recognition, speaker identification, communication aids, speech pathology, speech perception, machine translation, hearing research, rehabilitation and assessment of communication disorders and many…
Entropy Based Classifier Combination for Sentence Segmentation

DTIC Science & Technology

2007-01-01

speaker diarization system to divide the audio data into hypothetical speakers [17...the prosodic feature also includes turn-based features which describe the position of a word in relation to diarization seg- mentation. The speaker ...ro- bust speaker segmentation: the ICSI-SRI fall 2004 diarization system,” in Proc. RT-04F Workshop, 2004. [18] “The rich transcription fall 2003,” http://nist.gov/speech/tests/rt/rt2003/fall/docs/rt03-fall-eval- plan-v9.pdf.
Preverbal Infants Infer Third-Party Social Relationships Based on Language

ERIC Educational Resources Information Center

Liberman, Zoe; Woodward, Amanda L.; Kinzler, Katherine D.

2017-01-01

Language provides rich social information about its speakers. For instance, adults and children make inferences about a speaker's social identity, geographic origins, and group membership based on her language and accent. Although infants prefer speakers of familiar languages (Kinzler, Dupoux, & Spelke, 2007), little is known about the…
A cross-language study of perception of lexical stress in English.

PubMed

Yu, Vickie Y; Andruski, Jean E

2010-08-01

This study investigates the question of whether language background affects the perception of lexical stress in English. Thirty native English speakers and 30 native Chinese learners of English participated in a stressed-syllable identification task and a discrimination task involving three types of stimuli (real words/pseudowords/hums). The results show that both language groups were able to identify and discriminate stress patterns. Lexical and segmental information affected the English and Chinese speakers in varying degrees. English and Chinese speakers showed different response patterns to trochaic vs. iambic stress across the three types of stimuli. An acoustic analysis revealed that two language groups used different acoustic cues to process lexical stress. The findings suggest that the different degrees of lexical and segmental effects can be explained by language background, which in turn supports the hypothesis that language background affects the perception of lexical stress in English.
The Semantic Basis of Do So.

ERIC Educational Resources Information Center

Binder, Richard

The thesis of this paper is that the "do so" test described by Lakoff and Ross (1966) is a test of the speaker's belief system regarding the relationship of verbs to their surface subject, and that judgments of grammaticality concerning "do so" are based on the speaker's underlying semantic beliefs. ("Speaker" refers here to both speakers and…
Speaker Reliability Guides Children's Inductive Inferences about Novel Properties

ERIC Educational Resources Information Center

Kim, Sunae; Kalish, Charles W.; Harris, Paul L.

2012-01-01

Prior work shows that children can make inductive inferences about objects based on their labels rather than their appearance (Gelman, 2003). A separate line of research shows that children's trust in a speaker's label is selective. Children accept labels from a reliable speaker over an unreliable speaker (e.g., Koenig & Harris, 2005). In the…
Individual differences in selective attention predict speech identification at a cocktail party

PubMed Central

Oberfeld, Daniel; Klöckner-Nowotny, Felicitas

2016-01-01

Listeners with normal hearing show considerable individual differences in speech understanding when competing speakers are present, as in a crowded restaurant. Here, we show that one source of this variance are individual differences in the ability to focus selective attention on a target stimulus in the presence of distractors. In 50 young normal-hearing listeners, the performance in tasks measuring auditory and visual selective attention was associated with sentence identification in the presence of spatially separated competing speakers. Together, the measures of selective attention explained a similar proportion of variance as the binaural sensitivity for the acoustic temporal fine structure. Working memory span, age, and audiometric thresholds showed no significant association with speech understanding. These results suggest that a reduced ability to focus attention on a target is one reason why some listeners with normal hearing sensitivity have difficulty communicating in situations with background noise. DOI: http://dx.doi.org/10.7554/eLife.16747.001 PMID:27580272
Differential use of temporal cues to the /s/-/z/ contrast by native and non-native speakers of English.

PubMed

Flege, J E; Hillenbrand, J

1986-02-01

This study examined the effect of linguistic experience on perception of the English /s/-/z/ contrast in word-final position. The durations of the periodic ("vowel") and aperiodic ("fricative") portions of stimuli, ranging from peas to peace, were varied in a 5 X 5 factorial design. Forced-choice identification judgments were elicited from two groups of native speakers of American English differing in dialect, and from two groups each of native speakers of French, Swedish, and Finnish differing in English-language experience. The results suggested that the non-native subjects used cues established for the perception of phonetic contrasts in their native language to identify fricatives as /s/ or /z/. Lengthening vowel duration increased /z/ judgments in all eight subject groups, although the effect was smaller for native speakers of French than for native speakers of the other languages. Shortening fricative duration, on the other hand, significantly decreased /z/ judgments only by the English and French subjects. It did not influence voicing judgments by the Swedish and Finnish subjects, even those who had lived for a year or more in an English-speaking environment. These findings raise the question of whether adults who learn a foreign language can acquire the ability to integrate multiple acoustic cues to a phonetic contrast which does not exist in their native language.
Human Language Technology: Opportunities and Challenges

DTIC Science & Technology

2005-01-01

because of the connections to and reliance on signal processing. Audio diarization critically includes indexing of speakers [12], since speaker ...to reduce inter- speaker variability in training. Standard techniques include vocal-tract length normalization, adaptation of acoustic models using...maximum likelihood linear regression (MLLR), and speaker -adaptive training based on MLLR. The acoustic models are mixtures of Gaussians, typically with
Hybrid Speaker Recognition Using Universal Acoustic Model

NASA Astrophysics Data System (ADS)

Nishimura, Jun; Kuroda, Tadahiro

We propose a novel speaker recognition approach using a speaker-independent universal acoustic model (UAM) for sensornet applications. In sensornet applications such as “Business Microscope”, interactions among knowledge workers in an organization can be visualized by sensing face-to-face communication using wearable sensor nodes. In conventional studies, speakers are detected by comparing energy of input speech signals among the nodes. However, there are often synchronization errors among the nodes which degrade the speaker recognition performance. By focusing on property of the speaker's acoustic channel, UAM can provide robustness against the synchronization error. The overall speaker recognition accuracy is improved by combining UAM with the energy-based approach. For 0.1s speech inputs and 4 subjects, speaker recognition accuracy of 94% is achieved at the synchronization error less than 100ms.
Person authentication using brainwaves (EEG) and maximum a posteriori model adaptation.

PubMed

Marcel, Sébastien; Millán, José Del R

2007-04-01

In this paper, we investigate the use of brain activity for person authentication. It has been shown in previous studies that the brain-wave pattern of every individual is unique and that the electroencephalogram (EEG) can be used for biometric identification. EEG-based biometry is an emerging research topic and we believe that it may open new research directions and applications in the future. However, very little work has been done in this area and was focusing mainly on person identification but not on person authentication. Person authentication aims to accept or to reject a person claiming an identity, i.e., comparing a biometric data to one template, while the goal of person identification is to match the biometric data against all the records in a database. We propose the use of a statistical framework based on Gaussian Mixture Models and Maximum A Posteriori model adaptation, successfully applied to speaker and face authentication, which can deal with only one training session. We perform intensive experimental simulations using several strict train/test protocols to show the potential of our method. We also show that there are some mental tasks that are more appropriate for person authentication than others.
Audio-visual imposture

NASA Astrophysics Data System (ADS)

Karam, Walid; Mokbel, Chafic; Greige, Hanna; Chollet, Gerard

2006-05-01

A GMM based audio visual speaker verification system is described and an Active Appearance Model with a linear speaker transformation system is used to evaluate the robustness of the verification. An Active Appearance Model (AAM) is used to automatically locate and track a speaker's face in a video recording. A Gaussian Mixture Model (GMM) based classifier (BECARS) is used for face verification. GMM training and testing is accomplished on DCT based extracted features of the detected faces. On the audio side, speech features are extracted and used for speaker verification with the GMM based classifier. Fusion of both audio and video modalities for audio visual speaker verification is compared with face verification and speaker verification systems. To improve the robustness of the multimodal biometric identity verification system, an audio visual imposture system is envisioned. It consists of an automatic voice transformation technique that an impostor may use to assume the identity of an authorized client. Features of the transformed voice are then combined with the corresponding appearance features and fed into the GMM based system BECARS for training. An attempt is made to increase the acceptance rate of the impostor and to analyzing the robustness of the verification system. Experiments are being conducted on the BANCA database, with a prospect of experimenting on the newly developed PDAtabase developed within the scope of the SecurePhone project.
Artificially intelligent recognition of Arabic speaker using voice print-based local features

NASA Astrophysics Data System (ADS)

Mahmood, Awais; Alsulaiman, Mansour; Muhammad, Ghulam; Akram, Sheeraz

2016-11-01

Local features for any pattern recognition system are based on the information extracted locally. In this paper, a local feature extraction technique was developed. This feature was extracted in the time-frequency plain by taking the moving average on the diagonal directions of the time-frequency plane. This feature captured the time-frequency events producing a unique pattern for each speaker that can be viewed as a voice print of the speaker. Hence, we referred to this technique as voice print-based local feature. The proposed feature was compared to other features including mel-frequency cepstral coefficient (MFCC) for speaker recognition using two different databases. One of the databases used in the comparison is a subset of an LDC database that consisted of two short sentences uttered by 182 speakers. The proposed feature attained 98.35% recognition rate compared to 96.7% for MFCC using the LDC subset.
Consistency between verbal and non-verbal affective cues: a clue to speaker credibility.

PubMed

Gillis, Randall L; Nilsen, Elizabeth S

2017-06-01

Listeners are exposed to inconsistencies in communication; for example, when speakers' words (i.e. verbal) are discrepant with their demonstrated emotions (i.e. non-verbal). Such inconsistencies introduce ambiguity, which may render a speaker to be a less credible source of information. Two experiments examined whether children make credibility discriminations based on the consistency of speakers' affect cues. In Experiment 1, school-age children (7- to 8-year-olds) preferred to solicit information from consistent speakers (e.g. those who provided a negative statement with negative affect), over novel speakers, to a greater extent than they preferred to solicit information from inconsistent speakers (e.g. those who provided a negative statement with positive affect) over novel speakers. Preschoolers (4- to 5-year-olds) did not demonstrate this preference. Experiment 2 showed that school-age children's ratings of speakers were influenced by speakers' affect consistency when the attribute being judged was related to information acquisition (speakers' believability, "weird" speech), but not general characteristics (speakers' friendliness, likeability). Together, findings suggest that school-age children are sensitive to, and use, the congruency of affect cues to determine whether individuals are credible sources of information.

Effects of Language Background on Gaze Behavior: A Crosslinguistic Comparison Between Korean and German Speakers

PubMed Central

Goller, Florian; Lee, Donghoon; Ansorge, Ulrich; Choi, Soonja

2017-01-01

Languages differ in how they categorize spatial relations: While German differentiates between containment (in) and support (auf) with distinct spatial words—(a) den Kuli IN die Kappe stecken (”put pen in cap”); (b) die Kappe AUF den Kuli stecken (”put cap on pen”)—Korean uses a single spatial word (kkita) collapsing (a) and (b) into one semantic category, particularly when the spatial enclosure is tight-fit. Korean uses a different word (i.e., netha) for loose-fits (e.g., apple in bowl). We tested whether these differences influence the attention of the speaker. In a crosslinguistic study, we compared native German speakers with native Korean speakers. Participants rated the similarity of two successive video clips of several scenes where two objects were joined or nested (either in a tight or loose manner). The rating data show that Korean speakers base their rating of similarity more on tight- versus loose-fit, whereas German speakers base their rating more on containment versus support (in vs. auf). Throughout the experiment, we also measured the participants’ eye movements. Korean speakers looked equally long at the moving Figure object and at the stationary Ground object, whereas German speakers were more biased to look at the Ground object. Additionally, Korean speakers also looked more at the region where the two objects touched than did German speakers. We discuss our data in the light of crosslinguistic semantics and the extent of their influence on spatial cognition and perception. PMID:29362644
Korean Word Frequency and Commonality Study for Augmentative and Alternative Communication

ERIC Educational Resources Information Center

Shin, Sangeun; Hill, Katya

2016-01-01

Background: Vocabulary frequency results have been reported to design and support augmentative and alternative communication (AAC) interventions. A few studies exist for adult speakers and for other natural languages. With the increasing demand on AAC treatment for Korean adults, identification of high-frequency or core vocabulary (CV) becomes…
A Report by the Air Pollution Committee

ERIC Educational Resources Information Center

Kirkpatrick, Lane

1972-01-01

Description of a Symposium on Air Resource Protection and the Environment,'' held at the 1972 Environmental Health Conference and Exposition. Reports included a mathematical model for predicting future levels of air pollution, evaluation and identification of transportation controls, and a panel discussion of points raised by the speakers. (LK)
Myths and Political Rhetoric: Jimmy Carter Accepts the Nomination.

ERIC Educational Resources Information Center

Corso, Dianne M.

Like other political speakers who have drawn on the personification, identification, and dramatic encounter images of mythology to pressure and persuade audiences, Jimmy Carter evoked the myths of the hero, the American Dream, and the ideal political process in his presidential nomination acceptance speech. By stressing his unknown status, his…
Children's Understanding That Utterances Emanate from Minds: Using Speaker Belief To Aid Interpretation.

ERIC Educational Resources Information Center

Mitchell, Peter; Robinson, Elizabeth J.; Thompson, Doreen E.

1999-01-01

Three experiments examined 3- to 6-year olds' ability to use a speaker's utterance based on false belief to identify which of several referents was intended. Found that many 4- to 5-year olds performed correctly only when it was unnecessary to consider the speaker's belief. When the speaker gave an ambiguous utterance, many 3- to 6-year olds…
Speaker verification using committee neural networks.

PubMed

Reddy, Narender P; Buch, Ojas A

2003-10-01

Security is a major problem in web based access or remote access to data bases. In the present study, the technique of committee neural networks was developed for speech based speaker verification. Speech data from the designated speaker and several imposters were obtained. Several parameters were extracted in the time and frequency domains, and fed to neural networks. Several neural networks were trained and the five best performing networks were recruited into the committee. The committee decision was based on majority voting of the member networks. The committee opinion was evaluated with further testing data. The committee correctly identified the designated speaker in (50 out of 50) 100% of the cases and rejected imposters in (150 out of 150) 100% of the cases. The committee decision was not unanimous in majority of the cases tested.
Development of panel loudspeaker system: design, evaluation and enhancement.

PubMed

Bai, M R; Huang, T

2001-06-01

Panel speakers are investigated in terms of structural vibration and acoustic radiation. A panel speaker primarily consists of a panel and an inertia exciter. Contrary to conventional speakers, flexural resonance is encouraged such that the panel vibrates as randomly as possible. Simulation tools are developed to facilitate system integration of panel speakers. In particular, electro-mechanical analogy, finite element analysis, and fast Fourier transform are employed to predict panel vibration and the acoustic radiation. Design procedures are also summarized. In order to compare the panel speakers with the conventional speakers, experimental investigations were undertaken to evaluate frequency response, directional response, sensitivity, efficiency, and harmonic distortion of both speakers. The results revealed that the panel speakers suffered from a problem of sensitivity and efficiency. To alleviate the problem, a woofer using electronic compensation based on H2 model matching principle is utilized to supplement the bass response. As indicated in the result, significant improvement over the panel speaker alone was achieved by using the combined panel-woofer system.
And then I saw her race: Race-based expectations affect infants' word processing.

PubMed

Weatherhead, Drew; White, Katherine S

2018-08-01

How do our expectations about speakers shape speech perception? Adults' speech perception is influenced by social properties of the speaker (e.g., race). When in development do these influences begin? In the current study, 16-month-olds heard familiar words produced in their native accent (e.g., "dog") and in an unfamiliar accent involving a vowel shift (e.g., "dag"), in the context of an image of either a same-race speaker or an other-race speaker. Infants' interpretation of the words depended on the speaker's race. For the same-race speaker, infants only recognized words produced in the familiar accent; for the other-race speaker, infants recognized both versions of the words. Two additional experiments showed that infants only recognized an other-race speaker's atypical pronunciations when they differed systematically from the native accent. These results provide the first evidence that expectations driven by unspoken properties of speakers, such as race, influence infants' speech processing. Copyright © 2018 Elsevier B.V. All rights reserved.
Long short-term memory for speaker generalization in supervised speech separation

PubMed Central

Chen, Jitong; Wang, DeLiang

2017-01-01

Speech separation can be formulated as learning to estimate a time-frequency mask from acoustic features extracted from noisy speech. For supervised speech separation, generalization to unseen noises and unseen speakers is a critical issue. Although deep neural networks (DNNs) have been successful in noise-independent speech separation, DNNs are limited in modeling a large number of speakers. To improve speaker generalization, a separation model based on long short-term memory (LSTM) is proposed, which naturally accounts for temporal dynamics of speech. Systematic evaluation shows that the proposed model substantially outperforms a DNN-based model on unseen speakers and unseen noises in terms of objective speech intelligibility. Analyzing LSTM internal representations reveals that LSTM captures long-term speech contexts. It is also found that the LSTM model is more advantageous for low-latency speech separation and it, without future frames, performs better than the DNN model with future frames. The proposed model represents an effective approach for speaker- and noise-independent speech separation. PMID:28679261
A study on (K, Na) NbO3 based multilayer piezoelectric ceramics micro speaker

NASA Astrophysics Data System (ADS)

Gao, Renlong; Chu, Xiangcheng; Huan, Yu; Sun, Yiming; Liu, Jiayi; Wang, Xiaohui; Li, Longtu

2014-10-01

A flat panel micro speaker was fabricated from (K, Na) NbO3 (KNN)-based multilayer piezoelectric ceramics by a tape casting and cofiring process using Ag-Pd alloys as an inner electrode. The interface between ceramic and electrode was investigated by scanning electron microscope (SEM) and transmission electron microscope (TEM). The acoustic response was characterized by a standard audio test system. We found that the micro speaker with dimensions of 23 × 27 × 0.6 mm3, using three layers of 30 μm thickness KNN-based ceramic, has a high average sound pressure level (SPL) of 87 dB, between 100 Hz-20 kHz under five voltage. This result was even better than that of lead zirconate titanate (PZT)-based ceramics under the same conditions. The experimental results show that the KNN-based multilayer ceramics could be used as lead free piezoelectric micro speakers.
Speech Clarity Index (Ψ): A Distance-Based Speech Quality Indicator and Recognition Rate Prediction for Dysarthric Speakers with Cerebral Palsy

NASA Astrophysics Data System (ADS)

Kayasith, Prakasith; Theeramunkong, Thanaruk

It is a tedious and subjective task to measure severity of a dysarthria by manually evaluating his/her speech using available standard assessment methods based on human perception. This paper presents an automated approach to assess speech quality of a dysarthric speaker with cerebral palsy. With the consideration of two complementary factors, speech consistency and speech distinction, a speech quality indicator called speech clarity index (Ψ) is proposed as a measure of the speaker's ability to produce consistent speech signal for a certain word and distinguished speech signal for different words. As an application, it can be used to assess speech quality and forecast speech recognition rate of speech made by an individual dysarthric speaker before actual exhaustive implementation of an automatic speech recognition system for the speaker. The effectiveness of Ψ as a speech recognition rate predictor is evaluated by rank-order inconsistency, correlation coefficient, and root-mean-square of difference. The evaluations had been done by comparing its predicted recognition rates with ones predicted by the standard methods called the articulatory and intelligibility tests based on the two recognition systems (HMM and ANN). The results show that Ψ is a promising indicator for predicting recognition rate of dysarthric speech. All experiments had been done on speech corpus composed of speech data from eight normal speakers and eight dysarthric speakers.
Gender Differences in the Recognition of Vocal Emotions

PubMed Central

Lausen, Adi; Schacht, Annekathrin

2018-01-01

The conflicting findings from the few studies conducted with regard to gender differences in the recognition of vocal expressions of emotion have left the exact nature of these differences unclear. Several investigators have argued that a comprehensive understanding of gender differences in vocal emotion recognition can only be achieved by replicating these studies while accounting for influential factors such as stimulus type, gender-balanced samples, number of encoders, decoders, and emotional categories. This study aimed to account for these factors by investigating whether emotion recognition from vocal expressions differs as a function of both listeners' and speakers' gender. A total of N = 290 participants were randomly and equally allocated to two groups. One group listened to words and pseudo-words, while the other group listened to sentences and affect bursts. Participants were asked to categorize the stimuli with respect to the expressed emotions in a fixed-choice response format. Overall, females were more accurate than males when decoding vocal emotions, however, when testing for specific emotions these differences were small in magnitude. Speakers' gender had a significant impact on how listeners' judged emotions from the voice. The group listening to words and pseudo-words had higher identification rates for emotions spoken by male than by female actors, whereas in the group listening to sentences and affect bursts the identification rates were higher when emotions were uttered by female than male actors. The mixed pattern for emotion-specific effects, however, indicates that, in the vocal channel, the reliability of emotion judgments is not systematically influenced by speakers' gender and the related stereotypes of emotional expressivity. Together, these results extend previous findings by showing effects of listeners' and speakers' gender on the recognition of vocal emotions. They stress the importance of distinguishing these factors to explain recognition ability in the processing of emotional prosody. PMID:29922202
The Language of Persuasion, English, Vocabulary: 5114.68.

ERIC Educational Resources Information Center

Groff, Irvin

Developed for a high school quinmester unit on the language of persuasion, this guide provides the teacher with teaching strategies for a study of the speaker or writer as a persuader, the identification of the logical and psychological tools of persuasion, an examination of the levels of abstraction, the techniques of propaganda, and the…
Perception of Melodic Contour and Intonation in Autism Spectrum Disorder: Evidence from Mandarin Speakers

ERIC Educational Resources Information Center

Jiang, Jun; Liu, Fang; Wan, Xuan; Jiang, Cunmei

2015-01-01

Tone language experience benefits pitch processing in music and speech for typically developing individuals. No known studies have examined pitch processing in individuals with autism who speak a tone language. This study investigated discrimination and identification of melodic contour and speech intonation in a group of Mandarin-speaking…
A Cross-Language Study of Perception of Lexical Stress in English

ERIC Educational Resources Information Center

Yu, Vickie Y.; Andruski, Jean E.

2010-01-01

This study investigates the question of whether language background affects the perception of lexical stress in English. Thirty native English speakers and 30 native Chinese learners of English participated in a stressed-syllable identification task and a discrimination task involving three types of stimuli (real words/pseudowords/hums). The…
Processing of Acoustic Cues in Lexical-Tone Identification by Pediatric Cochlear-Implant Recipients

ERIC Educational Resources Information Center

Peng, Shu-Chen; Lu, Hui-Ping; Lu, Nelson; Lin, Yung-Song; Deroche, Mickael L. D.; Chatterjee, Monita

2017-01-01

Purpose: The objective was to investigate acoustic cue processing in lexical-tone recognition by pediatric cochlear-implant (CI) recipients who are native Mandarin speakers. Method: Lexical-tone recognition was assessed in pediatric CI recipients and listeners with normal hearing (NH) in 2 tasks. In Task 1, participants identified naturally…
Acquisition of L2 Vowel Duration in Japanese by Native English Speakers

ERIC Educational Resources Information Center

Okuno, Tomoko

2013-01-01

Research has demonstrated that focused perceptual training facilitates L2 learners' segmental perception and spoken word identification. Hardison (2003) and Motohashi-Saigo and Hardison (2009) found benefits of visual cues in the training for acquisition of L2 contrasts. The present study examined factors affecting perception and production…
Improving Speaker Recognition by Biometric Voice Deconstruction

PubMed Central

Mazaira-Fernandez, Luis Miguel; Álvarez-Marquina, Agustín; Gómez-Vilda, Pedro

2015-01-01

Person identification, especially in critical environments, has always been a subject of great interest. However, it has gained a new dimension in a world threatened by a new kind of terrorism that uses social networks (e.g., YouTube) to broadcast its message. In this new scenario, classical identification methods (such as fingerprints or face recognition) have been forcedly replaced by alternative biometric characteristics such as voice, as sometimes this is the only feature available. The present study benefits from the advances achieved during last years in understanding and modeling voice production. The paper hypothesizes that a gender-dependent characterization of speakers combined with the use of a set of features derived from the components, resulting from the deconstruction of the voice into its glottal source and vocal tract estimates, will enhance recognition rates when compared to classical approaches. A general description about the main hypothesis and the methodology followed to extract the gender-dependent extended biometric parameters is given. Experimental validation is carried out both on a highly controlled acoustic condition database, and on a mobile phone network recorded under non-controlled acoustic conditions. PMID:26442245
Improving Speaker Recognition by Biometric Voice Deconstruction.

PubMed

Mazaira-Fernandez, Luis Miguel; Álvarez-Marquina, Agustín; Gómez-Vilda, Pedro

2015-01-01

Person identification, especially in critical environments, has always been a subject of great interest. However, it has gained a new dimension in a world threatened by a new kind of terrorism that uses social networks (e.g., YouTube) to broadcast its message. In this new scenario, classical identification methods (such as fingerprints or face recognition) have been forcedly replaced by alternative biometric characteristics such as voice, as sometimes this is the only feature available. The present study benefits from the advances achieved during last years in understanding and modeling voice production. The paper hypothesizes that a gender-dependent characterization of speakers combined with the use of a set of features derived from the components, resulting from the deconstruction of the voice into its glottal source and vocal tract estimates, will enhance recognition rates when compared to classical approaches. A general description about the main hypothesis and the methodology followed to extract the gender-dependent extended biometric parameters is given. Experimental validation is carried out both on a highly controlled acoustic condition database, and on a mobile phone network recorded under non-controlled acoustic conditions.
Implementation of support vector machine for classification of speech marked hijaiyah letters based on Mel frequency cepstrum coefficient feature extraction

NASA Astrophysics Data System (ADS)

Adhi Pradana, Wisnu; Adiwijaya; Novia Wisesty, Untari

2018-03-01

Support Vector Machine or commonly called SVM is one method that can be used to process the classification of a data. SVM classifies data from 2 different classes with hyperplane. In this study, the system was built using SVM to develop Arabic Speech Recognition. In the development of the system, there are 2 kinds of speakers that have been tested that is dependent speakers and independent speakers. The results from this system is an accuracy of 85.32% for speaker dependent and 61.16% for independent speakers.

Guest Speakers in School-Based Sexuality Education

ERIC Educational Resources Information Center

McRee, Annie-Laurie; Madsen, Nikki; Eisenberg, Marla E.

2014-01-01

This study, using data from a statewide survey (n = 332), examined teachers' practices regarding the inclusion of guest speakers to cover sexuality content. More than half of teachers (58%) included guest speakers. In multivariate analyses, teachers who taught high school, had professional preparation in health education, or who received…
Promoting Communities of Practice among Non-Native Speakers of English in Online Discussions

ERIC Educational Resources Information Center

Kim, Hoe Kyeung

2011-01-01

An online discussion involving text-based computer-mediated communication has great potential for promoting equal participation among non-native speakers of English. Several studies claimed that online discussions could enhance the academic participation of non-native speakers of English. However, there is little research around participation…
``The perceptual bases of speaker identity'' revisited

NASA Astrophysics Data System (ADS)

Voiers, William D.

2003-10-01

A series of experiments begun 40 years ago [W. D. Voiers, J. Acoust. Soc. Am. 36, 1065-1073 (1964)] was concerned with identifying the perceived voice traits (PVTs) on which human recognition of voices depends. It culminated with the development of a voice taxonomy based on 20 PVTs and a set of highly reliable rating scales for classifying voices with respect to those PVTs. The development of a perceptual voice taxonomy was motivated by the need for a practical method of evaluating speaker recognizability in voice communication systems. The Diagnostic Speaker Recognition Test (DSRT) evaluates the effects of systems on speaker recognizability as reflected in changes in the inter-listener reliability of voice ratings on the 20 PVTs. The DSRT thus provides a qualitative, as well as quantitative, evaluation of the effects of a system on speaker recognizability. A fringe benefit of this project is PVT rating data for a sample of 680 voices. [Work partially supported by USAFRL.
Evaluating the lexico-grammatical differences in the writing of native and non-native speakers of English in peer-reviewed medical journals in the field of pediatric oncology: Creation of the genuine index scoring system.

PubMed

Gayle, Alberto Alexander; Shimaoka, Motomu

2017-01-01

The predominance of English in scientific research has created hurdles for "non-native speakers" of English. Here we present a novel application of native language identification (NLI) for the assessment of medical-scientific writing. For this purpose, we created a novel classification system whereby scoring would be based solely on text features found to be distinctive among native English speakers (NS) within a given context. We dubbed this the "Genuine Index" (GI). This methodology was validated using a small set of journals in the field of pediatric oncology. Our dataset consisted of 5,907 abstracts, representing work from 77 countries. A support vector machine (SVM) was used to generate our model and for scoring. Accuracy, precision, and recall of the classification model were 93.3%, 93.7%, and 99.4%, respectively. Class specific F-scores were 96.5% for NS and 39.8% for our benchmark class, Japan. Overall kappa was calculated to be 37.2%. We found significant differences between countries with respect to the GI score. Significant correlation was found between GI scores and two validated objective measures of writing proficiency and readability. Two sets of key terms and phrases differentiating NS and non-native writing were identified. Our GI model was able to detect, with a high degree of reliability, subtle differences between the terms and phrasing used by native and non-native speakers in peer reviewed journals, in the field of pediatric oncology. In addition, L1 language transfer was found to be very likely to survive revision, especially in non-Western countries such as Japan. These findings show that even when the language used is technically correct, there may still be some phrasing or usage that impact quality.
Classifications of Vocalic Segments from Articulatory Kinematics: Healthy Controls and Speakers with Dysarthria

ERIC Educational Resources Information Center

Yunusova, Yana; Weismer, Gary G.; Lindstrom, Mary J.

2011-01-01

Purpose: In this study, the authors classified vocalic segments produced by control speakers (C) and speakers with dysarthria due to amyotrophic lateral sclerosis (ALS) or Parkinson's disease (PD); classification was based on movement measures. The researchers asked the following questions: (a) Can vowels be classified on the basis of selected…
The contribution of dynamic visual cues to audiovisual speech perception.

PubMed

Jaekl, Philip; Pesquita, Ana; Alsius, Agnes; Munhall, Kevin; Soto-Faraco, Salvador

2015-08-01

Seeing a speaker's facial gestures can significantly improve speech comprehension, especially in noisy environments. However, the nature of the visual information from the speaker's facial movements that is relevant for this enhancement is still unclear. Like auditory speech signals, visual speech signals unfold over time and contain both dynamic configural information and luminance-defined local motion cues; two information sources that are thought to engage anatomically and functionally separate visual systems. Whereas, some past studies have highlighted the importance of local, luminance-defined motion cues in audiovisual speech perception, the contribution of dynamic configural information signalling changes in form over time has not yet been assessed. We therefore attempted to single out the contribution of dynamic configural information to audiovisual speech processing. To this aim, we measured word identification performance in noise using unimodal auditory stimuli, and with audiovisual stimuli. In the audiovisual condition, speaking faces were presented as point light displays achieved via motion capture of the original talker. Point light displays could be isoluminant, to minimise the contribution of effective luminance-defined local motion information, or with added luminance contrast, allowing the combined effect of dynamic configural cues and local motion cues. Audiovisual enhancement was found in both the isoluminant and contrast-based luminance conditions compared to an auditory-only condition, demonstrating, for the first time the specific contribution of dynamic configural cues to audiovisual speech improvement. These findings imply that globally processed changes in a speaker's facial shape contribute significantly towards the perception of articulatory gestures and the analysis of audiovisual speech. Copyright © 2015 Elsevier Ltd. All rights reserved.
Perception of Non-Native Consonant Length Contrast: The Role of Attention in Phonetic Processing

ERIC Educational Resources Information Center

Porretta, Vincent J.; Tucker, Benjamin V.

2015-01-01

The present investigation examines English speakers' ability to identify and discriminate non-native consonant length contrast. Three groups (L1 English No-Instruction, L1 English Instruction, and L1 Finnish control) performed a speeded forced-choice identification task and a speeded AX discrimination task on Finnish non-words (e.g.…
The Effect of Pitch Peak Alignment on Sentence Type Identification in Russian

ERIC Educational Resources Information Center

Makarova, Veronika

2007-01-01

This paper reports the results of an experimental phonetic study examining pitch peak alignment in production and perception of three-syllable one-word sentences with phonetic rising-falling pitch movement by speakers of Russian. The first part of the study (Experiment 1) utilizes 22 one-word three-syllable utterances read by five female speakers…
Differential Recognition of Pitch Patterns in Discrete and Gliding Stimuli in Congenital Amusia: Evidence from Mandarin Speakers

ERIC Educational Resources Information Center

Liu, Fang; Xu, Yi; Patel, Aniruddh D.; Francart, Tom; Jiang, Cunmei

2012-01-01

This study examined whether "melodic contour deafness" (insensitivity to the direction of pitch movement) in congenital amusia is associated with specific types of pitch patterns (discrete versus gliding pitches) or stimulus types (speech syllables versus complex tones). Thresholds for identification of pitch direction were obtained using discrete…
Automatic Method of Pause Measurement for Normal and Dysarthric Speech

ERIC Educational Resources Information Center

Rosen, Kristin; Murdoch, Bruce; Folker, Joanne; Vogel, Adam; Cahill, Louise; Delatycki, Martin; Corben, Louise

2010-01-01

This study proposes an automatic method for the detection of pauses and identification of pause types in conversational speech for the purpose of measuring the effects of Friedreich's Ataxia (FRDA) on speech. Speech samples of [approximately] 3 minutes were recorded from 13 speakers with FRDA and 18 healthy controls. Pauses were measured from the…
Effects of Audio-Visual Integration on the Detection of Masked Speech and Non-Speech Sounds

ERIC Educational Resources Information Center

Eramudugolla, Ranmalee; Henderson, Rachel; Mattingley, Jason B.

2011-01-01

Integration of simultaneous auditory and visual information about an event can enhance our ability to detect that event. This is particularly evident in the perception of speech, where the articulatory gestures of the speaker's lips and face can significantly improve the listener's detection and identification of the message, especially when that…
Response Identification in the Extremely Low Frequency Region of an Electret Condenser Microphone

PubMed Central

Jeng, Yih-Nen; Yang, Tzung-Ming; Lee, Shang-Yin

2011-01-01

This study shows that a small electret condenser microphone connected to a notebook or a personal computer (PC) has a prominent response in the extremely low frequency region in a specific environment. It confines most acoustic waves within a tiny air cell as follows. The air cell is constructed by drilling a small hole in a digital versatile disk (DVD) plate. A small speaker and an electret condenser microphone are attached to the two sides of the hole. Thus, the acoustic energy emitted by the speaker and reaching the microphone is strong enough to actuate the diaphragm of the latter. The experiments showed that, once small air leakages are allowed on the margin of the speaker, the microphone captured the signal in the range of 0.5 to 20 Hz. Moreover, by removing the plastic cover of the microphone and attaching the microphone head to the vibration surface, the low frequency signal can be effectively captured too. Two examples are included to show the convenience of applying the microphone to pick up the low frequency vibration information of practical systems. PMID:22346594
Response identification in the extremely low frequency region of an electret condenser microphone.

PubMed

Jeng, Yih-Nen; Yang, Tzung-Ming; Lee, Shang-Yin

2011-01-01

This study shows that a small electret condenser microphone connected to a notebook or a personal computer (PC) has a prominent response in the extremely low frequency region in a specific environment. It confines most acoustic waves within a tiny air cell as follows. The air cell is constructed by drilling a small hole in a digital versatile disk (DVD) plate. A small speaker and an electret condenser microphone are attached to the two sides of the hole. Thus, the acoustic energy emitted by the speaker and reaching the microphone is strong enough to actuate the diaphragm of the latter. The experiments showed that, once small air leakages are allowed on the margin of the speaker, the microphone captured the signal in the range of 0.5 to 20 Hz. Moreover, by removing the plastic cover of the microphone and attaching the microphone head to the vibration surface, the low frequency signal can be effectively captured too. Two examples are included to show the convenience of applying the microphone to pick up the low frequency vibration information of practical systems.
A Comparison of Coverbal Gesture Use in Oral Discourse Among Speakers With Fluent and Nonfluent Aphasia

PubMed Central

Law, Sam-Po; Chak, Gigi Wan-Chi

2017-01-01

Purpose Coverbal gesture use, which is affected by the presence and degree of aphasia, can be culturally specific. The purpose of this study was to compare gesture use among Cantonese-speaking individuals: 23 neurologically healthy speakers, 23 speakers with fluent aphasia, and 21 speakers with nonfluent aphasia. Method Multimedia data of discourse samples from these speakers were extracted from the Cantonese AphasiaBank. Gestures were independently annotated on their forms and functions to determine how gesturing rate and distribution of gestures differed across speaker groups. A multiple regression was conducted to determine the most predictive variable(s) for gesture-to-word ratio. Results Although speakers with nonfluent aphasia gestured most frequently, the rate of gesture use in counterparts with fluent aphasia did not differ significantly from controls. Different patterns of gesture functions in the 3 speaker groups revealed that gesture plays a minor role in lexical retrieval whereas its role in enhancing communication dominates among the speakers with aphasia. The percentages of complete sentences and dysfluency strongly predicted the gesturing rate in aphasia. Conclusions The current results supported the sketch model of language–gesture association. The relationship between gesture production and linguistic abilities and clinical implications for gesture-based language intervention for speakers with aphasia are also discussed. PMID:28609510
A Comparison of Coverbal Gesture Use in Oral Discourse Among Speakers With Fluent and Nonfluent Aphasia.

PubMed

Kong, Anthony Pak-Hin; Law, Sam-Po; Chak, Gigi Wan-Chi

2017-07-12

Coverbal gesture use, which is affected by the presence and degree of aphasia, can be culturally specific. The purpose of this study was to compare gesture use among Cantonese-speaking individuals: 23 neurologically healthy speakers, 23 speakers with fluent aphasia, and 21 speakers with nonfluent aphasia. Multimedia data of discourse samples from these speakers were extracted from the Cantonese AphasiaBank. Gestures were independently annotated on their forms and functions to determine how gesturing rate and distribution of gestures differed across speaker groups. A multiple regression was conducted to determine the most predictive variable(s) for gesture-to-word ratio. Although speakers with nonfluent aphasia gestured most frequently, the rate of gesture use in counterparts with fluent aphasia did not differ significantly from controls. Different patterns of gesture functions in the 3 speaker groups revealed that gesture plays a minor role in lexical retrieval whereas its role in enhancing communication dominates among the speakers with aphasia. The percentages of complete sentences and dysfluency strongly predicted the gesturing rate in aphasia. The current results supported the sketch model of language-gesture association. The relationship between gesture production and linguistic abilities and clinical implications for gesture-based language intervention for speakers with aphasia are also discussed.
The Speaker Gender Gap at Critical Care Conferences.

PubMed

Mehta, Sangeeta; Rose, Louise; Cook, Deborah; Herridge, Margaret; Owais, Sawayra; Metaxa, Victoria

2018-06-01

To review women's participation as faculty at five critical care conferences over 7 years. Retrospective analysis of five scientific programs to identify the proportion of females and each speaker's profession based on conference conveners, program documents, or internet research. Three international (European Society of Intensive Care Medicine, International Symposium on Intensive Care and Emergency Medicine, Society of Critical Care Medicine) and two national (Critical Care Canada Forum, U.K. Intensive Care Society State of the Art Meeting) annual critical care conferences held between 2010 and 2016. Female faculty speakers. None. Male speakers outnumbered female speakers at all five conferences, in all 7 years. Overall, women represented 5-31% of speakers, and female physicians represented 5-26% of speakers. Nursing and allied health professional faculty represented 0-25% of speakers; in general, more than 50% of allied health professionals were women. Over the 7 years, Society of Critical Care Medicine had the highest representation of female (27% overall) and nursing/allied health professional (16-25%) speakers; notably, male physicians substantially outnumbered female physicians in all years (62-70% vs 10-19%, respectively). Women's representation on conference program committees ranged from 0% to 40%, with Society of Critical Care Medicine having the highest representation of women (26-40%). The female proportions of speakers, physician speakers, and program committee members increased significantly over time at the Society of Critical Care Medicine and U.K. Intensive Care Society State of the Art Meeting conferences (p < 0.05), but there was no temporal change at the other three conferences. There is a speaker gender gap at critical care conferences, with male faculty outnumbering female faculty. This gap is more marked among physician speakers than those speakers representing nursing and allied health professionals. Several organizational strategies can address this gender gap.
Effect of Intensive Voice Treatment on Tone-Language Speakers with Parkinson's Disease

ERIC Educational Resources Information Center

Whitehill, Tara L.; Wong, Lina L. -N.

2007-01-01

The aim of this study was to investigate the effect of intensive voice therapy on Cantonese speakers with Parkinson's disease. The effect of the treatment on lexical tone was of particular interest. Four Cantonese speakers with idiopathic Parkinson's disease received treatment based on the principles of Lee Silverman Voice Treatment (LSVT).…
Development of a speaker discrimination test for cochlear implant users based on the Oldenburg Logatome corpus.

PubMed

Mühler, Roland; Ziese, Michael; Rostalski, Dorothea

2009-01-01

The purpose of the study was to develop a speaker discrimination test for cochlear implant (CI) users. The speech material was drawn from the Oldenburg Logatome (OLLO) corpus, which contains 150 different logatomes read by 40 German and 10 French native speakers. The prototype test battery included 120 logatome pairs spoken by 5 male and 5 female speakers with balanced representations of the conditions 'same speaker' and 'different speaker'. Ten adult normal-hearing listeners and 12 adult postlingually deafened CI users were included in a study to evaluate the suitability of the test. The mean speaker discrimination score for the CI users was 67.3% correct and for the normal-hearing listeners 92.2% correct. A significant influence of voice gender and fundamental frequency difference on the speaker discrimination score was found in CI users as well as in normal-hearing listeners. Since the test results of the CI users were significantly above chance level and no ceiling effect was observed, we conclude that subsets of the OLLO corpus are very well suited to speaker discrimination experiments in CI users. Copyright 2008 S. Karger AG, Basel.
Speech transformations based on a sinusoidal representation

NASA Astrophysics Data System (ADS)

Quatieri, T. E.; McAulay, R. J.

1986-05-01

A new speech analysis/synthesis technique is presented which provides the basis for a general class of speech transformation including time-scale modification, frequency scaling, and pitch modification. These modifications can be performed with a time-varying change, permitting continuous adjustment of a speaker's fundamental frequency and rate of articulation. The method is based on a sinusoidal representation of the speech production mechanism that has been shown to produce synthetic speech that preserves the waveform shape and is essentially perceptually indistinguishable from the original. Although the analysis/synthesis system originally was designed for single-speaker signals, it is equally capable of recovering and modifying nonspeech signals such as music; multiple speakers, marine biologic sounds, and speakers in the presence of interferences such as noise and musical backgrounds.
Gender differences in identifying emotions from auditory and visual stimuli.

PubMed

Waaramaa, Teija

2017-12-01

The present study focused on gender differences in emotion identification from auditory and visual stimuli produced by two male and two female actors. Differences in emotion identification from nonsense samples, language samples and prolonged vowels were investigated. It was also studied whether auditory stimuli can convey the emotional content of speech without visual stimuli, and whether visual stimuli can convey the emotional content of speech without auditory stimuli. The aim was to get a better knowledge of vocal attributes and a more holistic understanding of the nonverbal communication of emotion. Females tended to be more accurate in emotion identification than males. Voice quality parameters played a role in emotion identification in both genders. The emotional content of the samples was best conveyed by nonsense sentences, better than by prolonged vowels or shared native language of the speakers and participants. Thus, vocal non-verbal communication tends to affect the interpretation of emotion even in the absence of language. The emotional stimuli were better recognized from visual stimuli than auditory stimuli by both genders. Visual information about speech may not be connected to the language; instead, it may be based on the human ability to understand the kinetic movements in speech production more readily than the characteristics of the acoustic cues.

Center for Neural Engineering: applications of pulse-coupled neural networks

NASA Astrophysics Data System (ADS)

Malkani, Mohan; Bodruzzaman, Mohammad; Johnson, John L.; Davis, Joel

1999-03-01

Pulsed-Coupled Neural Network (PCNN) is an oscillatory model neural network where grouping of cells and grouping among the groups that form the output time series (number of cells that fires in each input presentation also called `icon'). This is based on the synchronicity of oscillations. Recent work by Johnson and others demonstrated the functional capabilities of networks containing such elements for invariant feature extraction using intensity maps. PCNN thus presents itself as a more biologically plausible model with solid functional potential. This paper will present the summary of several projects and their results where we successfully applied PCNN. In project one, the PCNN was applied for object recognition and classification through a robotic vision system. The features (icons) generated by the PCNN were then fed into a feedforward neural network for classification. In project two, we developed techniques for sensory data fusion. The PCNN algorithm was implemented and tested on a B14 mobile robot. The PCNN-based features were extracted from the images taken from the robot vision system and used in conjunction with the map generated by data fusion of the sonar and wheel encoder data for the navigation of the mobile robot. In our third project, we applied the PCNN for speaker recognition. The spectrogram image of speech signals are fed into the PCNN to produce invariant feature icons which are then fed into a feedforward neural network for speaker identification.
Kalman Filters for Time Delay of Arrival-Based Source Localization

NASA Astrophysics Data System (ADS)

Klee, Ulrich; Gehrig, Tobias; McDonough, John

2006-12-01

In this work, we propose an algorithm for acoustic source localization based on time delay of arrival (TDOA) estimation. In earlier work by other authors, an initial closed-form approximation was first used to estimate the true position of the speaker followed by a Kalman filtering stage to smooth the time series of estimates. In the proposed algorithm, this closed-form approximation is eliminated by employing a Kalman filter to directly update the speaker's position estimate based on the observed TDOAs. In particular, the TDOAs comprise the observation associated with an extended Kalman filter whose state corresponds to the speaker's position. We tested our algorithm on a data set consisting of seminars held by actual speakers. Our experiments revealed that the proposed algorithm provides source localization accuracy superior to the standard spherical and linear intersection techniques. Moreover, the proposed algorithm, although relying on an iterative optimization scheme, proved efficient enough for real-time operation.
Compound nouns in spoken language production by speakers with aphasia compared to neurologically healthy speakers: an exploratory study.

PubMed

Eiesland, Eli Anne; Lind, Marianne

2012-03-01

Compounds are words that are made up of at least two other words (lexemes), featuring lexical and syntactic characteristics and thus particularly interesting for the study of language processing. Most studies of compounds and language processing have been based on data from experimental single word production and comprehension tasks. To enhance the ecological validity of morphological processing research, data from other contexts, such as discourse production, need to be considered. This study investigates the production of nominal compounds in semi-spontaneous spoken texts by a group of speakers with fluent types of aphasia compared to a group of neurologically healthy speakers. The speakers with aphasia produce significantly fewer nominal compound types in their texts than the non-aphasic speakers, and the compounds they produce exhibit fewer different types of semantic relations than the compounds produced by the non-aphasic speakers. The results are discussed in relation to theories of language processing.
Analysis of Acoustic Features in Speakers with Cognitive Disorders and Speech Impairments

NASA Astrophysics Data System (ADS)

Saz, Oscar; Simón, Javier; Rodríguez, W. Ricardo; Lleida, Eduardo; Vaquero, Carlos

2009-12-01

This work presents the results in the analysis of the acoustic features (formants and the three suprasegmental features: tone, intensity and duration) of the vowel production in a group of 14 young speakers suffering different kinds of speech impairments due to physical and cognitive disorders. A corpus with unimpaired children's speech is used to determine the reference values for these features in speakers without any kind of speech impairment within the same domain of the impaired speakers; this is 57 isolated words. The signal processing to extract the formant and pitch values is based on a Linear Prediction Coefficients (LPCs) analysis of the segments considered as vowels in a Hidden Markov Model (HMM) based Viterbi forced alignment. Intensity and duration are also based in the outcome of the automated segmentation. As main conclusion of the work, it is shown that intelligibility of the vowel production is lowered in impaired speakers even when the vowel is perceived as correct by human labelers. The decrease in intelligibility is due to a 30% of increase in confusability in the formants map, a reduction of 50% in the discriminative power in energy between stressed and unstressed vowels and to a 50% increase of the standard deviation in the length of the vowels. On the other hand, impaired speakers keep good control of tone in the production of stressed and unstressed vowels.
Segmentation of the Speaker's Face Region with Audiovisual Correlation

NASA Astrophysics Data System (ADS)

Liu, Yuyu; Sato, Yoichi

The ability to find the speaker's face region in a video is useful for various applications. In this work, we develop a novel technique to find this region within different time windows, which is robust against the changes of view, scale, and background. The main thrust of our technique is to integrate audiovisual correlation analysis into a video segmentation framework. We analyze the audiovisual correlation locally by computing quadratic mutual information between our audiovisual features. The computation of quadratic mutual information is based on the probability density functions estimated by kernel density estimation with adaptive kernel bandwidth. The results of this audiovisual correlation analysis are incorporated into graph cut-based video segmentation to resolve a globally optimum extraction of the speaker's face region. The setting of any heuristic threshold in this segmentation is avoided by learning the correlation distributions of speaker and background by expectation maximization. Experimental results demonstrate that our method can detect the speaker's face region accurately and robustly for different views, scales, and backgrounds.
Phraseology and Frequency of Occurrence on the Web: Native Speakers' Perceptions of Google-Informed Second Language Writing

ERIC Educational Resources Information Center

Geluso, Joe

2013-01-01

Usage-based theories of language learning suggest that native speakers of a language are acutely aware of formulaic language due in large part to frequency effects. Corpora and data-driven learning can offer useful insights into frequent patterns of naturally occurring language to second/foreign language learners who, unlike native speakers, are…
Automatic speech recognition and training for severely dysarthric users of assistive technology: the STARDUST project.

PubMed

Parker, Mark; Cunningham, Stuart; Enderby, Pam; Hawley, Mark; Green, Phil

2006-01-01

The STARDUST project developed robust computer speech recognizers for use by eight people with severe dysarthria and concomitant physical disability to access assistive technologies. Independent computer speech recognizers trained with normal speech are of limited functional use by those with severe dysarthria due to limited and inconsistent proximity to "normal" articulatory patterns. Severe dysarthric output may also be characterized by a small mass of distinguishable phonetic tokens making the acoustic differentiation of target words difficult. Speaker dependent computer speech recognition using Hidden Markov Models was achieved by the identification of robust phonetic elements within the individual speaker output patterns. A new system of speech training using computer generated visual and auditory feedback reduced the inconsistent production of key phonetic tokens over time.
A fundamental residue pitch perception bias for tone language speakers

NASA Astrophysics Data System (ADS)

Petitti, Elizabeth

A complex tone composed of only higher-order harmonics typically elicits a pitch percept equivalent to the tone's missing fundamental frequency (f0). When judging the direction of residue pitch change between two such tones, however, listeners may have completely opposite perceptual experiences depending on whether they are biased to perceive changes based on the overall spectrum or the missing f0 (harmonic spacing). Individual differences in residue pitch change judgments are reliable and have been associated with musical experience and functional neuroanatomy. Tone languages put greater pitch processing demands on their speakers than non-tone languages, and we investigated whether these lifelong differences in linguistic pitch processing affect listeners' bias for residue pitch. We asked native tone language speakers and native English speakers to perform a pitch judgment task for two tones with missing fundamental frequencies. Given tone pairs with ambiguous pitch changes, listeners were asked to judge the direction of pitch change, where the direction of their response indicated whether they attended to the overall spectrum (exhibiting a spectral bias) or the missing f0 (exhibiting a fundamental bias). We found that tone language speakers are significantly more likely to perceive pitch changes based on the missing f0 than English speakers. These results suggest that tone-language speakers' privileged experience with linguistic pitch fundamentally tunes their basic auditory processing.
The Mechanism of Speech Processing in Congenital Amusia: Evidence from Mandarin Speakers

PubMed Central

Liu, Fang; Jiang, Cunmei; Thompson, William Forde; Xu, Yi; Yang, Yufang; Stewart, Lauren

2012-01-01

Congenital amusia is a neuro-developmental disorder of pitch perception that causes severe problems with music processing but only subtle difficulties in speech processing. This study investigated speech processing in a group of Mandarin speakers with congenital amusia. Thirteen Mandarin amusics and thirteen matched controls participated in a set of tone and intonation perception tasks and two pitch threshold tasks. Compared with controls, amusics showed impaired performance on word discrimination in natural speech and their gliding tone analogs. They also performed worse than controls on discriminating gliding tone sequences derived from statements and questions, and showed elevated thresholds for pitch change detection and pitch direction discrimination. However, they performed as well as controls on word identification, and on statement-question identification and discrimination in natural speech. Overall, tasks that involved multiple acoustic cues to communicative meaning were not impacted by amusia. Only when the tasks relied mainly on pitch sensitivity did amusics show impaired performance compared to controls. These findings help explain why amusia only affects speech processing in subtle ways. Further studies on a larger sample of Mandarin amusics and on amusics of other language backgrounds are needed to consolidate these results. PMID:22347374
The mechanism of speech processing in congenital amusia: evidence from Mandarin speakers.

PubMed

Liu, Fang; Jiang, Cunmei; Thompson, William Forde; Xu, Yi; Yang, Yufang; Stewart, Lauren

2012-01-01

Congenital amusia is a neuro-developmental disorder of pitch perception that causes severe problems with music processing but only subtle difficulties in speech processing. This study investigated speech processing in a group of Mandarin speakers with congenital amusia. Thirteen Mandarin amusics and thirteen matched controls participated in a set of tone and intonation perception tasks and two pitch threshold tasks. Compared with controls, amusics showed impaired performance on word discrimination in natural speech and their gliding tone analogs. They also performed worse than controls on discriminating gliding tone sequences derived from statements and questions, and showed elevated thresholds for pitch change detection and pitch direction discrimination. However, they performed as well as controls on word identification, and on statement-question identification and discrimination in natural speech. Overall, tasks that involved multiple acoustic cues to communicative meaning were not impacted by amusia. Only when the tasks relied mainly on pitch sensitivity did amusics show impaired performance compared to controls. These findings help explain why amusia only affects speech processing in subtle ways. Further studies on a larger sample of Mandarin amusics and on amusics of other language backgrounds are needed to consolidate these results.
Voice recognition through phonetic features with Punjabi utterances

NASA Astrophysics Data System (ADS)

Kaur, Jasdeep; Juglan, K. C.; Sharma, Vishal; Upadhyay, R. K.

2017-07-01

This paper deals with perception and disorders of speech in view of Punjabi language. Visualizing the importance of voice identification, various parameters of speaker identification has been studied. The speech material was recorded with a tape recorder in their normal and disguised mode of utterances. Out of the recorded speech materials, the utterances free from noise, etc were selected for their auditory and acoustic spectrographic analysis. The comparison of normal and disguised speech of seven subjects is reported. The fundamental frequency (F0) at similar places, Plosive duration at certain phoneme, Amplitude ratio (A1:A2) etc. were compared in normal and disguised speech. It was found that the formant frequency of normal and disguised speech remains almost similar only if it is compared at the position of same vowel quality and quantity. If the vowel is more closed or more open in the disguised utterance the formant frequency will be changed in comparison to normal utterance. The ratio of the amplitude (A1: A2) is found to be speaker dependent. It remains unchanged in the disguised utterance. However, this value may shift in disguised utterance if cross sectioning is not done at the same location.
Uphill and Downhill in a Flat World: The Conceptual Topography of the Yupno House

ERIC Educational Resources Information Center

Cooperrider, Kensy; Slotta, James; Núñez, Rafael

2017-01-01

Speakers of many languages around the world rely on body-based contrasts (e.g., "left/right") for spatial communication and cognition. Speakers of Yupno, a language of Papua New Guinea's mountainous interior, rely instead on an environment-based "uphill/downhill" contrast. Body-based contrasts are as easy to use indoors as…
Talker and accent variability effects on spoken word recognition

NASA Astrophysics Data System (ADS)

Nyang, Edna E.; Rogers, Catherine L.; Nishi, Kanae

2003-04-01

A number of studies have shown that words in a list are recognized less accurately in noise and with longer response latencies when they are spoken by multiple talkers, rather than a single talker. These results have been interpreted as support for an exemplar-based model of speech perception, in which it is assumed that detailed information regarding the speaker's voice is preserved in memory and used in recognition, rather than being eliminated via normalization. In the present study, the effects of varying both accent and talker are investigated using lists of words spoken by (a) a single native English speaker, (b) six native English speakers, (c) three native English speakers and three Japanese-accented English speakers. Twelve /hVd/ words were mixed with multi-speaker babble at three signal-to-noise ratios (+10, +5, and 0 dB) to create the word lists. Native English-speaking listeners' percent-correct recognition for words produced by native English speakers across the three talker conditions (single talker native, multi-talker native, and multi-talker mixed native and non-native) and three signal-to-noise ratios will be compared to determine whether sources of speaker variability other than voice alone add to the processing demands imposed by simple (i.e., single accent) speaker variability in spoken word recognition.
Physiological Indices of Bilingualism: Oral-Motor Coordination and Speech Rate in Bengali-English Speakers

ERIC Educational Resources Information Center

Chakraborty, Rahul; Goffman, Lisa; Smith, Anne

2008-01-01

Purpose: To examine how age of immersion and proficiency in a 2nd language influence speech movement variability and speaking rate in both a 1st language and a 2nd language. Method: A group of 21 Bengali-English bilingual speakers participated. Lip and jaw movements were recorded. For all 21 speakers, lip movement variability was assessed based on…
An Analysis of Speech Disfluencies of Turkish Speakers Based on Age Variable

ERIC Educational Resources Information Center

Altiparmak, Ayse; Kuruoglu, Gülmira

2018-01-01

The focus of this research is to verify the influence of the age variable on fluent Turkish native speakers' production of the various types of speech disfluencies. To accomplish this, four groups of native speakers of Turkish between ages 4-8, 18-23, 33-50 years respectively and those over 50-years-old were constructed. A total of 84 participants…
Language and thought in Bilinguals: The Case of Grammatical Number and Nonverbal Classification Preferences

ERIC Educational Resources Information Center

Athanasopoulos, Panos; Kasai, Chise

2008-01-01

Recent research shows that speakers of languages with obligatory plural marking (English) preferentially categorize objects based on common shape, whereas speakers of nonplural-marking classifier languages (Yucatec and Japanese) preferentially categorize objects based on common material. The current study extends that investigation to the domain…
Discrimination of speaker sex and size when glottal-pulse rate and vocal-tract length are controlled.

PubMed

Smith, David R R; Walters, Thomas C; Patterson, Roy D

2007-12-01

A recent study [Smith and Patterson, J. Acoust. Soc. Am. 118, 3177-3186 (2005)] demonstrated that both the glottal-pulse rate (GPR) and the vocal-tract length (VTL) of vowel sounds have a large effect on the perceived sex and age (or size) of a speaker. The vowels for all of the "different" speakers in that study were synthesized from recordings of the sustained vowels of one, adult male speaker. This paper presents a follow-up study in which a range of vowels were synthesized from recordings of four different speakers--an adult man, an adult woman, a young boy, and a young girl--to determine whether the sex and age of the original speaker would have an effect upon listeners' judgments of whether a vowel was spoken by a man, woman, boy, or girl, after they were equated for GPR and VTL. The sustained vowels of the four speakers were scaled to produce the same combinations of GPR and VTL, which covered the entire range normally encountered in every day life. The results show that listeners readily distinguish children from adults based on their sustained vowels but that they struggle to distinguish the sex of the speaker.
Patterns of lung volume use during an extemporaneous speech task in persons with Parkinson disease.

PubMed

Bunton, Kate

2005-01-01

This study examined patterns of lung volume use in speakers with Parkinson disease (PD) during an extemporaneous speaking task. The performance of a control group was also examined. Behaviors described are based on acoustic, kinematic and linguistic measures. Group differences were found in breath group duration, lung volume initiation, and lung volume termination measures. Speakers in the control group alternated between a longer and shorter breath groups. With starting lung volumes being higher for the longer breath groups and lower for shorter breath groups. Speech production was terminated before reaching tidal end expiratory level. This pattern was also seen in 4 of 7 speakers with PD. The remaining 3 PD speakers initiated speech at low starting lung volumes and continued speaking below EEL. This subgroup of PD speakers ended breath groups at agrammatical boundaries, whereas control speakers ended at appropriate grammatical boundaries. As a result of participating in this exercise, the reader will (1) be able to describe the patterns of lung volume use in speakers with Parkinson disease and compare them with those employed by control speakers; and (2) obtain information about the influence of speaking task on speech breathing.
Degraded Vowel Acoustics and the Perceptual Consequences in Dysarthria

NASA Astrophysics Data System (ADS)

Lansford, Kaitlin L.

Distorted vowel production is a hallmark characteristic of dysarthric speech, irrespective of the underlying neurological condition or dysarthria diagnosis. A variety of acoustic metrics have been used to study the nature of vowel production deficits in dysarthria; however, not all demonstrate sensitivity to the exhibited deficits. Less attention has been paid to quantifying the vowel production deficits associated with the specific dysarthrias. Attempts to characterize the relationship between naturally degraded vowel production in dysarthria with overall intelligibility have met with mixed results, leading some to question the nature of this relationship. It has been suggested that aberrant vowel acoustics may be an index of overall severity of the impairment and not an "integral component" of the intelligibility deficit. A limitation of previous work detailing perceptual consequences of disordered vowel acoustics is that overall intelligibility, not vowel identification accuracy, has been the perceptual measure of interest. A series of three experiments were conducted to address the problems outlined herein. The goals of the first experiment were to identify subsets of vowel metrics that reliably distinguish speakers with dysarthria from non-disordered speakers and differentiate the dysarthria subtypes. Vowel metrics that capture vowel centralization and reduced spectral distinctiveness among vowels differentiated dysarthric from non-disordered speakers. Vowel metrics generally failed to differentiate speakers according to their dysarthria diagnosis. The second and third experiments were conducted to evaluate the relationship between degraded vowel acoustics and the resulting percept. In the second experiment, correlation and regression analyses revealed vowel metrics that capture vowel centralization and distinctiveness and movement of the second formant frequency were most predictive of vowel identification accuracy and overall intelligibility. The third experiment was conducted to evaluate the extent to which the nature of the acoustic degradation predicts the resulting percept. Results suggest distinctive vowel tokens are better identified and, likewise, better-identified tokens are more distinctive. Further, an above-chance level agreement between nature of vowel misclassification and misidentification errors was demonstrated for all vowels, suggesting degraded vowel acoustics are not merely an index of severity in dysarthria, but rather are an integral component of the resultant intelligibility disorder.
Non-Native Speaker Interaction Management Strategies in a Network-Based Virtual Environment

ERIC Educational Resources Information Center

Peterson, Mark

2008-01-01

This article investigates the dyad-based communication of two groups of non-native speakers (NNSs) of English involved in real time interaction in a type of text-based computer-mediated communication (CMC) tool known as a MOO. The object of this semester long study was to examine the ways in which the subjects managed their L2 interaction during…

Social dominance orientation, nonnative accents, and hiring recommendations.

PubMed

Hansen, Karolina; Dovidio, John F

2016-10-01

Discrimination against nonnative speakers is widespread and largely socially acceptable. Nonnative speakers are evaluated negatively because accent is a sign that they belong to an outgroup and because understanding their speech requires unusual effort from listeners. The present research investigated intergroup bias, based on stronger support for hierarchical relations between groups (social dominance orientation [SDO]), as a predictor of hiring recommendations of nonnative speakers. In an online experiment using an adaptation of the thin-slices methodology, 65 U.S. adults (54% women; 80% White; Mage = 35.91, range = 18-67) heard a recording of a job applicant speaking with an Asian (Mandarin Chinese) or a Latino (Spanish) accent. Participants indicated how likely they would be to recommend hiring the speaker, answered questions about the text, and indicated how difficult it was to understand the applicant. Independent of objective comprehension, participants high in SDO reported that it was more difficult to understand a Latino speaker than an Asian speaker. SDO predicted hiring recommendations of the speakers, but this relationship was mediated by the perception that nonnative speakers were difficult to understand. This effect was stronger for speakers from lower status groups (Latinos relative to Asians) and was not related to objective comprehension. These findings suggest a cycle of prejudice toward nonnative speakers: Not only do perceptions of difficulty in understanding cause prejudice toward them, but also prejudice toward low-status groups can lead to perceived difficulty in understanding members of these groups. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Speaker diarization system on the 2007 NIST rich transcription meeting recognition evaluation

NASA Astrophysics Data System (ADS)

Sun, Hanwu; Nwe, Tin Lay; Koh, Eugene Chin Wei; Bin, Ma; Li, Haizhou

2007-09-01

This paper presents a speaker diarization system developed at the Institute for Infocomm Research (I2R) for NIST Rich Transcription 2007 (RT-07) evaluation task. We describe in details our primary approaches for the speaker diarization on the Multiple Distant Microphones (MDM) conditions in conference room scenario. Our proposed system consists of six modules: 1). Least-mean squared (NLMS) adaptive filter for the speaker direction estimate via Time Difference of Arrival (TDOA), 2). An initial speaker clustering via two-stage TDOA histogram distribution quantization approach, 3). Multiple microphone speaker data alignment via GCC-PHAT Time Delay Estimate (TDE) among all the distant microphone channel signals, 4). A speaker clustering algorithm based on GMM modeling approach, 5). Non-speech removal via speech/non-speech verification mechanism and, 6). Silence removal via "Double-Layer Windowing"(DLW) method. We achieves error rate of 31.02% on the 2006 Spring (RT-06s) MDM evaluation task and a competitive overall error rate of 15.32% for the NIST Rich Transcription 2007 (RT-07) MDM evaluation task.
Vocal Identity Recognition in Autism Spectrum Disorder

PubMed Central

Lin, I-Fan; Yamada, Takashi; Komine, Yoko; Kato, Nobumasa; Kato, Masaharu; Kashino, Makio

2015-01-01

Voices can convey information about a speaker. When forming an abstract representation of a speaker, it is important to extract relevant features from acoustic signals that are invariant to the modulation of these signals. This study investigated the way in which individuals with autism spectrum disorder (ASD) recognize and memorize vocal identity. The ASD group and control group performed similarly in a task when asked to choose the name of the newly-learned speaker based on his or her voice, and the ASD group outperformed the control group in a subsequent familiarity test when asked to discriminate the previously trained voices and untrained voices. These findings suggest that individuals with ASD recognized and memorized voices as well as the neurotypical individuals did, but they categorized voices in a different way: individuals with ASD categorized voices quantitatively based on the exact acoustic features, while neurotypical individuals categorized voices qualitatively based on the acoustic patterns correlated to the speakers' physical and mental properties. PMID:26070199
Vocal Identity Recognition in Autism Spectrum Disorder.

PubMed

Lin, I-Fan; Yamada, Takashi; Komine, Yoko; Kato, Nobumasa; Kato, Masaharu; Kashino, Makio

2015-01-01

Voices can convey information about a speaker. When forming an abstract representation of a speaker, it is important to extract relevant features from acoustic signals that are invariant to the modulation of these signals. This study investigated the way in which individuals with autism spectrum disorder (ASD) recognize and memorize vocal identity. The ASD group and control group performed similarly in a task when asked to choose the name of the newly-learned speaker based on his or her voice, and the ASD group outperformed the control group in a subsequent familiarity test when asked to discriminate the previously trained voices and untrained voices. These findings suggest that individuals with ASD recognized and memorized voices as well as the neurotypical individuals did, but they categorized voices in a different way: individuals with ASD categorized voices quantitatively based on the exact acoustic features, while neurotypical individuals categorized voices qualitatively based on the acoustic patterns correlated to the speakers' physical and mental properties.
Parallel Processing of Large Scale Microphone Arrays for Sound Capture

NASA Astrophysics Data System (ADS)

Jan, Ea-Ee.

1995-01-01

Performance of microphone sound pick up is degraded by deleterious properties of the acoustic environment, such as multipath distortion (reverberation) and ambient noise. The degradation becomes more prominent in a teleconferencing environment in which the microphone is positioned far away from the speaker. Besides, the ideal teleconference should feel as easy and natural as face-to-face communication with another person. This suggests hands-free sound capture with no tether or encumbrance by hand-held or body-worn sound equipment. Microphone arrays for this application represent an appropriate approach. This research develops new microphone array and signal processing techniques for high quality hands-free sound capture in noisy, reverberant enclosures. The new techniques combine matched-filtering of individual sensors and parallel processing to provide acute spatial volume selectivity which is capable of mitigating the deleterious effects of noise interference and multipath distortion. The new method outperforms traditional delay-and-sum beamformers which provide only directional spatial selectivity. The research additionally explores truncated matched-filtering and random distribution of transducers to reduce complexity and improve sound capture quality. All designs are first established by computer simulation of array performance in reverberant enclosures. The simulation is achieved by a room model which can efficiently calculate the acoustic multipath in a rectangular enclosure up to a prescribed order of images. It also calculates the incident angle of the arriving signal. Experimental arrays were constructed and their performance was measured in real rooms. Real room data were collected in a hard-walled laboratory and a controllable variable acoustics enclosure of similar size, approximately 6 x 6 x 3 m. An extensive speech database was also collected in these two enclosures for future research on microphone arrays. The simulation results are shown to be consistent with the real room data. Localization of sound sources has been explored using cross-power spectrum time delay estimation and has been evaluated using real room data under slightly, moderately and highly reverberant conditions. To improve the accuracy and reliability of the source localization, an outlier detector that removes incorrect time delay estimation has been invented. To provide speaker selectivity for microphone array systems, a hands-free speaker identification system has been studied. A recently invented feature using selected spectrum information outperforms traditional recognition methods. Measured results demonstrate the capabilities of speaker selectivity from a matched-filtered array. In addition, simulation utilities, including matched -filtering processing of the array and hands-free speaker identification, have been implemented on the massively -parallel nCube super-computer. This parallel computation highlights the requirements for real-time processing of array signals.
Speech and Swallowing in Parkinson’s Disease

PubMed Central

Tjaden, Kris

2009-01-01

Dysarthria and dysphagia occur frequently in Parkinson’s disease (PD). Reduced speech intelligibility is a significant functional limitation of dysarthria, and in the case of PD is likely related articulatory and phonatory impairment. Prosodically-based treatments show the most promise for addressing these deficits as well as for maximizing speech intelligibility. Communication-oriented strategies also may help to enhance mutual understanding between a speaker and listener. Dysphagia in PD can result in serious health issues, including aspiration pneumonia, malnutrition, and dehydration. Early identification of swallowing abnormalities is critical so as to minimize the impact of dysphagia on health status and quality of life. Feeding modifications, compensatory strategies, and therapeutic swallowing techniques all have a role in the management of dysphagia in PD. PMID:19946386
Coupled Electro-Magneto-Mechanical-Acoustic Analysis Method Developed by Using 2D Finite Element Method for Flat Panel Speaker Driven by Magnetostrictive-Material-Based Actuator

NASA Astrophysics Data System (ADS)

Yoo, Byungjin; Hirata, Katsuhiro; Oonishi, Atsurou

In this study, a coupled analysis method for flat panel speakers driven by giant magnetostrictive material (GMM) based actuator was developed. The sound field produced by a flat panel speaker that is driven by a GMM actuator depends on the vibration of the flat panel, this vibration is a result of magnetostriction property of the GMM. In this case, to predict the sound pressure level (SPL) in the audio-frequency range, it is necessary to take into account not only the magnetostriction property of the GMM but also the effect of eddy current and the vibration characteristics of the actuator and the flat panel. In this paper, a coupled electromagnetic-structural-acoustic analysis method is presented; this method was developed by using the finite element method (FEM). This analysis method is used to predict the performance of a flat panel speaker in the audio-frequency range. The validity of the analysis method is verified by comparing with the measurement results of a prototype speaker.
You had me at "Hello": Rapid extraction of dialect information from spoken words.

PubMed

Scharinger, Mathias; Monahan, Philip J; Idsardi, William J

2011-06-15

Research on the neuronal underpinnings of speaker identity recognition has identified voice-selective areas in the human brain with evolutionary homologues in non-human primates who have comparable areas for processing species-specific calls. Most studies have focused on estimating the extent and location of these areas. In contrast, relatively few experiments have investigated the time-course of speaker identity, and in particular, dialect processing and identification by electro- or neuromagnetic means. We show here that dialect extraction occurs speaker-independently, pre-attentively and categorically. We used Standard American English and African-American English exemplars of 'Hello' in a magnetoencephalographic (MEG) Mismatch Negativity (MMN) experiment. The MMN as an automatic change detection response of the brain reflected dialect differences that were not entirely reducible to acoustic differences between the pronunciations of 'Hello'. Source analyses of the M100, an auditory evoked response to the vowels suggested additional processing in voice-selective areas whenever a dialect change was detected. These findings are not only relevant for the cognitive neuroscience of language, but also for the social sciences concerned with dialect and race perception. Copyright © 2011 Elsevier Inc. All rights reserved.
Development of equally intelligible Telugu sentence-lists to test speech recognition in noise.

PubMed

Tanniru, Kishore; Narne, Vijaya Kumar; Jain, Chandni; Konadath, Sreeraj; Singh, Niraj Kumar; Sreenivas, K J Ramadevi; K, Anusha

2017-09-01

To develop sentence lists in the Telugu language for the assessment of speech recognition threshold (SRT) in the presence of background noise through identification of the mean signal-to-noise ratio required to attain a 50% sentence recognition score (SRTn). This study was conducted in three phases. The first phase involved the selection and recording of Telugu sentences. In the second phase, 20 lists, each consisting of 10 sentences with equal intelligibility, were formulated using a numerical optimisation procedure. In the third phase, the SRTn of the developed lists was estimated using adaptive procedures on individuals with normal hearing. A total of 68 native Telugu speakers with normal hearing participated in the study. Of these, 18 (including the speakers) performed on various subjective measures in first phase, 20 performed on sentence/word recognition in noise for second phase and 30 participated in the list equivalency procedures in third phase. In all, 15 lists of comparable difficulty were formulated as test material. The mean SRTn across these lists corresponded to -2.74 (SD = 0.21). The developed sentence lists provided a valid and reliable tool to measure SRTn in Telugu native speakers.
Classifying Human Voices by Using Hybrid SFX Time-Series Preprocessing and Ensemble Feature Selection

PubMed Central

Wong, Raymond

2013-01-01

Voice biometrics is one kind of physiological characteristics whose voice is different for each individual person. Due to this uniqueness, voice classification has found useful applications in classifying speakers' gender, mother tongue or ethnicity (accent), emotion states, identity verification, verbal command control, and so forth. In this paper, we adopt a new preprocessing method named Statistical Feature Extraction (SFX) for extracting important features in training a classification model, based on piecewise transformation treating an audio waveform as a time-series. Using SFX we can faithfully remodel statistical characteristics of the time-series; together with spectral analysis, a substantial amount of features are extracted in combination. An ensemble is utilized in selecting only the influential features to be used in classification model induction. We focus on the comparison of effects of various popular data mining algorithms on multiple datasets. Our experiment consists of classification tests over four typical categories of human voice data, namely, Female and Male, Emotional Speech, Speaker Identification, and Language Recognition. The experiments yield encouraging results supporting the fact that heuristically choosing significant features from both time and frequency domains indeed produces better performance in voice classification than traditional signal processing techniques alone, like wavelets and LPC-to-CC. PMID:24288684
Fifty years of progress in speech and speaker recognition

NASA Astrophysics Data System (ADS)

Furui, Sadaoki

2004-10-01

Speech and speaker recognition technology has made very significant progress in the past 50 years. The progress can be summarized by the following changes: (1) from template matching to corpus-base statistical modeling, e.g., HMM and n-grams, (2) from filter bank/spectral resonance to Cepstral features (Cepstrum + DCepstrum + DDCepstrum), (3) from heuristic time-normalization to DTW/DP matching, (4) from gdistanceh-based to likelihood-based methods, (5) from maximum likelihood to discriminative approach, e.g., MCE/GPD and MMI, (6) from isolated word to continuous speech recognition, (7) from small vocabulary to large vocabulary recognition, (8) from context-independent units to context-dependent units for recognition, (9) from clean speech to noisy/telephone speech recognition, (10) from single speaker to speaker-independent/adaptive recognition, (11) from monologue to dialogue/conversation recognition, (12) from read speech to spontaneous speech recognition, (13) from recognition to understanding, (14) from single-modality (audio signal only) to multi-modal (audio/visual) speech recognition, (15) from hardware recognizer to software recognizer, and (16) from no commercial application to many practical commercial applications. Most of these advances have taken place in both the fields of speech recognition and speaker recognition. The majority of technological changes have been directed toward the purpose of increasing robustness of recognition, including many other additional important techniques not noted above.
Simulation of talking faces in the human brain improves auditory speech recognition

PubMed Central

von Kriegstein, Katharina; Dogan, Özgür; Grüter, Martina; Giraud, Anne-Lise; Kell, Christian A.; Grüter, Thomas; Kleinschmidt, Andreas; Kiebel, Stefan J.

2008-01-01

Human face-to-face communication is essentially audiovisual. Typically, people talk to us face-to-face, providing concurrent auditory and visual input. Understanding someone is easier when there is visual input, because visual cues like mouth and tongue movements provide complementary information about speech content. Here, we hypothesized that, even in the absence of visual input, the brain optimizes both auditory-only speech and speaker recognition by harvesting speaker-specific predictions and constraints from distinct visual face-processing areas. To test this hypothesis, we performed behavioral and neuroimaging experiments in two groups: subjects with a face recognition deficit (prosopagnosia) and matched controls. The results show that observing a specific person talking for 2 min improves subsequent auditory-only speech and speaker recognition for this person. In both prosopagnosics and controls, behavioral improvement in auditory-only speech recognition was based on an area typically involved in face-movement processing. Improvement in speaker recognition was only present in controls and was based on an area involved in face-identity processing. These findings challenge current unisensory models of speech processing, because they show that, in auditory-only speech, the brain exploits previously encoded audiovisual correlations to optimize communication. We suggest that this optimization is based on speaker-specific audiovisual internal models, which are used to simulate a talking face. PMID:18436648
Factor analysis of auto-associative neural networks with application in speaker verification.

PubMed

Garimella, Sri; Hermansky, Hynek

2013-04-01

Auto-associative neural network (AANN) is a fully connected feed-forward neural network, trained to reconstruct its input at its output through a hidden compression layer, which has fewer numbers of nodes than the dimensionality of input. AANNs are used to model speakers in speaker verification, where a speaker-specific AANN model is obtained by adapting (or retraining) the universal background model (UBM) AANN, an AANN trained on multiple held out speakers, using corresponding speaker data. When the amount of speaker data is limited, this adaptation procedure may lead to overfitting as all the parameters of UBM-AANN are adapted. In this paper, we introduce and develop the factor analysis theory of AANNs to alleviate this problem. We hypothesize that only the weight matrix connecting the last nonlinear hidden layer and the output layer is speaker-specific, and further restrict it to a common low-dimensional subspace during adaptation. The subspace is learned using large amounts of development data, and is held fixed during adaptation. Thus, only the coordinates in a subspace, also known as i-vector, need to be estimated using speaker-specific data. The update equations are derived for learning both the common low-dimensional subspace and the i-vectors corresponding to speakers in the subspace. The resultant i-vector representation is used as a feature for the probabilistic linear discriminant analysis model. The proposed system shows promising results on the NIST-08 speaker recognition evaluation (SRE), and yields a 23% relative improvement in equal error rate over the previously proposed weighted least squares-based subspace AANNs system. The experiments on NIST-10 SRE confirm that these improvements are consistent and generalize across datasets.
Sociological effects on vocal aging: Age related F0 effects in two languages

NASA Astrophysics Data System (ADS)

Nagao, Kyoko

2005-04-01

Listeners can estimate the age of a speaker fairly accurately from their speech (Ptacek and Sander, 1966). It is generally considered that this perception is based on physiologically determined aspects of the speech. However, the degree to which it is due to conventional sociolinguistic aspects of speech is unknown. The current study examines the degree to which fundamental frequency (F0) changes due to advanced aging across two language groups of speakers. It also examines the degree to which the speakers associate these changes with aging in a voice disguising task. Thirty native speakers each of English and Japanese, taken from three age groups, read a target phrase embedded in a carrier sentence in their native language. Each speaker also read the sentence pretending to be 20-years younger or 20-years older than their own age. Preliminary analysis of eighteen Japanese speakers indicates that the mean and maximum F0 values increase when the speakers pretended to be younger than when they pretended to be older. Some previous studies on age perception, however, suggested that F0 has minor effects on listeners' age estimation. The acoustic results will also be discussed in conjunction with the results of the listeners' age estimation of the speakers.
How Does Similarity-Based Interference Affect the Choice of Referring Expression?

ERIC Educational Resources Information Center

Fukumura, Kumiko; van Gompel, Roger P. G.; Harley, Trevor; Pickering, Martin J.

2011-01-01

We tested a cue-based retrieval model that predicts how similarity between discourse entities influences the speaker's choice of referring expressions. In Experiment 1, speakers produced fewer pronouns (relative to repeated noun phrases) when the competitor was in the same situation as the referent (both on a horse) rather than in a different…
A Rasch-Based Validation of the Vocabulary Size Test

ERIC Educational Resources Information Center

Beglar, David

2010-01-01

The primary purpose of this study was to provide preliminary validity evidence for a 140-item form of the Vocabulary Size Test, which is designed to measure written receptive knowledge of the first 14,000 words of English. Nineteen native speakers of English and 178 native speakers of Japanese participated in the study. Analyses based on the Rasch…
Arctic Visiting Speakers Series (AVS)

NASA Astrophysics Data System (ADS)

Fox, S. E.; Griswold, J.

2011-12-01

The Arctic Visiting Speakers (AVS) Series funds researchers and other arctic experts to travel and share their knowledge in communities where they might not otherwise connect. Speakers cover a wide range of arctic research topics and can address a variety of audiences including K-12 students, graduate and undergraduate students, and the general public. Host applications are accepted on an on-going basis, depending on funding availability. Applications need to be submitted at least 1 month prior to the expected tour dates. Interested hosts can choose speakers from an online Speakers Bureau or invite a speaker of their choice. Preference is given to individuals and organizations to host speakers that reach a broad audience and the general public. AVS tours are encouraged to span several days, allowing ample time for interactions with faculty, students, local media, and community members. Applications for both domestic and international visits will be considered. Applications for international visits should involve participation of more than one host organization and must include either a US-based speaker or a US-based organization. This is a small but important program that educates the public about Arctic issues. There have been 27 tours since 2007 that have impacted communities across the globe including: Gatineau, Quebec Canada; St. Petersburg, Russia; Piscataway, New Jersey; Cordova, Alaska; Nuuk, Greenland; Elizabethtown, Pennsylvania; Oslo, Norway; Inari, Finland; Borgarnes, Iceland; San Francisco, California and Wolcott, Vermont to name a few. Tours have included lectures to K-12 schools, college and university students, tribal organizations, Boy Scout troops, science center and museum patrons, and the general public. There are approximately 300 attendees enjoying each AVS tour, roughly 4100 people have been reached since 2007. The expectations for each tour are extremely manageable. Hosts must submit a schedule of events and a tour summary to be posted online. Hosts must acknowledge the National Science Foundation Office of Polar Programs and ARCUS in all promotional materials. Host agrees to send ARCUS photographs, fliers, and if possible a video of the main lecture. Host and speaker agree to collect data on the number of attendees in each audience to submit as part of a post-tour evaluation. The grants can generally cover all the expenses of a tour, depending on the location. A maximum of 2,000 will be provided for the travel related expenses of a speaker on a domestic visit. A maxiμm of 2,500 will be provided for the travel related expenses of a speaker on an international visit. Each speaker will receive an honorarium of $300.
Continuing Medical Education Speakers with High Evaluation Scores Use more Image-based Slides.

PubMed

Ferguson, Ian; Phillips, Andrew W; Lin, Michelle

2017-01-01

Although continuing medical education (CME) presentations are common across health professions, it is unknown whether slide design is independently associated with audience evaluations of the speaker. Based on the conceptual framework of Mayer's theory of multimedia learning, this study aimed to determine whether image use and text density in presentation slides are associated with overall speaker evaluations. This retrospective analysis of six sequential CME conferences (two annual emergency medicine conferences over a three-year period) used a mixed linear regression model to assess whether post-conference speaker evaluations were associated with image fraction (percentage of image-based slides per presentation) and text density (number of words per slide). A total of 105 unique lectures were given by 49 faculty members, and 1,222 evaluations (70.1% response rate) were available for analysis. On average, 47.4% (SD=25.36) of slides had at least one educationally-relevant image (image fraction). Image fraction significantly predicted overall higher evaluation scores [F(1, 100.676)=6.158, p=0.015] in the mixed linear regression model. The mean (SD) text density was 25.61 (8.14) words/slide but was not a significant predictor [F(1, 86.293)=0.55, p=0.815]. Of note, the individual speaker [χ 2 (1)=2.952, p=0.003] and speaker seniority [F(3, 59.713)=4.083, p=0.011] significantly predicted higher scores. This is the first published study to date assessing the linkage between slide design and CME speaker evaluations by an audience of practicing clinicians. The incorporation of images was associated with higher evaluation scores, in alignment with Mayer's theory of multimedia learning. Contrary to this theory, however, text density showed no significant association, suggesting that these scores may be multifactorial. Professional development efforts should focus on teaching best practices in both slide design and presentation skills.
Comparison of Magnetic Resonance Imaging-based vocal tract area functions obtained from the same speaker in 1994 and 2002

PubMed Central

Story, Brad H.

2008-01-01

A new set of area functions for vowels has been obtained with Magnetic Resonance Imaging (MRI) from the same speaker as that previously reported in 1996 [Story, Titze, & Hoffman, JASA, 100, 537–554 (1996)]. The new area functions were derived from image data collected in 2002, whereas the previously reported area functions were based on MR images obtained in 1994. When compared, the new area function sets indicated a tendency toward a constricted pharyngeal region and expanded oral cavity relative to the previous set. Based on calculated formant frequencies and sensitivity functions, these morphological differences were shown to have the primary acoustic effect of systematically shifting the second formant (F2) downward in frequency. Multiple instances of target vocal tract shapes from a specific speaker provide additional sampling of the possible area functions that may be produced during speech production. This may be of benefit for understanding intra-speaker variability in vowel production and for further development of speech synthesizers and speech models that utilize area function information. PMID:18177162
The impact of musical training and tone language experience on talker identification

PubMed Central

Xie, Xin; Myers, Emily

2015-01-01

Listeners can use pitch changes in speech to identify talkers. Individuals exhibit large variability in sensitivity to pitch and in accuracy perceiving talker identity. In particular, people who have musical training or long-term tone language use are found to have enhanced pitch perception. In the present study, the influence of pitch experience on talker identification was investigated as listeners identified talkers in native language as well as non-native languages. Experiment 1 was designed to explore the influence of pitch experience on talker identification in two groups of individuals with potential advantages for pitch processing: musicians and tone language speakers. Experiment 2 further investigated individual differences in pitch processing and the contribution to talker identification by testing a mediation model. Cumulatively, the results suggested that (a) musical training confers an advantage for talker identification, supporting a shared resources hypothesis regarding music and language and (b) linguistic use of lexical tones also increases accuracy in hearing talker identity. Importantly, these two types of hearing experience enhance talker identification by sharpening pitch perception skills in a domain-general manner. PMID:25618071

The impact of musical training and tone language experience on talker identification.

PubMed

Xie, Xin; Myers, Emily

2015-01-01

Listeners can use pitch changes in speech to identify talkers. Individuals exhibit large variability in sensitivity to pitch and in accuracy perceiving talker identity. In particular, people who have musical training or long-term tone language use are found to have enhanced pitch perception. In the present study, the influence of pitch experience on talker identification was investigated as listeners identified talkers in native language as well as non-native languages. Experiment 1 was designed to explore the influence of pitch experience on talker identification in two groups of individuals with potential advantages for pitch processing: musicians and tone language speakers. Experiment 2 further investigated individual differences in pitch processing and the contribution to talker identification by testing a mediation model. Cumulatively, the results suggested that (a) musical training confers an advantage for talker identification, supporting a shared resources hypothesis regarding music and language and (b) linguistic use of lexical tones also increases accuracy in hearing talker identity. Importantly, these two types of hearing experience enhance talker identification by sharpening pitch perception skills in a domain-general manner.
Infants' understanding of false labeling events: the referential roles of words and the speakers who use them.

PubMed

Koenig, Melissa A; Echols, Catharine H

2003-04-01

The four studies reported here examine whether 16-month-old infants' responses to true and false utterances interact with their knowledge of human agents. In Study 1, infants heard repeated instances either of true or false labeling of common objects; labels came from an active human speaker seated next to the infant. In Study 2, infants experienced the same stimuli and procedure; however, we replaced the human speaker of Study 1 with an audio speaker in the same location. In Study 3, labels came from a hidden audio speaker. In Study 4, a human speaker labeled the objects while facing away from them. In Study 1, infants looked significantly longer to the human agent when she falsely labeled than when she truthfully labeled the objects. Infants did not show a similar pattern of attention for the audio speaker of Study 2, the silent human of Study 3 or the facing-backward speaker of Study 4. In fact, infants who experienced truthful labeling looked significantly longer to the facing-backward labeler of Study 4 than to true labelers of the other three contexts. Additionally, infants were more likely to correct false labels when produced by the human labeler of Study 1 than in any of the other contexts. These findings suggest, first, that infants are developing a critical conception of other human speakers as truthful communicators, and second, that infants understand that human speakers may provide uniquely useful information when a word fails to match its referent. These findings are consistent with the view that infants can recognize differences in knowledge and that such differences can be based on differences in the availability of perceptual experience.
Robust Speech Processing & Recognition: Speaker ID, Language ID, Speech Recognition/Keyword Spotting, Diarization/Co-Channel/Environmental Characterization, Speaker State Assessment

DTIC Science & Technology

2015-10-01

Scoring, Gaussian Backend , etc.) as shown in Fig. 39. The methods in this domain also emphasized the ability to perform data purification for both...investigation using the same infrastructure was undertaken to explore Lombard effect “flavor” detection for improved speaker ID. The study The presence of...dimension selection and compared to a common N-gram frequency based selection. 2.1.2: Exploration on NN/DBN backend : Since Deep Neural Networks (DNN) have
Effects of low speed wind on the recognition/identification and pass-through communication tasks of auditory situation awareness afforded by military hearing protection/enhancement devices and tactical communication and protective systems.

PubMed

Lee, Kichol; Casali, John G

2016-01-01

To investigate the effect of controlled low-speed wind-noise on the auditory situation awareness performance afforded by military hearing protection/enhancement devices (HPED) and tactical communication and protective systems (TCAPS). Recognition/identification and pass-through communications tasks were separately conducted under three wind conditions (0, 5, and 10 mph). Subjects wore two in-ear-type TCAPS, one earmuff-type TCAPS, a Combat Arms Earplug in its 'open' or pass-through setting, and an EB-15LE electronic earplug. Devices with electronic gain systems were tested under two gain settings: 'unity' and 'max'. Testing without any device (open ear) was conducted as a control. Ten subjects were recruited from the student population at Virginia Tech. Audiometric requirements were 25 dBHL or better at 500, 1000, 2000, 4000, and 8000 Hz in both ears. Performance on the interaction of communication task-by-device was significantly different only in 0 mph wind speed. The between-device performance differences varied with azimuthal speaker locations. It is evident from this study that stable (non-gusting) wind speeds up to 10 mph did not significantly degrade recognition/identification task performance and pass-through communication performance of the group of HPEDs and TCAPS tested. However, the various devices performed differently as the test sound signal speaker location was varied and it appears that physical as well as electronic features may have contributed to this directional result.
Prediction of acoustic feature parameters using myoelectric signals.

PubMed

Lee, Ki-Seung

2010-07-01

It is well-known that a clear relationship exists between human voices and myoelectric signals (MESs) from the area of the speaker's mouth. In this study, we utilized this information to implement a speech synthesis scheme in which MES alone was used to predict the parameters characterizing the vocal-tract transfer function of specific speech signals. Several feature parameters derived from MES were investigated to find the optimal feature for maximization of the mutual information between the acoustic and the MES features. After the optimal feature was determined, an estimation rule for the acoustic parameters was proposed, based on a minimum mean square error (MMSE) criterion. In a preliminary study, 60 isolated words were used for both objective and subjective evaluations. The results showed that the average Euclidean distance between the original and predicted acoustic parameters was reduced by about 30% compared with the average Euclidean distance of the original parameters. The intelligibility of the synthesized speech signals using the predicted features was also evaluated. A word-level identification ratio of 65.5% and a syllable-level identification ratio of 73% were obtained through a listening test.
Evolving Spiking Neural Networks for Recognition of Aged Voices.

PubMed

Silva, Marco; Vellasco, Marley M B R; Cataldo, Edson

2017-01-01

The aging of the voice, known as presbyphonia, is a natural process that can cause great change in vocal quality of the individual. This is a relevant problem to those people who use their voices professionally, and its early identification can help determine a suitable treatment to avoid its progress or even to eliminate the problem. This work focuses on the development of a new model for the identification of aging voices (independently of their chronological age), using as input attributes parameters extracted from the voice and glottal signals. The proposed model, named Quantum binary-real evolving Spiking Neural Network (QbrSNN), is based on spiking neural networks (SNNs), with an unsupervised training algorithm, and a Quantum-Inspired Evolutionary Algorithm that automatically determines the most relevant attributes and the optimal parameters that configure the SNN. The QbrSNN model was evaluated in a database composed of 120 records, containing samples from three groups of speakers. The results obtained indicate that the proposed model provides better accuracy than other approaches, with fewer input attributes. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Applying Rasch model analysis in the development of the cantonese tone identification test (CANTIT).

PubMed

Lee, Kathy Y S; Lam, Joffee H S; Chan, Kit T Y; van Hasselt, Charles Andrew; Tong, Michael C F

2017-01-01

Applying Rasch analysis to evaluate the internal structure of a lexical tone perception test known as the Cantonese Tone Identification Test (CANTIT). A 75-item pool (CANTIT-75) with pictures and sound tracks was developed. Respondents were required to make a four-alternative forced choice on each item. A short version of 30 items (CANTIT-30) was developed based on fit statistics, difficulty estimates, and content evaluation. Internal structure was evaluated by fit statistics and Rasch Factor Analysis (RFA). 200 children with normal hearing and 141 children with hearing impairment were recruited. For CANTIT-75, all infit and 97% of outfit values were < 2.0. RFA revealed 40.1% of total variance was explained by the Rasch measure. The first residual component explained 2.5% of total variance in an eigenvalue of 3.1. For CANTIT-30, all infit and outfit values were < 2.0. The Rasch measure explained 38.8% of total variance, the first residual component explained 3.9% of total variance in an eigenvalue of 1.9. The Rasch model provides excellent guidance for the development of short forms. Both CANTIT-75 and CANTIT-30 possess satisfactory internal structure as a construct validity evidence in measuring the lexical tone identification ability of the Cantonese speakers.
Intonation as an encoder of speaker certainty: information and confirmation yes-no questions in Catalan.

PubMed

Vanrell, Maria del Mar; Mascaró, Ignasi; Torres-Tamarit, Francesc; Prieto, Pilar

2013-06-01

Recent studies in the field of intonational phonology have shown that information-seeking questions can be distinguished from confirmation-seeking questions by prosodic means in a variety of languages (Armstrong, 2010, for Puerto Rican Spanish; Grice & Savino, 1997, for Bari Italian; Kügler, 2003, for Leipzig German; Mata & Santos, 2010, for European Portuguese; Vanrell, Mascaró, Prieto, & Torres-Tamarit, 2010, for Catalan). However, all these studies have relied on production experiments and little is known about the perceptual relevance of these intonational cues. This paper explores whether Majorcan Catalan listeners distinguish information- and confirmation-seeking questions by means of two distinct nuclear falling pitch accents. Three behavioral tasks were conducted with 20 Majorcan Catalan subjects, namely a semantic congruity test, a rating test, and a classical categorical perception identification/discrimination test. The results show that a difference in pitch scaling on the leading H tone of the H+L* nuclear pitch accent is the main cue used by Majorcan Catalan listeners to distinguish confirmation questions from information-seeking questions. Thus, while a iH+L* pitch accent signals an information-seeking question (i.e., the speaker has no expectation about the nature of the answer), the H+L* pitch accent indicates that the speaker is asking about mutually shared information. We argue that these results have implications in representing the distinctions of tonal height in Catalan. The results also support the claim that phonological contrasts in intonation, together with other linguistic strategies, can signal the speakers' beliefs about the certainty of the proposition expressed.
Influences on preschool children's oral health-related quality of life as reported by English and Spanish-speaking parents and caregivers.

PubMed

Born, Catherine D; Divaris, Kimon; Zeldin, Leslie P; Rozier, R Gary

2016-09-01

This study examined young, preschool children's oral health-related quality of life (OHRQoL) among a community-based cohort of English and Spanish-speaking parent-child dyads in North Carolina, and sought to quantify the association of parent/caregiver characteristics, including spoken language, with OHRQoL impacts. Data from structured interviews with 1,111 parents of children aged 6-23 months enrolled in the Zero-Out Early Childhood Caries study in 2010-2012 were used. OHRQoL was measured using the overall score (range: 0-52) of the Early Childhood Oral Health Impact Scale (ECOHIS). We examined associations with parents' sociodemographic characteristics, spoken language, self-reported oral and general health, oral health knowledge, children's dental attendance, and dental care needs. Analyses included descriptive, bivariate, and multivariate methods based upon zero-inflated negative binomial regression. To determine differences between English and Spanish speakers, language-stratified model estimates were contrasted using homogeneity χ 2 tests. The mean overall ECOHIS score was 3.9 [95% confidence interval (CI) = 3.6-4.2]; 4.7 among English-speakers and 1.5 among Spanish speakers. In multivariate analyses, caregivers' education showed a positive association with OHRQoL impacts among Spanish speakers [prevalence ratio (PR) = 1.12 (95% CI = 1.03-1.22), for every added year of schooling], whereas caregivers' fair/poor oral health showed a positive association among English speakers (PR = 1.20; 95% CI = 1.02-1.41). The overall severity of ECOHIS impacts was low among this population-based sample of young, preschool children, and substantially lower among Spanish versus English speakers. Further studies are warranted to identify sources of these differences in - actual or reported - OHRQoL impacts. © 2016 American Association of Public Health Dentistry.
Perceptual Learning of Time-Compressed Speech: More than Rapid Adaptation

PubMed Central

Banai, Karen; Lavner, Yizhar

2012-01-01

Background Time-compressed speech, a form of rapidly presented speech, is harder to comprehend than natural speech, especially for non-native speakers. Although it is possible to adapt to time-compressed speech after a brief exposure, it is not known whether additional perceptual learning occurs with further practice. Here, we ask whether multiday training on time-compressed speech yields more learning than that observed during the initial adaptation phase and whether the pattern of generalization following successful learning is different than that observed with initial adaptation only. Methodology/Principal Findings Two groups of non-native Hebrew speakers were tested on five different conditions of time-compressed speech identification in two assessments conducted 10–14 days apart. Between those assessments, one group of listeners received five practice sessions on one of the time-compressed conditions. Between the two assessments, trained listeners improved significantly more than untrained listeners on the trained condition. Furthermore, the trained group generalized its learning to two untrained conditions in which different talkers presented the trained speech materials. In addition, when the performance of the non-native speakers was compared to that of a group of naïve native Hebrew speakers, performance of the trained group was equivalent to that of the native speakers on all conditions on which learning occurred, whereas performance of the untrained non-native listeners was substantially poorer. Conclusions/Significance Multiday training on time-compressed speech results in significantly more perceptual learning than brief adaptation. Compared to previous studies of adaptation, the training induced learning is more stimulus specific. Taken together, the perceptual learning of time-compressed speech appears to progress from an initial, rapid adaptation phase to a subsequent prolonged and more stimulus specific phase. These findings are consistent with the predictions of the Reverse Hierarchy Theory of perceptual learning and suggest constraints on the use of perceptual-learning regimens during second language acquisition. PMID:23056592
Phonetic complexity and stuttering in Spanish

PubMed Central

Howell, Peter; Au-Yeung, James

2007-01-01

The current study investigated whether phonetic complexity affected stuttering rate for Spanish speakers. The speakers were assigned to three age groups (6-11, 12-17 and 18 years plus) that were similar to those used in an earlier study on English. The analysis was performed using Jakielski's (1998) Index of Phonetic Complexity (IPC) scheme in which each word is given an IPC score based on the number of complex attributes it includes for each of eight factors. Stuttering on function words for Spanish did not correlate with IPC score for any age group. This mirrors the finding for English that stuttering on these words is not affected by phonetic complexity. The IPC scores of content words correlated positively with stuttering rate for 6-11 year old and adult speakers. Comparison was made between the languages to establish whether or not experience with the factors determines the problem they pose for speakers (revealed by differences in stuttering rate). Evidence was obtained that four factors found to be important determinants of stuttering on content words in English for speakers aged 12 and above, also affected Spanish speakers. This occurred despite large differences in frequency of usage of these factors. It is concluded that phonetic factors affect stuttering rate irrespective of a speaker's experience with that factor. PMID:17364620
Phonetic complexity and stuttering in Spanish.

PubMed

Howell, Peter; Au-Yeung, James

2007-02-01

The current study investigated whether phonetic complexity affected stuttering rate for Spanish speakers. The speakers were assigned to three age groups (6-11, 12-17 and 18-years plus) that were similar to those used in an earlier study on English. The analysis was performed using Jakielski's Index of Phonetic Complexity (IPC) scheme in which each word is given an IPC score based on the number of complex attributes it includes for each of eight factors. Stuttering on function words for Spanish did not correlate with IPC score for any age group. This mirrors the finding for English that stuttering on these words is not affected by phonetic complexity. The IPC scores of content words correlated positively with stuttering rate for 6-11-year-old and adult speakers. Comparison was made between the languages to establish whether or not experience with the factors determines the problem they pose for speakers (revealed by differences in stuttering rate). Evidence was obtained that four factors found to be important determinants of stuttering on content words in English for speakers aged 12 and above, also affected Spanish speakers. This occurred despite large differences in frequency of usage of these factors. It is concluded that phonetic factors affect stuttering rate irrespective of a speaker's experience with that factor.
24. AIRCONDITIONING DUCT, WINCH CONTROL BOX, AND SPEAKER AT STATION ...

Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

24. AIR-CONDITIONING DUCT, WINCH CONTROL BOX, AND SPEAKER AT STATION 85.5 OF MST. FOLDED-UP PLATFORM ON RIGHT OF PHOTO. - Vandenberg Air Force Base, Space Launch Complex 3, Launch Pad 3 East, Napa & Alden Roads, Lompoc, Santa Barbara County, CA
Educators Using Information Technology. GIS Video Series. [Videotape].

ERIC Educational Resources Information Center

A M Productions Inc., Vancouver (British Columbia).

This 57-minute videotape covers the "Florida Educators Using Information Technology" session of the "Eco-Informa '96" conference. Two speakers presented examples of environmental educators using information technology. The first speaker, Brenda Maxwell, is the Director and Developer of the Florida Science Institute based at…
Core values versus common sense: consequentialist views appear less rooted in morality.

PubMed

Kreps, Tamar A; Monin, Benoît

2014-11-01

When a speaker presents an opinion, an important factor in audiences' reactions is whether the speaker seems to be basing his or her decision on ethical (as opposed to more pragmatic) concerns. We argue that, despite a consequentialist philosophical tradition that views utilitarian consequences as the basis for moral reasoning, lay perceivers think that speakers using arguments based on consequences do not construe the issue as a moral one. Five experiments show that, for both political views (including real State of the Union quotations) and organizational policies, consequentialist views are seen to express less moralization than deontological views, and even sometimes than views presented with no explicit justification. We also demonstrate that perceived moralization in turn affects speakers' perceived commitment to the issue and authenticity. These findings shed light on lay conceptions of morality and have practical implications for people considering how to express moral opinions publicly. © 2014 by the Society for Personality and Social Psychology, Inc.
Analysis and Classification of Voice Pathologies Using Glottal Signal Parameters.

PubMed

Forero M, Leonardo A; Kohler, Manoela; Vellasco, Marley M B R; Cataldo, Edson

2016-09-01

The classification of voice diseases has many applications in health, in diseases treatment, and in the design of new medical equipment for helping doctors in diagnosing pathologies related to the voice. This work uses the parameters of the glottal signal to help the identification of two types of voice disorders related to the pathologies of the vocal folds: nodule and unilateral paralysis. The parameters of the glottal signal are obtained through a known inverse filtering method, and they are used as inputs to an Artificial Neural Network, a Support Vector Machine, and also to a Hidden Markov Model, to obtain the classification, and to compare the results, of the voice signals into three different groups: speakers with nodule in the vocal folds; speakers with unilateral paralysis of the vocal folds; and speakers with normal voices, that is, without nodule or unilateral paralysis present in the vocal folds. The database is composed of 248 voice recordings (signals of vowels production) containing samples corresponding to the three groups mentioned. In this study, a larger database was used for the classification when compared with similar studies, and its classification rate is superior to other studies, reaching 97.2%. Copyright © 2016 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Integrated Robust Open-Set Speaker Identification System (IROSIS)

DTIC Science & Technology

2012-05-01

29 LIST OF TABLES Table 1. Detail of NIST Data Used for Training and Testing ............................................ 3 Table 2...scenarios are referred to as VB-YB, VL-YL, VB-YL and VL-YB respectively. Table 1. Detail of NIST Data Used for Training and Testing Purpose Source No...M is the UBM supervector, and that the difference between ( )L m and ( , )Q M m is the Kullback - Leibler divergence between the “alignment” of the
A new droplet generator

NASA Technical Reports Server (NTRS)

Slack, W. E.

1982-01-01

A new droplet generator is described. A loud speaker driven extractor needle was immersed in a pendant drop. Pulsing the speaker extracted the needle forming a fluid ligament which will decay into a droplet. The droplets were sized by stroboscopic photographs. The droplet's size was changed by varying the amplitude of the speaker pulses and the extractor needle diameter. The mechanism of droplet formation is discussed and photographs of ligament decay are presented. The droplet generator worked well on both oil and water based pesticide formulations. Current applications and results are discussed.
Intonational Phrasing Is Constrained by Meaning, Not Balance

ERIC Educational Resources Information Center

Breen, Mara; Watson, Duane G.; Gibson, Edward

2011-01-01

This paper evaluates two classes of hypotheses about how people prosodically segment utterances: (1) meaning-based proposals, with a focus on Watson and Gibson's (2004) proposal, according to which speakers tend to produce boundaries before and after long constituents; and (2) balancing proposals, according to which speakers tend to produce…
Listening: The Second Speaker.

ERIC Educational Resources Information Center

Erway, Ella Anderson

1972-01-01

Scholars agree that listening is an active rather than a passive process. The listening which makes people achieve higher scores on current listening tests is "second speaker" listening or active participation in the encoding of the message. Most of the instructional suggestions in listening curriculum guides are based on this concept. In terms of…

Classification and Counter-Classification of Language on Saint Barthelemy.

ERIC Educational Resources Information Center

Pressman, Jon F.

1998-01-01

Analyzes the use of metapragmatic description in the ethnoclassification of language by native speakers on the Franco-Antillean island of Saint Barthelemy. A prevalent technique for metapragmatic description based on honorific pronouns that reflects the varied geolinguistic and generational attributes of the speakers is described. (Author/MSE)
Collaborative Dialogue in Learner-Learner and Learner-Native Speaker Interaction

ERIC Educational Resources Information Center

Dobao, Ana Fernandez

2012-01-01

This study analyses intermediate and advanced learner-learner and learner-native speaker (NS) interaction looking for collaborative dialogue. It investigates how the presence of a NS interlocutor affects the frequency and nature of lexical language-related episodes (LREs) spontaneously generated during task-based interaction. Twenty-four learners…
An Integrated Approach to ESL Teaching.

ERIC Educational Resources Information Center

De Luca, Rosemary J.

A University of Waikato (New Zealand) course in English for academic purposes is described. The credit course was originally designed for native English-speaking students to address their academic writing needs. However, based on the idea that the writing tasks of native speakers and non-native speakers are similar and that their writing…
Children's Sociolinguistic Evaluations of Nice Foreigners and Mean Americans

ERIC Educational Resources Information Center

Kinzler, Katherine D.; DeJesus, Jasmine M.

2013-01-01

Three experiments investigated 5- to 6-year-old monolingual English-speaking American children's sociolinguistic evaluations of others based on their accent (native, foreign) and social actions (nice, mean, neutral). In Experiment 1, children expressed social preferences for native-accented English speakers over foreign-accented speakers, and they…
Bilingual Computerized Speech Recognition Screening for Depression Symptoms

ERIC Educational Resources Information Center

Gonzalez, Gerardo; Carter, Colby; Blanes, Erika

2007-01-01

The Voice-Interactive Depression Assessment System (VIDAS) is a computerized speech recognition application for screening depression based on the Center for Epidemiological Studies--Depression scale in English and Spanish. Study 1 included 50 English and 47 Spanish speakers. Study 2 involved 108 English and 109 Spanish speakers. Participants…
Sentence Comprehension in Swahili-English Bilingual Agrammatic Speakers

ERIC Educational Resources Information Center

Abuom, Tom O.; Shah, Emmah; Bastiaanse, Roelien

2013-01-01

For this study, sentence comprehension was tested in Swahili-English bilingual agrammatic speakers. The sentences were controlled for four factors: (1) order of the arguments (base vs. derived); (2) embedding (declarative vs. relative sentences); (3) overt use of the relative pronoun "who"; (4) language (English and Swahili). Two…
Espanol para el hispanolhablante (Spanish for the Spanish Speaker).

ERIC Educational Resources Information Center

Blanco, George M.

This guide provides Texas teachers and administrators with guidelines, goals, instructional strategies, and activities for teaching Spanish to secondary level native speakers. It is based on the principle that the Spanish speaking student is the strongest linguistic and cultural resource to Texas teachers of languages other than English, and one…
Romanian Basic Course.

ERIC Educational Resources Information Center

Defense Language Inst., Washington, DC.

The "Romanian Basic Course," consisting of 89 lesson units in eight volumes, is designed to train native English language speakers to Level 3 proficiency in comprehension, speaking, reading, and writing Romanian (based on a 1-5 scale in which Level 5 is native speaker proficiency). Volume 1, which introduces basic sentences in dialog form with…
Use of listening strategies for the speech of individuals with dysarthria and cerebral palsy.

PubMed

Hustad, Katherine C; Dardis, Caitlin M; Kramper, Amy J

2011-03-01

This study examined listeners' endorsement of cognitive, linguistic, segmental, and suprasegmental strategies employed when listening to speakers with dysarthria. The study also examined whether strategy endorsement differed between listeners who earned the highest and lowest intelligibility scores. Speakers were eight individuals with dysarthria and cerebral palsy. Listeners were 80 individuals who transcribed speech stimuli and rated their use of each of 24 listening strategies on a 4-point scale. Results showed that cognitive and linguistic strategies were most highly endorsed. Use of listening strategies did not differ between listeners with the highest and lowest intelligibility scores. Results suggest that there may be a core of strategies common to listeners of speakers with dysarthria that may be supplemented by additional strategies, based on characteristics of the speaker and speech signal.
General contrast effects in speech perception: effect of preceding liquid on stop consonant identification.

PubMed

Lotto, A J; Kluender, K R

1998-05-01

When members of a series of synthesized stop consonants varying acoustically in F3 characteristics and varying perceptually from /da/ to /ga/ are preceded by /al/, subjects report hearing more /ga/ syllables relative to when each member is preceded by /ar/ (Mann, 1980). It has been suggested that this result demonstrates the existence of a mechanism that compensates for coarticulation via tacit knowledge of articulatory dynamics and constraints, or through perceptual recovery of vocal-tract dynamics. The present study was designed to assess the degree to which these perceptual effects are specific to qualities of human articulatory sources. In three experiments, series of consonant-vowel (CV) stimuli varying in F3-onset frequency (/da/-/ga/) were preceded by speech versions or nonspeech analogues of /al/ and /ar/. The effect of liquid identity on stop consonant labeling remained when the preceding VC was produced by a female speaker and the CV syllable was modeled after a male speaker's productions. Labeling boundaries also shifted when the CV was preceded by a sine wave glide modeled after F3 characteristics of /al/ and /ar/. Identifications shifted even when the preceding sine wave was of constant frequency equal to the offset frequency of F3 from a natural production. These results suggest an explanation in terms of general auditory processes as opposed to recovery of or knowledge of specific articulatory dynamics.
Phoneme Error Pattern by Heritage Speakers of Spanish on an English Word Recognition Test.

PubMed

Shi, Lu-Feng

2017-04-01

Heritage speakers acquire their native language from home use in their early childhood. As the native language is typically a minority language in the society, these individuals receive their formal education in the majority language and eventually develop greater competency with the majority than their native language. To date, there have not been specific research attempts to understand word recognition by heritage speakers. It is not clear if and to what degree we may infer from evidence based on bilingual listeners in general. This preliminary study investigated how heritage speakers of Spanish perform on an English word recognition test and analyzed their phoneme errors. A prospective, cross-sectional, observational design was employed. Twelve normal-hearing adult Spanish heritage speakers (four men, eight women, 20-38 yr old) participated in the study. Their language background was obtained through the Language Experience and Proficiency Questionnaire. Nine English monolingual listeners (three men, six women, 20-41 yr old) were also included for comparison purposes. Listeners were presented with 200 Northwestern University Auditory Test No. 6 words in quiet. They repeated each word orally and in writing. Their responses were scored by word, word-initial consonant, vowel, and word-final consonant. Performance was compared between groups with Student's t test or analysis of variance. Group-specific error patterns were primarily descriptive, but intergroup comparisons were made using 95% or 99% confidence intervals for proportional data. The two groups of listeners yielded comparable scores when their responses were examined by word, vowel, and final consonant. However, heritage speakers of Spanish misidentified significantly more word-initial consonants and had significantly more difficulty with initial /p, b, h/ than their monolingual peers. The two groups yielded similar patterns for vowel and word-final consonants, but heritage speakers made significantly fewer errors with /e/ and more errors with word-final /p, k/. Data reported in the present study lead to a twofold conclusion. On the one hand, normal-hearing heritage speakers of Spanish may misidentify English phonemes in patterns different from those of English monolingual listeners. Not all phoneme errors can be readily understood by comparing Spanish and English phonology, suggesting that Spanish heritage speakers differ in performance from other Spanish-English bilingual listeners. On the other hand, the absolute number of errors and the error pattern of most phonemes were comparable between English monolingual listeners and Spanish heritage speakers, suggesting that audiologists may assess word recognition in quiet in the same way for these two groups of listeners, if diagnosis is based on words, not phonemes. American Academy of Audiology
Effective Prediction of Errors by Non-native Speakers Using Decision Tree for Speech Recognition-Based CALL System

NASA Astrophysics Data System (ADS)

Wang, Hongcui; Kawahara, Tatsuya

CALL (Computer Assisted Language Learning) systems using ASR (Automatic Speech Recognition) for second language learning have received increasing interest recently. However, it still remains a challenge to achieve high speech recognition performance, including accurate detection of erroneous utterances by non-native speakers. Conventionally, possible error patterns, based on linguistic knowledge, are added to the lexicon and language model, or the ASR grammar network. However, this approach easily falls in the trade-off of coverage of errors and the increase of perplexity. To solve the problem, we propose a method based on a decision tree to learn effective prediction of errors made by non-native speakers. An experimental evaluation with a number of foreign students learning Japanese shows that the proposed method can effectively generate an ASR grammar network, given a target sentence, to achieve both better coverage of errors and smaller perplexity, resulting in significant improvement in ASR accuracy.
Reaching Spanish-speaking smokers online: a 10-year worldwide research program

PubMed Central

Muñoz, Ricardo Felipe; Chen, Ken; Bunge, Eduardo Liniers; Bravin, Julia Isabela; Shaughnessy, Elizabeth Annelly; Pérez-Stable, Eliseo Joaquín

2014-01-01

Objective To describe a 10-year proof-of-concept smoking cessation research program evaluating the reach of online health interventions throughout the Americas. Methods Recruitment occurred from 2002–2011, primarily using Google.com AdWords. Over 6 million smokers from the Americas entered keywords related to smoking cessation; 57 882 smokers (15 912 English speakers and 41 970 Spanish speakers) were recruited into online self-help automated intervention studies. To examine disparities in utilization of methods to quit smoking, cessation aids used by English speakers and Spanish speakers were compared. To determine whether online interventions reduce disparities, abstinence rates were also compared. Finally, the reach of the intervention was illustrated for three large Spanish-speaking countries of the Americas—Argentina, Mexico, and Peru—and the United States of America. Results Few participants had utilized other methods to stop smoking before coming to the Internet site; most reported using no previous smoking cessation aids: 69.2% of Spanish speakers versus 51.8% of English speakers (P < 0.01). The most used method was nicotine gum, 13.9%. Nicotine dependence levels were similar to those reported for in-person smoking cessation trials. Overall observed quit rate for English speakers was 38.1% and for Spanish speakers, 37.0%; quit rates in which participants with missing data were considered to be smoking were 11.1% and 10.6%, respectively. Neither comparison was significantly different. Conclusions The systematic use of evidence-based Internet interventions for health problems could have a broad impact throughout the Americas, at little or no cost to individuals or to ministries of health. PMID:25211569
On the optimization of a mixed speaker array in an enclosed space using the virtual-speaker weighting method

NASA Astrophysics Data System (ADS)

Peng, Bo; Zheng, Sifa; Liao, Xiangning; Lian, Xiaomin

2018-03-01

In order to achieve sound field reproduction in a wide frequency band, multiple-type speakers are used. The reproduction accuracy is not only affected by the signals sent to the speakers, but also depends on the position and the number of each type of speaker. The method of optimizing a mixed speaker array is investigated in this paper. A virtual-speaker weighting method is proposed to optimize both the position and the number of each type of speaker. In this method, a virtual-speaker model is proposed to quantify the increment of controllability of the speaker array when the speaker number increases. While optimizing a mixed speaker array, the gain of the virtual-speaker transfer function is used to determine the priority orders of the candidate speaker positions, which optimizes the position of each type of speaker. Then the relative gain of the virtual-speaker transfer function is used to determine whether the speakers are redundant, which optimizes the number of each type of speaker. Finally the virtual-speaker weighting method is verified by reproduction experiments of the interior sound field in a passenger car. The results validate that the optimum mixed speaker array can be obtained using the proposed method.
Evaluating the lexico-grammatical differences in the writing of native and non-native speakers of English in peer-reviewed medical journals in the field of pediatric oncology: Creation of the genuine index scoring system

PubMed Central

Gayle, Alberto Alexander; Shimaoka, Motomu

2017-01-01

Introduction The predominance of English in scientific research has created hurdles for “non-native speakers” of English. Here we present a novel application of native language identification (NLI) for the assessment of medical-scientific writing. For this purpose, we created a novel classification system whereby scoring would be based solely on text features found to be distinctive among native English speakers (NS) within a given context. We dubbed this the “Genuine Index” (GI). Methodology This methodology was validated using a small set of journals in the field of pediatric oncology. Our dataset consisted of 5,907 abstracts, representing work from 77 countries. A support vector machine (SVM) was used to generate our model and for scoring. Results Accuracy, precision, and recall of the classification model were 93.3%, 93.7%, and 99.4%, respectively. Class specific F-scores were 96.5% for NS and 39.8% for our benchmark class, Japan. Overall kappa was calculated to be 37.2%. We found significant differences between countries with respect to the GI score. Significant correlation was found between GI scores and two validated objective measures of writing proficiency and readability. Two sets of key terms and phrases differentiating NS and non-native writing were identified. Conclusions Our GI model was able to detect, with a high degree of reliability, subtle differences between the terms and phrasing used by native and non-native speakers in peer reviewed journals, in the field of pediatric oncology. In addition, L1 language transfer was found to be very likely to survive revision, especially in non-Western countries such as Japan. These findings show that even when the language used is technically correct, there may still be some phrasing or usage that impact quality. PMID:28212419
Nonoccurrence of Negotiation of Meaning in Task-Based Synchronous Computer-Mediated Communication

ERIC Educational Resources Information Center

Van Der Zwaard, Rose; Bannink, Anne

2016-01-01

This empirical study investigated the occurrence of meaning negotiation in an interactive synchronous computer-mediated second language (L2) environment. Sixteen dyads (N = 32) consisting of nonnative speakers (NNSs) and native speakers (NSs) of English performed 2 different tasks using videoconferencing and written chat. The data were coded and…
Deeper than Shallow: Evidence for Structure-Based Parsing Biases in Second-Language Sentence Processing

ERIC Educational Resources Information Center

Witzel, Jeffrey; Witzel, Naoko; Nicol, Janet

2012-01-01

This study examines the reading patterns of native speakers (NSs) and high-level (Chinese) nonnative speakers (NNSs) on three English sentence types involving temporarily ambiguous structural configurations. The reading patterns on each sentence type indicate that both NSs and NNSs were biased toward specific structural interpretations. These…
Negotiation for Action: English Language Learning in Game-Based Virtual Worlds

ERIC Educational Resources Information Center

Zheng, Dongping; Young, Michael F.; Wagner, Manuela Maria; Brewer, Robert A.

2009-01-01

This study analyzes the user chat logs and other artifacts of a virtual world, "Quest Atlantis" (QA), and proposes the concept of Negotiation for Action (NfA) to explain how interaction, specifically, avatar-embodied collaboration between native English speakers and nonnative English speakers, provided resources for English language acquisition.…
A Corpus-Based Study on Turkish Spoken Productions of Bilingual Adults

ERIC Educational Resources Information Center

Agçam, Reyhan; Bulut, Adem

2016-01-01

The current study investigated whether monolingual adult speakers of Turkish and bilingual adult speakers of Arabic and Turkish significantly differ regarding their spoken productions in Turkish. Accordingly, two groups of undergraduate students studying Turkish Language and Literature at a state university in Turkey were presented two videos on a…
Language Skills of Bidialectal and Bilingual Children: Considering a Strengths-Based Perspective

ERIC Educational Resources Information Center

Lee-James, Ryan; Washington, Julie A.

2018-01-01

This article examines the language and cognitive skills of bidialectal and bilingual children, focusing on African American English bidialectal speakers and Spanish-English bilingual speakers. It contributes to the discussion by considering two themes in the extant literature: (1) linguistic and cognitive strengths can be found in speaking two…

Long-Term Speech Results of Cleft Palate Speakers with Marginal Velopharyngeal Competence.

ERIC Educational Resources Information Center

Hardin, Mary A.; And Others

1990-01-01

This study of the longitudinal speech performance of 48 cleft palate speakers with marginal velopharyngeal competence, from age 6 to adolescence, found that the adolescent subjects' velopharyngeal status could be predicted based on 2 variables at age 6: the severity ratings of articulation defectiveness and nasality. (Author/JDD)
Analysis of VOT in Turkish Speakers with Aphasia

ERIC Educational Resources Information Center

Kopkalli-Yavuz, Handan; Mavis, Ilknur; Akyildiz, Didem

2011-01-01

Studies investigating voicing onset time (VOT) production by speakers with aphasia have shown that nonfluent aphasics show a deficit in the articulatory programming of speech sounds based on the range of VOT values produced by aphasic individuals. If the VOT value lies between the normal range of VOT for the voiced and voiceless categories, then…
Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

DOEpatents

Holzrichter, J.F.; Ng, L.C.

1998-03-17

The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching. 35 figs.
Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

DOEpatents

Holzrichter, John F.; Ng, Lawrence C.

1998-01-01

The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching.
Real-Time Control of an Articulatory-Based Speech Synthesizer for Brain Computer Interfaces

PubMed Central

Bocquelet, Florent; Hueber, Thomas; Girin, Laurent; Savariaux, Christophe; Yvert, Blaise

2016-01-01

Restoring natural speech in paralyzed and aphasic people could be achieved using a Brain-Computer Interface (BCI) controlling a speech synthesizer in real-time. To reach this goal, a prerequisite is to develop a speech synthesizer producing intelligible speech in real-time with a reasonable number of control parameters. We present here an articulatory-based speech synthesizer that can be controlled in real-time for future BCI applications. This synthesizer converts movements of the main speech articulators (tongue, jaw, velum, and lips) into intelligible speech. The articulatory-to-acoustic mapping is performed using a deep neural network (DNN) trained on electromagnetic articulography (EMA) data recorded on a reference speaker synchronously with the produced speech signal. This DNN is then used in both offline and online modes to map the position of sensors glued on different speech articulators into acoustic parameters that are further converted into an audio signal using a vocoder. In offline mode, highly intelligible speech could be obtained as assessed by perceptual evaluation performed by 12 listeners. Then, to anticipate future BCI applications, we further assessed the real-time control of the synthesizer by both the reference speaker and new speakers, in a closed-loop paradigm using EMA data recorded in real time. A short calibration period was used to compensate for differences in sensor positions and articulatory differences between new speakers and the reference speaker. We found that real-time synthesis of vowels and consonants was possible with good intelligibility. In conclusion, these results open to future speech BCI applications using such articulatory-based speech synthesizer. PMID:27880768
The Use of Epistemic Markers as a Means of Hedging and Boosting in the Discourse of L1 and L2 Speakers of Modern Greek: A Corpus-Based Study in Informal Letter-Writing

ERIC Educational Resources Information Center

Efstathiadi, Lia

2010-01-01

The paper investigates the semantic area of Epistemic Modality in Modern Greek, by means of a corpus-based research. A comparative, quantitative study was performed between written corpora (informal letter-writing) of non-native informants with various language backgrounds and Greek native speakers. A number of epistemic markers were selected for…
Training Japanese listeners to identify English /r/ and /l/: A first report

PubMed Central

Logan, John S.; Lively, Scott E.; Pisoni, David B.

2012-01-01

Native speakers of Japanese learning English generally have difficulty differentiating the phonemes /r/ and /l/, even after years of experience with English. Previous research that attempted to train Japanese listeners to distinguish this contrast using synthetic stimuli reported little success, especially when transfer to natural tokens containing /r/ and /l/ was tested. In the present study, a different training procedure that emphasized variability among stimulus tokens was used. Japanese subjects were trained in a minimal pair identification paradigm using multiple natural exemplars contrasting /r/ and /l/ from a variety of phonetic environments as stimuli. A pretest–posttest design containing natural tokens was used to assess the effects of training. Results from six subjects showed that the new procedure was more robust than earlier training techniques. Small but reliable differences in performance were obtained between pretest and posttest scores. The results demonstrate the importance of stimulus variability and task-related factors in training nonnative speakers to perceive novel phonetic contrasts that are not distinctive in their native language. PMID:2016438
Invariant principles of speech motor control that are not language-specific.

PubMed

Chakraborty, Rahul

2012-12-01

Bilingual speakers must learn to modify their speech motor control mechanism based on the linguistic parameters and rules specified by the target language. This study examines if there are aspects of speech motor control which remain invariant regardless of the first (L1) and second (L2) language targets. Based on the age of academic exposure and proficiency in L2, 21 Bengali-English bilingual participants were classified into high (n = 11) and low (n = 10) L2 (English) proficiency groups. Using the Optotrak 3020 motion sensitive camera system, the lips and jaw movements were recorded while participants produced Bengali (L1) and English (L2) sentences. Based on kinematic analyses of the lip and jaw movements, two different variability measures (i.e., lip aperture and lower lip/jaw complex) were computed for English and Bengali sentences. Analyses demonstrated that the two groups of bilingual speakers produced lip aperture complexes (a higher order synergy) that were more consistent in co-ordination than were the lower lip/jaw complexes (a lower order synergy). Similar findings were reported earlier in monolingual English speakers by Smith and Zelaznik. Thus, this hierarchical organization may be viewed as a fundamental principle of speech motor control, since it is maintained even in bilingual speakers.
Interactive voice technology: Variations in the vocal utterances of speakers performing a stress-inducing task

NASA Astrophysics Data System (ADS)

Mosko, J. D.; Stevens, K. N.; Griffin, G. R.

1983-08-01

Acoustical analyses were conducted of words produced by four speakers in a motion stress-inducing situation. The aim of the analyses was to document the kinds of changes that occur in the vocal utterances of speakers who are exposed to motion stress and to comment on the implications of these results for the design and development of voice interactive systems. The speakers differed markedly in the types and magnitudes of the changes that occurred in their speech. For some speakers, the stress-inducing experimental condition caused an increase in fundamental frequency, changes in the pattern of vocal fold vibration, shifts in vowel production and changes in the relative amplitudes of sounds containing turbulence noise. All speakers showed greater variability in the experimental condition than in more relaxed control situation. The variability was manifested in the acoustical characteristics of individual phonetic elements, particularly in speech sound variability observed serve to unstressed syllables. The kinds of changes and variability observed serve to emphasize the limitations of speech recognition systems based on template matching of patterns that are stored in the system during a training phase. There is need for a better understanding of these phonetic modifications and for developing ways of incorporating knowledge about these changes within a speech recognition system.
A Computational Wireless Network Backplane: Performance in a Distributed Speaker Identification Application Postprint

DTIC Science & Technology

2008-12-01

AUTHOR(S) H.T. Kung, Chit-Kwan Lin, Chia-Yung Su, Dario Vlah, John Grieco, Mark Huggins, and Bruce Suter 5d. PROJECT NUMBER WCNA 5e. TASK NUMBER...APPLICATION H. T. Kung, Chit-Kwan Lin, Chia-Yung Su, Dario Vlah John Grieco†, Mark Huggins‡, Bruce Suter† Harvard University Air Force Research Lab†, Oasis...contributing its C7 processors used in our wireless testbed. REFERENCES [1] R. North, N. Browne, and L. Schiavone , “Joint tactical radio system - connecting
Knowledge of Connectors as Cohesion in Text: A Comparative Study of Native English and ESL (English as a Second Language) Speakers

DTIC Science & Technology

1989-08-18

for public release; distribution 2b. DECLASSIFICATION / DOWNGRADING SCHEDULE ufnl i mi ted. 4. PERFORMING ORGANIZATION REPORT NUMBER(S) S. MONITORING... ORGANIZATION REPORT NUMBER(S) 6a. NAME OF PERFORMING ORGANIZATION 6b OFFICE SYMBOL 7a. NAME OF MONITORING ORGANIZATION University of California (if...OF FUNDING/SPONSORING 8b OFFICE SYMBOL 9 PROCUREMENT INSTRUMENT IDENTIFICATION NUMBER ORGANIZATION (If applicable) N00014-85-K0562 8c. ADDRESS (City
A model of acoustic interspeaker variability based on the concept of formant-cavity affiliation

NASA Astrophysics Data System (ADS)

Apostol, Lian; Perrier, Pascal; Bailly, Gérard

2004-01-01

A method is proposed to model the interspeaker variability of formant patterns for oral vowels. It is assumed that this variability originates in the differences existing among speakers in the respective lengths of their front and back vocal-tract cavities. In order to characterize, from the spectral description of the acoustic speech signal, these vocal-tract differences between speakers, each formant is interpreted, according to the concept of formant-cavity affiliation, as a resonance of a specific vocal-tract cavity. Its frequency can thus be directly related to the corresponding cavity length, and a transformation model can be proposed from a speaker A to a speaker B on the basis of the frequency ratios of the formants corresponding to the same resonances. In order to minimize the number of sounds to be recorded for each speaker in order to carry out this speaker transformation, the frequency ratios are exactly computed only for the three extreme cardinal vowels [eye, aye, you] and they are approximated for the remaining vowels through an interpolation function. The method is evaluated through its capacity to transform the (F1,F2) formant patterns of eight oral vowels pronounced by five male speakers into the (F1,F2) patterns of the corresponding vowels generated by an articulatory model of the vocal tract. The resulting formant patterns are compared to those provided by normalization techniques published in the literature. The proposed method is found to be efficient, but a number of limitations are also observed and discussed. These limitations can be associated with the formant-cavity affiliation model itself or with a possible influence of speaker-specific vocal-tract geometry in the cross-sectional direction, which the model might not have taken into account.
Speech enhancement based on neural networks improves speech intelligibility in noise for cochlear implant users.

PubMed

Goehring, Tobias; Bolner, Federico; Monaghan, Jessica J M; van Dijk, Bas; Zarowski, Andrzej; Bleeck, Stefan

2017-02-01

Speech understanding in noisy environments is still one of the major challenges for cochlear implant (CI) users in everyday life. We evaluated a speech enhancement algorithm based on neural networks (NNSE) for improving speech intelligibility in noise for CI users. The algorithm decomposes the noisy speech signal into time-frequency units, extracts a set of auditory-inspired features and feeds them to the neural network to produce an estimation of which frequency channels contain more perceptually important information (higher signal-to-noise ratio, SNR). This estimate is used to attenuate noise-dominated and retain speech-dominated CI channels for electrical stimulation, as in traditional n-of-m CI coding strategies. The proposed algorithm was evaluated by measuring the speech-in-noise performance of 14 CI users using three types of background noise. Two NNSE algorithms were compared: a speaker-dependent algorithm, that was trained on the target speaker used for testing, and a speaker-independent algorithm, that was trained on different speakers. Significant improvements in the intelligibility of speech in stationary and fluctuating noises were found relative to the unprocessed condition for the speaker-dependent algorithm in all noise types and for the speaker-independent algorithm in 2 out of 3 noise types. The NNSE algorithms used noise-specific neural networks that generalized to novel segments of the same noise type and worked over a range of SNRs. The proposed algorithm has the potential to improve the intelligibility of speech in noise for CI users while meeting the requirements of low computational complexity and processing delay for application in CI devices. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Speech watermarking: an approach for the forensic analysis of digital telephonic recordings.

PubMed

Faundez-Zanuy, Marcos; Lucena-Molina, Jose J; Hagmüller, Martin

2010-07-01

In this article, the authors discuss the problem of forensic authentication of digital audio recordings. Although forensic audio has been addressed in several articles, the existing approaches are focused on analog magnetic recordings, which are less prevalent because of the large amount of digital recorders available on the market (optical, solid state, hard disks, etc.). An approach based on digital signal processing that consists of spread spectrum techniques for speech watermarking is presented. This approach presents the advantage that the authentication is based on the signal itself rather than the recording format. Thus, it is valid for usual recording devices in police-controlled telephone intercepts. In addition, our proposal allows for the introduction of relevant information such as the recording date and time and all the relevant data (this is not always possible with classical systems). Our experimental results reveal that the speech watermarking procedure does not interfere in a significant way with the posterior forensic speaker identification.
American Voice Types: Towards a Vocal Typology for American English

ERIC Educational Resources Information Center

McPeek, Tyler

2013-01-01

Individual voices are not uniformly similar to others, even when factoring out speaker characteristics such as sex, age, dialect, and so on. Some speakers share common features and can cohere into groups based on gross vocal similarity but, to date, no attempt has been made to describe these features systematically or to generate a taxonomy based…
Apprendre l'orthographe avec un correcteur orthographique (Learning Spelling with a Spell-Checker?)?

ERIC Educational Resources Information Center

Desmarais, Lise

1998-01-01

Reports a study with 27 adults, both native French-speakers and native English-speakers, on the effectiveness of using a spell-checker as the core element to teach French spelling. The method used authentic materials, individualized monitoring, screen and hard-copy text reading, and content sequencing based on errors. The approach generated…
Teaching Standard Italian to Dialect Speakers: A Pedagogical Perspective of Linguistic Systems in Contact

ERIC Educational Resources Information Center

Danesi, Marcel

1974-01-01

The teaching of standard Italian to speakers of Italian dialects both in Italy and in North America is discussed, specifically through a specialized pedagogical program within the framework of a sociolinguistic and psycholinguistic perspective, and based on a structural analysis of linguistic systems in contact. Italian programs in Toronto are…
Lifting the Curtain on the Wizard of Oz: Biased Voice-Based Impressions of Speaker Size

ERIC Educational Resources Information Center

Rendall, Drew; Vokey, John R.; Nemeth, Christie

2007-01-01

The consistent, but often wrong, impressions people form of the size of unseen speakers are not random but rather point to a consistent misattribution bias, one that the advertising, broadcasting, and entertainment industries also routinely exploit. The authors report 3 experiments examining the perceptual basis of this bias. The results indicate…
Proficiency and Working Memory Based Explanations for Nonnative Speakers' Sensitivity to Agreement in Sentence Processing

ERIC Educational Resources Information Center

Coughlin, Caitlin E.; Tremblay, Annie

2013-01-01

This study examines the roles of proficiency and working memory (WM) capacity in second-/foreign-language (L2) learners' processing of agreement morphology. It investigates the processing of grammatical and ungrammatical short- and long-distance number agreement dependencies by native English speakers at two proficiencies in French, and the…
Accent Detection and Social Cognition: Evidence of Protracted Learning

ERIC Educational Resources Information Center

Creel, Sarah C.

2018-01-01

How and when do children become aware that speakers have different accents? While adults readily make a variety of subtle social inferences based on speakers' accents, findings from children are more mixed: while one line of research suggests that even infants may be acutely sensitive to accent unfamiliarity, other studies suggest that 5-year-olds…

Is Mandarin Chinese a Truth-Based Language? Rejecting Responses to Negative Assertions and Questions

PubMed Central

Li, Feifei; González-Fuente, Santiago; Prieto, Pilar; Espinal, M.Teresa

2016-01-01

This paper addresses the central question of whether Mandarin Chinese (MC) is a canonical truth-based language, a language that is expected to express the speaker's disagreement to a negative proposition by means of a negative particle followed by a positive sentence. Eight native speakers of MC participated in an oral Discourse Completion Task that elicited rejecting responses to negative assertions/questions and broad focus statements (control condition). Results show that MC speakers convey reject by relying on a combination of lexico-syntactic strategies (e.g., negative particles such as bù, méi(yǒu), and positive sentences) together with prosodic (e.g., mean pitch) and gestural strategies (mainly, the use of head nods). Importantly, the use of a negative particle, which was the expected outcome in truth-based languages, only appeared in 52% of the rejecting answers. This system puts into question the macroparametric division between truth-based and polarity-based languages and calls for a more general view of the instantiation of a reject speech act that integrates lexical and syntactic strategies with prosodic and gestural strategies. PMID:28066292
Speaker normalization for chinese vowel recognition in cochlear implants.

PubMed

Luo, Xin; Fu, Qian-Jie

2005-07-01

Because of the limited spectra-temporal resolution associated with cochlear implants, implant patients often have greater difficulty with multitalker speech recognition. The present study investigated whether multitalker speech recognition can be improved by applying speaker normalization techniques to cochlear implant speech processing. Multitalker Chinese vowel recognition was tested with normal-hearing Chinese-speaking subjects listening to a 4-channel cochlear implant simulation, with and without speaker normalization. For each subject, speaker normalization was referenced to the speaker that produced the best recognition performance under conditions without speaker normalization. To match the remaining speakers to this "optimal" output pattern, the overall frequency range of the analysis filter bank was adjusted for each speaker according to the ratio of the mean third formant frequency values between the specific speaker and the reference speaker. Results showed that speaker normalization provided a small but significant improvement in subjects' overall recognition performance. After speaker normalization, subjects' patterns of recognition performance across speakers changed, demonstrating the potential for speaker-dependent effects with the proposed normalization technique.
Oral-diadochokinesis rates across languages: English and Hebrew norms.

PubMed

Icht, Michal; Ben-David, Boaz M

2014-01-01

Oro-facial and speech motor control disorders represent a variety of speech and language pathologies. Early identification of such problems is important and carries clinical implications. A common and simple tool for gauging the presence and severity of speech motor control impairments is oral-diadochokinesis (oral-DDK). Surprisingly, norms for adult performance are missing from the literature. The goals of this study were: (1) to establish a norm for oral-DDK rate for (young to middle-age) adult English speakers, by collecting data from the literature (five studies, N=141); (2) to investigate the possible effect of language (and culture) on oral-DDK performance, by analyzing studies conducted in other languages (five studies, N=140), alongside the English norm; and (3) to find a new norm for adult Hebrew speakers, by testing 115 speakers. We first offer an English norm with a mean of 6.2syllables/s (SD=.8), and a lower boundary of 5.4syllables/s that can be used to indicate possible abnormality. Next, we found significant differences between four tested languages (English, Portuguese, Farsi and Greek) in oral-DDK rates. Results suggest the need to set language and culture sensitive norms for the application of the oral-DDK task world-wide. Finally, we found the oral-DDK performance for adult Hebrew speakers to be 6.4syllables/s (SD=.8), not significantly different than the English norms. This implies possible phonological similarities between English and Hebrew. We further note that no gender effects were found in our study. We recommend using oral-DDK as an important tool in the speech language pathologist's arsenal. Yet, application of this task should be done carefully, comparing individual performance to a set norm within the specific language. Readers will be able to: (1) identify the Speech-Language Pathologist assessment process using the oral-DDK task, by comparing an individual performance to the present English norm, (2) describe the impact of language on oral-DDK performance, and (3) accurately detect Hebrew speakers' patients using this tool. Copyright © 2014 Elsevier Inc. All rights reserved.
Multimodal Speaker Diarization.

PubMed

Noulas, A; Englebienne, G; Krose, B J A

2012-01-01

We present a novel probabilistic framework that fuses information coming from the audio and video modality to perform speaker diarization. The proposed framework is a Dynamic Bayesian Network (DBN) that is an extension of a factorial Hidden Markov Model (fHMM) and models the people appearing in an audiovisual recording as multimodal entities that generate observations in the audio stream, the video stream, and the joint audiovisual space. The framework is very robust to different contexts, makes no assumptions about the location of the recording equipment, and does not require labeled training data as it acquires the model parameters using the Expectation Maximization (EM) algorithm. We apply the proposed model to two meeting videos and a news broadcast video, all of which come from publicly available data sets. The results acquired in speaker diarization are in favor of the proposed multimodal framework, which outperforms the single modality analysis results and improves over the state-of-the-art audio-based speaker diarization.
Phonological awareness of English by Chinese and Korean bilinguals

NASA Astrophysics Data System (ADS)

Chung, Hyunjoo; Schmidt, Anna; Cheng, Tse-Hsuan

2002-05-01

This study examined non-native speakers phonological awareness of spoken English. Chinese speaking adults, Korean speaking adults, and English speaking adults were tested. The L2 speakers had been in the US for less than 6 months. Chinese and Korean allow no consonant clusters and have limited numbers of consonants allowable in syllable final position, whereas English allows a variety of clusters and various consonants in syllable final position. Subjects participated in eight phonological awareness tasks (4 replacement tasks and 4 deletion tasks) based on English phonology. In addition, digit span was measured. Preliminary analysis indicates that Chinese and Korean speaker errors appear to reflect L1 influences (such as orthography, phonotactic constraints, and phonology). All three groups of speakers showed more difficulty with manipulation of rime than onset, especially with postvocalic nasals. Results will be discussed in terms of syllable structure, L1 influence, and association with short term memory.
Age-related differences in communication and audience design.

PubMed

Horton, William S; Spieler, Daniel H

2007-06-01

This article reports an experiment examining the extent to which younger and older speakers engage in audience design, the process of adapting one's speech for particular addresses. Through an initial card-matching task, pairs of younger adults and pairs of older adults established common ground for sets of picture cards. Subsequently, the same individuals worked separately on a computer-based picture-description task that involved a novel partner-cuing paradigm. Younger speakers' descriptions to the familiar partner were shorter and were initiated more quickly than were descriptions to an unfamiliar partner. In addition, younger speakers' descriptions to the familiar partner exhibited a higher proportion of lexical overlap with previous descriptions than did descriptions to an unfamiliar partner. Older speakers showed no equivalent evidence for audience design, which may reflect difficulties with retrieving partner-specific information from memory during conversation. ((c) 2007 APA, all rights reserved).
Using perturbed handwriting to support writer identification in the presence of severe data constraints

NASA Astrophysics Data System (ADS)

Chen, Jin; Cheng, Wen; Lopresti, Daniel

2011-01-01

Since real data is time-consuming and expensive to collect and label, researchers have proposed approaches using synthetic variations for the tasks of signature verification, speaker authentication, handwriting recognition, keyword spotting, etc. However, the limitation of real data is particularly critical in the field of writer identification in that in forensics, adversaries cannot be expected to provide sufficient data to train a classifier. Therefore, it is unrealistic to always assume sufficient real data to train classifiers extensively for writer identification. In addition, this field differs from many others in that we strive to preserve as much inter-writer variations, but model-perturbed handwriting might break such discriminability among writers. Building on work described in another paper where human subjects were involved in calibrating realistic-looking transformation, we then measured the effects of incorporating perturbed handwriting into the training dataset. Experimental results justified our hypothesis that with limited real data, model-perturbed handwriting improved the performance of writer identification. Particularly, if only one single sample for each writer was available, incorporating perturbed data achieved a 36x performance gain.
An Exploratory Study of Mexican-Origin Fathers' Involvement in Their Child's Education: The Role of Linguistic Acculturation

ERIC Educational Resources Information Center

Lopez, Vera

2007-01-01

The present exploratory study examined the involvement of 77 Mexican-origin fathers in their school-age (grades 4-6) child's education. Fathers were classified into one of three groups based on their linguistic acculturation status. The three groups were predominantly English-speakers (n = 25), English/Spanish-speakers (n = 27), and predominantly…
Gender Related Differences in Using Intensive Adverbs in Turkish

ERIC Educational Resources Information Center

Önem, Engin E.

2017-01-01

This study aims to find out whether there is a gender based difference between male and female native speakers of Turkish in using intensive adverbs in Turkish. To achieve this, 182 voluntary native speakers of Turkish (89 female/93 male) with age ranging from 18 to 22 were asked to complete a photo description task. The task required choosing one…
Basic Report for Targeted Communications; Teaching a Standard English to Speakers of Other Dialects.

ERIC Educational Resources Information Center

Hess, Karen M.

Designed to interpret and synthesize the existing research and related information about dialects for those people who are involved in teaching a standard English to speakers of other dialects, the information in this report is based on an analysis and synthesis of over 1,250 articles and reports dealing with dialects and dialect learning. The…
Structural Correlates for Lexical Efficiency and Number of Languages in Non-Native Speakers of English

ERIC Educational Resources Information Center

Grogan, A.; Parker Jones, O.; Ali, N.; Crinion, J.; Orabona, S.; Mechias, M. L.; Ramsden, S.; Green, D. W.; Price, C. J.

2012-01-01

We used structural magnetic resonance imaging (MRI) and voxel based morphometry (VBM) to investigate whether the efficiency of word processing in the non-native language (lexical efficiency) and the number of non-native languages spoken (2+ versus 1) were related to local differences in the brain structure of bilingual and multilingual speakers.…
Native Speaker Norms and China English: From the Perspective of Learners and Teachers in China

ERIC Educational Resources Information Center

He, Deyuan; Zhang, Qunying

2010-01-01

This article explores the question of whether the norms based on native speakers of English should be kept in English teaching in an era when English has become World Englishes. This is an issue that has been keenly debated in recent years, not least in the pages of "TESOL Quarterly." However, "China English" in such debates…
A Normative-Speaker Validation Study of Two Indices Developed to Quantify Tongue Dorsum Activity from Midsagittal Tongue Shapes

ERIC Educational Resources Information Center

Zharkova, Natalia

2013-01-01

This study reported adult scores on two measures of tongue shape, based on midsagittal tongue shape data from ultrasound imaging. One of the measures quantified the extent of tongue dorsum excursion, and the other measure represented the place of maximal excursion. Data from six adult speakers of Scottish Standard English without speech disorders…
Native-Speaker/Non-Native-Speaker Discourse in the MOO: Topic Negotiation and Initiation in a Synchronous Text-Based Environment

ERIC Educational Resources Information Center

Schwienhorst, Klaus

2004-01-01

A number of researchers in computer-mediated communication have pointed towards its potential to stimulate learner participation and engagement in the classroom. However, in many cases only anecdotal reports were provided. In addition, it is unclear whether the pedagogical set-up or the technology involved is responsible for changes in learner…
Empathy matters: ERP evidence for inter-individual differences in social language processing

PubMed Central

Van Berkum, Jos J.A.; Bastiaansen, Marcel C.M.; Tesink, Cathelijne M.J.Y.; Kos, Miriam; Buitelaar, Jan K.; Hagoort, Peter

2012-01-01

When an adult claims he cannot sleep without his teddy bear, people tend to react surprised. Language interpretation is, thus, influenced by social context, such as who the speaker is. The present study reveals inter-individual differences in brain reactivity to social aspects of language. Whereas women showed brain reactivity when stereotype-based inferences about a speaker conflicted with the content of the message, men did not. This sex difference in social information processing can be explained by a specific cognitive trait, one’s ability to empathize. Individuals who empathize to a greater degree revealed larger N400 effects (as well as a larger increase in γ-band power) to socially relevant information. These results indicate that individuals with high-empathizing skills are able to rapidly integrate information about the speaker with the content of the message, as they make use of voice-based inferences about the speaker to process language in a top-down manner. Alternatively, individuals with lower empathizing skills did not use information about social stereotypes in implicit sentence comprehension, but rather took a more bottom-up approach to the processing of these social pragmatic sentences. PMID:21148175
Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

DOE Office of Scientific and Technical Information (OSTI.GOV)

Holzrichter, J.F.; Ng, L.C.

The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used formore » purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching. 35 figs.« less
Blue-green color categorization in Mandarin-English speakers.

PubMed

Wuerger, Sophie; Xiao, Kaida; Mylonas, Dimitris; Huang, Qingmei; Karatzas, Dimosthenis; Hird, Emily; Paramei, Galina

2012-02-01

Observers are faster to detect a target among a set of distracters if the targets and distracters come from different color categories. This cross-boundary advantage seems to be limited to the right visual field, which is consistent with the dominance of the left hemisphere for language processing [Gilbert et al., Proc. Natl. Acad. Sci. USA 103, 489 (2006)]. Here we study whether a similar visual field advantage is found in the color identification task in speakers of Mandarin, a language that uses a logographic system. Forty late Mandarin-English bilinguals performed a blue-green color categorization task, in a blocked design, in their first language (L1: Mandarin) or second language (L2: English). Eleven color singletons ranging from blue to green were presented for 160 ms, randomly in the left visual field (LVF) or right visual field (RVF). Color boundary and reaction times (RTs) at the color boundary were estimated in L1 and L2, for both visual fields. We found that the color boundary did not differ between the languages; RTs at the color boundary, however, were on average more than 100 ms shorter in the English compared to the Mandarin sessions, but only when the stimuli were presented in the RVF. The finding may be explained by the script nature of the two languages: Mandarin logographic characters are analyzed visuospatially in the right hemisphere, which conceivably facilitates identification of color presented to the LVF. © 2012 Optical Society of America
Speech endpoint detection with non-language speech sounds for generic speech processing applications

NASA Astrophysics Data System (ADS)

McClain, Matthew; Romanowski, Brian

2009-05-01

Non-language speech sounds (NLSS) are sounds produced by humans that do not carry linguistic information. Examples of these sounds are coughs, clicks, breaths, and filled pauses such as "uh" and "um" in English. NLSS are prominent in conversational speech, but can be a significant source of errors in speech processing applications. Traditionally, these sounds are ignored by speech endpoint detection algorithms, where speech regions are identified in the audio signal prior to processing. The ability to filter NLSS as a pre-processing step can significantly enhance the performance of many speech processing applications, such as speaker identification, language identification, and automatic speech recognition. In order to be used in all such applications, NLSS detection must be performed without the use of language models that provide knowledge of the phonology and lexical structure of speech. This is especially relevant to situations where the languages used in the audio are not known apriori. We present the results of preliminary experiments using data from American and British English speakers, in which segments of audio are classified as language speech sounds (LSS) or NLSS using a set of acoustic features designed for language-agnostic NLSS detection and a hidden-Markov model (HMM) to model speech generation. The results of these experiments indicate that the features and model used are capable of detection certain types of NLSS, such as breaths and clicks, while detection of other types of NLSS such as filled pauses will require future research.
[The contribution of different cochlear insertion region to Mandarin speech perception in users of cochlear implant].

PubMed

Qi, Beier; Liu, Bo; Liu, Sha; Liu, Haihong; Dong, Ruijuan; Zhang, Ning; Gong, Shusheng

2011-05-01

To study the effect of cochlear electrode coverage and different insertion region on speech recognition, especially tone perception of cochlear implant users whose native language is Mandarin Chinese. Setting seven test conditions by fitting software. All conditions were created by switching on/off respective channels in order to simulate different insertion position. Then Mandarin CI users received 4 Speech tests, including Vowel Identification test, Consonant Identification test, Tone Identification test-male speaker, Mandarin HINT test (SRS) in quiet and noise. To all test conditions: the average score of vowel identification was significantly different, from 56% to 91% (Rank sum test, P < 0.05). The average score of consonant identification was significantly different, from 72% to 85% (ANOVNA, P < 0.05). The average score of Tone identification was not significantly different (ANOVNA, P > 0.05). However the more channels activated, the higher scores obtained, from 68% to 81%. This study shows that there is a correlation between insertion depth and speech recognition. Because all parts of the basement membrane can help CI users to improve their speech recognition ability, it is very important to enhance verbal communication ability and social interaction ability of CI users by increasing insertion depth and actively stimulating the top region of cochlear.
The Wildcat Corpus of Native- and Foreign-Accented English: Communicative Efficiency across Conversational Dyads with Varying Language Alignment Profiles

PubMed Central

Van Engen, Kristin J.; Baese-Berk, Melissa; Baker, Rachel E.; Choi, Arim; Kim, Midam; Bradlow, Ann R.

2012-01-01

This paper describes the development of the Wildcat Corpus of native- and foreign-accented English, a corpus containing scripted and spontaneous speech recordings from 24 native speakers of American English and 52 non-native speakers of English. The core element of this corpus is a set of spontaneous speech recordings, for which a new method of eliciting dialogue-based, laboratory-quality speech recordings was developed (the Diapix task). Dialogues between two native speakers of English, between two non-native speakers of English (with either shared or different L1s), and between one native and one non-native speaker of English are included and analyzed in terms of general measures of communicative efficiency. The overall finding was that pairs of native talkers were most efficient, followed by mixed native/non-native pairs and non-native pairs with shared L1. Non-native pairs with different L1s were least efficient. These results support the hypothesis that successful speech communication depends both on the alignment of talkers to the target language and on the alignment of talkers to one another in terms of native language background. PMID:21313992

On compensation of mismatched recording conditions in the Bayesian approach for forensic automatic speaker recognition.

PubMed

Botti, F; Alexander, A; Drygajlo, A

2004-12-02

This paper deals with a procedure to compensate for mismatched recording conditions in forensic speaker recognition, using a statistical score normalization. Bayesian interpretation of the evidence in forensic automatic speaker recognition depends on three sets of recordings in order to perform forensic casework: reference (R) and control (C) recordings of the suspect, and a potential population database (P), as well as a questioned recording (QR) . The requirement of similar recording conditions between suspect control database (C) and the questioned recording (QR) is often not satisfied in real forensic cases. The aim of this paper is to investigate a procedure of normalization of scores, which is based on an adaptation of the Test-normalization (T-norm) [2] technique used in the speaker verification domain, to compensate for the mismatch. Polyphone IPSC-02 database and ASPIC (an automatic speaker recognition system developed by EPFL and IPS-UNIL in Lausanne, Switzerland) were used in order to test the normalization procedure. Experimental results for three different recording condition scenarios are presented using Tippett plots and the effect of the compensation on the evaluation of the strength of the evidence is discussed.
Validity of Single-Item Screening for Limited Health Literacy in English and Spanish Speakers.

PubMed

Bishop, Wendy Pechero; Craddock Lee, Simon J; Skinner, Celette Sugg; Jones, Tiffany M; McCallister, Katharine; Tiro, Jasmin A

2016-05-01

To evaluate 3 single-item screening measures for limited health literacy in a community-based population of English and Spanish speakers. We recruited 324 English and 314 Spanish speakers from a community research registry in Dallas, Texas, enrolled between 2009 and 2012. We used 3 screening measures: (1) How would you rate your ability to read?; (2) How confident are you filling out medical forms by yourself?; and (3) How often do you have someone help you read hospital materials? In analyses stratified by language, we used area under the receiver operating characteristic (AUROC) curves to compare each item with the validated 40-item Short Test of Functional Health Literacy in Adults. For English speakers, no difference was seen among the items. For Spanish speakers, "ability to read" identified inadequate literacy better than "help reading hospital materials" (AUROC curve = 0.76 vs 0.65; P = .019). The "ability to read" item performed the best, supporting use as a screening tool in safety-net systems caring for diverse populations. Future studies should investigate how to implement brief measures in safety-net settings and whether highlighting health literacy level influences providers' communication practices and patient outcomes.
Facilities to assist people to research into stammered speech

PubMed Central

Howell, Peter; Huckvale, Mark

2008-01-01

The purpose of this article is to indicate how access can be obtained, through Stammering Research, to audio recordings and transcriptions of spontaneous speech data from speakers who stammer. Selections of the first author’s data are available in several formats. We describe where to obtain free software for manipulation and analysis of the data in their respective formats. Papers reporting analyses of these data are invited as submissions to this section of Stammering Research. It is intended that subsequent analyses that employ these data will be published in Stammering Research on an on-going basis. Plans are outlined to provide similar data from young speakers (ones developing fluently and ones who stammer), follow-up data from speakers who stammer, data from speakers who stammer who do not speak English and from speakers who have other speech disorders, for comparison, all through the pages of Stammering Research. The invitation is extended to those promulgating evidence-based practice approaches (see the Journal of Fluency Disorders, volume 28, number 4 which is a special issue devoted to this topic) and anyone with other interesting data related to stammering to prepare them in a form that can be made accessible to others via Stammering Research. PMID:18418475
An Acoustically Based Sociolinguistic Analysis of Variable Coda /s/ Production in the Spanish of New York City

ERIC Educational Resources Information Center

Erker, Daniel Gerard

2012-01-01

This study examines a major linguistic event underway in New York City. Of its 10 million inhabitants, nearly a third are speakers of Spanish. This community is socially and linguistically diverse: Some speakers are recent arrivals from Latin America while others are lifelong New Yorkers. Some have origins in the Caribbean, the historic source of…
Learning to Think in a Second Language: Effects of Proficiency and Length of Exposure in English Learners of German

ERIC Educational Resources Information Center

Athanasopoulos, Panos; Damjanovic, Ljubica; Burnand, Julie; Bylund, Emanuel

2015-01-01

The aim of the current study is to investigate motion event cognition in second language learners in a higher education context. Based on recent findings that speakers of grammatical aspect languages like English attend less to the endpoint (goal) of events than do speakers of nonaspect languages like Swedish in a nonverbal categorization task…
The Influence of Orthography on the Production of Alphabetic, Second-Language Allophones by Speakers of a Non-Alphabetic Language

ERIC Educational Resources Information Center

Han, Jeong-Im; Kim, Joo-Yeon

2017-01-01

This study investigated the influence of orthographic information on the production of allophones in a second language (L2). Two proficiency levels of native Mandarin speakers learned novel Korean words with potential variants of /h/ based on auditory stimuli, and then they were provided various types of spellings for the variants, including the…
Managing Discourse in Intercultural Business Email Interactions: A Case Study of a British and Italian Business Transaction

ERIC Educational Resources Information Center

Incelli, Ersilia

2013-01-01

This paper investigates native speaker (NS) and non-native speaker (NNS) interaction in the workplace in computer-mediated communication (CMC). Based on empirical data from a 10-month email exchange between a medium-sized British company and a small-sized Italian company, the general aim of this study is to explore the nature of the intercultural…
Comparing Ease-of-Processing Values of the Same Set of Words for Native English Speakers and Japanese Learners of English

ERIC Educational Resources Information Center

Takashima, Hiroomi

2009-01-01

Ease of processing of 3,969 English words for native speakers and Japanese learners was investigated using lexical decision and naming latencies taken from the English Lexicon Project (Balota et al. The English Lexicon Project: A web-based repository of descriptive and behavioral measures for 40,481 English words and nonwords, 2002) and accuracy…
Speaking fundamental frequency and vowel formant frequencies: effects on perception of gender.

PubMed

Gelfer, Marylou Pausewang; Bennett, Quinn E

2013-09-01

The purpose of the present study was to investigate the contribution of vowel formant frequencies to gender identification in connected speech, the distinctiveness of vowel formants in males versus females, and how ambiguous speaking fundamental frequencies (SFFs) and vowel formants might affect perception of gender. Multivalent experimental. Speakers subjects (eight tall males, eight short females, and seven males and seven females of "middle" height) were recorded saying two carrier phrases to elicit the vowels /i/ and /α/ and a sentence. The gender/height groups were selected to (presumably) maximize formant differences between some groups (tall vs short) and minimize differences between others (middle height). Each subjects' samples were digitally altered to distinct SFFs (116, 145, 155, 165, and 207 Hz) to represent SFFs typical of average males, average females, and in an ambiguous range. Listeners judged the gender of each randomized altered speech sample. Results indicated that female speakers were perceived as female even with an SFF in the typical male range. For male speakers, gender perception was less accurate at SFFs of 165 Hz and higher. Although the ranges of vowel formants had considerable overlap between genders, significant differences in formant frequencies of males and females were seen. Vowel formants appeared to be important to perception of gender, especially for SFFs in the range of 145-165 Hz; however, formants may be a more salient cue in connected speech when compared with isolated vowels or syllables. Copyright © 2013 The Voice Foundation. Published by Mosby, Inc. All rights reserved.
Effect of gender on communication of health information to older adults.

PubMed

Dearborn, Jennifer L; Panzer, Victoria P; Burleson, Joseph A; Hornung, Frederick E; Waite, Harrison; Into, Frances H

2006-04-01

To examine the effect of gender on three key elements of communication with elderly individuals: effectiveness of the communication, perceived relevance to the individual, and effect of gender-stereotyped content. Survey. University of Connecticut Health Center. Thirty-three subjects (17 female); aged 69 to 91 (mean+/-standard deviation 82+/-5.4). Older adults listened to 16 brief narratives randomized in order and by the sex of the speaker (Narrator Voice). Effectiveness was measured according to ability to identify key features (Risks), and subjects were asked to rate the relevance (Plausibility). Number of Risks detected and determinations of plausibility were analyzed according to Subject Gender and Narrator Voice. Narratives were written for either sex or included male or female bias (Neutral or Stereotyped). Female subjects identified a significantly higher number of Risks across all narratives (P=.01). Subjects perceived a significantly higher number of Risks with a female Narrator Voice (P=.03). A significant Voice-by-Stereotype interaction was present for female-stereotyped narratives (P=.009). In narratives rated as Plausible, subjects detected more Risks (P=.02). Subject Gender influenced communication effectiveness. A female speaker resulted in identification of more Risks for subjects of both sexes, particularly for Stereotyped narratives. There was no significant effect of matching Subject Gender and Narrator Voice. This study suggests that the sex of the speaker influences the effectiveness of communication with older adults. These findings should motivate future research into the means by which medical providers can improve communication with their patients.
Verification of endocrinological functions at a short distance between parametric speakers and the human body.

PubMed

Lee, Soomin; Katsuura, Tetsuo; Shimomura, Yoshihiro

2011-01-01

In recent years, a new type of speaker called the parametric speaker has been used to generate highly directional sound, and these speakers are now commercially available. In our previous study, we verified that the burden of the parametric speaker was lower than that of the general speaker for endocrine functions. However, nothing has yet been demonstrated about the effects of the shorter distance than 2.6 m between parametric speakers and the human body. Therefore, we investigated the distance effect on endocrinological function and subjective evaluation. Nine male subjects participated in this study. They completed three consecutive sessions: a 20-min quiet period as a baseline, a 30-min mental task period with general speakers or parametric speakers, and a 20-min recovery period. We measured salivary cortisol and chromogranin A (CgA) concentrations. Furthermore, subjects took the Kwansei-gakuin Sleepiness Scale (KSS) test before and after the task and also a sound quality evaluation test after it. Four experiments, one with a speaker condition (general speaker and parametric speaker), the other with a distance condition (0.3 m and 1.0 m), were conducted, respectively, at the same time of day on separate days. We used three-way repeated measures ANOVA (speaker factor × distance factor × time factor) to examine the effects of the parametric speaker. We found that the endocrinological functions were not significantly different between the speaker condition and the distance condition. The results also showed that the physiological burdens increased with progress in time independent of the speaker condition and distance condition.
The Communication of Public Speaking Anxiety: Perceptions of Asian and American Speakers.

ERIC Educational Resources Information Center

Martini, Marianne; And Others

1992-01-01

Finds that U.S. audiences perceive Asian speakers to have more speech anxiety than U.S. speakers, even though Asian speakers do not self-report higher anxiety levels. Confirms that speech state anxiety is not communicated effectively between speakers and audiences for Asian or U.S. speakers. (SR)
An Investigation of Syntactic Priming among German Speakers at Varying Proficiency Levels

ERIC Educational Resources Information Center

Ruf, Helena T.

2011-01-01

This dissertation investigates syntactic priming in second language (L2) development among three speaker populations: (1) less proficient L2 speakers; (2) advanced L2 speakers; and (3) LI speakers. Using confederate scripting this study examines how German speakers choose certain word orders in locative constructions (e.g., "Auf dem Tisch…
Modeling Speaker Proficiency, Comprehensibility, and Perceived Competence in a Language Use Domain

ERIC Educational Resources Information Center

Schmidgall, Jonathan Edgar

2013-01-01

Research suggests that listener perceptions of a speaker's oral language use, or a speaker's "comprehensibility," may be influenced by a variety of speaker-, listener-, and context-related factors. Primary speaker factors include aspects of the speaker's proficiency in the target language such as pronunciation and…
Methods and apparatus for non-acoustic speech characterization and recognition

DOEpatents

Holzrichter, John F.

1999-01-01

By simultaneously recording EM wave reflections and acoustic speech information, the positions and velocities of the speech organs as speech is articulated can be defined for each acoustic speech unit. Well defined time frames and feature vectors describing the speech, to the degree required, can be formed. Such feature vectors can uniquely characterize the speech unit being articulated each time frame. The onset of speech, rejection of external noise, vocalized pitch periods, articulator conditions, accurate timing, the identification of the speaker, acoustic speech unit recognition, and organ mechanical parameters can be determined.
Methods and apparatus for non-acoustic speech characterization and recognition

DOE Office of Scientific and Technical Information (OSTI.GOV)

Holzrichter, J.F.

By simultaneously recording EM wave reflections and acoustic speech information, the positions and velocities of the speech organs as speech is articulated can be defined for each acoustic speech unit. Well defined time frames and feature vectors describing the speech, to the degree required, can be formed. Such feature vectors can uniquely characterize the speech unit being articulated each time frame. The onset of speech, rejection of external noise, vocalized pitch periods, articulator conditions, accurate timing, the identification of the speaker, acoustic speech unit recognition, and organ mechanical parameters can be determined.
The power of your vocal image.

PubMed

McCoy, L A

1996-03-01

Your vocal image is the impression that listeners form of you based on the sound of your voice. In a dental office, where the initial patient contact usually occurs over the phone, your vocal image is vitally important. According to social psychologists, people begin to make relatively durable first impressions within six to 12 seconds of perceiving a sensory cue. This means that patients begin to form their impressions of a telephone speaker almost immediately. Based on the qualities of the speaker's voice and how it is used, they'll form impressions related to everything from the speaker's physical and personality characteristics to his or her intellectual ability, and eventually even generalize their impressions to include the office that the speaker represents. If you want to improve your vocal image, you must first be aware of exactly what that image is. There are two factors that combine to create a vocal impression--the speaker's physical vocal tools and the sound that is created by them. The five physical tools involved are the lungs, vocal cords, throat, mouth and ears. At each stage in the sound production process, we can easily fall into negative habits and lazy patterns if we're not careful. Although we can't do much about our physical voice mechanism, we can certainly exercise a great deal of control over how our voice is used. A strong, confident voice is an essential part of effective interpersonal communication. If you want to project an image of confidence and professionalism, don't overlook the subtle benefits of effective vocal power.
Do children go for the nice guys? The influence of speaker benevolence and certainty on selective word learning.

PubMed

Bergstra, Myrthe; DE Mulder, Hannah N M; Coopmans, Peter

2018-04-06

This study investigated how speaker certainty (a rational cue) and speaker benevolence (an emotional cue) influence children's willingness to learn words in a selective learning paradigm. In two experiments four- to six-year-olds learnt novel labels from two speakers and, after a week, their memory for these labels was reassessed. Results demonstrated that children retained the label-object pairings for at least a week. Furthermore, children preferred to learn from certain over uncertain speakers, but they had no significant preference for nice over nasty speakers. When the cues were combined, children followed certain speakers, even if they were nasty. However, children did prefer to learn from nice and certain speakers over nasty and certain speakers. These results suggest that rational cues regarding a speaker's linguistic competence trump emotional cues regarding a speaker's affective status in word learning. However, emotional cues were found to have a subtle influence on this process.
Speaker Recognition Through NLP and CWT Modeling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brown-VanHoozer, S.A.; Kercel, S.W.; Tucker, R.W.

The objective of this research is to develop a system capable of identifying speakers on wiretaps from a large database (>500 speakers) with a short search time duration (<30 seconds), and with better than 90% accuracy. Much previous research in speaker recognition has led to algorithms that produced encouraging preliminary results, but were overwhelmed when applied to populations of more than a dozen or so different speakers. The authors are investigating a solution to the "large population" problem by seeking two completely different kinds of characterizing features. These features are he techniques of Neuro-Linguistic Programming (NLP) and the continuous waveletmore » transform (CWT). NLP extracts precise neurological, verbal and non-verbal information, and assimilates the information into useful patterns. These patterns are based on specific cues demonstrated by each individual, and provide ways of determining congruency between verbal and non-verbal cues. The primary NLP modalities are characterized through word spotting (or verbal predicates cues, e.g., see, sound, feel, etc.) while the secondary modalities would be characterized through the speech transcription used by the individual. This has the practical effect of reducing the size of the search space, and greatly speeding up the process of identifying an unknown speaker. The wavelet-based line of investigation concentrates on using vowel phonemes and non-verbal cues, such as tempo. The rationale for concentrating on vowels is there are a limited number of vowels phonemes, and at least one of them usually appears in even the shortest of speech segments. Using the fast, CWT algorithm, the details of both the formant frequency and the glottal excitation characteristics can be easily extracted from voice waveforms. The differences in the glottal excitation waveforms as well as the formant frequency are evident in the CWT output. More significantly, the CWT reveals significant detail of the glottal excitation waveform.« less
Speaker recognition through NLP and CWT modeling.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brown-VanHoozer, A.; Kercel, S. W.; Tucker, R. W.

The objective of this research is to develop a system capable of identifying speakers on wiretaps from a large database (>500 speakers) with a short search time duration (<30 seconds), and with better than 90% accuracy. Much previous research in speaker recognition has led to algorithms that produced encouraging preliminary results, but were overwhelmed when applied to populations of more than a dozen or so different speakers. The authors are investigating a solution to the ''huge population'' problem by seeking two completely different kinds of characterizing features. These features are extracted using the techniques of Neuro-Linguistic Programming (NLP) and themore » continuous wavelet transform (CWT). NLP extracts precise neurological, verbal and non-verbal information, and assimilates the information into useful patterns. These patterns are based on specific cues demonstrated by each individual, and provide ways of determining congruency between verbal and non-verbal cues. The primary NLP modalities are characterized through word spotting (or verbal predicates cues, e.g., see, sound, feel, etc.) while the secondary modalities would be characterized through the speech transcription used by the individual. This has the practical effect of reducing the size of the search space, and greatly speeding up the process of identifying an unknown speaker. The wavelet-based line of investigation concentrates on using vowel phonemes and non-verbal cues, such as tempo. The rationale for concentrating on vowels is there are a limited number of vowels phonemes, and at least one of them usually appears in even the shortest of speech segments. Using the fast, CWT algorithm, the details of both the formant frequency and the glottal excitation characteristics can be easily extracted from voice waveforms. The differences in the glottal excitation waveforms as well as the formant frequency are evident in the CWT output. More significantly, the CWT reveals significant detail of the glottal excitation waveform.« less

Adaptive Communication: Languages with More Non-Native Speakers Tend to Have Fewer Word Forms

PubMed Central

Bentz, Christian; Verkerk, Annemarie; Kiela, Douwe; Hill, Felix; Buttery, Paula

2015-01-01

Explaining the diversity of languages across the world is one of the central aims of typological, historical, and evolutionary linguistics. We consider the effect of language contact-the number of non-native speakers a language has-on the way languages change and evolve. By analysing hundreds of languages within and across language families, regions, and text types, we show that languages with greater levels of contact typically employ fewer word forms to encode the same information content (a property we refer to as lexical diversity). Based on three types of statistical analyses, we demonstrate that this variance can in part be explained by the impact of non-native speakers on information encoding strategies. Finally, we argue that languages are information encoding systems shaped by the varying needs of their speakers. Language evolution and change should be modeled as the co-evolution of multiple intertwined adaptive systems: On one hand, the structure of human societies and human learning capabilities, and on the other, the structure of language. PMID:26083380
Examining intuitive cancer risk perceptions in Haitian-Creole and Spanish-speaking populations

PubMed Central

Hay, Jennifer; Brennessel, Debra; Kemeny, M. Margaret; Lubetkin, Erica

2017-01-01

Background There is a developing emphasis on intuition and affect in the illness risk perception process, yet there have been no available strategies to measure these constructs in non-English speakers. This study examined the comprehensibility and acceptability of translations of cancer risk beliefs in Haitian-Creole and Spanish. Methods An established, iterative, team-based translation process was employed. Cognitive interviews (n=20 in Haitian-Creole speakers; n=23 in Spanish speakers) were conducted in an inner city primary care clinic by trained interviewers who were native speakers of each language. Use of an established coding scheme for problematic terms and ambiguous concepts resulted in rewording and dropping items. Results Most items (90% in the Haitian-Creole version; 87% in the Spanish version) were highly comprehensible. Discussion This work will allow for further research examining health outcomes associated with risk perceptions across diverse, non-English language subgroups, paving the way for targeted risk communication with these populations. PMID:25505052
Depression recognition and capacity for self-report among ethnically diverse nursing homes residents: Evidence of disparities in screening.

PubMed

Chun, Audrey; Reinhardt, Joann P; Ramirez, Mildred; Ellis, Julie M; Silver, Stephanie; Burack, Orah; Eimicke, Joseph P; Cimarolli, Verena; Teresi, Jeanne A

2017-12-01

To examine agreement between Minimum Data Set clinician ratings and researcher assessments of depression among ethnically diverse nursing home residents using the 9-item Patient Health Questionnaire. Although depression is common among nursing homes residents, its recognition remains a challenge. Observational baseline data from a longitudinal intervention study. Sample of 155 residents from 12 long-term care units in one US facility; 50 were interviewed in Spanish. Convergence between clinician and researcher ratings was examined for (i) self-report capacity, (ii) suicidal ideation, (iii) at least moderate depression, (iv) Patient Health Questionnaire severity scores. Experiences by clinical raters using the depression assessment were analysed. The intraclass correlation coefficient was used to examine concordance and Cohen's kappa to examine agreement between clinicians and researchers. Moderate agreement (κ = 0.52) was observed in determination of capacity and poor to fair agreement in reporting suicidal ideation (κ = 0.10-0.37) across time intervals. Poor agreement was observed in classification of at least moderate depression (κ = -0.02 to 0.24), lower than the maximum kappa obtainable (0.58-0.85). Eight assessors indicated problems assessing Spanish-speaking residents. Among Spanish speakers, researchers identified 16% with Patient Health Questionnaire scores of 10 or greater, and 14% with thoughts of self-harm whilst clinicians identified 6% and 0%, respectively. This study advances the field of depression recognition in long-term care by identification of possible challenges in assessing Spanish speakers. Use of the Patient Health Questionnaire requires further investigation, particularly among non-English speakers. Depression screening for ethnically diverse nursing home residents is required, as underreporting of depression and suicidal ideation among Spanish speakers may result in lack of depression recognition and referral for evaluation and treatment. Training in depression recognition is imperative to improve the recognition, evaluation and treatment of depression in older people living in nursing homes. © 2017 John Wiley & Sons Ltd.
Speaker's voice as a memory cue.

PubMed

Campeanu, Sandra; Craik, Fergus I M; Alain, Claude

2015-02-01

Speaker's voice occupies a central role as the cornerstone of auditory social interaction. Here, we review the evidence suggesting that speaker's voice constitutes an integral context cue in auditory memory. Investigation into the nature of voice representation as a memory cue is essential to understanding auditory memory and the neural correlates which underlie it. Evidence from behavioral and electrophysiological studies suggest that while specific voice reinstatement (i.e., same speaker) often appears to facilitate word memory even without attention to voice at study, the presence of a partial benefit of similar voices between study and test is less clear. In terms of explicit memory experiments utilizing unfamiliar voices, encoding methods appear to play a pivotal role. Voice congruency effects have been found when voice is specifically attended at study (i.e., when relatively shallow, perceptual encoding takes place). These behavioral findings coincide with neural indices of memory performance such as the parietal old/new recollection effect and the late right frontal effect. The former distinguishes between correctly identified old words and correctly identified new words, and reflects voice congruency only when voice is attended at study. Characterization of the latter likely depends upon voice memory, rather than word memory. There is also evidence to suggest that voice effects can be found in implicit memory paradigms. However, the presence of voice effects appears to depend greatly on the task employed. Using a word identification task, perceptual similarity between study and test conditions is, like for explicit memory tests, crucial. In addition, the type of noise employed appears to have a differential effect. While voice effects have been observed when white noise is used at both study and test, using multi-talker babble does not confer the same results. In terms of neuroimaging research modulations, characterization of an implicit memory effect reflective of voice congruency is currently lacking. Copyright © 2014 Elsevier B.V. All rights reserved.
Multisensory and modality specific processing of visual speech in different regions of the premotor cortex

PubMed Central

Callan, Daniel E.; Jones, Jeffery A.; Callan, Akiko

2014-01-01

Behavioral and neuroimaging studies have demonstrated that brain regions involved with speech production also support speech perception, especially under degraded conditions. The premotor cortex (PMC) has been shown to be active during both observation and execution of action (“Mirror System” properties), and may facilitate speech perception by mapping unimodal and multimodal sensory features onto articulatory speech gestures. For this functional magnetic resonance imaging (fMRI) study, participants identified vowels produced by a speaker in audio-visual (saw the speaker's articulating face and heard her voice), visual only (only saw the speaker's articulating face), and audio only (only heard the speaker's voice) conditions with varying audio signal-to-noise ratios in order to determine the regions of the PMC involved with multisensory and modality specific processing of visual speech gestures. The task was designed so that identification could be made with a high level of accuracy from visual only stimuli to control for task difficulty and differences in intelligibility. The results of the functional magnetic resonance imaging (fMRI) analysis for visual only and audio-visual conditions showed overlapping activity in inferior frontal gyrus and PMC. The left ventral inferior premotor cortex (PMvi) showed properties of multimodal (audio-visual) enhancement with a degraded auditory signal. The left inferior parietal lobule and right cerebellum also showed these properties. The left ventral superior and dorsal premotor cortex (PMvs/PMd) did not show this multisensory enhancement effect, but there was greater activity for the visual only over audio-visual conditions in these areas. The results suggest that the inferior regions of the ventral premotor cortex are involved with integrating multisensory information, whereas, more superior and dorsal regions of the PMC are involved with mapping unimodal (in this case visual) sensory features of the speech signal with articulatory speech gestures. PMID:24860526
Improvements of ModalMax High-Fidelity Piezoelectric Audio Device

NASA Technical Reports Server (NTRS)

Woodard, Stanley E.

2005-01-01

ModalMax audio speakers have been enhanced by innovative means of tailoring the vibration response of thin piezoelectric plates to produce a high-fidelity audio response. The ModalMax audio speakers are 1 mm in thickness. The device completely supplants the need to have a separate driver and speaker cone. ModalMax speakers can perform the same applications of cone speakers, but unlike cone speakers, ModalMax speakers can function in harsh environments such as high humidity or extreme wetness. New design features allow the speakers to be completely submersed in salt water, making them well suited for maritime applications. The sound produced from the ModalMax audio speakers has sound spatial resolution that is readily discernable for headset users.
What does it take to stress a word? Digital manipulation of stress markers in ataxic dysarthria.

PubMed

Lowit, Anja; Ijitona, Tolulope; Kuschmann, Anja; Corson, Stephen; Soraghan, John

2018-05-18

Stress production is important for effective communication, but this skill is frequently impaired in people with motor speech disorders. The literature reports successful treatment of these deficits in this population, thus highlighting the therapeutic potential of this area. However, no specific guidance is currently available to clinicians about whether any of the stress markers are more effective than others, to what degree they have to be manipulated, and whether strategies need to differ according to the underlying symptoms. In order to provide detailed information on how stress production problems can be addressed, the study investigated (1) the minimum amount of change in a single stress marker necessary to achieve significant improvement in stress target identification; and (2) whether stress can be signalled more effectively with a combination of stress markers. Data were sourced from a sentence stress task performed by 10 speakers with ataxic dysarthria and 10 healthy matched control participants. Fifteen utterances perceived as having incorrect stress patterns (no stress, all words stressed or inappropriate word stressed) were selected and digitally manipulated in a stepwise fashion based on typical speaker performance. Manipulations were performed on F0, intensity and duration, either in isolation or in combination with each other. In addition, pitch contours were modified for some utterances. A total of 50 naïve listeners scored which word they perceived as being stressed. Results showed that increases in duration and intensity at levels smaller than produced by the control participants resulted in significant improvements in listener accuracy. The effectiveness of F0 increases depended on the underlying error pattern. Overall intensity showed the most stable effects. Modifications of the pitch contour also resulted in significant improvements, but not to the same degree as amplification. Integration of two or more stress markers did not result in better results than manipulation of individual stress markers, unless they were combined with pitch contour modifications. The results highlight the potential for improvement of stress production in speakers with motor speech disorders. The fact that individual parameter manipulation is as effective as combining them will facilitate the therapeutic process considerably, as will the result that amplification at lower levels than seen in typical speakers is sufficient. The difference in results across utterance sets highlights the need to investigate the underlying error pattern in order to select the most effective compensatory strategy for clients. © 2018 Royal College of Speech and Language Therapists.
Frequency-Limiting Effects on Speech and Environmental Sound Identification for Cochlear Implant and Normal Hearing Listeners

PubMed Central

Chang, Son-A; Won, Jong Ho; Kim, HyangHee; Oh, Seung-Ha; Tyler, Richard S.; Cho, Chang Hyun

2018-01-01

Background and Objectives It is important to understand the frequency region of cues used, and not used, by cochlear implant (CI) recipients. Speech and environmental sound recognition by individuals with CI and normal-hearing (NH) was measured. Gradients were also computed to evaluate the pattern of change in identification performance with respect to the low-pass filtering or high-pass filtering cutoff frequencies. Subjects and Methods Frequency-limiting effects were implemented in the acoustic waveforms by passing the signals through low-pass filters (LPFs) or high-pass filters (HPFs) with seven different cutoff frequencies. Identification of Korean vowels and consonants produced by a male and female speaker and environmental sounds was measured. Crossover frequencies were determined for each identification test, where the LPF and HPF conditions show the identical identification scores. Results CI and NH subjects showed changes in identification performance in a similar manner as a function of cutoff frequency for the LPF and HPF conditions, suggesting that the degraded spectral information in the acoustic signals may similarly constraint the identification performance for both subject groups. However, CI subjects were generally less efficient than NH subjects in using the limited spectral information for speech and environmental sound identification due to the inefficient coding of acoustic cues through the CI sound processors. Conclusions This finding will provide vital information in Korean for understanding how different the frequency information is in receiving speech and environmental sounds by CI processor from normal hearing. PMID:29325391
Frequency-Limiting Effects on Speech and Environmental Sound Identification for Cochlear Implant and Normal Hearing Listeners.

PubMed

Chang, Son-A; Won, Jong Ho; Kim, HyangHee; Oh, Seung-Ha; Tyler, Richard S; Cho, Chang Hyun

2017-12-01

It is important to understand the frequency region of cues used, and not used, by cochlear implant (CI) recipients. Speech and environmental sound recognition by individuals with CI and normal-hearing (NH) was measured. Gradients were also computed to evaluate the pattern of change in identification performance with respect to the low-pass filtering or high-pass filtering cutoff frequencies. Frequency-limiting effects were implemented in the acoustic waveforms by passing the signals through low-pass filters (LPFs) or high-pass filters (HPFs) with seven different cutoff frequencies. Identification of Korean vowels and consonants produced by a male and female speaker and environmental sounds was measured. Crossover frequencies were determined for each identification test, where the LPF and HPF conditions show the identical identification scores. CI and NH subjects showed changes in identification performance in a similar manner as a function of cutoff frequency for the LPF and HPF conditions, suggesting that the degraded spectral information in the acoustic signals may similarly constraint the identification performance for both subject groups. However, CI subjects were generally less efficient than NH subjects in using the limited spectral information for speech and environmental sound identification due to the inefficient coding of acoustic cues through the CI sound processors. This finding will provide vital information in Korean for understanding how different the frequency information is in receiving speech and environmental sounds by CI processor from normal hearing.
The Influence of Orthography on the Production of Alphabetic, Second-Language Allophones by Speakers of a Non-alphabetic Language.

PubMed

Han, Jeong-Im; Kim, Joo-Yeon

2017-08-01

This study investigated the influence of orthographic information on the production of allophones in a second language (L2). Two proficiency levels of native Mandarin speakers learned novel Korean words with potential variants of /h/ based on auditory stimuli, and then they were provided various types of spellings for the variants, including the letters for [[Formula: see text
DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS

DTIC Science & Technology

2015-05-29

DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS Elliot Singer and Douglas Reynolds Massachusetts Institute of...development data is assumed to be unavailable. The method is based on a generalization of data whitening used in association with i-vector length...normalization and utilizes a library of whitening transforms trained at system development time using strictly out-of-domain data. The approach is
Musical Sophistication and the Effect of Complexity on Auditory Discrimination in Finnish Speakers.

PubMed

Dawson, Caitlin; Aalto, Daniel; Šimko, Juraj; Vainio, Martti; Tervaniemi, Mari

2017-01-01

Musical experiences and native language are both known to affect auditory processing. The present work aims to disentangle the influences of native language phonology and musicality on behavioral and subcortical sound feature processing in a population of musically diverse Finnish speakers as well as to investigate the specificity of enhancement from musical training. Finnish speakers are highly sensitive to duration cues since in Finnish, vowel and consonant duration determine word meaning. Using a correlational approach with a set of behavioral sound feature discrimination tasks, brainstem recordings, and a musical sophistication questionnaire, we find no evidence for an association between musical sophistication and more precise duration processing in Finnish speakers either in the auditory brainstem response or in behavioral tasks, but they do show an enhanced pitch discrimination compared to Finnish speakers with less musical experience and show greater duration modulation in a complex task. These results are consistent with a ceiling effect set for certain sound features which corresponds to the phonology of the native language, leaving an opportunity for music experience-based enhancement of sound features not explicitly encoded in the language (such as pitch, which is not explicitly encoded in Finnish). Finally, the pattern of duration modulation in more musically sophisticated Finnish speakers suggests integrated feature processing for greater efficiency in a real world musical situation. These results have implications for research into the specificity of plasticity in the auditory system as well as to the effects of interaction of specific language features with musical experiences.
Musical Sophistication and the Effect of Complexity on Auditory Discrimination in Finnish Speakers

PubMed Central

Dawson, Caitlin; Aalto, Daniel; Šimko, Juraj; Vainio, Martti; Tervaniemi, Mari

2017-01-01

Musical experiences and native language are both known to affect auditory processing. The present work aims to disentangle the influences of native language phonology and musicality on behavioral and subcortical sound feature processing in a population of musically diverse Finnish speakers as well as to investigate the specificity of enhancement from musical training. Finnish speakers are highly sensitive to duration cues since in Finnish, vowel and consonant duration determine word meaning. Using a correlational approach with a set of behavioral sound feature discrimination tasks, brainstem recordings, and a musical sophistication questionnaire, we find no evidence for an association between musical sophistication and more precise duration processing in Finnish speakers either in the auditory brainstem response or in behavioral tasks, but they do show an enhanced pitch discrimination compared to Finnish speakers with less musical experience and show greater duration modulation in a complex task. These results are consistent with a ceiling effect set for certain sound features which corresponds to the phonology of the native language, leaving an opportunity for music experience-based enhancement of sound features not explicitly encoded in the language (such as pitch, which is not explicitly encoded in Finnish). Finally, the pattern of duration modulation in more musically sophisticated Finnish speakers suggests integrated feature processing for greater efficiency in a real world musical situation. These results have implications for research into the specificity of plasticity in the auditory system as well as to the effects of interaction of specific language features with musical experiences. PMID:28450829
The positive side of a negative reference: the delay between linguistic processing and common ground

PubMed Central

Noveck, Ira; Rivera, Natalia; Jaume-Guazzini, Francisco

2017-01-01

Interlocutors converge on names to refer to entities. For example, a speaker might refer to a novel looking object as the jellyfish and, once identified, the listener will too. The hypothesized mechanism behind such referential precedents is a subject of debate. The common ground view claims that listeners register the object as well as the identity of the speaker who coined the label. The linguistic view claims that, once established, precedents are treated by listeners like any other linguistic unit, i.e. without needing to keep track of the speaker. To test predictions from each account, we used visual-world eyetracking, which allows observations in real time, during a standard referential communication task. Participants had to select objects based on instructions from two speakers. In the critical condition, listeners sought an object with a negative reference such as not the jellyfish. We aimed to determine the extent to which listeners rely on the linguistic input, common ground or both. We found that initial interpretations were based on linguistic processing only and that common ground considerations do emerge but only after 1000 ms. Our findings support the idea that—at least temporally—linguistic processing can be isolated from common ground. PMID:28386440
Teaching First Language Speakers to Communicate across Linguistic Difference: Addressing Attitudes, Comprehension, and Strategies

ERIC Educational Resources Information Center

Subtirelu, Nicholas Close; Lindemann, Stephanie

2016-01-01

While most research in applied linguistics has focused on second language (L2) speakers and their language capabilities, the success of interaction between such speakers and first language (L1) speakers also relies on the positive attitudes and communication skills of the L1 speakers. However, some research has suggested that many L1 speakers lack…
Temporal and acoustic characteristics of Greek vowels produced by adults with cerebral palsy

NASA Astrophysics Data System (ADS)

Botinis, Antonis; Orfanidou, Ioanna; Fourakis, Marios; Fourakis, Marios

2005-09-01

The present investigation examined the temporal and spectral characteristics of Greek vowels as produced by speakers with intact (NO) versus cerebral palsy affected (CP) neuromuscular systems. Six NO and six CP native speakers of Greek produced the Greek vowels [i, e, a, o, u] in the first syllable of CVCV nonsense words in a short carrier phrase. Stress could be on either the first or second syllable. There were three female and three male speakers in each group. In terms of temporal characteristics, the results showed that: vowels produced by CP speakers were longer than vowels produced by NO speakers; stressed vowels were longer than unstressed vowels; vowels produced by female speakers were longer than vowels produced by male speakers. In terms of spectral characteristics the results showed that the vowel space of the CP speakers was smaller than that of the NO speakers. This is similar to the results recently reported by Liu et al. [J. Acoust. Soc. Am. 117, 3879-3889 (2005)] for CP speakers of Mandarin. There was also a reduction of the acoustic vowel space defined by unstressed vowels, but this reduction was much more pronounced in the vowel productions of CP speakers than NO speakers.
Inferring speaker attributes in adductor spasmodic dysphonia: ratings from unfamiliar listeners.

PubMed

Isetti, Derek; Xuereb, Linnea; Eadie, Tanya L

2014-05-01

To determine whether unfamiliar listeners' perceptions of speakers with adductor spasmodic dysphonia (ADSD) differ from control speakers on the parameters of relative age, confidence, tearfulness, and vocal effort and are related to speaker-rated vocal effort or voice-specific quality of life. Twenty speakers with ADSD (including 6 speakers with ADSD plus tremor) and 20 age- and sex-matched controls provided speech recordings, completed a voice-specific quality-of-life instrument (Voice Handicap Index; Jacobson et al., 1997), and rated their own vocal effort. Twenty listeners evaluated speech samples for relative age, confidence, tearfulness, and vocal effort using rating scales. Listeners judged speakers with ADSD as sounding significantly older, less confident, more tearful, and more effortful than control speakers (p < .01). Increased vocal effort was strongly associated with decreased speaker confidence (rs = .88-.89) and sounding more tearful (rs = .83-.85). Self-rated speaker effort was moderately related (rs = .45-.52) to listener impressions. Listeners' perceptions of confidence and tearfulness were also moderately associated with higher Voice Handicap Index scores (rs = .65-.70). Unfamiliar listeners judge speakers with ADSD more negatively than control speakers, with judgments extending beyond typical clinical measures. The results have implications for counseling and understanding the psychosocial effects of ADSD.
The Memory Jog Service

NASA Astrophysics Data System (ADS)

Dimakis, Nikolaos; Soldatos, John; Polymenakos, Lazaros; Sturm, Janienke; Neumann, Joachim; Casas, Josep R.

The CHIL Memory Jog service focuses on facilitating the collaboration of participants in meetings, lectures, presentations, and other human interactive events, occurring in indoor CHIL spaces. It exploits the whole set of the perceptual components that have been developed by the CHIL Consortium partners (e.g., person tracking, face identification, audio source localization, etc) along with a wide range of actuating devices such as projectors, displays, targeted audio devices, speakers, etc. The underlying set of perceptual components provides a constant flow of elementary contextual information, such as “person at location x0,y0”, “speech at location x0,y0”, information that alone is not of significant use. However, the CHIL Memory Jog service is accompanied by powerful situation identification techniques that fuse all the incoming information and creates complex states that drive the actuating logic.
Perception of emotionally loaded vocal expressions and its connection to responses to music. A cross-cultural investigation: Estonia, Finland, Sweden, Russia, and the USA

PubMed Central

Waaramaa, Teija; Leisiö, Timo

2013-01-01

The present study focused on voice quality and the perception of the basic emotions from speech samples in cross-cultural conditions. It was examined whether voice quality, cultural, or language background, age, or gender were related to the identification of the emotions. Professional actors (n2) and actresses (n2) produced non-sense sentences (n32) and protracted vowels (n8) expressing the six basic emotions, interest, and a neutral emotional state. The impact of musical interests on the ability to distinguish between emotions or valence (on an axis positivity – neutrality – negativity) from voice samples was studied. Listening tests were conducted on location in five countries: Estonia, Finland, Russia, Sweden, and the USA with 50 randomly chosen participants (25 males and 25 females) in each country. The participants (total N = 250) completed a questionnaire eliciting their background information and musical interests. The responses in the listening test and the questionnaires were statistically analyzed. Voice quality parameters and the share of the emotions and valence identified correlated significantly with each other for both genders. The percentage of emotions and valence identified was clearly above the chance level in each of the five countries studied, however, the countries differed significantly from each other for the identified emotions and the gender of the speaker. The samples produced by females were identified significantly better than those produced by males. Listener's age was a significant variable. Only minor gender differences were found for the identification. Perceptual confusion in the listening test between emotions seemed to be dependent on their similar voice production types. Musical interests tended to have a positive effect on the identification of the emotions. The results also suggest that identifying emotions from speech samples may be easier for those listeners who share a similar language or cultural background with the speaker. PMID:23801972
Syntactic learning by mere exposure - An ERP study in adult learners

PubMed Central

Mueller, Jutta L; Oberecker, Regine; Friederici, Angela D

2009-01-01

Background Artificial language studies have revealed the remarkable ability of humans to extract syntactic structures from a continuous sound stream by mere exposure. However, it remains unclear whether the processes acquired in such tasks are comparable to those applied during normal language processing. The present study compares the ERPs to auditory processing of simple Italian sentences in native and non-native speakers after brief exposure to Italian sentences of a similar structure. The sentences contained a non-adjacent dependency between an auxiliary and the morphologically marked suffix of the verb. Participants were presented four alternating learning and testing phases. During learning phases only correct sentences were presented while during testing phases 50 percent of the sentences contained a grammatical violation. Results The non-native speakers successfully learned the dependency and displayed an N400-like negativity and a subsequent anteriorily distributed positivity in response to rule violations. The native Italian group showed an N400 followed by a P600 effect. Conclusion The presence of the P600 suggests that native speakers applied a grammatical rule. In contrast, non-native speakers appeared to use a lexical form-based processing strategy. Thus, the processing mechanisms acquired in the language learning task were only partly comparable to those applied by competent native speakers. PMID:19640301

Syntactic learning by mere exposure--an ERP study in adult learners.

PubMed

Mueller, Jutta L; Oberecker, Regine; Friederici, Angela D

2009-07-29

Artificial language studies have revealed the remarkable ability of humans to extract syntactic structures from a continuous sound stream by mere exposure. However, it remains unclear whether the processes acquired in such tasks are comparable to those applied during normal language processing. The present study compares the ERPs to auditory processing of simple Italian sentences in native and non-native speakers after brief exposure to Italian sentences of a similar structure. The sentences contained a non-adjacent dependency between an auxiliary and the morphologically marked suffix of the verb. Participants were presented four alternating learning and testing phases. During learning phases only correct sentences were presented while during testing phases 50 percent of the sentences contained a grammatical violation. The non-native speakers successfully learned the dependency and displayed an N400-like negativity and a subsequent anteriorily distributed positivity in response to rule violations. The native Italian group showed an N400 followed by a P600 effect. The presence of the P600 suggests that native speakers applied a grammatical rule. In contrast, non-native speakers appeared to use a lexical form-based processing strategy. Thus, the processing mechanisms acquired in the language learning task were only partly comparable to those applied by competent native speakers.
Clinical Predictors of Shigella and Campylobacter Infection in Children in the United States

PubMed Central

Smith, Timothy; Ye, Xiangyang; Stockmann, Chris; Cohen, Daniel; Leber, Amy; Daly, Judy; Jackson, Jami; Selvarangan, Rangaraj; Kanwar, Neena; Bender, Jeffery; Bard, Jennifer Dien; Festekjian, Ara; Duffy, Susan; Larsen, Chari; Baca, Tanya; Holmberg, Kristen; Bourzac, Kevin; Chapin, Kimberle C; Pavia, Andrew; Leung, Daniel

2017-01-01

Abstract Background Infectious gastroenteritis is a major cause of morbidity and mortality among children worldwide. While most episodes are self-limiting, for select pathogens such as Shigella and Campylobacter, etiological diagnosis may allow effective antimicrobial therapy and aid public health interventions. Unfortunately, clinical predictors of such pathogens are not well established and are based on small studies using bacterial culture for identification. Methods We used prospectively collected data from a multi-center study of pediatric gastroenteritis employing multi-pathogen molecular diagnostics to determine clinical predictors associated with 1) Shigella and 2) Shigella or Campylobacter infection. We used machine learning algorithms for clinical predictor identification, then performed logistic regression on features extracted plus pre-selected variables of interest. Results Of 993 children enrolled with acute diarrhea, we detected Shigella spp. in 56 (5.6%) and Campylobacter spp. in 24 (2.4%). Compared with children who had neither pathogen detected (of whom, >70% had ≥1 potential pathogen identified), bloody diarrhea (odds ratio 4.0), headache (OR 2.2), fever (OR 7.1), summer (OR 3.3), and sick contact with GI illness (OR 2.2), were positively associated with Shigella, and out-of-state travel (OR 0.3) and vomiting and/or nausea (OR 0.4) were negatively associated (Table). For Shigella or Campylobacter, predictors were similar but season was no longer significantly associated with infection. Conclusion These results can create prediction models and assist clinicians with identifying patients who would benefit from diagnostic testing and earlier antibiotic treatment. This may curtail unnecessary antibiotic use, and help to direct and target appropriate use of stool diagnostics. Disclosures A. Leber, BioFIre Diagnostics: Research Contractor and Scientific Advisor, Research support, Speaker honorarium and Travel expenses J. Daly, Biofire: Grant Investigator, Grant recipient R. Selvarangan, BioFire Diagnostics: Board Member and Investigator, Consulting fee and Research grant Luminex Diagnostics: Investigator, Research grant J. Dien Bard, BioFire: Consultant and Investigator, Research grant and Speaker honorarium K. Holmberg, BioFire Diagnostics: Employee, Salary K. Bourzac, BioFire Diagnostics: Employee, Salary K. C. Chapin, BioFire Diagnstics: Investigator, Research support A. Pavia, BioFire Diagnostics: Grant Investigator, Research grant
The effect of intensive speech rate and intonation therapy on intelligibility in Parkinson's disease.

PubMed

Martens, Heidi; Van Nuffelen, Gwen; Dekens, Tomas; Hernández-Díaz Huici, Maria; Kairuz Hernández-Díaz, Hector Arturo; De Letter, Miet; De Bodt, Marc

2015-01-01

Most studies on treatment of prosody in individuals with dysarthria due to Parkinson's disease are based on intensive treatment of loudness. The present study investigates the effect of intensive treatment of speech rate and intonation on the intelligibility of individuals with dysarthria due to Parkinson's disease. A one group pretest-posttest design was used to compare intelligibility, speech rate, and intonation before and after treatment. Participants included eleven Dutch-speaking individuals with predominantly moderate dysarthria due to Parkinson's disease, who received five one-hour treatment sessions per week during three weeks. Treatment focused on lowering speech rate and magnifying the phrase final intonation contrast between statements and questions. Intelligibility was perceptually assessed using a standardized sentence intelligibility test. Speech rate was automatically assessed during the sentence intelligibility test as well as during a passage reading task and a storytelling task. Intonation was perceptually assessed using a sentence reading task and a sentence repetition task, and also acoustically analyzed in terms of maximum fundamental frequency. After treatment, there was a significant improvement of sentence intelligibility (effect size .83), a significant increase of pause frequency during the passage reading task, a significant improvement of correct listener identification of statements and questions, and a significant increase of the maximum fundamental frequency in the final syllable of questions during both intonation tasks. The findings suggest that participants were more intelligible and more able to manipulate pause frequency and statement-question intonation after treatment. However, the relationship between the change in intelligibility on the one hand and the changes in speech rate and intonation on the other hand is not yet fully understood. Results are nuanced in the light of the operated research design. The reader will be able to: (1) describe the effect of intensive speech rate and intonation treatment on intelligibility of speakers with dysarthria due to PD, (2) describe the effect of intensive speech rate treatment on rate manipulation by speakers with dysarthria due to PD, and (3) describe the effect of intensive intonation treatment on manipulation of the phrase final intonation contrast between statements and questions by speakers with dysarthria due to PD. Copyright © 2015 Elsevier Inc. All rights reserved.
Low-voltage Driven Graphene Foam Thermoacoustic Speaker.

PubMed

Fei, Wenwen; Zhou, Jianxin; Guo, Wanlin

2015-05-20

A low-voltage driven thermoacoustic speaker is fabricated based on three-dimensional graphene foams synthesized by a nickel-template assisted chemical vapor deposition method. The corresponding thermoacoustic performances are found to be related to its microstructure. Graphene foams exhibit low heat-leakage to substrates and feasible tunability in structures and thermoacoustic performances, having great promise for applications in flexible or ultrasonic acoustic devices. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
On the Time Course of Vocal Emotion Recognition

PubMed Central

Pell, Marc D.; Kotz, Sonja A.

2011-01-01

How quickly do listeners recognize emotions from a speaker's voice, and does the time course for recognition vary by emotion type? To address these questions, we adapted the auditory gating paradigm to estimate how much vocal information is needed for listeners to categorize five basic emotions (anger, disgust, fear, sadness, happiness) and neutral utterances produced by male and female speakers of English. Semantically-anomalous pseudo-utterances (e.g., The rivix jolled the silling) conveying each emotion were divided into seven gate intervals according to the number of syllables that listeners heard from sentence onset. Participants (n = 48) judged the emotional meaning of stimuli presented at each gate duration interval, in a successive, blocked presentation format. Analyses looked at how recognition of each emotion evolves as an utterance unfolds and estimated the “identification point” for each emotion. Results showed that anger, sadness, fear, and neutral expressions are recognized more accurately at short gate intervals than happiness, and particularly disgust; however, as speech unfolds, recognition of happiness improves significantly towards the end of the utterance (and fear is recognized more accurately than other emotions). When the gate associated with the emotion identification point of each stimulus was calculated, data indicated that fear (M = 517 ms), sadness (M = 576 ms), and neutral (M = 510 ms) expressions were identified from shorter acoustic events than the other emotions. These data reveal differences in the underlying time course for conscious recognition of basic emotions from vocal expressions, which should be accounted for in studies of emotional speech processing. PMID:22087275
Identification and Evaluation of Medical Translator Mobile Applications Using an Adapted APPLICATIONS Scoring System.

PubMed

Khander, Amrin; Farag, Sara; Chen, Katherine T

2017-12-22

With an increasing number of patients requiring translator services, many providers are turning to mobile applications (apps) for assistance. However, there have been no published reviews of medical translator apps. To identify and evaluate medical translator mobile apps using an adapted APPLICATIONS scoring system. A list of apps was identified from the Apple iTunes and Google Play stores, using the search term, "medical translator." Apps not found on two different searches, not in an English-based platform, not used for translation, or not functional after purchase, were excluded. The remaining apps were evaluated using an adapted APPLICATIONS scoring system, which included both objective and subjective criteria. App comprehensiveness was a weighted score defined by the number of non-English languages included in each app relative to the proportion of non-English speakers in the United States. The Apple iTunes and Google Play stores. Medical translator apps identified using the search term "medical translator." Main Outcomes and Measures: Compilation of medical translator apps for provider usage. A total of 524 apps were initially found. After applying the exclusion criteria, 20 (8.2%) apps from the Google Play store and 26 (9.2%) apps from the Apple iTunes store remained for evaluation. The highest scoring apps, Canopy Medical Translator, Universal Doctor Speaker, and Vocre Translate, scored 13.5 out of 18.7 possible points. A large proportion of apps initially found did not function as medical translator apps. Using the APPLICATIONS scoring system, we have identified and evaluated medical translator apps for providers who care for non-English speaking patients.
Diabetes Applications for Arabic Speakers: A Critical Review of Available Apps for Android and iOS Operated Smartphones.

PubMed

Alhuwail, Dari

2016-01-01

Today, 415 million adults have diabetes; more than 35 million of diabetic adults live in the Middle East and North Africa region. Smartphone penetration in the region is high and applications, or apps, for diabetics have shown promising results in recent years. This study took place between September and December 2015 and reviewed all currently available smartphone diabetes apps for Arabic speakers in both the Apple App and Google Play stores. There were only few diabetes apps for Arabic speakers; only eighteen apps were discovered and considered for this study. Most apps were informational. Only three apps offered utilities such as glucose reading conversion. The apps had issues related to information quality and adherence to latest evidence-based medical advice. There is a need for more evidence-based Arabic diabetes apps with improved functionality. Future research of Arabic diabetes apps should also focus on the involvement and engagement of the patients in the design of these apps.
Preverbal Infants Infer Third-Party Social Relationships Based on Language.

PubMed

Liberman, Zoe; Woodward, Amanda L; Kinzler, Katherine D

2017-04-01

Language provides rich social information about its speakers. For instance, adults and children make inferences about a speaker's social identity, geographic origins, and group membership based on her language and accent. Although infants prefer speakers of familiar languages (Kinzler, Dupoux, & Spelke, 2007), little is known about the developmental origins of humans' sensitivity to language as marker of social identity. We investigated whether 9-month-olds use the language a person speaks as an indicator of that person's likely social relationships. Infants were familiarized with videos of two people who spoke the same or different languages, and then viewed test videos of those two individuals affiliating or disengaging. Results suggest that infants expected two people who spoke the same language to be more likely to affiliate than two people who spoke different languages. Thus, infants view language as a meaningful social marker and use language to make inferences about third-party social relationships. Copyright © 2016 Cognitive Science Society, Inc.
Age Estimation Based on Children's Voice: A Fuzzy-Based Decision Fusion Strategy

PubMed Central

Ting, Hua-Nong

2014-01-01

Automatic estimation of a speaker's age is a challenging research topic in the area of speech analysis. In this paper, a novel approach to estimate a speaker's age is presented. The method features a “divide and conquer” strategy wherein the speech data are divided into six groups based on the vowel classes. There are two reasons behind this strategy. First, reduction in the complicated distribution of the processing data improves the classifier's learning performance. Second, different vowel classes contain complementary information for age estimation. Mel-frequency cepstral coefficients are computed for each group and single layer feed-forward neural networks based on self-adaptive extreme learning machine are applied to the features to make a primary decision. Subsequently, fuzzy data fusion is employed to provide an overall decision by aggregating the classifier's outputs. The results are then compared with a number of state-of-the-art age estimation methods. Experiments conducted based on six age groups including children aged between 7 and 12 years revealed that fuzzy fusion of the classifier's outputs resulted in considerable improvement of up to 53.33% in age estimation accuracy. Moreover, the fuzzy fusion of decisions aggregated the complementary information of a speaker's age from various speech sources. PMID:25006595
The effect of tonal changes on voice onset time in Mandarin esophageal speech.

PubMed

Liu, Hanjun; Ng, Manwa L; Wan, Mingxi; Wang, Supin; Zhang, Yi

2008-03-01

The present study investigated the effect of tonal changes on voice onset time (VOT) between normal laryngeal (NL) and superior esophageal (SE) speakers of Mandarin Chinese. VOT values were measured from the syllables /pha/, /tha/, and /kha/ produced at four tone levels by eight NL and seven SE speakers who were native speakers of Mandarin. Results indicated that Mandarin tones were associated with significantly different VOT values for NL speakers, in which high-falling tone was associated with significantly shorter VOT values than mid-rising tone and falling-rising tone. Regarding speaker group, SE speakers showed significantly shorter VOT values than NL speakers across all tone levels. This may be related to their use of pharyngoesophageal (PE) segment as another sound source. SE speakers appear to take a shorter time to start PE segment vibration compared to NL speakers using the vocal folds for vibration.
Proficiency in English sentence stress production by Cantonese speakers who speak English as a second language (ESL).

PubMed

Ng, Manwa L; Chen, Yang

2011-12-01

The present study examined English sentence stress produced by native Cantonese speakers who were speaking English as a second language (ESL). Cantonese ESL speakers' proficiency in English stress production as perceived by English-speaking listeners was also studied. Acoustical parameters associated with sentence stress including fundamental frequency (F0), vowel duration, and intensity were measured from the English sentences produced by 40 Cantonese ESL speakers. Data were compared with those obtained from 40 native speakers of American English. The speech samples were also judged by eight native listeners who were native speakers of American English for placement, degree, and naturalness of stress. Results showed that Cantonese ESL speakers were able to use F0, vowel duration, and intensity to differentiate sentence stress patterns. Yet, both female and male Cantonese ESL speakers exhibited consistently higher F0 in stressed words than English speakers. Overall, Cantonese ESL speakers were found to be proficient in using duration and intensity to signal sentence stress, in a way comparable with English speakers. In addition, F0 and intensity were found to correlate closely with perceptual judgement and the degree of stress with the naturalness of stress.
Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech☆

PubMed Central

Cao, Houwei; Verma, Ragini; Nenkova, Ani

2014-01-01

We introduce a ranking approach for emotion recognition which naturally incorporates information about the general expressivity of speakers. We demonstrate that our approach leads to substantial gains in accuracy compared to conventional approaches. We train ranking SVMs for individual emotions, treating the data from each speaker as a separate query, and combine the predictions from all rankers to perform multi-class prediction. The ranking method provides two natural benefits. It captures speaker specific information even in speaker-independent training/testing conditions. It also incorporates the intuition that each utterance can express a mix of possible emotion and that considering the degree to which each emotion is expressed can be productively exploited to identify the dominant emotion. We compare the performance of the rankers and their combination to standard SVM classification approaches on two publicly available datasets of acted emotional speech, Berlin and LDC, as well as on spontaneous emotional data from the FAU Aibo dataset. On acted data, ranking approaches exhibit significantly better performance compared to SVM classification both in distinguishing a specific emotion from all others and in multi-class prediction. On the spontaneous data, which contains mostly neutral utterances with a relatively small portion of less intense emotional utterances, ranking-based classifiers again achieve much higher precision in identifying emotional utterances than conventional SVM classifiers. In addition, we discuss the complementarity of conventional SVM and ranking-based classifiers. On all three datasets we find dramatically higher accuracy for the test items on whose prediction the two methods agree compared to the accuracy of individual methods. Furthermore on the spontaneous data the ranking and standard classification are complementary and we obtain marked improvement when we combine the two classifiers by late-stage fusion. PMID:25422534
Characterizing opto-electret based paper speakers by using a real-time projection Moiré metrology system

NASA Astrophysics Data System (ADS)

Chang, Ya-Ling; Hsu, Kuan-Yu; Lee, Chih-Kung

2016-03-01

Advancement of distributed piezo-electret sensors and actuators facilitates various smart systems development, which include paper speakers, opto-piezo/electret bio-chips, etc. The array-based loudspeaker system possess several advantages over conventional coil speakers, such as light-weightness, flexibility, low power consumption, directivity, etc. With the understanding that the performance of the large-area piezo-electret loudspeakers or even the microfluidic biochip transport behavior could be tailored by changing their dynamic behaviors, a full-field real-time high-resolution non-contact metrology system was developed. In this paper, influence of the resonance modes and the transient vibrations of an arraybased loudspeaker system on the acoustic effect were measured by using a real-time projection moiré metrology system and microphones. To make the paper speaker even more versatile, we combine the photosensitive material TiOPc into the original electret loudspeaker. The vibration of this newly developed opto-electret loudspeaker could be manipulated by illuminating different light-intensity patterns. Trying to facilitate the tailoring process of the opto-electret loudspeaker, projection moiré was adopted to measure its vibration. By recording the projected fringes which are modulated by the contours of the testing sample, the phase unwrapping algorithm can give us a continuous phase distribution which is proportional to the object height variations. With the aid of the projection moiré metrology system, the vibrations associated with each distinctive light pattern could be characterized. Therefore, we expect that the overall acoustic performance could be improved by finding the suitable illuminating patterns. In this manuscript, the system performance of the projection moiré and the optoelectret paper speakers were cross-examined and verified by the experimental results obtained.
Discourse intonation and second language acquisition: Three genre-based studies

NASA Astrophysics Data System (ADS)

Wennerstrom, Ann Kristin

1997-12-01

This dissertation investigates intonation in the discourse of nonnative speakers of English. It is proposed that intonation functions as a grammar of cohesion, contributing to the coherence of the text. Based on a componential model of intonation adapted from Pierrehumbert and Hirshberg (1990), three empirical studies were conducted in different genres of spoken discourse: academic lectures, conversations, and oral narratives. Using computerized speech technology, excerpts of taped discourse were measured to determine how intonation associated with various constituents of text. All speakers were tested for overall English level on tests adapted from the SPEAK Test (ETS, 1985). Comparisons using native speaker data were also conducted. The first study investigated intonation in lectures given by Chinese teaching assistants. Multivariate analyses showed that intonation was a significant factor contributing to better scores on an exam of overall comprehensibility in English. The second study investigated the role of intonation in the turn-taking system in conversations between native and nonnative speakers of English. The final study considered emotional aspects of intonation in narratives, using the framework of Labov and Waletsky (1967). In sum, adult nonnative speakers can acquire intonation as part of their overall language development, although there is evidence against any specific order of acquisition. Intonation contributes to coherence by indicating the relationship between the current utterance and what is assumed to already be in participants' mental representations of the discourse. It also performs a segmentation function, denoting hierarchical relationships among utterances and/or turns. It is suggested that while pitch can be a resource in cross-cultural communication to show emotion and attitude, the grammatical aspects of intonation must be acquired gradually.
The Sociophonetic and Acoustic Vowel Dynamics of Michigan's Upper Peninsula English

NASA Astrophysics Data System (ADS)

Rankinen, Wil A.

The present sociophonetic study examines the English variety in Michigan's Upper Peninsula (UP) based upon a 130-speaker sample from Marquette County. The linguistic variables of interest include seven monophthongs and four diphthongs: 1) front lax, 2) low back, and 3) high back monophthongs and 4) short and 5) long diphthongs. The sample is stratified by the predictor variables of heritage-location, bilingualism, age, sex and class. The aim of the thesis is two fold: 1) to determine the extent of potential substrate effects on a 71-speaker older-aged bilingual and monolingual subset of these UP English speakers focusing on the predictor variables of heritage-location and bilingualism, and 2) to determine the extent of potential exogenous influences on an 85-speaker subset of UP English monolingual speakers by focusing on the predictor variables of heritage-location, age, sex and class. All data were extracted from a reading passage task collected during a sociolinguistic interview and measured instrumentally. The findings of this apparent-time data reveal the presence of lingering effects from substrate sources and developing effects from exogenous sources based upon American and Canadian models of diffusion. The linguistic changes-in-progress from above, led by middle-class females, are taking shape in the speech of UP residents of whom are propagating linguistic phenomena typically associated with varieties of Canadian English (i.e., low-back merger, Canadian shift, and Canadian raising); however, the findings also report resistance of such norms by working-class females. Finally, the data also reveal substrate effects demonstrating cases of dialect leveling and maintenance. As a result, the speech spoken in Michigan's Upper Peninsula can presently be described as a unique variety of English comprised of lingering substrate effects as well as exogenous effects modeled from both American and Canadian English linguistic norms.
Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech☆

PubMed

Cao, Houwei; Verma, Ragini; Nenkova, Ani

2015-01-01

We introduce a ranking approach for emotion recognition which naturally incorporates information about the general expressivity of speakers. We demonstrate that our approach leads to substantial gains in accuracy compared to conventional approaches. We train ranking SVMs for individual emotions, treating the data from each speaker as a separate query, and combine the predictions from all rankers to perform multi-class prediction. The ranking method provides two natural benefits. It captures speaker specific information even in speaker-independent training/testing conditions. It also incorporates the intuition that each utterance can express a mix of possible emotion and that considering the degree to which each emotion is expressed can be productively exploited to identify the dominant emotion. We compare the performance of the rankers and their combination to standard SVM classification approaches on two publicly available datasets of acted emotional speech, Berlin and LDC, as well as on spontaneous emotional data from the FAU Aibo dataset. On acted data, ranking approaches exhibit significantly better performance compared to SVM classification both in distinguishing a specific emotion from all others and in multi-class prediction. On the spontaneous data, which contains mostly neutral utterances with a relatively small portion of less intense emotional utterances, ranking-based classifiers again achieve much higher precision in identifying emotional utterances than conventional SVM classifiers. In addition, we discuss the complementarity of conventional SVM and ranking-based classifiers. On all three datasets we find dramatically higher accuracy for the test items on whose prediction the two methods agree compared to the accuracy of individual methods. Furthermore on the spontaneous data the ranking and standard classification are complementary and we obtain marked improvement when we combine the two classifiers by late-stage fusion.
Deep bottleneck features for spoken language identification.

PubMed

Jiang, Bing; Song, Yan; Wei, Si; Liu, Jun-Hua; McLoughlin, Ian Vince; Dai, Li-Rong

2014-01-01

A key problem in spoken language identification (LID) is to design effective representations which are specific to language information. For example, in recent years, representations based on both phonotactic and acoustic features have proven their effectiveness for LID. Although advances in machine learning have led to significant improvements, LID performance is still lacking, especially for short duration speech utterances. With the hypothesis that language information is weak and represented only latently in speech, and is largely dependent on the statistical properties of the speech content, existing representations may be insufficient. Furthermore they may be susceptible to the variations caused by different speakers, specific content of the speech segments, and background noise. To address this, we propose using Deep Bottleneck Features (DBF) for spoken LID, motivated by the success of Deep Neural Networks (DNN) in speech recognition. We show that DBFs can form a low-dimensional compact representation of the original inputs with a powerful descriptive and discriminative capability. To evaluate the effectiveness of this, we design two acoustic models, termed DBF-TV and parallel DBF-TV (PDBF-TV), using a DBF based i-vector representation for each speech utterance. Results on NIST language recognition evaluation 2009 (LRE09) show significant improvements over state-of-the-art systems. By fusing the output of phonotactic and acoustic approaches, we achieve an EER of 1.08%, 1.89% and 7.01% for 30 s, 10 s and 3 s test utterances respectively. Furthermore, various DBF configurations have been extensively evaluated, and an optimal system proposed.
Patient predictors of colposcopy comprehension of consent among English- and Spanish-speaking women.

PubMed

Krankl, Julia Tatum; Shaykevich, Shimon; Lipsitz, Stuart; Lehmann, Lisa Soleymani

2011-01-01

patients with limited English proficiency may be at increased risk for diminished understanding of clinical procedures. This study sought to assess patient predictors of comprehension of colposcopy information during informed consent and to assess differences in understanding between English and Spanish speakers. between June and August 2007, English- and Spanish-speaking colposcopy patients at two Boston hospitals were surveyed to assess their understanding of the purpose, risks, benefits, alternatives, and nature of colposcopy. Patient demographic information was collected. there were 183 women who consented to participate in the study. We obtained complete data on 111 English speakers and 38 Spanish speakers. English speakers were more likely to have a higher education, greater household income, and private insurance. Subjects correctly answered an average of 7.91 ± 2.16 (72%) of 11 colposcopy survey questions. English speakers answered more questions correctly than Spanish speakers (8.50 ± 1.92 [77%] vs 6.21 ± 1.93 [56%]; p < .001). Using linear regression to adjust for confounding variables, we found that language was not significantly associated with greater understanding (p = .46). Rather, education was the most significant predictor of colposcopy knowledge (p < .001). many colposcopy patients did not understand the procedure well enough to give informed consent. The observed differences in colposcopy comprehension based on language were a proxy for differences in education. Education, not language, predicted subjects' understanding of colposcopy. These results demonstrate the need for greater attention to patients' educational background to ensure adequate understanding of clinical information. 2011 Jacobs Institute of Women's Health. Published by Elsevier Inc.
Generation, language, body mass index, and activity patterns in Hispanic children.

PubMed

Taverno, Sharon E; Rollins, Brandi Y; Francis, Lori A

2010-02-01

The acculturation hypothesis proposes an overall disadvantage in health outcomes for Hispanic immigrants with more time spent living in the U.S., but little is known about how generational status and language may influence Hispanic children's relative weight and activity patterns. To investigate associations among generation and language with relative weight (BMI z-scores), physical activity, screen time, and participation in extracurricular activities (i.e., sports, clubs) in a U.S.-based, nationally representative sample of Hispanic children. Participants included 2012 Hispanic children aged 6-11 years from the cross-sectional 2003 National Survey of Children's Health. Children were grouped according to generational status (first, second, or third), and the primary language spoken in the home (English versus non-English). Primary analyses included adjusted logistic and multinomial logistic regression to examine the relationships among variables; all analyses were conducted between 2008 and 2009. Compared to third-generation, English speakers, first- and second-generation, non-English speakers were more than two times more likely to be obese. Moreover, first-generation, non-English speakers were half as likely to engage in regular physical activity and sports. Both first- and second-generation, non-English speakers were less likely to participate in clubs compared to second- and third-generation, English speakers. Overall, non-English-speaking groups reported less screen time compared to third-generation, English speakers. The hypothesis that Hispanics lose their health protection with more time spent in the U.S. was not supported in this sample of Hispanic children. Copyright 2010 American Journal of Preventive Medicine. Published by Elsevier Inc. All rights reserved.
Communication-related affective, behavioral, and cognitive reactions in speakers with spasmodic dysphonia.

PubMed

Watts, Christopher R; Vanryckeghem, Martine

2017-12-01

To investigate the self-perceived affective, behavioral, and cognitive reactions associated with communication of speakers with spasmodic dysphonia as a function of employment status. Prospective cross-sectional investigation. 148 Participants with spasmodic dysphonia (SD) completed an adapted version of the Behavior Assessment Battery (BAB-Voice), a multidimensional assessment of self-perceived reactions to communication. The BAB-Voice consisted of four subtests: the Speech Situation Checklist for A) Emotional Reaction (SSC-ER) and B) Speech Disruption (SSC-SD), C) the Behavior Checklist (BCL), and D) the Communication Attitude Test for Adults (BigCAT). Participants were assigned to groups based on employment status (working versus retired). Descriptive comparison of the BAB-Voice in speakers with SD to previously published non-dysphonic speaker data revealed substantially higher scores associated with SD across all four subtests. Multivariate Analysis of Variance (MANOVA) revealed no significantly different BAB-Voice subtest scores as a function of SD group status (working vs. retired). BAB-Voice scores revealed that speakers with SD experienced substantial impact of their voice disorder on communication attitude, coping behaviors, and affective reactions in speaking situations as reflected in their high BAB scores. These impacts do not appear to be influenced by work status, as speakers with SD who were employed or retired experienced similar levels of affective and behavioral reactions in various speaking situations and cognitive responses. These findings are consistent with previously published pilot data. The specificity of items assessed by means of the BAB-Voice may inform the clinician of valid patient-centered treatment goals which target the impairment extended beyond the physiological dimension. 2b.

Reflecting on Native Speaker Privilege

ERIC Educational Resources Information Center

Berger, Kathleen

2014-01-01

The issues surrounding native speakers (NSs) and nonnative speakers (NNSs) as teachers (NESTs and NNESTs, respectively) in the field of teaching English to speakers of other languages (TESOL) are a current topic of interest. In many contexts, the native speaker of English is viewed as the model teacher, thus putting the NEST into a position of…
English Speakers Attend More Strongly than Spanish Speakers to Manner of Motion when Classifying Novel Objects and Events

ERIC Educational Resources Information Center

Kersten, Alan W.; Meissner, Christian A.; Lechuga, Julia; Schwartz, Bennett L.; Albrechtsen, Justin S.; Iglesias, Adam

2010-01-01

Three experiments provide evidence that the conceptualization of moving objects and events is influenced by one's native language, consistent with linguistic relativity theory. Monolingual English speakers and bilingual Spanish/English speakers tested in an English-speaking context performed better than monolingual Spanish speakers and bilingual…
Speed-difficulty trade-off in speech: Chinese versus English

PubMed Central

Sun, Yao; Latash, Elizaveta M.; Mikaelian, Irina L.

2011-01-01

This study continues the investigation of the previously described speed-difficulty trade-off in picture description tasks. In particular, we tested a hypothesis that the Mandarin Chinese and American English are similar in showing logarithmic dependences between speech time and index of difficulty (ID), while they differ significantly in the amount of time needed to describe simple pictures, this difference increases for more complex pictures, and it is associated with a proportional difference in the number of syllables used. Subjects (eight Chinese speakers and eight English speakers) were tested in pairs. One subject (the Speaker) described simple pictures, while the other subject (the Performer) tried to reproduce the pictures based on the verbal description as quickly as possible with a set of objects. The Chinese speakers initiated speech production significantly faster than the English speakers. Speech time scaled linearly with ln(ID) in all subjects, but the regression coefficient was significantly higher in the English speakers as compared with the Chinese speakers. The number of errors was somewhat lower in the Chinese participants (not significantly). The Chinese pairs also showed a shorter delay between the initiation of speech and initiation of action by the Performer, shorter movement time by the Performer, and shorter overall performance time. The number of syllables scaled with ID, and the Chinese speakers used significantly smaller numbers of syllables. Speech rate was comparable between the two groups, about 3 syllables/s; it dropped for more complex pictures (higher ID). When asked to reproduce the same pictures without speaking, movement time scaled linearly with ln(ID); the Chinese performers were slower than the English performers. We conclude that natural languages show a speed-difficulty trade-off similar to Fitts’ law; the trade-offs in movement and speech production are likely to originate at a cognitive level. The time advantage of the Chinese participants originates not from similarity of the simple pictures and Chinese written characters and not from more sloppy performance. It is linked to using fewer syllables to transmit the same information. We suggest that natural languages may differ by informational density defined as the amount of information transmitted by a given number of syllables. PMID:21479658
Mobile voice health monitoring using a wearable accelerometer sensor and a smartphone platform.

PubMed

Mehta, Daryush D; Zañartu, Matías; Feng, Shengran W; Cheyne, Harold A; Hillman, Robert E

2012-11-01

Many common voice disorders are chronic or recurring conditions that are likely to result from faulty and/or abusive patterns of vocal behavior, referred to generically as vocal hyperfunction. An ongoing goal in clinical voice assessment is the development and use of noninvasively derived measures to quantify and track the daily status of vocal hyperfunction so that the diagnosis and treatment of such behaviorally based voice disorders can be improved. This paper reports on the development of a new, versatile, and cost-effective clinical tool for mobile voice monitoring that acquires the high-bandwidth signal from an accelerometer sensor placed on the neck skin above the collarbone. Using a smartphone as the data acquisition platform, the prototype device provides a user-friendly interface for voice use monitoring, daily sensor calibration, and periodic alert capabilities. Pilot data are reported from three vocally normal speakers and three subjects with voice disorders to demonstrate the potential of the device to yield standard measures of fundamental frequency and sound pressure level and model-based glottal airflow properties. The smartphone-based platform enables future clinical studies for the identification of the best set of measures for differentiating between normal and hyperfunctional patterns of voice use.
Mobile voice health monitoring using a wearable accelerometer sensor and a smartphone platform

PubMed Central

Mehta, Daryush D.; Zañartu, Matías; Feng, Shengran W.; Cheyne, Harold A.; Hillman, Robert E.

2012-01-01

Many common voice disorders are chronic or recurring conditions that are likely to result from faulty and/or abusive patterns of vocal behavior, referred to generically as vocal hyperfunction. An ongoing goal in clinical voice assessment is the development and use of noninvasively derived measures to quantify and track the daily status of vocal hyperfunction so that the diagnosis and treatment of such behaviorally based voice disorders can be improved. This paper reports on the development of a new, versatile, and cost-effective clinical tool for mobile voice monitoring that acquires the high-bandwidth signal from an accelerometer sensor placed on the neck skin above the collarbone. Using a smartphone as the data acquisition platform, the prototype device provides a user-friendly interface for voice use monitoring, daily sensor calibration, and periodic alert capabilities. Pilot data are reported from three vocally normal speakers and three subjects with voice disorders to demonstrate the potential of the device to yield standard measures of fundamental frequency and sound pressure level and model-based glottal airflow properties. The smartphone-based platform enables future clinical studies for the identification of the best set of measures for differentiating between normal and hyperfunctional patterns of voice use. PMID:22875236
Sound Processing Features for Speaker-Dependent and Phrase-Independent Emotion Recognition in Berlin Database

NASA Astrophysics Data System (ADS)

Anagnostopoulos, Christos Nikolaos; Vovoli, Eftichia

An emotion recognition framework based on sound processing could improve services in human-computer interaction. Various quantitative speech features obtained from sound processing of acting speech were tested, as to whether they are sufficient or not to discriminate between seven emotions. Multilayered perceptrons were trained to classify gender and emotions on the basis of a 24-input vector, which provide information about the prosody of the speaker over the entire sentence using statistics of sound features. Several experiments were performed and the results were presented analytically. Emotion recognition was successful when speakers and utterances were “known” to the classifier. However, severe misclassifications occurred during the utterance-independent framework. At least, the proposed feature vector achieved promising results for utterance-independent recognition of high- and low-arousal emotions.
The Development of the Speaker Independent ARM Continuous Speech Recognition System

DTIC Science & Technology

1992-01-01

spokeTi airborne reconnaissance reports u-ing a speech recognition system based on phoneme-level hidden Markov models (HMMs). Previous versions of the ARM...will involve automatic selection from multiple model sets, corresponding to different speaker types, and that the most rudimen- tary partition of a...The vocabulary size for the ARM task is 497 words. These words are related to the phoneme-level symbols corresponding to the models in the model set
Pharmaceutical speakers' bureaus, academic freedom, and the management of promotional speaking at academic medical centers.

PubMed

Boumil, Marcia M; Cutrell, Emily S; Lowney, Kathleen E; Berman, Harris A

2012-01-01

Pharmaceutical companies routinely engage physicians, particularly those with prestigious academic credentials, to deliver "educational" talks to groups of physicians in the community to help market the company's brand-name drugs. Although presented as educational, and even though they provide educational content, these events are intended to influence decisions about drug selection in ways that are not based on the suitability and effectiveness of the product, but on the prestige and persuasiveness of the speaker. A number of state legislatures and most academic medical centers have attempted to restrict physician participation in pharmaceutical marketing activities, though most restrictions are not absolute and have proven difficult to enforce. This article reviews the literature on why Speakers' Bureaus have become a lightning rod for academic/industry conflicts of interest and examines the arguments of those who defend physician participation. It considers whether the restrictions on Speakers' Bureaus are consistent with principles of academic freedom and concludes with the legal and institutional efforts to manage industry speaking. © 2012 American Society of Law, Medicine & Ethics, Inc.
Memory for non-native language: the role of lexical processing in the retention of surface form.

PubMed

Sampaio, Cristina; Konopka, Agnieszka E

2013-01-01

Research on memory for native language (L1) has consistently shown that retention of surface form is inferior to that of gist (e.g., Sachs, 1967). This paper investigates whether the same pattern is found in memory for non-native language (L2). We apply a model of bilingual word processing to more complex linguistic structures and predict that memory for L2 sentences ought to contain more surface information than L1 sentences. Native and non-native speakers of English were tested on a set of sentence pairs with different surface forms but the same meaning (e.g., "The bullet hit/struck the bull's eye"). Memory for these sentences was assessed with a cued recall procedure. Responses showed that native and non-native speakers did not differ in the accuracy of gist-based recall but that non-native speakers outperformed native speakers in the retention of surface form. The results suggest that L2 processing involves more intensive encoding of lexical level information than L1 processing.
Analysis of wolves and sheep. Final report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hogden, J.; Papcun, G.; Zlokarnik, I.

1997-08-01

In evaluating speaker verification systems, asymmetries have been observed in the ease with which people are able to break into other people`s voice locks. People who are good at breaking into voice locks are called wolves, and people whose locks are easy to break into are called sheep. (Goats are people that have a difficult time opening their own voice locks.) Analyses of speaker verification algorithms could be used to understand wolf/sheep asymmetries. Using the notion of a ``speaker space``, it is demonstrated that such asymmetries could arise even though the similarity of voice 1 to voice 2 is themore » same as the inverse similarity. This explains partially the wolf/sheep asymmetries, although there may be other factors. The speaker space can be computed from interspeaker similarity data using multidimensional scaling, and such speaker space can be used to given a good approximation of the interspeaker similarities. The derived speaker space can be used to predict which of the enrolled speakers are likely to be wolves and which are likely to be sheep. However, a speaker must first enroll in the speaker key system and then be compared to each of the other speakers; a good estimate of a person`s speaker space position could be obtained using only a speech sample.« less
Investigating Auditory Processing of Syntactic Gaps with L2 Speakers Using Pupillometry

ERIC Educational Resources Information Center

Fernandez, Leigh; Höhle, Barbara; Brock, Jon; Nickels, Lyndsey

2018-01-01

According to the Shallow Structure Hypothesis (SSH), second language (L2) speakers, unlike native speakers, build shallow syntactic representations during sentence processing. In order to test the SSH, this study investigated the processing of a syntactic movement in both native speakers of English and proficient late L2 speakers of English using…
Literacy Skill Differences between Adult Native English and Native Spanish Speakers

ERIC Educational Resources Information Center

Herman, Julia; Cote, Nicole Gilbert; Reilly, Lenore; Binder, Katherine S.

2013-01-01

The goal of this study was to compare the literacy skills of adult native English and native Spanish ABE speakers. Participants were 169 native English speakers and 124 native Spanish speakers recruited from five prior research projects. The results showed that the native Spanish speakers were less skilled on morphology and passage comprehension…
Grammatical Planning Units during Real-Time Sentence Production in Speakers with Agrammatic Aphasia and Healthy Speakers

ERIC Educational Resources Information Center

Lee, Jiyeon; Yoshida, Masaya; Thompson, Cynthia K.

2015-01-01

Purpose: Grammatical encoding (GE) is impaired in agrammatic aphasia; however, the nature of such deficits remains unclear. We examined grammatical planning units during real-time sentence production in speakers with agrammatic aphasia and control speakers, testing two competing models of GE. We queried whether speakers with agrammatic aphasia…
Word Durations in Non-Native English

PubMed Central

Baker, Rachel E.; Baese-Berk, Melissa; Bonnasse-Gahot, Laurent; Kim, Midam; Van Engen, Kristin J.; Bradlow, Ann R.

2010-01-01

In this study, we compare the effects of English lexical features on word duration for native and non-native English speakers and for non-native speakers with different L1s and a range of L2 experience. We also examine whether non-native word durations lead to judgments of a stronger foreign accent. We measured word durations in English paragraphs read by 12 American English (AE), 20 Korean, and 20 Chinese speakers. We also had AE listeners rate the `accentedness' of these non-native speakers. AE speech had shorter durations, greater within-speaker word duration variance, greater reduction of function words, and less between-speaker variance than non-native speech. However, both AE and non-native speakers showed sensitivity to lexical predictability by reducing second mentions and high frequency words. Non-native speakers with more native-like word durations, greater within-speaker word duration variance, and greater function word reduction were perceived as less accented. Overall, these findings identify word duration as an important and complex feature of foreign-accented English. PMID:21516172
Neural correlates of pragmatic language comprehension in autism spectrum disorders.

PubMed

Tesink, C M J Y; Buitelaar, J K; Petersson, K M; van der Gaag, R J; Kan, C C; Tendolkar, I; Hagoort, P

2009-07-01

Difficulties with pragmatic aspects of communication are universal across individuals with autism spectrum disorders (ASDs). Here we focused on an aspect of pragmatic language comprehension that is relevant to social interaction in daily life: the integration of speaker characteristics inferred from the voice with the content of a message. Using functional magnetic resonance imaging (fMRI), we examined the neural correlates of the integration of voice-based inferences about the speaker's age, gender or social background, and sentence content in adults with ASD and matched control participants. Relative to the control group, the ASD group showed increased activation in right inferior frontal gyrus (RIFG; Brodmann area 47) for speaker-incongruent sentences compared to speaker-congruent sentences. Given that both groups performed behaviourally at a similar level on a debriefing interview outside the scanner, the increased activation in RIFG for the ASD group was interpreted as being compensatory in nature. It presumably reflects spill-over processing from the language dominant left hemisphere due to higher task demands faced by the participants with ASD when integrating speaker characteristics and the content of a spoken sentence. Furthermore, only the control group showed decreased activation for speaker-incongruent relative to speaker-congruent sentences in right ventral medial prefrontal cortex (vMPFC; Brodmann area 10), including right anterior cingulate cortex (ACC; Brodmann area 24/32). Since vMPFC is involved in self-referential processing related to judgments and inferences about self and others, the absence of such a modulation in vMPFC activation in the ASD group possibly points to atypical default self-referential mental activity in ASD. Our results show that in ASD compensatory mechanisms are necessary in implicit, low-level inferential processes in spoken language understanding. This indicates that pragmatic language problems in ASD are not restricted to high-level inferential processes, but encompass the most basic aspects of pragmatic language processing.
Does the speaker matter? Online processing of semantic and pragmatic information in L2 speech comprehension.

PubMed

Foucart, Alice; Garcia, Xavier; Ayguasanosa, Meritxell; Thierry, Guillaume; Martin, Clara; Costa, Albert

2015-08-01

The present study investigated how pragmatic information is integrated during L2 sentence comprehension. We put forward that the differences often observed between L1 and L2 sentence processing may reflect differences on how various types of information are used to process a sentence, and not necessarily differences between native and non-native linguistic systems. Based on the idea that when a cue is missing or distorted, one relies more on other cues available, we hypothesised that late bilinguals favour the cues that they master during sentence processing. To verify this hypothesis we investigated whether late bilinguals take the speaker's identity (inferred by the voice) into account when incrementally processing speech and whether this affects their online interpretation of the sentence. To do so, we adapted Van Berkum, J.J.A., Van den Brink, D., Tesink, C.M.J.Y., Kos, M., Hagoort, P., 2008. J. Cogn. Neurosci. 20(4), 580-591, study in which sentences with either semantic violations or pragmatic inconsistencies were presented. While both the native and the non-native groups showed a similar response to semantic violations (N400), their response to speakers' inconsistencies slightly diverged; late bilinguals showed a positivity much earlier than native speakers (LPP). These results suggest that, like native speakers, late bilinguals process semantic and pragmatic information incrementally; however, what seems to differ between L1 and L2 processing is the time-course of the different processes. We propose that this difference may originate from late bilinguals' sensitivity to pragmatic information and/or their ability to efficiently make use of the information provided by the sentence context to generate expectations in relation to pragmatic information during L2 sentence comprehension. In other words, late bilinguals may rely more on speaker identity than native speakers when they face semantic integration difficulties. Copyright © 2015 Elsevier Ltd. All rights reserved.
Bystander capability to activate speaker function for continuous dispatcher assisted CPR in case of suspected cardiac arrest.

PubMed

Steensberg, Alvilda T; Eriksen, Mette M; Andersen, Lars B; Hendriksen, Ole M; Larsen, Heinrich D; Laier, Gunnar H; Thougaard, Thomas

2017-06-01

The European Resuscitation Council Guidelines 2015 recommend bystanders to activate their mobile phone speaker function, if possible, in case of suspected cardiac arrest. This is to facilitate continuous dialogue with the dispatcher including (if required) cardiopulmonary resuscitation instructions. The aim of this study was to measure the bystander capability to activate speaker function in case of suspected cardiac arrest. In 87days, a systematic prospective registration of bystander capability to activate the speaker function, when cardiac arrest was suspected, was performed. For those asked, "can you activate your mobile phone's speaker function", audio recordings were examined and categorized into groups according to the bystanders capability to activate speaker function on their own initiative, without instructions, or with instructions from the emergency medical dispatcher. Time delay was measured, in seconds, for the bystanders without pre-activated speaker function. 42.0% (58) was able to activate the speaker function without instructions, 2.9% (4) with instructions, 18.1% (25) on own initiative and 37.0% (51) were unable to activate the speaker function. The median time to activate speaker function was 19s and 8s, with and without instructions, respectively. Dispatcher assisted cardiopulmonary resuscitation with activated speaker function, in cases of suspected cardiac arrest, allows for continuous dialogue between the emergency medical dispatcher and the bystander. In this study, we found a 63.0% success rate of activating the speaker function in such situations. Copyright © 2017 Elsevier B.V. All rights reserved.
"I May Be a Native Speaker but I'm Not Monolingual": Reimagining "All" Teachers' Linguistic Identities in TESOL

ERIC Educational Resources Information Center

Ellis, Elizabeth M.

2016-01-01

Teacher linguistic identity has so far mainly been researched in terms of whether a teacher identifies (or is identified by others) as a native speaker (NEST) or nonnative speaker (NNEST) (Moussu & Llurda, 2008; Reis, 2011). Native speakers are presumed to be monolingual, and nonnative speakers, although by definition bilingual, tend to be…
GMM-based speaker age and gender classification in Czech and Slovak

NASA Astrophysics Data System (ADS)

Přibil, Jiří; Přibilová, Anna; Matoušek, Jindřich

2017-01-01

The paper describes an experiment with using the Gaussian mixture models (GMM) for automatic classification of the speaker age and gender. It analyses and compares the influence of different number of mixtures and different types of speech features used for GMM gender/age classification. Dependence of the computational complexity on the number of used mixtures is also analysed. Finally, the GMM classification accuracy is compared with the output of the conventional listening tests. The results of these objective and subjective evaluations are in correspondence.
How Cognitive Load Influences Speakers' Choice of Referring Expressions.

PubMed

Vogels, Jorrig; Krahmer, Emiel; Maes, Alfons

2015-08-01

We report on two experiments investigating the effect of an increased cognitive load for speakers on the choice of referring expressions. Speakers produced story continuations to addressees, in which they referred to characters that were either salient or non-salient in the discourse. In Experiment 1, referents that were salient for the speaker were non-salient for the addressee, and vice versa. In Experiment 2, all discourse information was shared between speaker and addressee. Cognitive load was manipulated by the presence or absence of a secondary task for the speaker. The results show that speakers under load are more likely to produce pronouns, at least when referring to less salient referents. We take this finding as evidence that speakers under load have more difficulties taking discourse salience into account, resulting in the use of expressions that are more economical for themselves. © 2014 Cognitive Science Society, Inc.

International perspectives on air quality: risk management principles for policy development--conference statement.

PubMed

Craig, Lorraine; Krewski, Dan; Samet, Jonathan; Shortreed, John; van Bree, Leendert; Krupnick, Alan J

2008-01-01

This statement is the result of discussions held at the 2005 NERAM IV Colloquium "International Perspectives on Air Quality: Risk Management Principles for Policy Development" and represents the collective views of 35 delegates, including international air quality policy analysts, academics, nongovernmental organizations, industry representatives, and decision makers from Mexico, Canada, the United States, the United Kingdom, Brazil, Hong Kong, and The Netherlands on principles for global air quality management. The objective of the colloquium was to "establish principles for air quality management based on the identification of international best practice in air quality policy development and implementation." This statement represents the main findings of a breakout group discussion session, presentations of an international panel of speakers from Canada, the United States, Mexico, and Hong Kong and views of the delegates expressed in plenary discussions. NERAM undertook a transparent process to try to ensure that the statement would accurately reflect the conference discussions, including documenting the proceedings and inviting delegates' comments on draft versions of the statement.
A dynamic multi-channel speech enhancement system for distributed microphones in a car environment

NASA Astrophysics Data System (ADS)

Matheja, Timo; Buck, Markus; Fingscheidt, Tim

2013-12-01

Supporting multiple active speakers in automotive hands-free or speech dialog applications is an interesting issue not least due to comfort reasons. Therefore, a multi-channel system for enhancement of speech signals captured by distributed distant microphones in a car environment is presented. Each of the potential speakers in the car has a dedicated directional microphone close to his position that captures the corresponding speech signal. The aim of the resulting overall system is twofold: On the one hand, a combination of an arbitrary pre-defined subset of speakers' signals can be performed, e.g., to create an output signal in a hands-free telephone conference call for a far-end communication partner. On the other hand, annoying cross-talk components from interfering sound sources occurring in multiple different mixed output signals are to be eliminated, motivated by the possibility of other hands-free applications being active in parallel. The system includes several signal processing stages. A dedicated signal processing block for interfering speaker cancellation attenuates the cross-talk components of undesired speech. Further signal enhancement comprises the reduction of residual cross-talk and background noise. Subsequently, a dynamic signal combination stage merges the processed single-microphone signals to obtain appropriate mixed signals at the system output that may be passed to applications such as telephony or a speech dialog system. Based on signal power ratios between the particular microphone signals, an appropriate speaker activity detection and therewith a robust control mechanism of the whole system is presented. The proposed system may be dynamically configured and has been evaluated for a car setup with four speakers sitting in the car cabin disturbed in various noise conditions.
Communication‐related affective, behavioral, and cognitive reactions in speakers with spasmodic dysphonia

PubMed Central

Vanryckeghem, Martine

2017-01-01

Objectives To investigate the self‐perceived affective, behavioral, and cognitive reactions associated with communication of speakers with spasmodic dysphonia as a function of employment status. Study Design Prospective cross‐sectional investigation Methods 148 Participants with spasmodic dysphonia (SD) completed an adapted version of the Behavior Assessment Battery (BAB‐Voice), a multidimensional assessment of self‐perceived reactions to communication. The BAB‐Voice consisted of four subtests: the Speech Situation Checklist for A) Emotional Reaction (SSC‐ER) and B) Speech Disruption (SSC‐SD), C) the Behavior Checklist (BCL), and D) the Communication Attitude Test for Adults (BigCAT). Participants were assigned to groups based on employment status (working versus retired). Results Descriptive comparison of the BAB‐Voice in speakers with SD to previously published non‐dysphonic speaker data revealed substantially higher scores associated with SD across all four subtests. Multivariate Analysis of Variance (MANOVA) revealed no significantly different BAB‐Voice subtest scores as a function of SD group status (working vs. retired). Conclusions BAB‐Voice scores revealed that speakers with SD experienced substantial impact of their voice disorder on communication attitude, coping behaviors, and affective reactions in speaking situations as reflected in their high BAB scores. These impacts do not appear to be influenced by work status, as speakers with SD who were employed or retired experienced similar levels of affective and behavioral reactions in various speaking situations and cognitive responses. These findings are consistent with previously published pilot data. The specificity of items assessed by means of the BAB‐Voice may inform the clinician of valid patient‐centered treatment goals which target the impairment extended beyond the physiological dimension. Level of Evidence 2b PMID:29299525
Perception of intelligibility and qualities of non-native accented speakers.

PubMed

Fuse, Akiko; Navichkova, Yuliya; Alloggio, Krysteena

To provide effective treatment to clients, speech-language pathologists must be understood, and be perceived to demonstrate the personal qualities necessary for therapeutic practice (e.g., resourcefulness and empathy). One factor that could interfere with the listener's perception of non-native speech is the speaker's accent. The current study explored the relationship between how accurately listeners could understand non-native speech and their perceptions of personal attributes of the speaker. Additionally, this study investigated how listeners' familiarity and experience with other languages may influence their perceptions of non-native accented speech. Through an online survey, native monolingual and bilingual English listeners rated four non-native accents (i.e., Spanish, Chinese, Russian, and Indian) on perceived intelligibility and perceived personal qualities (i.e., professionalism, intelligence, resourcefulness, empathy, and patience) necessary for speech-language pathologists. The results indicated significant relationships between the perception of intelligibility and the perception of personal qualities (i.e., professionalism, intelligence, and resourcefulness) attributed to non-native speakers. However, these findings were not supported for the Chinese accent. Bilingual listeners judged the non-native speech as more intelligible in comparison to monolingual listeners. No significant differences were found in the ratings between bilingual listeners who share the same language background as the speaker and other bilingual listeners. Based on the current findings, greater perception of intelligibility was the key to promoting a positive perception of personal qualities such as professionalism, intelligence, and resourcefulness, important for speech-language pathologists. The current study found evidence to support the claim that bilinguals have a greater ability in understanding non-native accented speech compared to monolingual listeners. The results, however, did not confirm an advantage for bilingual listeners sharing the same language backgrounds with the non-native speaker over other bilingual listeners. Copyright © 2017 Elsevier Inc. All rights reserved.
Long-Term Experience with Chinese Language Shapes the Fusiform Asymmetry of English Reading

PubMed Central

Mei, Leilei; Xue, Gui; Lu, Zhong-Lin; Chen, Chuansheng; Wei, Miao; He, Qinghua; Dong, Qi

2015-01-01

Previous studies have suggested differential engagement of the bilateral fusiform gyrus in the processing of Chinese and English. The present study tested the possibility that long-term experience with Chinese language affects the fusiform laterality of English reading by comparing three samples: Chinese speakers, English speakers with Chinese experience, and English speakers without Chinese experience. We found that, when reading words in their respective native language, Chinese and English speakers without Chinese experience differed in functional laterality of the posterior fusiform region (right laterality for Chinese speakers, but left laterality for English speakers). More importantly, compared with English speakers without Chinese experience, English speakers with Chinese experience showed more recruitment of the right posterior fusiform cortex for English words and pseudowords, which is similar to how Chinese speakers processed Chinese. These results suggest that long-term experience with Chinese shapes the fusiform laterality of English reading and have important implications for our understanding of the cross-language influences in terms of neural organization and of the functions of different fusiform subregions in reading. PMID:25598049
Statistical Evaluation of Biometric Evidence in Forensic Automatic Speaker Recognition

NASA Astrophysics Data System (ADS)

Drygajlo, Andrzej

Forensic speaker recognition is the process of determining if a specific individual (suspected speaker) is the source of a questioned voice recording (trace). This paper aims at presenting forensic automatic speaker recognition (FASR) methods that provide a coherent way of quantifying and presenting recorded voice as biometric evidence. In such methods, the biometric evidence consists of the quantified degree of similarity between speaker-dependent features extracted from the trace and speaker-dependent features extracted from recorded speech of a suspect. The interpretation of recorded voice as evidence in the forensic context presents particular challenges, including within-speaker (within-source) variability and between-speakers (between-sources) variability. Consequently, FASR methods must provide a statistical evaluation which gives the court an indication of the strength of the evidence given the estimated within-source and between-sources variabilities. This paper reports on the first ENFSI evaluation campaign through a fake case, organized by the Netherlands Forensic Institute (NFI), as an example, where an automatic method using the Gaussian mixture models (GMMs) and the Bayesian interpretation (BI) framework were implemented for the forensic speaker recognition task.
The Effects of Self-Disclosure on Male and Female Perceptions of Individuals Who Stutter.

PubMed

Byrd, Courtney T; McGill, Megann; Gkalitsiou, Zoi; Cappellini, Colleen

2017-02-01

The purpose of this study was to examine the influence of self-disclosure on observers' perceptions of persons who stutter. Participants (N = 173) were randomly assigned to view 2 of 4 possible videos (i.e., male self-disclosure, male no self-disclosure, female self-disclosure, and female no self-disclosure). After viewing both videos, participants completed a survey assessing their perceptions of the speakers. Controlling for observer and speaker gender, listeners were more likely to select speakers who self-disclosed their stuttering as more friendly, outgoing, and confident compared with speakers who did not self-disclose. Observers were more likely to select speakers who did not self-disclose as unfriendly and shy compared with speakers who used a self-disclosure statement. Controlling for self-disclosure and observer gender, observers were less likely to choose the female speaker as friendlier, outgoing, and confident compared with the male speaker. Observers also were more likely to select the female speaker as unfriendly, shy, unintelligent, and insecure compared with the male speaker and were more likely to report that they were more distracted when viewing the videos. Results lend support to the effectiveness of self-disclosure as a technique that persons who stutter can use to positively influence the perceptions of listeners.
Accounting for the listener: comparing the production of contrastive intonation in typically-developing speakers and speakers with autism.

PubMed

Kaland, Constantijn; Swerts, Marc; Krahmer, Emiel

2013-09-01

The present research investigates what drives the prosodic marking of contrastive information. For example, a typically developing speaker of a Germanic language like Dutch generally refers to a pink car as a "PINK car" (accented words in capitals) when a previously mentioned car was red. The main question addressed in this paper is whether contrastive intonation is produced with respect to the speaker's or (also) the listener's perspective on the preceding discourse. Furthermore, this research investigates the production of contrastive intonation by typically developing speakers and speakers with autism. The latter group is investigated because people with autism are argued to have difficulties accounting for another person's mental state and exhibit difficulties in the production and perception of accentuation and pitch range. To this end, utterances with contrastive intonation are elicited from both groups and analyzed in terms of function and form of prosody using production and perception measures. Contrary to expectations, typically developing speakers and speakers with autism produce functionally similar contrastive intonation as both groups account for both their own and their listener's perspective. However, typically developing speakers use a larger pitch range and are perceived as speaking more dynamically than speakers with autism, suggesting differences in their use of prosodic form.
Measurement of trained speech patterns in stuttering: interjudge and intrajudge agreement of experts by means of modified time-interval analysis.

PubMed

Alpermann, Anke; Huber, Walter; Natke, Ulrich; Willmes, Klaus

2010-09-01

Improved fluency after stuttering therapy is usually measured by the percentage of stuttered syllables. However, outcome studies rarely evaluate the use of trained speech patterns that speakers use to manage stuttering. This study investigated whether the modified time interval analysis can distinguish between trained speech patterns, fluent speech, and stuttered speech. Seventeen German experts on stuttering judged a speech sample on two occasions. Speakers of the sample were stuttering adults, who were not undergoing therapy, as well as participants in a fluency shaping and a stuttering modification therapy. Results showed satisfactory inter-judge and intra-judge agreement above 80%. Intervals with trained speech patterns were identified as consistently as stuttered and fluent intervals. We discuss limitations of the study, as well as implications of our findings for the development of training for identification of trained speech patterns and future outcome studies. The reader will be able to (a) explain different methods to measure the use of trained speech patterns, (b) evaluate whether German experts are able to discriminate intervals with trained speech patterns reliably from fluent and stuttered intervals and (c) describe how the measurement of trained speech patterns can contribute to outcome studies.
The human genome: Some assembly required. Final report

DOE Office of Scientific and Technical Information (OSTI.GOV)

NONE

1994-12-31

The Human Genome Project promises to be one of the most rewarding endeavors in modern biology. The cost and the ethical and social implications, however, have made this project the source of considerable debate both in the scientific community and in the public at large. The 1994 Graduate Student Symposium addresses the scientific merits of the project, the technical issues involved in accomplishing the task, as well as the medical and social issues which stem from the wealth of knowledge which the Human Genome Project will help create. To this end, speakers were brought together who represent the diverse areasmore » of expertise characteristic of this multidisciplinary project. The keynote speaker addresses the project`s motivations and goals in the larger context of biological and medical sciences. The first two sessions address relevant technical issues, data collection with a focus on high-throughput sequencing methods and data analysis with an emphasis on identification of coding sequences. The third session explores recent advances in the understanding of genetic diseases and possible routes to treatment. Finally, the last session addresses some of the ethical, social and legal issues which will undoubtedly arise from having a detailed knowledge of the human genome.« less
Sentence durations and accentedness judgments

NASA Astrophysics Data System (ADS)

Bond, Z. S.; Stockmal, Verna; Markus, Dace

2003-04-01

Talkers in a second language can frequently be identified as speaking with a foreign accent. It is not clear to what degree a foreign accent represents specific deviations from a target language versus more general characteristics. We examined the identifications of native and non-native talkers by listeners with various amount of knowledge of the target language. Native and non-native speakers of Latvian provided materials. All the non-native talkers spoke Russian as their first language and were long-term residents of Latvia. A listening test, containing sentences excerpted from a short recorded passage, was presented to three groups of listeners: native speakers of Latvian, Russians for whom Latvian was a second language, and Americans with no knowledge of either of the two languages. The listeners were asked to judge whether each utterance was produced by a native or non-native talker. The Latvians identified the non-native talkers very accurately, 88%. The Russians were somewhat less accurate, 83%. The American listeners were least accurate, but still identified the non-native talkers at above chance levels, 62%. Sentence durations correlated with the judgments provided by the American listeners but not with the judgments provided by native or L2 listeners.
How Do Speakers Avoid Ambiguous Linguistic Expressions?

ERIC Educational Resources Information Center

Ferreira, V.S.; Slevc, L.R.; Rogers, E.S.

2005-01-01

Three experiments assessed how speakers avoid linguistically and nonlinguistically ambiguous expressions. Speakers described target objects (a flying mammal, bat) in contexts including foil objects that caused linguistic (a baseball bat) and nonlinguistic (a larger flying mammal) ambiguity. Speakers sometimes avoided linguistic-ambiguity, and they…
Combining Behavioral and ERP Methodologies to Investigate the Differences Between McGurk Effects Demonstrated by Cantonese and Mandarin Speakers.

PubMed

Zhang, Juan; Meng, Yaxuan; McBride, Catherine; Fan, Xitao; Yuan, Zhen

2018-01-01

The present study investigated the impact of Chinese dialects on McGurk effect using behavioral and event-related potential (ERP) methodologies. Specifically, intra-language comparison of McGurk effect was conducted between Mandarin and Cantonese speakers. The behavioral results showed that Cantonese speakers exhibited a stronger McGurk effect in audiovisual speech perception compared to Mandarin speakers, although both groups performed equally in the auditory and visual conditions. ERP results revealed that Cantonese speakers were more sensitive to visual cues than Mandarin speakers, though this was not the case for the auditory cues. Taken together, the current findings suggest that the McGurk effect generated by Chinese speakers is mainly influenced by segmental phonology during audiovisual speech integration.
Combining Behavioral and ERP Methodologies to Investigate the Differences Between McGurk Effects Demonstrated by Cantonese and Mandarin Speakers

PubMed Central

Zhang, Juan; Meng, Yaxuan; McBride, Catherine; Fan, Xitao; Yuan, Zhen

2018-01-01

The present study investigated the impact of Chinese dialects on McGurk effect using behavioral and event-related potential (ERP) methodologies. Specifically, intra-language comparison of McGurk effect was conducted between Mandarin and Cantonese speakers. The behavioral results showed that Cantonese speakers exhibited a stronger McGurk effect in audiovisual speech perception compared to Mandarin speakers, although both groups performed equally in the auditory and visual conditions. ERP results revealed that Cantonese speakers were more sensitive to visual cues than Mandarin speakers, though this was not the case for the auditory cues. Taken together, the current findings suggest that the McGurk effect generated by Chinese speakers is mainly influenced by segmental phonology during audiovisual speech integration. PMID:29780312
Can you hear my age? Influences of speech rate and speech spontaneity on estimation of speaker age

PubMed Central

Skoog Waller, Sara; Eriksson, Mårten; Sörqvist, Patrik

2015-01-01

Cognitive hearing science is mainly about the study of how cognitive factors contribute to speech comprehension, but cognitive factors also partake in speech processing to infer non-linguistic information from speech signals, such as the intentions of the talker and the speaker’s age. Here, we report two experiments on age estimation by “naïve” listeners. The aim was to study how speech rate influences estimation of speaker age by comparing the speakers’ natural speech rate with increased or decreased speech rate. In Experiment 1, listeners were presented with audio samples of read speech from three different speaker age groups (young, middle aged, and old adults). They estimated the speakers as younger when speech rate was faster than normal and as older when speech rate was slower than normal. This speech rate effect was slightly greater in magnitude for older (60–65 years) speakers in comparison with younger (20–25 years) speakers, suggesting that speech rate may gain greater importance as a perceptual age cue with increased speaker age. This pattern was more pronounced in Experiment 2, in which listeners estimated age from spontaneous speech. Faster speech rate was associated with lower age estimates, but only for older and middle aged (40–45 years) speakers. Taken together, speakers of all age groups were estimated as older when speech rate decreased, except for the youngest speakers in Experiment 2. The absence of a linear speech rate effect in estimates of younger speakers, for spontaneous speech, implies that listeners use different age estimation strategies or cues (possibly vocabulary) depending on the age of the speaker and the spontaneity of the speech. Potential implications for forensic investigations and other applied domains are discussed. PMID:26236259
Objective eye-gaze behaviour during face-to-face communication with proficient alaryngeal speakers: a preliminary study.

PubMed

Evitts, Paul; Gallop, Robert

2011-01-01

There is a large body of research demonstrating the impact of visual information on speaker intelligibility in both normal and disordered speaker populations. However, there is minimal information on which specific visual features listeners find salient during conversational discourse. To investigate listeners' eye-gaze behaviour during face-to-face conversation with normal, laryngeal and proficient alaryngeal speakers. Sixty participants individually participated in a 10-min conversation with one of four speakers (typical laryngeal, tracheoesophageal, oesophageal, electrolaryngeal; 15 participants randomly assigned to one mode of speech). All speakers were > 85% intelligible and were judged to be 'proficient' by two certified speech-language pathologists. Participants were fitted with a head-mounted eye-gaze tracking device (Mobile Eye, ASL) that calculated the region of interest and mean duration of eye-gaze. Self-reported gaze behaviour was also obtained following the conversation using a 10 cm visual analogue scale. While listening, participants viewed the lower facial region of the oesophageal speaker more than the normal or tracheoesophageal speaker. Results of non-hierarchical cluster analyses showed that while listening, the pattern of eye-gaze was predominantly directed at the lower face of the oesophageal and electrolaryngeal speaker and more evenly dispersed among the background, lower face, and eyes of the normal and tracheoesophageal speakers. Finally, results show a low correlation between self-reported eye-gaze behaviour and objective regions of interest data. Overall, results suggest similar eye-gaze behaviour when healthy controls converse with normal and tracheoesophageal speakers and that participants had significantly different eye-gaze patterns when conversing with an oesophageal speaker. Results are discussed in terms of existing eye-gaze data and its potential implications on auditory-visual speech perception. © 2011 Royal College of Speech & Language Therapists.
Speech serial control in healthy speakers and speakers with hypokinetic or ataxic dysarthria: effects of sequence length and practice

PubMed Central

Reilly, Kevin J.; Spencer, Kristie A.

2013-01-01

The current study investigated the processes responsible for selection of sounds and syllables during production of speech sequences in 10 adults with hypokinetic dysarthria from Parkinson’s disease, five adults with ataxic dysarthria, and 14 healthy control speakers. Speech production data from a choice reaction time task were analyzed to evaluate the effects of sequence length and practice on speech sound sequencing. Speakers produced sequences that were between one and five syllables in length over five experimental runs of 60 trials each. In contrast to the healthy speakers, speakers with hypokinetic dysarthria demonstrated exaggerated sequence length effects for both inter-syllable intervals (ISIs) and speech error rates. Conversely, speakers with ataxic dysarthria failed to demonstrate a sequence length effect on ISIs and were also the only group that did not exhibit practice-related changes in ISIs and speech error rates over the five experimental runs. The exaggerated sequence length effects in the hypokinetic speakers with Parkinson’s disease are consistent with an impairment of action selection during speech sequence production. The absent length effects observed in the speakers with ataxic dysarthria is consistent with previous findings that indicate a limited capacity to buffer speech sequences in advance of their execution. In addition, the lack of practice effects in these speakers suggests that learning-related improvements in the production rate and accuracy of speech sequences involves processing by structures of the cerebellum. Together, the current findings inform models of serial control for speech in healthy speakers and support the notion that sequencing deficits contribute to speech symptoms in speakers with hypokinetic or ataxic dysarthria. In addition, these findings indicate that speech sequencing is differentially impaired in hypokinetic and ataxic dysarthria. PMID:24137121
Discourse comprehension in L2: Making sense of what is not explicitly said.

PubMed

Foucart, Alice; Romero-Rivas, Carlos; Gort, Bernharda Lottie; Costa, Albert

2016-12-01

Using ERPs, we tested whether L2 speakers can integrate multiple sources of information (e.g., semantic, pragmatic information) during discourse comprehension. We presented native speakers and L2 speakers with three-sentence scenarios in which the final sentence was highly causally related, intermediately related, or causally unrelated to its context; its interpretation therefore required simple or complex inferences. Native speakers revealed a gradual N400-like effect, larger in the causally unrelated condition than in the highly related condition, and falling in-between in the intermediately related condition, replicating previous results. In the crucial intermediately related condition, L2 speakers behaved like native speakers, however, showing extra processing in a later time-window. Overall, the results show that, when reading, L2 speakers are able to process information from the local context and prior information (e.g., world knowledge) to build global coherence, suggesting that they process different sources of information to make inferences online during discourse comprehension, like native speakers. Copyright © 2016 Elsevier Inc. All rights reserved.
The Voice of Emotion: Acoustic Properties of Six Emotional Expressions.

NASA Astrophysics Data System (ADS)

Baldwin, Carol May

Studies in the perceptual identification of emotional states suggested that listeners seemed to depend on a limited set of vocal cues to distinguish among emotions. Linguistics and speech science literatures have indicated that this small set of cues included intensity, fundamental frequency, and temporal properties such as speech rate and duration. Little research has been done, however, to validate these cues in the production of emotional speech, or to determine if specific dimensions of each cue are associated with the production of a particular emotion for a variety of speakers. This study addressed deficiencies in understanding of the acoustical properties of duration and intensity as components of emotional speech by means of speech science instrumentation. Acoustic data were conveyed in a brief sentence spoken by twelve English speaking adult male and female subjects, half with dramatic training, and half without such training. Simulated expressions included: happiness, surprise, sadness, fear, anger, and disgust. The study demonstrated that the acoustic property of mean intensity served as an important cue for a vocal taxonomy. Overall duration was rejected as an element for a general taxonomy due to interactions involving gender and role. Findings suggested a gender-related taxonomy, however, based on differences in the ways in which men and women use the duration cue in their emotional expressions. Results also indicated that speaker training may influence greater use of the duration cue in expressions of emotion, particularly for male actors. Discussion of these results provided linkages to (1) practical management of emotional interactions in clinical and interpersonal environments, (2) implications for differences in the ways in which males and females may be socialized to express emotions, and (3) guidelines for future perceptual studies of emotional sensitivity.
The perception of phonological quantity based on durational cues by native speakers, second-language users and nonspeakers of Finnish.

PubMed

Ylinen, Sari; Shestakova, Anna; Alku, Paavo; Huotilainen, Minna

2005-01-01

Some languages, such as Finnish, use speech-sound duration as the primary cue for a phonological quantity distinction. For second-language (L2) learners, quantity is often difficult to master if speech-sound duration plays a less important role in the phonology of their native language (L1). By comparing the categorization performance of native speakers of Finnish, Russian L2 users of Finnish, and non-Finnish-speaking Russians, the present study aimed to determine whether the L2 users, whose native language does not have a quantity distinction, have been able to establish categories for Finnish quantity. The results suggest that the native speakers and some of the L2 users that have been exposed to Finnish for a longer time have access to phonological quantity categories, whereas the L2 users with shorter exposure and the non-Finnish-speaking subjects do not. In addition, by comparing categorization and discrimination tasks it was found that the native speakers show a phoneme-boundary effect for quantity that is cued by duration only, whereas the non-Finnish-speaking subjects and the subjects with low proficiency in Finnish do not.

Acoustic Calibration of the Exterior Effects Room at the NASA Langley Research Center

NASA Technical Reports Server (NTRS)

Faller, Kenneth J., II; Rizzi, Stephen A.; Klos, Jacob; Chapin, William L.; Surucu, Fahri; Aumann, Aric R.

2010-01-01

The Exterior Effects Room (EER) at the NASA Langley Research Center is a 39-seat auditorium built for psychoacoustic studies of aircraft community noise. The original reproduction system employed monaural playback and hence lacked sound localization capability. In an effort to more closely recreate field test conditions, a significant upgrade was undertaken to allow simulation of a three-dimensional audio and visual environment. The 3D audio system consists of 27 mid and high frequency satellite speakers and 4 subwoofers, driven by a real-time audio server running an implementation of Vector Base Amplitude Panning. The audio server is part of a larger simulation system, which controls the audio and visual presentation of recorded and synthesized aircraft flyovers. The focus of this work is on the calibration of the 3D audio system, including gains used in the amplitude panning algorithm, speaker equalization, and absolute gain control. Because the speakers are installed in an irregularly shaped room, the speaker equalization includes time delay and gain compensation due to different mounting distances from the focal point, filtering for color compensation due to different installations (half space, corner, baffled/unbaffled), and cross-over filtering.
How auditory discontinuities and linguistic experience affect the perception of speech and non-speech in English- and Spanish-speaking listeners

NASA Astrophysics Data System (ADS)

Hay, Jessica F.; Holt, Lori L.; Lotto, Andrew J.; Diehl, Randy L.

2005-04-01

The present study was designed to investigate the effects of long-term linguistic experience on the perception of non-speech sounds in English and Spanish speakers. Research using tone-onset-time (TOT) stimuli, a type of non-speech analogue of voice-onset-time (VOT) stimuli, has suggested that there is an underlying auditory basis for the perception of stop consonants based on a threshold for detecting onset asynchronies in the vicinity of +20 ms. For English listeners, stop consonant labeling boundaries are congruent with the positive auditory discontinuity, while Spanish speakers place their VOT labeling boundaries and discrimination peaks in the vicinity of 0 ms VOT. The present study addresses the question of whether long-term linguistic experience with different VOT categories affects the perception of non-speech stimuli that are analogous in their acoustic timing characteristics. A series of synthetic VOT stimuli and TOT stimuli were created for this study. Using language appropriate labeling and ABX discrimination tasks, labeling boundaries (VOT) and discrimination peaks (VOT and TOT) are assessed for 24 monolingual English speakers and 24 monolingual Spanish speakers. The interplay between language experience and auditory biases are discussed. [Work supported by NIDCD.
The Climate Voices Speakers Network: Collaborating with Nontraditional, National Networks to Develop Climate Literacy on a Local Level

NASA Astrophysics Data System (ADS)

Wegner, K.; Schmidt, C.; Herrin, S.

2015-12-01

How can we leverage the successes of the numerous organizations in the climate change communication arena to build momentum rather than reinvent the wheel? Over the past two years, Climate Voices (climatevoices.org) has established a network of nearly 400 speakers and established partnerships to scale programs that address climate change communication and community engagement. In this presentation, we will present how we have identified and fostered win-win partnerships with organizations, such as GreenFaith Interfaith Partners for the Environment and Rotary International, to reach the broader general public. We will also share how, by drawing on the resources from the National Climate Assessment and the expertise of our own community, we developed and provided our speakers the tools to provide their audiences access to basic climate science - contributing to each audience's ability to understand local impacts, make informed decisions, and gain the confidence to engage in solutions-based actions in response to climate change. We will also discuss how we have created webinar coaching presentations by speakers who aren't climate scientists- and why we have chosen to do so.
Online matchmaking: It's not just for dating sites anymore! Connecting the Climate Voices Science Speakers Network to Educators

NASA Astrophysics Data System (ADS)

Wegner, K.; Herrin, S.; Schmidt, C.

2015-12-01

Scientists play an integral role in the development of climate literacy skills - for both teachers and students alike. By partnering with local scientists, teachers can gain valuable insights into the science practices highlighted by the Next Generation Science Standards (NGSS), as well as a deeper understanding of cutting-edge scientific discoveries and local impacts of climate change. For students, connecting to local scientists can provide a relevant connection to climate science and STEM skills. Over the past two years, the Climate Voices Science Speakers Network (climatevoices.org) has grown to a robust network of nearly 400 climate science speakers across the United States. Formal and informal educators, K-12 students, and community groups connect with our speakers through our interactive map-based website and invite them to meet through face-to-face and virtual presentations, such as webinars and podcasts. But creating a common language between scientists and educators requires coaching on both sides. In this presentation, we will present the "nitty-gritty" of setting up scientist-educator collaborations, as well as the challenges and opportunities that arise from these partnerships. We will share the impact of these collaborations through case studies, including anecdotal feedback and metrics.
Online Matchmaking: It's Not Just for Dating Sites Anymore! Connecting the Climate Voices Science Speakers Network to Educators

NASA Technical Reports Server (NTRS)

Wegner, Kristin; Herrin, Sara; Schmidt, Cynthia

2015-01-01

Scientists play an integral role in the development of climate literacy skills - for both teachers and students alike. By partnering with local scientists, teachers can gain valuable insights into the science practices highlighted by the Next Generation Science Standards (NGSS), as well as a deeper understanding of cutting-edge scientific discoveries and local impacts of climate change. For students, connecting to local scientists can provide a relevant connection to climate science and STEM skills. Over the past two years, the Climate Voices Science Speakers Network (climatevoices.org) has grown to a robust network of nearly 400 climate science speakers across the United States. Formal and informal educators, K-12 students, and community groups connect with our speakers through our interactive map-based website and invite them to meet through face-to-face and virtual presentations, such as webinars and podcasts. But creating a common language between scientists and educators requires coaching on both sides. In this presentation, we will present the "nitty-gritty" of setting up scientist-educator collaborations, as well as the challenges and opportunities that arise from these partnerships. We will share the impact of these collaborations through case studies, including anecdotal feedback and metrics.
Accent modulates access to word meaning: Evidence for a speaker-model account of spoken word recognition.

PubMed

Cai, Zhenguang G; Gilbert, Rebecca A; Davis, Matthew H; Gaskell, M Gareth; Farrar, Lauren; Adler, Sarah; Rodd, Jennifer M

2017-11-01

Speech carries accent information relevant to determining the speaker's linguistic and social background. A series of web-based experiments demonstrate that accent cues can modulate access to word meaning. In Experiments 1-3, British participants were more likely to retrieve the American dominant meaning (e.g., hat meaning of "bonnet") in a word association task if they heard the words in an American than a British accent. In addition, results from a speeded semantic decision task (Experiment 4) and sentence comprehension task (Experiment 5) confirm that accent modulates on-line meaning retrieval such that comprehension of ambiguous words is easier when the relevant word meaning is dominant in the speaker's dialect. Critically, neutral-accent speech items, created by morphing British- and American-accented recordings, were interpreted in a similar way to accented words when embedded in a context of accented words (Experiment 2). This finding indicates that listeners do not use accent to guide meaning retrieval on a word-by-word basis; instead they use accent information to determine the dialectic identity of a speaker and then use their experience of that dialect to guide meaning access for all words spoken by that person. These results motivate a speaker-model account of spoken word recognition in which comprehenders determine key characteristics of their interlocutor and use this knowledge to guide word meaning access. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Speaker Clustering for a Mixture of Singing and Reading (Preprint)

DTIC Science & Technology

2012-03-01

diarization [2, 3] which answers the ques- tion of ”who spoke when?” is a combination of speaker segmentation and clustering. Although it is possible to...focuses on speaker clustering, the techniques developed here can be applied to speaker diarization . For the remainder of this paper, the term ”speech...and retrieval,” Proceedings of the IEEE, vol. 88, 2000. [2] S. Tranter and D. Reynolds, “An overview of automatic speaker diarization systems,” IEEE
Applications of Hilbert Spectral Analysis for Speech and Sound Signals

NASA Technical Reports Server (NTRS)

Huang, Norden E.

2003-01-01

A new method for analyzing nonlinear and nonstationary data has been developed, and the natural applications are to speech and sound signals. The key part of the method is the Empirical Mode Decomposition method with which any complicated data set can be decomposed into a finite and often small number of Intrinsic Mode Functions (IMF). An IMF is defined as any function having the same numbers of zero-crossing and extrema, and also having symmetric envelopes defined by the local maxima and minima respectively. The IMF also admits well-behaved Hilbert transform. This decomposition method is adaptive, and, therefore, highly efficient. Since the decomposition is based on the local characteristic time scale of the data, it is applicable to nonlinear and nonstationary processes. With the Hilbert transform, the Intrinsic Mode Functions yield instantaneous frequencies as functions of time, which give sharp identifications of imbedded structures. This method invention can be used to process all acoustic signals. Specifically, it can process the speech signals for Speech synthesis, Speaker identification and verification, Speech recognition, and Sound signal enhancement and filtering. Additionally, as the acoustical signals from machinery are essentially the way the machines are talking to us. Therefore, the acoustical signals, from the machines, either from sound through air or vibration on the machines, can tell us the operating conditions of the machines. Thus, we can use the acoustic signal to diagnosis the problems of machines.
Multicultural issues in test interpretation.

PubMed

Langdon, Henriette W; Wiig, Elisabeth H

2009-11-01

Designing the ideal test or series of tests to assess individuals who speak languages other than English is difficult. This article first describes some of the roadblocks-one of which is the lack of identification criteria for language and learning disabilities in monolingual and bilingual populations in most countries of the non-English-speaking world. This lag exists, in part, because access to general education is often limited. The second section describes tests that have been developed in the United States, primarily for Spanish-speaking individuals because they now represent the largest first-language majority in the United States (80% of English-language learners [ELLs] speak Spanish at home). We discuss tests developed for monolingual and bilingual English-Spanish speakers in the United States and divide this coverage into two parts: The first addresses assessment of students' first language (L1) and second language (L2), usually English, with different versions of the same test; the second describes assessment of L1 and L2 using the same version of the test, administered in the two languages. Examples of tests that fit a priori-determined criteria are briefly discussed throughout the article. Suggestions how to develop tests for speakers of languages other than English are also provided. In conclusion, we maintain that there will never be a perfect test or set of tests to adequately assess the communication skills of a bilingual individual. This is not surprising because we have yet to develop an ideal test or set of tests that fits monolingual Anglo speakers perfectly. Tests are tools, and the speech-language pathologist needs to know how to use those tools most effectively and equitably. The goal of this article is to provide such guidance. Thieme Medical Publishers.
Do Listeners Store in Memory a Speaker's Habitual Utterance-Final Phonation Type?

PubMed Central

Bőhm, Tamás; Shattuck-Hufnagel, Stefanie

2009-01-01

Earlier studies report systematic differences across speakers in the occurrence of utterance-final irregular phonation; the work reported here investigated whether human listeners remember this speaker-specific information and can access it when necessary (a prerequisite for using this cue in speaker recognition). Listeners personally familiar with the voices of the speakers were presented with pairs of speech samples: one with the original and the other with transformed final phonation type. Asked to select the member of the pair that was closer to the talker's voice, most listeners tended to choose the unmanipulated token (even though they judged them to sound essentially equally natural). This suggests that utterance-final pitch period irregularity is part of the mental representation of individual speaker voices, although this may depend on the individual speaker and listener to some extent. PMID:19776665
Second Microgravity Fluid Physics Conference

NASA Technical Reports Server (NTRS)

1994-01-01

The conference's purpose was to inform the fluid physics community of research opportunities in reduced-gravity fluid physics, present the status of the existing and planned reduced gravity fluid physics research programs, and inform participants of the upcoming NASA Research Announcement in this area. The plenary sessions provided an overview of the Microgravity Fluid Physics Program information on NASA's ground-based and space-based flight research facilities. An international forum offered participants an opportunity to hear from French, German, and Russian speakers about the microgravity research programs in their respective countries. Two keynote speakers provided broad technical overviews on multiphase flow and complex fluids research. Presenters briefed their peers on the scientific results of their ground-based and flight research. Fifty-eight of the sixty-two technical papers are included here.
Learning Words from Speakers with False Beliefs

ERIC Educational Resources Information Center

Papafragou, Anna; Fairchild, Sarah; Cohen, Matthew L.; Friedberg, Carlyn

2017-01-01

During communication, hearers try to infer the speaker's intentions to be able to understand what the speaker means. Nevertheless, whether (and how early) preschoolers track their interlocutors' mental states is still a matter of debate. Furthermore, there is disagreement about how children's ability to consult a speaker's belief in communicative…
International Student Speaker Programs: "Someone from Another World."

ERIC Educational Resources Information Center

Wilson, Angene

This study surveyed members of the Association of International Educators and community volunteers to find out how international student speaker programs actually work. An international student speaker program provides speakers (from the university foreign student population) for community organizations and schools. The results of the survey (49…
Linguistic "Mudes" and the De-Ethnicization of Language Choice in Catalonia

ERIC Educational Resources Information Center

Pujolar, Joan; Gonzalez, Isaac

2013-01-01

Catalan speakers have traditionally constructed the Catalan language as the main emblem of their identity even as migration filled the country with substantial numbers of speakers of Castilian. Although Catalan speakers have been bilingual in Catalan and Castilian for generations, sociolinguistic research has shown how speakers' bilingual…
Embodied Communication: Speakers' Gestures Affect Listeners' Actions

ERIC Educational Resources Information Center

Cook, Susan Wagner; Tanenhaus, Michael K.

2009-01-01

We explored how speakers and listeners use hand gestures as a source of perceptual-motor information during naturalistic communication. After solving the Tower of Hanoi task either with real objects or on a computer, speakers explained the task to listeners. Speakers' hand gestures, but not their speech, reflected properties of the particular…
Speech Breathing in Speakers Who Use an Electrolarynx

ERIC Educational Resources Information Center

Bohnenkamp, Todd A.; Stowell, Talena; Hesse, Joy; Wright, Simon

2010-01-01

Speakers who use an electrolarynx following a total laryngectomy no longer require pulmonary support for speech. Subsequently, chest wall movements may be affected; however, chest wall movements in these speakers are not well defined. The purpose of this investigation was to evaluate speech breathing in speakers who use an electrolarynx during…
High Performance Computing at NASA

NASA Technical Reports Server (NTRS)

Bailey, David H.; Cooper, D. M. (Technical Monitor)

1994-01-01

The speaker will give an overview of high performance computing in the U.S. in general and within NASA in particular, including a description of the recently signed NASA-IBM cooperative agreement. The latest performance figures of various parallel systems on the NAS Parallel Benchmarks will be presented. The speaker was one of the authors of the NAS (National Aerospace Standards) Parallel Benchmarks, which are now widely cited in the industry as a measure of sustained performance on realistic high-end scientific applications. It will be shown that significant progress has been made by the highly parallel supercomputer industry during the past year or so, with several new systems, based on high-performance RISC processors, that now deliver superior performance per dollar compared to conventional supercomputers. Various pitfalls in reporting performance will be discussed. The speaker will then conclude by assessing the general state of the high performance computing field.
Relative fundamental frequency during vocal onset and offset in older speakers with and without Parkinson's disease.

PubMed

Stepp, Cara E

2013-03-01

The relative fundamental frequency (RFF) surrounding production of a voiceless consonant has previously been shown to be lower in speakers with hypokinetic dysarthria and Parkinson's disease (PD) relative to age/sex matched controls. Here RFF was calculated in 32 speakers with PD without overt hypokinetic dysarthria and 32 age and sex matched controls to better understand the relationships between RFF and PD progression, medication status, and sex. Results showed that RFF was statistically significantly lower in individuals with PD compared with healthy age-matched controls and was statistically significantly lower in individuals diagnosed at least 5 yrs prior to experimentation relative to individuals recorded less than 5 yrs past diagnosis. Contrary to previous trends, no effect of medication was found. However, a statistically significant effect of sex on offset RFF was shown, with lower values in males relative to females. Future work examining the physiological bases of RFF is warranted.
Early Detection of Severe Apnoea through Voice Analysis and Automatic Speaker Recognition Techniques

NASA Astrophysics Data System (ADS)

Fernández, Ruben; Blanco, Jose Luis; Díaz, David; Hernández, Luis A.; López, Eduardo; Alcázar, José

This study is part of an on-going collaborative effort between the medical and the signal processing communities to promote research on applying voice analysis and Automatic Speaker Recognition techniques (ASR) for the automatic diagnosis of patients with severe obstructive sleep apnoea (OSA). Early detection of severe apnoea cases is important so that patients can receive early treatment. Effective ASR-based diagnosis could dramatically cut medical testing time. Working with a carefully designed speech database of healthy and apnoea subjects, we present and discuss the possibilities of using generative Gaussian Mixture Models (GMMs), generally used in ASR systems, to model distinctive apnoea voice characteristics (i.e. abnormal nasalization). Finally, we present experimental findings regarding the discriminative power of speaker recognition techniques applied to severe apnoea detection. We have achieved an 81.25 % correct classification rate, which is very promising and underpins the interest in this line of inquiry.
A numerical study of defect detection in a plaster dome ceiling using structural acoustics.

PubMed

Bucaro, J A; Romano, A J; Valdivia, N; Houston, B H; Dey, S

2009-07-01

A numerical study is carried out to evaluate the effectiveness of using measured surface displacements resulting from acoustic speaker excitation to detect and localize flaws in a domed, plaster ceiling. The response of the structure to an incident acoustic pressure is obtained at four frequencies between 100 and 400 Hz using a parallel h-p structural acoustic finite element-based code. Three ceiling conditions are modeled: the pristine ceiling considered rigidly attached to the domed-shape support, partial detachment of a segment of the plaster layer from the support, and an interior pocket of plaster deconsolidation modeled as a heavy fluid. Spatial maps of the normal displacement resulting from speaker excitation are interpreted with the help of predictions based on static analysis. It is found that acoustic speaker excitation can provide displacement levels readily detected by commercially available laser Doppler vibrometer systems. Further, it is concluded that for 1 in. thick plaster layers, detachment sizes as small as 4 cm are detectable by direct observation of the measured displacement maps. Finally, spatial structure differences are observed in the displacement maps beneath the two defect types, which may provide a wavenumber-based feature useful for distinguishing plaster detachment from other defects such as deconsolidation.

Gas Reservoir Identification Basing on Deep Learning of Seismic-print Characteristics

NASA Astrophysics Data System (ADS)

Cao, J.; Wu, S.; He, X.

2016-12-01

Reservoir identification based on seismic data analysis is the core task in oil and gas geophysical exploration. The essence of reservoir identification is to identify the properties of rock pore fluid. We developed a novel gas reservoir identification method named seismic-print analysis by imitation of the vocal-print analysis techniques in speaker identification. The term "seismic-print" is referred to the characteristics of the seismic waveform which can identify determinedly the property of the geological objectives, for instance, a nature gas reservoir. Seismic-print can be characterized by one or a few parameters named as seismic-print parameters. It has been proven that gas reservoirs are of characteristics of negative 1-order cepstrum coefficient anomaly and Positive 2-order cepstrum coefficient anomaly, concurrently. The method is valid for sandstone gas reservoir, carbonate reservoir and shale gas reservoirs, and the accuracy rate may reach up to 90%. There are two main problems to deal with in the application of seismic-print analysis method. One is to identify the "ripple" of a reservoir on the seismogram, and another is to construct the mapping relationship between the seismic-print and the gas reservoirs. Deep learning developed in recent years is of the ability to reveal the complex non-linear relationship between the attribute and the data, and of ability to extract automatically the features of the objective from the data. Thus, deep learning could been used to deal with these two problems. There are lots of algorithms to carry out deep learning. The algorithms can be roughly divided into two categories: Belief Networks Network (DBNs) and Convolutional Neural Network (CNN). DBNs is a probabilistic generative model, which can establish a joint distribution of the observed data and tags. CNN is a feedforward neural network, which can be used to extract the 2D structure feature of the input data. Both DBNs and CNN can be used to deal with seismic data. We use an improved DBNs to identify carbonate rocks from log data, the accuracy rate can reach up to 83%. DBNs is used to deal with seismic waveform data, more information is obtained. The work was supported by NSFC under grant No. 41430323 and No. 41274128, and State Key Lab. of Oil and Gas Reservoir Geology and Exploration.
The speakers' bureau system: a form of peer selling.

PubMed

Reid, Lynette; Herder, Matthew

2013-01-01

In the speakers' bureau system, physicians are recruited and trained by pharmaceutical, biotechnology, and medical device companies to deliver information about products to other physicians, in exchange for a fee. Using publicly available disclosures, we assessed the thesis that speakers' bureau involvement is not a feature of academic medicine in Canada, by estimating the prevalence of participation in speakers' bureaus among Canadian faculty in one medical specialty, cardiology. We analyzed the relevant features of an actual contract made public by the physician addressee and applied the Canadian Medical Association (CMA) guidelines on physician-industry relations to participation in a speakers' bureau. We argue that speakers' bureau participation constitutes a form of peer selling that should be understood to contravene the prohibition on product endorsement in the CMA Code of Ethics. Academic medical institutions, in conjunction with regulatory colleges, should continue and strengthen their policies to address participation in speakers' bureaus.
Simultaneous Talk--From the Perspective of Floor Management of English and Japanese Speakers.

ERIC Educational Resources Information Center

Hayashi, Reiko

1988-01-01

Investigates simultaneous talk in face-to-face conversation using the analytic framework of "floor" proposed by Edelsky (1981). Analysis of taped conversation among speakers of Japanese and among speakers of English shows that, while both groups use simultaneous talk, it is used more frequently by Japanese speakers. A reference list…
Respiratory Control in Stuttering Speakers: Evidence from Respiratory High-Frequency Oscillations.

ERIC Educational Resources Information Center

Denny, Margaret; Smith, Anne

2000-01-01

This study examined whether stuttering speakers (N=10) differed from fluent speakers in relations between the neural control systems for speech and life support. It concluded that in some stuttering speakers the relations between respiratory controllers are atypical, but that high participation by the high frequency oscillation-producing circuitry…
The Effects of Source Unreliability on Prior and Future Word Learning

ERIC Educational Resources Information Center

Faught, Gayle G.; Leslie, Alicia D.; Scofield, Jason

2015-01-01

Young children regularly learn words from interactions with other speakers, though not all speakers are reliable informants. Interestingly, children will reverse to trusting a reliable speaker when a previously endorsed speaker proves unreliable. When later asked to identify the referent of a novel word, children who reverse trust are less willing…
Native-Speakerism and the Complexity of Personal Experience: A Duoethnographic Study

ERIC Educational Resources Information Center

Lowe, Robert J.; Kiczkowiak, Marek

2016-01-01

This paper presents a duoethnographic study into the effects of native-speakerism on the professional lives of two English language teachers, one "native", and one "non-native speaker" of English. The goal of the study was to build on and extend existing research on the topic of native-speakerism by investigating, through…
Research Timeline: Second Language Communication Strategies

ERIC Educational Resources Information Center

Kennedy, Sara; Trofimovich, Pavel

2016-01-01

Speakers of a second language (L2), regardless of profciency level, communicate for specifc purposes. For example, an L2 speaker of English may wish to build rapport with a co-worker by chatting about the weather. The speaker will draw on various resources to accomplish her communicative purposes. For instance, the speaker may say "falling…
Word Stress and Pronunciation Teaching in English as a Lingua Franca Contexts

ERIC Educational Resources Information Center

Lewis, Christine; Deterding, David

2018-01-01

Traditionally, pronunciation was taught by reference to native-speaker models. However, as speakers around the world increasingly interact in English as a lingua franca (ELF) contexts, there is less focus on native-speaker targets, and there is wide acceptance that achieving intelligibility is crucial while mimicking native-speaker pronunciation…
Defining "Native Speaker" in Multilingual Settings: English as a Native Language in Asia

ERIC Educational Resources Information Center

Hansen Edwards, Jette G.

2017-01-01

The current study examines how and why speakers of English from multilingual contexts in Asia are identifying as native speakers of English. Eighteen participants from different contexts in Asia, including Singapore, Malaysia, India, Taiwan, and The Philippines, who self-identified as native speakers of English participated in hour-long interviews…
Speaker Identity Supports Phonetic Category Learning

ERIC Educational Resources Information Center

Mani, Nivedita; Schneider, Signe

2013-01-01

Visual cues from the speaker's face, such as the discriminable mouth movements used to produce speech sounds, improve discrimination of these sounds by adults. The speaker's face, however, provides more information than just the mouth movements used to produce speech--it also provides a visual indexical cue of the identity of the speaker. The…
The Interpretability Hypothesis: Evidence from Wh-Interrogatives in Second Language Acquisition

ERIC Educational Resources Information Center

Tsimpli, Ianthi Maria; Dimitrakopoulou, Maria

2007-01-01

The second language acquisition (SLA) literature reports numerous studies of proficient second language (L2) speakers who diverge significantly from native speakers despite the evidence offered by the L2 input. Recent SLA theories have attempted to account for native speaker/non-native speaker (NS/NNS) divergence by arguing for the dissociation…
The interaction of glottal-pulse rate and vocal-tract length in judgements of speaker size, sex, and age

NASA Astrophysics Data System (ADS)

Smith, David R. R.; Patterson, Roy D.

2005-11-01

Glottal-pulse rate (GPR) and vocal-tract length (VTL) are related to the size, sex, and age of the speaker but it is not clear how the two factors combine to influence our perception of speaker size, sex, and age. This paper describes experiments designed to measure the effect of the interaction of GPR and VTL upon judgements of speaker size, sex, and age. Vowels were scaled to represent people with a wide range of GPRs and VTLs, including many well beyond the normal range of the population, and listeners were asked to judge the size and sex/age of the speaker. The judgements of speaker size show that VTL has a strong influence upon perceived speaker size. The results for the sex and age categorization (man, woman, boy, or girl) show that, for vowels with GPR and VTL values in the normal range, judgements of speaker sex and age are influenced about equally by GPR and VTL. For vowels with abnormal combinations of low GPRs and short VTLs, the VTL information appears to decide the sex/age judgement.
Comparison of singer's formant, speaker's ring, and LTA spectrum among classical singers and untrained normal speakers.

PubMed

Oliveira Barrichelo, V M; Heuer, R J; Dean, C M; Sataloff, R T

2001-09-01

Many studies have described and analyzed the singer's formant. A similar phenomenon produced by trained speakers led some authors to examine the speaker's ring. If we consider these phenomena as resonance effects associated with vocal tract adjustments and training, can we hypothesize that trained singers can carry over their singing formant ability into speech, also obtaining a speaker's ring? Can we find similar differences for energy distribution in continuous speech? Forty classically trained singers and forty untrained normal speakers performed an all-voiced reading task and produced a sample of a sustained spoken vowel /a/. The singers were also requested to perform a sustained sung vowel /a/ at a comfortable pitch. The reading was analyzed by the long-term average spectrum (LTAS) method. The sustained vowels were analyzed through power spectrum analysis. The data suggest that singers show more energy concentration in the singer's formant/speaker's ring region in both sung and spoken vowels. The singers' spoken vowel energy in the speaker's ring area was found to be significantly larger than that of the untrained speakers. The LTAS showed similar findings suggesting that those differences also occur in continuous speech. This finding supports the value of further research on the effect of singing training on the resonance of the speaking voice.
Speaker and Observer Perceptions of Physical Tension during Stuttering.

PubMed

Tichenor, Seth; Leslie, Paula; Shaiman, Susan; Yaruss, J Scott

2017-01-01

Speech-language pathologists routinely assess physical tension during evaluation of those who stutter. If speakers experience tension that is not visible to clinicians, then judgments of severity may be inaccurate. This study addressed this potential discrepancy by comparing judgments of tension by people who stutter and expert clinicians to determine if clinicians could accurately identify the speakers' experience of physical tension. Ten adults who stutter were audio-video recorded in two speaking samples. Two board-certified specialists in fluency evaluated the samples using the Stuttering Severity Instrument-4 and a checklist adapted for this study. Speakers rated their tension using the same forms, and then discussed their experiences in a qualitative interview so that themes related to physical tension could be identified. The degree of tension reported by speakers was higher than that observed by specialists. Tension in parts of the body that were less visible to the observer (chest, abdomen, throat) was reported more by speakers than by specialists. The thematic analysis revealed that speakers' experience of tension changes over time and that these changes may be related to speakers' acceptance of stuttering. The lack of agreement between speaker and specialist perceptions of tension suggests that using self-reports is a necessary component for supporting the accurate diagnosis of tension in stuttering. © 2018 S. Karger AG, Basel.
A report on the current status of grand rounds in radiology residency programs in the United States.

PubMed

Yablon, Corrie M; Wu, Jim S; Slanetz, Priscilla J; Eisenberg, Ronald L

2011-12-01

A national needs assessment of radiology program directors was performed to characterize grand rounds (GR) programs, assess the perceived educational value of GR programs, and determine the impact of the recent economic downturn on GR. A 28-question survey was developed querying the organizational logistics of GR programs, types of speakers, content of talks, honoraria, types of speakers invited, response to the economic downturn, types of speaker interaction with residents, and perceived educational value of GR. Questions were in multiple-choice, yes-or-no, and five-point Likert-type formats. The survey was distributed to the program directors of all radiology residencies within the United States. Fifty-seven of 163 programs responded, resulting in a response rate of 36%. Thirty-eight programs (67%) were university residencies and 10 (18%) were university affiliated. Eighty-two percent of university and 60% of university-affiliated residencies had their own GR programs, while only 14% of community and no military residencies held GR. GR were held weekly in 18% of programs, biweekly in 8%, monthly in 42%, bimonthly in 16%, and less frequently than every 2 months in 16%. All 38 programs hosting GR reported a broad spectrum of presentations, including talks on medical education (66%), clinical and evidence-based medicine (55%), professionalism (45%), ethics (45%), quality assurance (34%), global health (26%), and resident presentations (26%). All programs invited speakers from outside the institution, but there was variability with regard to the frequency of visits and whether invited speakers were from out of town. As a result of recent economic events, one radiology residency (3%) completely canceled its GR program. Others decreased the number of speakers from outside their cities (40%) or decreased the number of speakers from within their own cities (16%). Honoraria were paid to speakers by 95% of responding programs. Most program directors (79%) who had their own GR programs either strongly agreed or agreed that GR are an essential component of any academic radiology department, and this opinion was shared by a majority of all respondents (68%). Almost all respondents (97%) either strongly agreed or agreed that general radiologic education of imaging subspecialists is valuable in an academic radiology department. A majority (65%) either strongly agreed or agreed that attendance at GR should be expected of all attending radiologists. GR programs among radiology residencies tend to have similar formats involving invited speakers, although the frequency, types of talks, and honoraria may vary slightly. Most programs value GR, and all programs integrate GR within resident education to some degree. The recent economic downturn has led to a decrease in the number of invited visiting speakers but not to a decrease in the amounts of honoraria. Copyright Â© 2011 AUR. Published by Elsevier Inc. All rights reserved.
Speech Prosody Across Stimulus Types for Individuals with Parkinson's Disease.

PubMed

K-Y Ma, Joan; Schneider, Christine B; Hoffmann, Rüdiger; Storch, Alexander

2015-01-01

Up to 89% of the individuals with Parkinson's disease (PD) experience speech problem over the course of the disease. Speech prosody and intelligibility are two of the most affected areas in hypokinetic dysarthria. However, assessment of these areas could potentially be problematic as speech prosody and intelligibility could be affected by the type of speech materials employed. To comparatively explore the effects of different types of speech stimulus on speech prosody and intelligibility in PD speakers. Speech prosody and intelligibility of two groups of individuals with varying degree of dysarthria resulting from PD was compared to that of a group of control speakers using sentence reading, passage reading and monologue. Acoustic analysis including measures on fundamental frequency (F0), intensity and speech rate was used to form a prosodic profile for each individual. Speech intelligibility was measured for the speakers with dysarthria using direct magnitude estimation. Difference in F0 variability between the speakers with dysarthria and control speakers was only observed in sentence reading task. Difference in the average intensity level was observed for speakers with mild dysarthria to that of the control speakers. Additionally, there were stimulus effect on both intelligibility and prosodic profile. The prosodic profile of PD speakers was different from that of the control speakers in the more structured task, and lower intelligibility was found in less structured task. This highlighted the value of both structured and natural stimulus to evaluate speech production in PD speakers.
Speaker and Accent Variation Are Handled Differently: Evidence in Native and Non-Native Listeners

PubMed Central

Kriengwatana, Buddhamas; Terry, Josephine; Chládková, Kateřina; Escudero, Paola

2016-01-01

Listeners are able to cope with between-speaker variability in speech that stems from anatomical sources (i.e. individual and sex differences in vocal tract size) and sociolinguistic sources (i.e. accents). We hypothesized that listeners adapt to these two types of variation differently because prior work indicates that adapting to speaker/sex variability may occur pre-lexically while adapting to accent variability may require learning from attention to explicit cues (i.e. feedback). In Experiment 1, we tested our hypothesis by training native Dutch listeners and Australian-English (AusE) listeners without any experience with Dutch or Flemish to discriminate between the Dutch vowels /I/ and /ε/ from a single speaker. We then tested their ability to classify /I/ and /ε/ vowels of a novel Dutch speaker (i.e. speaker or sex change only), or vowels of a novel Flemish speaker (i.e. speaker or sex change plus accent change). We found that both Dutch and AusE listeners could successfully categorize vowels if the change involved a speaker/sex change, but not if the change involved an accent change. When AusE listeners were given feedback on their categorization responses to the novel speaker in Experiment 2, they were able to successfully categorize vowels involving an accent change. These results suggest that adapting to accents may be a two-step process, whereby the first step involves adapting to speaker differences at a pre-lexical level, and the second step involves adapting to accent differences at a contextual level, where listeners have access to word meaning or are given feedback that allows them to appropriately adjust their perceptual category boundaries. PMID:27309889
Euclidean Distances as measures of speaker similarity including identical twin pairs: A forensic investigation using source and filter voice characteristics.

PubMed

San Segundo, Eugenia; Tsanas, Athanasios; Gómez-Vilda, Pedro

2017-01-01

There is a growing consensus that hybrid approaches are necessary for successful speaker characterization in Forensic Speaker Comparison (FSC); hence this study explores the forensic potential of voice features combining source and filter characteristics. The former relate to the action of the vocal folds while the latter reflect the geometry of the speaker's vocal tract. This set of features have been extracted from pause fillers, which are long enough for robust feature estimation while spontaneous enough to be extracted from voice samples in real forensic casework. Speaker similarity was measured using standardized Euclidean Distances (ED) between pairs of speakers: 54 different-speaker (DS) comparisons, 54 same-speaker (SS) comparisons and 12 comparisons between monozygotic twins (MZ). Results revealed that the differences between DS and SS comparisons were significant in both high quality and telephone-filtered recordings, with no false rejections and limited false acceptances; this finding suggests that this set of voice features is highly speaker-dependent and therefore forensically useful. Mean ED for MZ pairs lies between the average ED for SS comparisons and DS comparisons, as expected according to the literature on twin voices. Specific cases of MZ speakers with very high ED (i.e. strong dissimilarity) are discussed in the context of sociophonetic and twin studies. A preliminary simplification of the Vocal Profile Analysis (VPA) Scheme is proposed, which enables the quantification of voice quality features in the perceptual assessment of speaker similarity, and allows for the calculation of perceptual-acoustic correlations. The adequacy of z-score normalization for this study is also discussed, as well as the relevance of heat maps for detecting the so-called phantoms in recent approaches to the biometric menagerie. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Visual abilities are important for auditory-only speech recognition: evidence from autism spectrum disorder.

PubMed

Schelinski, Stefanie; Riedel, Philipp; von Kriegstein, Katharina

2014-12-01

In auditory-only conditions, for example when we listen to someone on the phone, it is essential to fast and accurately recognize what is said (speech recognition). Previous studies have shown that speech recognition performance in auditory-only conditions is better if the speaker is known not only by voice, but also by face. Here, we tested the hypothesis that such an improvement in auditory-only speech recognition depends on the ability to lip-read. To test this we recruited a group of adults with autism spectrum disorder (ASD), a condition associated with difficulties in lip-reading, and typically developed controls. All participants were trained to identify six speakers by name and voice. Three speakers were learned by a video showing their face and three others were learned in a matched control condition without face. After training, participants performed an auditory-only speech recognition test that consisted of sentences spoken by the trained speakers. As a control condition, the test also included speaker identity recognition on the same auditory material. The results showed that, in the control group, performance in speech recognition was improved for speakers known by face in comparison to speakers learned in the matched control condition without face. The ASD group lacked such a performance benefit. For the ASD group auditory-only speech recognition was even worse for speakers known by face compared to speakers not known by face. In speaker identity recognition, the ASD group performed worse than the control group independent of whether the speakers were learned with or without face. Two additional visual experiments showed that the ASD group performed worse in lip-reading whereas face identity recognition was within the normal range. The findings support the view that auditory-only communication involves specific visual mechanisms. Further, they indicate that in ASD, speaker-specific dynamic visual information is not available to optimize auditory-only speech recognition. Copyright © 2014 Elsevier Ltd. All rights reserved.
Evaluation of Speakers with Foreign-Accented Speech in Japan: The Effect of Accent Produced by English Native Speakers

ERIC Educational Resources Information Center

Tsurutani, Chiharu

2012-01-01

Foreign-accented speakers are generally regarded as less educated, less reliable and less interesting than native speakers and tend to be associated with cultural stereotypes of their country of origin. This discrimination against foreign accents has, however, been discussed mainly using accented English in English-speaking countries. This study…

The Employability of Non-Native-Speaker Teachers of EFL: A UK Survey

ERIC Educational Resources Information Center

Clark, Elizabeth; Paran, Amos

2007-01-01

The native speaker still has a privileged position in English language teaching, representing both the model speaker and the ideal teacher. Non-native-speaker teachers of English are often perceived as having a lower status than their native-speaking counterparts, and have been shown to face discriminatory attitudes when applying for teaching…
Generic Language and Speaker Confidence Guide Preschoolers' Inferences about Novel Animate Kinds

ERIC Educational Resources Information Center

Stock, Hayli R.; Graham, Susan A.; Chambers, Craig G.

2009-01-01

We investigated the influence of speaker certainty on 156 four-year-old children's sensitivity to generic and nongeneric statements. An inductive inference task was implemented, in which a speaker described a nonobvious property of a novel creature using either a generic or a nongeneric statement. The speaker appeared to be confident, neutral, or…
Modern Greek Language: Acquisition of Morphology and Syntax by Non-Native Speakers

ERIC Educational Resources Information Center

Andreou, Georgia; Karapetsas, Anargyros; Galantomos, Ioannis

2008-01-01

This study investigated the performance of native and non native speakers of Modern Greek language on morphology and syntax tasks. Non-native speakers of Greek whose native language was English, which is a language with strict word order and simple morphology, made more errors and answered more slowly than native speakers on morphology but not…
A Comparison of Coverbal Gesture Use in Oral Discourse among Speakers with Fluent and Nonfluent Aphasia

ERIC Educational Resources Information Center

Kong, Anthony Pak-Hin; Law, Sam-Po; Chak, Gigi Wan-Chi

2017-01-01

Purpose: Coverbal gesture use, which is affected by the presence and degree of aphasia, can be culturally specific. The purpose of this study was to compare gesture use among Cantonese-speaking individuals: 23 neurologically healthy speakers, 23 speakers with fluent aphasia, and 21 speakers with nonfluent aphasia. Method: Multimedia data of…
What's Learned Together Stays Together: Speakers' Choice of Referring Expression Reflects Shared Experience

ERIC Educational Resources Information Center

Gorman, Kristen S.; Gegg-Harrison, Whitney; Marsh, Chelsea R.; Tanenhaus, Michael K.

2013-01-01

When referring to named objects, speakers can choose either a name ("mbira") or a description ("that gourd-like instrument with metal strips"); whether the name provides useful information depends on whether the speaker's knowledge of the name is shared with the addressee. But, how do speakers determine what is shared? In 2…
Accent Attribution in Speakers with Foreign Accent Syndrome

ERIC Educational Resources Information Center

Verhoeven, Jo; De Pauw, Guy; Pettinato, Michele; Hirson, Allen; Van Borsel, John; Marien, Peter

2013-01-01

Purpose: The main aim of this experiment was to investigate the perception of Foreign Accent Syndrome in comparison to speakers with an authentic foreign accent. Method: Three groups of listeners attributed accents to conversational speech samples of 5 FAS speakers which were embedded amongst those of 5 speakers with a real foreign accent and 5…
Race in Conflict with Heritage: "Black" Heritage Language Speaker of Japanese

ERIC Educational Resources Information Center

Doerr, Neriko Musha; Kumagai, Yuri

2014-01-01

"Heritage language speaker" is a relatively new term to denote minority language speakers who grew up in a household where the language was used or those who have a family, ancestral, or racial connection to the minority language. In research on heritage language speakers, overlap between these 2 definitions is often assumed--that is,…
Early Language Experience Facilitates the Processing of Gender Agreement in Spanish Heritage Speakers

ERIC Educational Resources Information Center

Montrul, Silvina; Davidson, Justin; De La Fuente, Israel; Foote, Rebecca

2014-01-01

We examined how age of acquisition in Spanish heritage speakers and L2 learners interacts with implicitness vs. explicitness of tasks in gender processing of canonical and non-canonical ending nouns. Twenty-three Spanish native speakers, 29 heritage speakers, and 33 proficiency-matched L2 learners completed three on-line spoken word recognition…
The Role of Interaction in Native Speaker Comprehension of Nonnative Speaker Speech.

ERIC Educational Resources Information Center

Polio, Charlene; Gass, Susan M.

1998-01-01

Because interaction gives language learners an opportunity to modify their speech upon a signal of noncomprehension, it should also have a positive effect on native speakers' (NS) comprehension of nonnative speakers (NNS). This study shows that interaction does help NSs comprehend NNSs, contrasting the claims of an earlier study that found no…
Cross-cultural adaptation, reliability, and validation of the Korean version of the identification functional ankle instability (IdFAI).

PubMed

Ko, Jupil; Rosen, Adam B; Brown, Cathleen N

2017-09-12

To cross-culturally adapt the Identification Functional Ankle Instability for use with Korean-speaking participants. The English version of the IdFAI was cross-culturally adapted into Korean based on the guidelines. The psychometric properties in the Korean version of the IdFAI were measured for test-retest reliability, internal consistency, criterion-related validity, discriminative validity, and measurement error 181 native Korean-speakers. Intra-class correlation coefficients (ICC 2,1 ) between the English and Korean versions of the IdFAI for test-retest reliability was 0.98 (standard error of measurement = 1.41). The Cronbach's alpha coefficient was 0.89 for the Korean versions of IdFAI. The Korean versions of the IdFAI had a strong correlation with the SF-36 (r s = -0.69, p < .001) and the Korean version of the Cumberland Ankle Instability Tool (r s = -0.65, p < .001). The cutoff score of >10 was the optimal cutoff score to distinguish between the group memberships. The minimally detectable change of the Korean versions of the IdFAI score was 3.91. The Korean versions of the IdFAI have shown to be an excellent, reliable, and valid instrument. The Korean versions of the IdFAI can be utilized to assess the presence of Chronic Ankle Instability by researchers and clinicians working among Korean-speaking populations. Implications for rehabilitation The high recurrence rate of sprains may result into Chronic Ankle Instability (CAI). The Identification of Functional Ankle Instability Tool (IdFAI) has been validated and recommended to identify patients with Chronic Ankle Instability (CAI). The Korean version of the Identification of Functional Ankle Instability Tool (IdFAI) may be also recommend to researchers and clinicians for assessing the presence of Chronic Ankle Instability (CAI) in Korean-speaking population.
The perception of syllable affiliation of singleton stops in repetitive speech.

PubMed

de Jong, Kenneth J; Lim, Byung-Jin; Nagao, Kyoko

2004-01-01

Stetson (1951) noted that repeating singleton coda consonants at fast speech rates makes them be perceived as onset consonants affiliated with a following vowel. The current study documents the perception of rate-induced resyllabification, as well as what temporal properties give rise to the perception of syllable affiliation. Stimuli were extracted from a previous study of repeated stop + vowel and vowel + stop syllables (de Jong, 2001a, 2001b). Forced-choice identification tasks show that slow repetitions are clearly distinguished. As speakers increase rate, they reach a point after which listeners disagree as to the affiliation of the stop. This pattern is found for voiced and voiceless consonants using different stimulus extraction techniques. Acoustic models of the identifications indicate that the sudden shift in syllabification occurs with the loss of an acoustic hiatus between successive syllables. Acoustic models of the fast rate identifications indicate various other qualities, such as consonant voicing, affect the probability that the consonants will be perceived as onsets. These results indicate a model of syllabic affiliation where specific juncture-marking aspects of the signal dominate parsing, and in their absence other differences provide additional, weaker cues to syllabic affiliation.
The cognitive neuroscience of person identification.

PubMed

Biederman, Irving; Shilowich, Bryan E; Herald, Sarah B; Margalit, Eshed; Maarek, Rafael; Meschke, Emily X; Hacker, Catrina M

2018-02-14

We compare and contrast five differences between person identification by voice and face. 1. There is little or no cost when a familiar face is to be recognized from an unrestricted set of possible faces, even at Rapid Serial Visual Presentation (RSVP) rates, but the accuracy of familiar voice recognition declines precipitously when the set of possible speakers is increased from one to a mere handful. 2. Whereas deficits in face recognition are typically perceptual in origin, those with normal perception of voices can manifest severe deficits in their identification. 3. Congenital prosopagnosics (CPros) and congenital phonagnosics (CPhon) are generally unable to imagine familiar faces and voices, respectively. Only in CPros, however, is this deficit a manifestation of a general inability to form visual images of any kind. CPhons report no deficit in imaging non-voice sounds. 4. The prevalence of CPhons of 3.2% is somewhat higher than the reported prevalence of approximately 2.0% for CPros in the population. There is evidence that CPhon represents a distinct condition statistically and not just normal variation. 5. Face and voice recognition proficiency are uncorrelated rather than reflecting limitations of a general capacity for person individuation. Copyright © 2018 Elsevier Ltd. All rights reserved.
Working with Speakers.

ERIC Educational Resources Information Center

Pestel, Ann

1989-01-01

The author discusses working with speakers from business and industry to present career information at the secondary level. Advice for speakers is presented, as well as tips for program coordinators. (CH)
Catalan speakers' perception of word stress in unaccented contexts.

PubMed

Ortega-Llebaria, Marta; del Mar Vanrell, Maria; Prieto, Pilar

2010-01-01

In unaccented contexts, formant frequency differences related to vowel reduction constitute a consistent cue to word stress in English, whereas in languages such as Spanish that have no systematic vowel reduction, stress perception is based on duration and intensity cues. This article examines the perception of word stress by speakers of Central Catalan, in which, due to its vowel reduction patterns, words either alternate stressed open vowels with unstressed mid-central vowels as in English or contain no vowel quality cues to stress, as in Spanish. Results show that Catalan listeners perceive stress based mainly on duration cues in both word types. Other cues pattern together with duration to make stress perception more robust. However, no single cue is absolutely necessary and trading effects compensate for a lack of differentiation in one dimension by changes in another dimension. In particular, speakers identify longer mid-central vowels as more stressed than shorter open vowels. These results and those obtained in other stress-accent languages provide cumulative evidence that word stress is perceived independently of pitch accents by relying on a set of cues with trading effects so that no single cue, including formant frequency differences related to vowel reduction, is absolutely necessary for stress perception.
GB277: preview Women 2000. ILO exmaines progress, looks ahead to Beijing + 5.

PubMed

2000-01-01

In the special Symposium on Decent Work for Women, conducted during the Governing Body meeting, the challenge of eliminating gender-based discrimination in the workplace was highlighted. Among the topics discussed were rights-based and development-based approaches; progress and gaps in decent work for men and women; promoting women workers' rights; a gender perspective on poverty, employment and social protection; management development and entrepreneurship for women; and gender in crisis response and reconstructions. This paper presents excerpts of the addresses of key speakers: Juan Somavia, International Labor Organization Director-General; Angela Kin, Special Advisor to the UN on Gender Issues and the Advancement of Women; and Bina Agarwal, Professor of Economics at the University of Delhi. In general, the speakers identified existing obstacles to gender equality, and propose initiatives and actions for the future.
Grammatical Planning Units During Real-Time Sentence Production in Speakers With Agrammatic Aphasia and Healthy Speakers.

PubMed

Lee, Jiyeon; Yoshida, Masaya; Thompson, Cynthia K

2015-08-01

Grammatical encoding (GE) is impaired in agrammatic aphasia; however, the nature of such deficits remains unclear. We examined grammatical planning units during real-time sentence production in speakers with agrammatic aphasia and control speakers, testing two competing models of GE. We queried whether speakers with agrammatic aphasia produce sentences word by word without advanced planning or whether hierarchical syntactic structure (i.e., verb argument structure; VAS) is encoded as part of the advanced planning unit. Experiment 1 examined production of sentences with a predefined structure (i.e., "The A and the B are above the C") using eye tracking. Experiment 2 tested production of transitive and unaccusative sentences without a predefined sentence structure in a verb-priming study. In Experiment 1, both speakers with agrammatic aphasia and young and age-matched control speakers used word-by-word strategies, selecting the first lemma (noun A) only prior to speech onset. However, in Experiment 2, unlike controls, speakers with agrammatic aphasia preplanned transitive and unaccusative sentences, encoding VAS before speech onset. Speakers with agrammatic aphasia show incremental, word-by-word production for structurally simple sentences, requiring retrieval of multiple noun lemmas. However, when sentences involve functional (thematic to grammatical) structure building, advanced planning strategies (i.e., VAS encoding) are used. This early use of hierarchical syntactic information may provide a scaffold for impaired GE in agrammatism.
Grammatical Encoding and Learning in Agrammatic Aphasia: Evidence from Structural Priming

PubMed Central

Cho-Reyes, Soojin; Mack, Jennifer E.; Thompson, Cynthia K.

2017-01-01

The present study addressed open questions about the nature of sentence production deficits in agrammatic aphasia. In two structural priming experiments, 13 aphasic and 13 age-matched control speakers repeated visually- and auditorily-presented prime sentences, and then used visually-presented word arrays to produce dative sentences. Experiment 1 examined whether agrammatic speakers form structural and thematic representations during sentence production, whereas Experiment 2 tested the lasting effects of structural priming in lags of two and four sentences. Results of Experiment 1 showed that, like unimpaired speakers, the aphasic speakers evinced intact structural priming effects, suggesting that they are able to generate such representations. Unimpaired speakers also evinced reliable thematic priming effects, whereas agrammatic speakers did so in some experimental conditions, suggesting that access to thematic representations may be intact. Results of Experiment 2 showed structural priming effects of comparable magnitude for aphasic and unimpaired speakers. In addition, both groups showed lasting structural priming effects in both lag conditions, consistent with implicit learning accounts. In both experiments, aphasic speakers with more severe language impairments exhibited larger priming effects, consistent with the “inverse preference” prediction of implicit learning accounts. The findings indicate that agrammatic speakers are sensitive to structural priming across levels of representation and that such effects are lasting, suggesting that structural priming may be beneficial for the treatment of sentence production deficits in agrammatism. PMID:28924328
Brief Report: Relations between Prosodic Performance and Communication and Socialization Ratings in High Functioning Speakers with Autism Spectrum Disorders

ERIC Educational Resources Information Center

Paul, Rhea; Shriberg, Lawrence D.; McSweeny, Jane; Cicchetti, Domenic; Klin, Ami; Volkmar, Fred

2005-01-01

Shriberg "et al." [Shriberg, L. "et al." (2001). "Journal of Speech, Language and Hearing Research, 44," 1097-1115] described prosody-voice features of 30 high functioning speakers with autistic spectrum disorder (ASD) compared to age-matched control speakers. The present study reports additional information on the speakers with ASD, including…
Investigating Holistic Measures of Speech Prosody

ERIC Educational Resources Information Center

Cunningham, Dana Aliel

2012-01-01

Speech prosody is a multi-faceted dimension of speech which can be measured and analyzed in a variety of ways. In this study, the speech prosody of Mandarin L1 speakers, English L2 speakers, and English L1 speakers was assessed by trained raters who listened to sound clips of the speakers responding to a graph prompt and reading a short passage.…
Young Children's Sensitivity to Speaker Gender When Learning from Others

ERIC Educational Resources Information Center

Ma, Lili; Woolley, Jacqueline D.

2013-01-01

This research explores whether young children are sensitive to speaker gender when learning novel information from others. Four- and 6-year-olds ("N" = 144) chose between conflicting statements from a male versus a female speaker (Studies 1 and 3) or decided which speaker (male or female) they would ask (Study 2) when learning about the functions…

Switches to English during French Service Encounters: Relationships with L2 French Speakers' Willingness to Communicate and Motivation

ERIC Educational Resources Information Center

McNaughton, Stephanie; McDonough, Kim

2015-01-01

This exploratory study investigated second language (L2) French speakers' service encounters in the multilingual setting of Montreal, specifically whether switches to English during French service encounters were related to L2 speakers' willingness to communicate or motivation. Over a two-week period, 17 French L2 speakers in Montreal submitted…
A Respirometric Technique to Evaluate Velopharyngeal Function in Speakers with Cleft Palate, with and without Prostheses.

ERIC Educational Resources Information Center

Gilbert, Harvey R.; Ferrand, Carole T.

1987-01-01

Respirometric quotients (RQ), the ratio of oral air volume expended to total volume expended, were obtained from the productions of oral and nasal airflow of 10 speakers with cleft palate, with and without their prosthetic appliances, and 10 normal speakers. Cleft palate speakers without their appliances exhibited the lowest RQ values. (Author/DB)
Using Stimulated Recall to Investigate Native Speaker Perceptions in Native-Nonnative Speaker Interaction

ERIC Educational Resources Information Center

Polio, Charlene; Gass, Susan; Chapin, Laura

2006-01-01

Implicit negative feedback has been shown to facilitate SLA, and the extent to which such feedback is given is related to a variety of task and interlocutor variables. The background of a native speaker (NS), in terms of amount of experience in interactions with nonnative speakers (NNSs), has been shown to affect the quantity of implicit negative…
Compliment Responses: Comparing American Learners of Japanese, Native Japanese Speakers, and American Native English Speakers

ERIC Educational Resources Information Center

Tatsumi, Naofumi

2012-01-01

Previous research shows that American learners of Japanese (AJs) tend to differ from native Japanese speakers in their compliment responses (CRs). Yokota (1986) and Shimizu (2009) have reported that AJs tend to respond more negatively than native Japanese speakers. It has also been reported that AJs' CRs tend to lack the use of avoidance or…
Intelligibility of clear speech: effect of instruction.

PubMed

Lam, Jennifer; Tjaden, Kris

2013-10-01

The authors investigated how clear speech instructions influence sentence intelligibility. Twelve speakers produced sentences in habitual, clear, hearing impaired, and overenunciate conditions. Stimuli were amplitude normalized and mixed with multitalker babble for orthographic transcription by 40 listeners. The main analysis investigated percentage-correct intelligibility scores as a function of the 4 conditions and speaker sex. Additional analyses included listener response variability, individual speaker trends, and an alternate intelligibility measure: proportion of content words correct. Relative to the habitual condition, the overenunciate condition was associated with the greatest intelligibility benefit, followed by the hearing impaired and clear conditions. Ten speakers followed this trend. The results indicated different patterns of clear speech benefit for male and female speakers. Greater listener variability was observed for speakers with inherently low habitual intelligibility compared to speakers with inherently high habitual intelligibility. Stable proportions of content words were observed across conditions. Clear speech instructions affected the magnitude of the intelligibility benefit. The instruction to overenunciate may be most effective in clear speech training programs. The findings may help explain the range of clear speech intelligibility benefit previously reported. Listener variability analyses suggested the importance of obtaining multiple listener judgments of intelligibility, especially for speakers with inherently low habitual intelligibility.
On the same wavelength: predictable language enhances speaker-listener brain-to-brain synchrony in posterior superior temporal gyrus.

PubMed

Dikker, Suzanne; Silbert, Lauren J; Hasson, Uri; Zevin, Jason D

2014-04-30

Recent research has shown that the degree to which speakers and listeners exhibit similar brain activity patterns during human linguistic interaction is correlated with communicative success. Here, we used an intersubject correlation approach in fMRI to test the hypothesis that a listener's ability to predict a speaker's utterance increases such neural coupling between speakers and listeners. Nine subjects listened to recordings of a speaker describing visual scenes that varied in the degree to which they permitted specific linguistic predictions. In line with our hypothesis, the temporal profile of listeners' brain activity was significantly more synchronous with the speaker's brain activity for highly predictive contexts in left posterior superior temporal gyrus (pSTG), an area previously associated with predictive auditory language processing. In this region, predictability differentially affected the temporal profiles of brain responses in the speaker and listeners respectively, in turn affecting correlated activity between the two: whereas pSTG activation increased with predictability in the speaker, listeners' pSTG activity instead decreased for more predictable sentences. Listeners additionally showed stronger BOLD responses for predictive images before sentence onset, suggesting that highly predictable contexts lead comprehenders to preactivate predicted words.
When speaker identity is unavoidable: Neural processing of speaker identity cues in natural speech.

PubMed

Tuninetti, Alba; Chládková, Kateřina; Peter, Varghese; Schiller, Niels O; Escudero, Paola

2017-11-01

Speech sound acoustic properties vary largely across speakers and accents. When perceiving speech, adult listeners normally disregard non-linguistic variation caused by speaker or accent differences, in order to comprehend the linguistic message, e.g. to correctly identify a speech sound or a word. Here we tested whether the process of normalizing speaker and accent differences, facilitating the recognition of linguistic information, is found at the level of neural processing, and whether it is modulated by the listeners' native language. In a multi-deviant oddball paradigm, native and nonnative speakers of Dutch were exposed to naturally-produced Dutch vowels varying in speaker, sex, accent, and phoneme identity. Unexpectedly, the analysis of mismatch negativity (MMN) amplitudes elicited by each type of change shows a large degree of early perceptual sensitivity to non-linguistic cues. This finding on perception of naturally-produced stimuli contrasts with previous studies examining the perception of synthetic stimuli wherein adult listeners automatically disregard acoustic cues to speaker identity. The present finding bears relevance to speech normalization theories, suggesting that at an unattended level of processing, listeners are indeed sensitive to changes in fundamental frequency in natural speech tokens. Copyright © 2017 Elsevier Inc. All rights reserved.
When one person's mistake is another's standard usage: the effect of foreign accent on syntactic processing.

PubMed

Hanulíková, Adriana; van Alphen, Petra M; van Goch, Merel M; Weber, Andrea

2012-04-01

How do native listeners process grammatical errors that are frequent in non-native speech? We investigated whether the neural correlates of syntactic processing are modulated by speaker identity. ERPs to gender agreement errors in sentences spoken by a native speaker were compared with the same errors spoken by a non-native speaker. In line with previous research, gender violations in native speech resulted in a P600 effect (larger P600 for violations in comparison with correct sentences), but when the same violations were produced by the non-native speaker with a foreign accent, no P600 effect was observed. Control sentences with semantic violations elicited comparable N400 effects for both the native and the non-native speaker, confirming no general integration problem in foreign-accented speech. The results demonstrate that the P600 is modulated by speaker identity, extending our knowledge about the role of speaker's characteristics on neural correlates of speech processing.
Factors affecting the perception of Korean-accented American English

NASA Astrophysics Data System (ADS)

Cho, Kwansun; Harris, John G.; Shrivastav, Rahul

2005-09-01

This experiment examines the relative contribution of two factors, intonation and articulation errors, on the perception of foreign accent in Korean-accented American English. Ten native speakers of Korean and ten native speakers of American English were asked to read ten English sentences. These sentences were then modified using high-quality speech resynthesis techniques [STRAIGHT Kawahara et al., Speech Commun. 27, 187-207 (1999)] to generate four sets of stimuli. In the first two sets of stimuli, the intonation patterns of the Korean speakers and American speakers were switched with one another. The articulatory errors for each speaker were not modified. In the final two sets, the sentences from the Korean and American speakers were resynthesized without any modifications. Fifteen listeners were asked to rate all the stimuli for the degree of foreign accent. Preliminary results show that, for native speakers of American English, articulation errors may play a greater role in the perception of foreign accent than errors in intonation patterns. [Work supported by KAIM.
Vibration and sound radiation of an electrostatic speaker based on circular diaphragm.

PubMed

Chiang, Hsin-Yuan; Huang, Yu-Hsi

2015-04-01

This study investigated the lumped parameter method (LPM) and distributed parameter method (DPM) in the measurement of vibration and prediction of sound pressure levels (SPLs) produced by an electrostatic speaker with circular diaphragm. An electrostatic speaker with push-pull configuration was achieved by suspending the circular diaphragm (60 mm diameter) between two transparent conductive plates. The transparent plates included a two-dimensional array of holes to enable the visualization of vibrations and avoid acoustic distortion. LPM was used to measure the displacement amplitude at the center of the diaphragm using a scanning vibrometer with the aim of predicting symmetric modes using Helmholtz equations and SPLs using Rayleigh integral equations. DPM was used to measure the amplitude of displacement across the entire surface of the speaker and predict SPL curves. LPM results show that the prediction of SPL associated with the first three symmetric resonant modes is in good agreement with the results of DPM and acoustic measurement. Below the breakup frequency of 375 Hz, the SPL predicted by LPM and DPM are identical with the results of acoustic measurement. This study provides a rapid, accurate method with which to measure the SPL associated with the first three symmetric modes using semi-analytic LPM.
Detecting Infections Rapidly and Easily for Candidemia Trial (DIRECT1): A Prospective, Multicenter Study of the T2Candida Panel

PubMed Central

Clancy, Cornelius J; Pappas, Peter; Vazquez, Jose; Judson, Marc A; Tobin, Ellis; Kontoyiannis, Dimitrios P; Thompson, George R; Reboli, Annette; Garey, Kevin W; Greenberg, Richard N; Ostrosky-Zeichner, Luis; Wu, Alan; Lyon, G Marshall; Apewokin, Senu; Nguyen, M Hong; Caliendo, Angela

2017-01-01

Abstract Background Blood cultures (BC) are the diagnostic gold standard for candidemia, but sensitivity is <50%. T2 Candida (T2) is a novel, FDA-approved nanodiagnostic panel, which utilizes T2 magnetic resonance and a dedicated instrument to detect Candida within whole blood samples. Methods Candidemic adults were identified at 14 centers by diagnostic BC (dBC). Follow-up blood samples were collected from all patients (pts) for testing by T2 and companion BC (cBC). T2 was run-in batch at a central lab; results are reported qualitatively for three groups of spp. (Candida albicans/C. tropicalis (CA/CT), C. glabrata/C. krusei (CG/CK), or C. parapsilosis (CP)). T2 and cBC were defined as positive (+) if they detected a sp. identified in dBC. Results 152 patients were enrolled (median age: 54 yrs (18–93); 54% (82) men). Candidemia risk factors included indwelling catheters (82%, 125), abdominal surgery (24%, 36), transplant (22%, 33), cancer (22%, 33), hemodialysis (17%, 26), neutropenia (10%, 15). Mean times to Candida detection/spp. identification by dBC were 47/133 hours (2/5.5 d). dBC revealed CA (30%, 46), CG (29%, 45), CP (28%, 43), CT (11%, 17) and CK (3%, 4). Mean time to collection of T2/cBC was 62 hours (2.6 d). 74% (112) of patients received antifungal (AF) therapy prior to T2/cBC (mean: 55 hours (2.3 d)). Overall, T2 results were more likely than cBC to be + (P < 0.0001; Table), a result driven by performance in AF-treated patients (P < 0.0001). T2 was more likely to be + among patients originally infected with CA (61% (28) vs. 20% (9); P = 0.001); there were trends toward higher positivity in patients infected with CT (59% (17) vs. 23% (4; P = 0.08) and CP (42% (18) vs. 28% (12); P = 0.26). T2 was + in 89% (32/36) of patients with + cBC. Conclusion T2 was sensitive for diagnosing candidemia at the time of + cBC, and it was significantly more like to be + than cBC among AF-treated patients. T2 is an important advance in the diagnosis of candidemia, which is likely to be particularly useful in patients receiving prophylactic, pre-emptive or empiric AF therapy. Test results, n (%) Pt group (n) T2+ T2- cBC+ cBC- T2+/cBC+ T2+/cBC- T2-/cBC+ T2-/cBC- All (152) 69  (45%) 83  (55%) 36  (24%) 116  (76%) 32  (21%) 37  (24%) 4  (3%) 79  (52%) Prior AF (112) 55  (49%) 57  (51%) 23  (20%) 89  (80%) 20  (18%) 35  (31%) 3  (3%) 54  (48%) No AF (40) 14  (35%) 26  (65%) 13  (32%) 27  (68%) 12  (30%) 2  (5%) 1  (2%) 25  (62%) Disclosure D. P. Kontoyiannis, Pfizer: Research Contractor, Research support and Speaker honorarium; Astellas: Research Contractor, Research support and Speaker honorarium; Merck: Honorarium, Speaker honorarium; Cidara: Honorarium, Speaker honorarium; Amplyx: Honorarium, Speaker honorarium; F2G: Honorarium, Speaker honorarium; L. Ostrosky-Zeichner, Astellas: Consultant and Grant Investigator, Consulting fee and Research grant; Merck: Scientific Advisor and Speaker’s Bureau, Consulting fee and Speaker honorarium; Pfizer: Grant Investigator and Speaker’s Bureau, Grant recipient and Speaker honorarium; Scynexis: Grant Investigator and Scientific Advisor, Consulting fee and Grant recipient; Cidara: Grant Investigator and Scientific Advisor, Consulting fee and Research grant; S. Apewokin, T2 biosystems: Investigator, Research support; Astellas: Scientific Advisor, Consulting fee
RTP Speakers Bureau

EPA Pesticide Factsheets

The Research Triangle Park Speakers Bureau page is a free resource that schools, universities, and community groups in the Raleigh-Durham-Chapel Hill, N.C. area can use to request speakers and find educational resources.
Speaker Introductions at Internal Medicine Grand Rounds: Forms of Address Reveal Gender Bias.

PubMed

Files, Julia A; Mayer, Anita P; Ko, Marcia G; Friedrich, Patricia; Jenkins, Marjorie; Bryan, Michael J; Vegunta, Suneela; Wittich, Christopher M; Lyle, Melissa A; Melikian, Ryan; Duston, Trevor; Chang, Yu-Hui H; Hayes, Sharonne N

2017-05-01

Gender bias has been identified as one of the drivers of gender disparity in academic medicine. Bias may be reinforced by gender subordinating language or differential use of formality in forms of address. Professional titles may influence the perceived expertise and authority of the referenced individual. The objective of this study is to examine how professional titles were used in the same and mixed-gender speaker introductions at Internal Medicine Grand Rounds (IMGR). A retrospective observational study of video-archived speaker introductions at consecutive IMGR was conducted at two different locations (Arizona, Minnesota) of an academic medical center. Introducers and speakers at IMGR were physician and scientist peers holding MD, PhD, or MD/PhD degrees. The primary outcome was whether or not a speaker's professional title was used during the first form of address during speaker introductions at IMGR. As secondary outcomes, we evaluated whether or not the speakers professional title was used in any form of address during the introduction. Three hundred twenty-one forms of address were analyzed. Female introducers were more likely to use professional titles when introducing any speaker during the first form of address compared with male introducers (96.2% [102/106] vs. 65.6% [141/215]; p < 0.001). Female dyads utilized formal titles during the first form of address 97.8% (45/46) compared with male dyads who utilized a formal title 72.4% (110/152) of the time (p = 0.007). In mixed-gender dyads, where the introducer was female and speaker male, formal titles were used 95.0% (57/60) of the time. Male introducers of female speakers utilized professional titles 49.2% (31/63) of the time (p < 0.001). In this study, women introduced by men at IMGR were less likely to be addressed by professional title than were men introduced by men. Differential formality in speaker introductions may amplify isolation, marginalization, and professional discomfiture expressed by women faculty in academic medicine.
Integrating industrial seminars within a graduate engineering programme

NASA Astrophysics Data System (ADS)

Ringwood, John. V.

2013-05-01

The benefit of external, often industry-based, speakers for a seminar series associated with both undergraduate and graduate programmes is relatively unchallenged. However, the means by which such a seminar series can be encapsulated within a structured learning module, and the appropriate design of an accompanying assessment methodology, is not so obvious. This paper examines how such a learning module can be formulated and addresses the main issues involved in the design of such a module, namely the selection of speakers, format of seminars, method of delivery and assessment methodology, informed by the objectives of the module.
Pitch perception and production in congenital amusia: Evidence from Cantonese speakers.

PubMed

Liu, Fang; Chan, Alice H D; Ciocca, Valter; Roquet, Catherine; Peretz, Isabelle; Wong, Patrick C M

2016-07-01

This study investigated pitch perception and production in speech and music in individuals with congenital amusia (a disorder of musical pitch processing) who are native speakers of Cantonese, a tone language with a highly complex tonal system. Sixteen Cantonese-speaking congenital amusics and 16 controls performed a set of lexical tone perception, production, singing, and psychophysical pitch threshold tasks. Their tone production accuracy and singing proficiency were subsequently judged by independent listeners, and subjected to acoustic analyses. Relative to controls, amusics showed impaired discrimination of lexical tones in both speech and non-speech conditions. They also received lower ratings for singing proficiency, producing larger pitch interval deviations and making more pitch interval errors compared to controls. Demonstrating higher pitch direction identification thresholds than controls for both speech syllables and piano tones, amusics nevertheless produced native lexical tones with comparable pitch trajectories and intelligibility as controls. Significant correlations were found between pitch threshold and lexical tone perception, music perception and production, but not between lexical tone perception and production for amusics. These findings provide further evidence that congenital amusia is a domain-general language-independent pitch-processing deficit that is associated with severely impaired music perception and production, mildly impaired speech perception, and largely intact speech production.
High-Resolution, Non-Invasive Imaging of Upper Vocal Tract Articulators Compatible with Human Brain Recordings

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bouchard, Kristofer E.; Conant, David F.; Anumanchipalli, Gopala K.

A complete neurobiological understanding of speech motor control requires determination of the relationship between simultaneously recorded neural activity and the kinematics of the lips, jaw, tongue, and larynx. Many speech articulators are internal to the vocal tract, and therefore simultaneously tracking the kinematics of all articulators is nontrivial-especially in the context of human electrophysiology recordings. Here, we describe a noninvasive, multi-modal imaging system to monitor vocal tract kinematics, demonstrate this system in six speakers during production of nine American English vowels, and provide new analysis of such data. Classification and regression analysis revealed considerable variability in the articulator-to-acoustic relationship acrossmore » speakers. Non-negative matrix factorization extracted basis sets capturing vocal tract shapes allowing for higher vowel classification accuracy than traditional methods. Statistical speech synthesis generated speech from vocal tract measurements, and we demonstrate perceptual identification. We demonstrate the capacity to predict lip kinematics from ventral sensorimotor cortical activity. These results demonstrate a multi-modal system to non-invasively monitor articulator kinematics during speech production, describe novel analytic methods for relating kinematic data to speech acoustics, and provide the first decoding of speech kinematics from electrocorticography. These advances will be critical for understanding the cortical basis of speech production and the creation of vocal prosthetics.« less
Pitch perception and production in congenital amusia: Evidence from Cantonese speakers

PubMed Central

Liu, Fang; Chan, Alice H. D.; Ciocca, Valter; Roquet, Catherine; Peretz, Isabelle; Wong, Patrick C. M.

2016-01-01

This study investigated pitch perception and production in speech and music in individuals with congenital amusia (a disorder of musical pitch processing) who are native speakers of Cantonese, a tone language with a highly complex tonal system. Sixteen Cantonese-speaking congenital amusics and 16 controls performed a set of lexical tone perception, production, singing, and psychophysical pitch threshold tasks. Their tone production accuracy and singing proficiency were subsequently judged by independent listeners, and subjected to acoustic analyses. Relative to controls, amusics showed impaired discrimination of lexical tones in both speech and non-speech conditions. They also received lower ratings for singing proficiency, producing larger pitch interval deviations and making more pitch interval errors compared to controls. Demonstrating higher pitch direction identification thresholds than controls for both speech syllables and piano tones, amusics nevertheless produced native lexical tones with comparable pitch trajectories and intelligibility as controls. Significant correlations were found between pitch threshold and lexical tone perception, music perception and production, but not between lexical tone perception and production for amusics. These findings provide further evidence that congenital amusia is a domain-general language-independent pitch-processing deficit that is associated with severely impaired music perception and production, mildly impaired speech perception, and largely intact speech production. PMID:27475178
Speech Processing to Improve the Perception of Speech in Background Noise for Children With Auditory Processing Disorder and Typically Developing Peers.

PubMed

Flanagan, Sheila; Zorilă, Tudor-Cătălin; Stylianou, Yannis; Moore, Brian C J

2018-01-01

Auditory processing disorder (APD) may be diagnosed when a child has listening difficulties but has normal audiometric thresholds. For adults with normal hearing and with mild-to-moderate hearing impairment, an algorithm called spectral shaping with dynamic range compression (SSDRC) has been shown to increase the intelligibility of speech when background noise is added after the processing. Here, we assessed the effect of such processing using 8 children with APD and 10 age-matched control children. The loudness of the processed and unprocessed sentences was matched using a loudness model. The task was to repeat back sentences produced by a female speaker when presented with either speech-shaped noise (SSN) or a male competing speaker (CS) at two signal-to-background ratios (SBRs). Speech identification was significantly better with SSDRC processing than without, for both groups. The benefit of SSDRC processing was greater for the SSN than for the CS background. For the SSN, scores were similar for the two groups at both SBRs. For the CS, the APD group performed significantly more poorly than the control group. The overall improvement produced by SSDRC processing could be useful for enhancing communication in a classroom where the teacher's voice is broadcast using a wireless system.
High-Resolution, Non-Invasive Imaging of Upper Vocal Tract Articulators Compatible with Human Brain Recordings

PubMed Central

Anumanchipalli, Gopala K.; Dichter, Benjamin; Chaisanguanthum, Kris S.; Johnson, Keith; Chang, Edward F.

2016-01-01

A complete neurobiological understanding of speech motor control requires determination of the relationship between simultaneously recorded neural activity and the kinematics of the lips, jaw, tongue, and larynx. Many speech articulators are internal to the vocal tract, and therefore simultaneously tracking the kinematics of all articulators is nontrivial—especially in the context of human electrophysiology recordings. Here, we describe a noninvasive, multi-modal imaging system to monitor vocal tract kinematics, demonstrate this system in six speakers during production of nine American English vowels, and provide new analysis of such data. Classification and regression analysis revealed considerable variability in the articulator-to-acoustic relationship across speakers. Non-negative matrix factorization extracted basis sets capturing vocal tract shapes allowing for higher vowel classification accuracy than traditional methods. Statistical speech synthesis generated speech from vocal tract measurements, and we demonstrate perceptual identification. We demonstrate the capacity to predict lip kinematics from ventral sensorimotor cortical activity. These results demonstrate a multi-modal system to non-invasively monitor articulator kinematics during speech production, describe novel analytic methods for relating kinematic data to speech acoustics, and provide the first decoding of speech kinematics from electrocorticography. These advances will be critical for understanding the cortical basis of speech production and the creation of vocal prosthetics. PMID:27019106
High-Resolution, Non-Invasive Imaging of Upper Vocal Tract Articulators Compatible with Human Brain Recordings

DOE PAGES

Bouchard, Kristofer E.; Conant, David F.; Anumanchipalli, Gopala K.; ...

2016-03-28

A complete neurobiological understanding of speech motor control requires determination of the relationship between simultaneously recorded neural activity and the kinematics of the lips, jaw, tongue, and larynx. Many speech articulators are internal to the vocal tract, and therefore simultaneously tracking the kinematics of all articulators is nontrivial-especially in the context of human electrophysiology recordings. Here, we describe a noninvasive, multi-modal imaging system to monitor vocal tract kinematics, demonstrate this system in six speakers during production of nine American English vowels, and provide new analysis of such data. Classification and regression analysis revealed considerable variability in the articulator-to-acoustic relationship acrossmore » speakers. Non-negative matrix factorization extracted basis sets capturing vocal tract shapes allowing for higher vowel classification accuracy than traditional methods. Statistical speech synthesis generated speech from vocal tract measurements, and we demonstrate perceptual identification. We demonstrate the capacity to predict lip kinematics from ventral sensorimotor cortical activity. These results demonstrate a multi-modal system to non-invasively monitor articulator kinematics during speech production, describe novel analytic methods for relating kinematic data to speech acoustics, and provide the first decoding of speech kinematics from electrocorticography. These advances will be critical for understanding the cortical basis of speech production and the creation of vocal prosthetics.« less

An acoustic comparison of two women's infant- and adult-directed speech

NASA Astrophysics Data System (ADS)

Andruski, Jean; Katz-Gershon, Shiri

2003-04-01

In addition to having prosodic characteristics that are attractive to infant listeners, infant-directed (ID) speech shares certain characteristics of adult-directed (AD) clear speech, such as increased acoustic distance between vowels, that might be expected to make ID speech easier for adults to perceive in noise than AD conversational speech. However, perceptual tests of two women's ID productions by Andruski and Bessega [J. Acoust. Soc. Am. 112, 2355] showed that is not always the case. In a word identification task that compared ID speech with AD clear and conversational speech, one speaker's ID productions were less well-identified than AD clear speech, but better identified than AD conversational speech. For the second woman, ID speech was the least accurately identified of the three speech registers. For both speakers, hard words (infrequent words with many lexical neighbors) were also at an increased disadvantage relative to easy words (frequent words with few lexical neighbors) in speech registers that were less accurately perceived. This study will compare several acoustic properties of these women's productions, including pitch and formant-frequency characteristics. Results of the acoustic analyses will be examined with the original perceptual results to suggest reasons for differences in listener's accuracy in identifying these two women's ID speech in noise.
Crossmodal plasticity in the fusiform gyrus of late blind individuals during voice recognition.

PubMed

Hölig, Cordula; Föcker, Julia; Best, Anna; Röder, Brigitte; Büchel, Christian

2014-12-01

Blind individuals are trained in identifying other people through voices. In congenitally blind adults the anterior fusiform gyrus has been shown to be active during voice recognition. Such crossmodal changes have been associated with a superiority of blind adults in voice perception. The key question of the present functional magnetic resonance imaging (fMRI) study was whether visual deprivation that occurs in adulthood is followed by similar adaptive changes of the voice identification system. Late blind individuals and matched sighted participants were tested in a priming paradigm, in which two voice stimuli were subsequently presented. The prime (S1) and the target (S2) were either from the same speaker (person-congruent voices) or from two different speakers (person-incongruent voices). Participants had to classify the S2 as either coming from an old or a young person. Only in late blind but not in matched sighted controls, the activation in the anterior fusiform gyrus was modulated by voice identity: late blind volunteers showed an increase of the BOLD signal in response to person-incongruent compared with person-congruent trials. These results suggest that the fusiform gyrus adapts to input of a new modality even in the mature brain and thus demonstrate an adult type of crossmodal plasticity. Copyright © 2014 Elsevier Inc. All rights reserved.
Brain systems mediating voice identity processing in blind humans.

PubMed

Hölig, Cordula; Föcker, Julia; Best, Anna; Röder, Brigitte; Büchel, Christian

2014-09-01

Blind people rely more on vocal cues when they recognize a person's identity than sighted people. Indeed, a number of studies have reported better voice recognition skills in blind than in sighted adults. The present functional magnetic resonance imaging study investigated changes in the functional organization of neural systems involved in voice identity processing following congenital blindness. A group of congenitally blind individuals and matched sighted control participants were tested in a priming paradigm, in which two voice stimuli (S1, S2) were subsequently presented. The prime (S1) and the target (S2) were either from the same speaker (person-congruent voices) or from two different speakers (person-incongruent voices). Participants had to classify the S2 as either a old or a young person. Person-incongruent voices (S2) compared with person-congruent voices elicited an increased activation in the right anterior fusiform gyrus in congenitally blind individuals but not in matched sighted control participants. In contrast, only matched sighted controls showed a higher activation in response to person-incongruent compared with person-congruent voices (S2) in the right posterior superior temporal sulcus. These results provide evidence for crossmodal plastic changes of the person identification system in the brain after visual deprivation. Copyright © 2014 Wiley Periodicals, Inc.
Request a Speaker

Science.gov Websites

. Northern Command Speakers Program The U.S. Northern Command Speaker's Program works to increase face-to -face contact with our public to help build and sustain public understanding of our command missions and
Speakers of Different Languages Process the Visual World Differently

PubMed Central

Chabal, Sarah; Marian, Viorica

2015-01-01

Language and vision are highly interactive. Here we show that people activate language when they perceive the visual world, and that this language information impacts how speakers of different languages focus their attention. For example, when searching for an item (e.g., clock) in the same visual display, English and Spanish speakers look at different objects. Whereas English speakers searching for the clock also look at a cloud, Spanish speakers searching for the clock also look at a gift, because the Spanish names for gift (regalo) and clock (reloj) overlap phonologically. These different looking patterns emerge despite an absence of direct linguistic input, showing that language is automatically activated by visual scene processing. We conclude that the varying linguistic information available to speakers of different languages affects visual perception, leading to differences in how the visual world is processed. PMID:26030171
Learning foreign labels from a foreign speaker: the role of (limited) exposure to a second language.

PubMed

Akhtar, Nameera; Menjivar, Jennifer; Hoicka, Elena; Sabbagh, Mark A

2012-11-01

Three- and four-year-olds (N = 144) were introduced to novel labels by an English speaker and a foreign speaker (of Nordish, a made-up language), and were asked to endorse one of the speaker's labels. Monolingual English-speaking children were compared to bilingual children and English-speaking children who were regularly exposed to a language other than English. All children tended to endorse the English speaker's labels when asked 'What do you call this?', but when asked 'What do you call this in Nordish?', children with exposure to a second language were more likely to endorse the foreign label than monolingual and bilingual children. The findings suggest that, at this age, exposure to, but not necessarily immersion in, more than one language may promote the ability to learn foreign words from a foreign speaker.
Surmounting the Tower of Babel: Monolingual and bilingual 2-year-olds' understanding of the nature of foreign language words.

PubMed

Byers-Heinlein, Krista; Chen, Ke Heng; Xu, Fei

2014-03-01

Languages function as independent and distinct conventional systems, and so each language uses different words to label the same objects. This study investigated whether 2-year-old children recognize that speakers of their native language and speakers of a foreign language do not share the same knowledge. Two groups of children unfamiliar with Mandarin were tested: monolingual English-learning children (n=24) and bilingual children learning English and another language (n=24). An English speaker taught children the novel label fep. On English mutual exclusivity trials, the speaker asked for the referent of a novel label (wug) in the presence of the fep and a novel object. Both monolingual and bilingual children disambiguated the reference of the novel word using a mutual exclusivity strategy, choosing the novel object rather than the fep. On similar trials with a Mandarin speaker, children were asked to find the referent of a novel Mandarin label kuò. Monolinguals again chose the novel object rather than the object with the English label fep, even though the Mandarin speaker had no access to conventional English words. Bilinguals did not respond systematically to the Mandarin speaker, suggesting that they had enhanced understanding of the Mandarin speaker's ignorance of English words. The results indicate that monolingual children initially expect words to be conventionally shared across all speakers-native and foreign. Early bilingual experience facilitates children's discovery of the nature of foreign language words. Copyright © 2013 Elsevier Inc. All rights reserved.
Content-specific coordination of listeners' to speakers' EEG during communication.

PubMed

Kuhlen, Anna K; Allefeld, Carsten; Haynes, John-Dylan

2012-01-01

Cognitive neuroscience has recently begun to extend its focus from the isolated individual mind to two or more individuals coordinating with each other. In this study we uncover a coordination of neural activity between the ongoing electroencephalogram (EEG) of two people-a person speaking and a person listening. The EEG of one set of twelve participants ("speakers") was recorded while they were narrating short stories. The EEG of another set of twelve participants ("listeners") was recorded while watching audiovisual recordings of these stories. Specifically, listeners watched the superimposed videos of two speakers simultaneously and were instructed to attend either to one or the other speaker. This allowed us to isolate neural coordination due to processing the communicated content from the effects of sensory input. We find several neural signatures of communication: First, the EEG is more similar among listeners attending to the same speaker than among listeners attending to different speakers, indicating that listeners' EEG reflects content-specific information. Secondly, listeners' EEG activity correlates with the attended speakers' EEG, peaking at a time delay of about 12.5 s. This correlation takes place not only between homologous, but also between non-homologous brain areas in speakers and listeners. A semantic analysis of the stories suggests that listeners coordinate with speakers at the level of complex semantic representations, so-called "situation models". With this study we link a coordination of neural activity between individuals directly to verbally communicated information.
A Cross-Language Study of Acoustic Predictors of Speech Intelligibility in Individuals With Parkinson's Disease

PubMed Central

Choi, Yaelin

2017-01-01

Purpose The present study aimed to compare acoustic models of speech intelligibility in individuals with the same disease (Parkinson's disease [PD]) and presumably similar underlying neuropathologies but with different native languages (American English [AE] and Korean). Method A total of 48 speakers from the 4 speaker groups (AE speakers with PD, Korean speakers with PD, healthy English speakers, and healthy Korean speakers) were asked to read a paragraph in their native languages. Four acoustic variables were analyzed: acoustic vowel space, voice onset time contrast scores, normalized pairwise variability index, and articulation rate. Speech intelligibility scores were obtained from scaled estimates of sentences extracted from the paragraph. Results The findings indicated that the multiple regression models of speech intelligibility were different in Korean and AE, even with the same set of predictor variables and with speakers matched on speech intelligibility across languages. Analysis of the descriptive data for the acoustic variables showed the expected compression of the vowel space in speakers with PD in both languages, lower normalized pairwise variability index scores in Korean compared with AE, and no differences within or across language in articulation rate. Conclusions The results indicate that the basis of an intelligibility deficit in dysarthria is likely to depend on the native language of the speaker and listener. Additional research is required to explore other potential predictor variables, as well as additional language comparisons to pursue cross-linguistic considerations in classification and diagnosis of dysarthria types. PMID:28821018
A Cognitively Grounded Measure of Pronunciation Distance

PubMed Central

Wieling, Martijn; Nerbonne, John; Bloem, Jelke; Gooskens, Charlotte; Heeringa, Wilbert; Baayen, R. Harald

2014-01-01

In this study we develop pronunciation distances based on naive discriminative learning (NDL). Measures of pronunciation distance are used in several subfields of linguistics, including psycholinguistics, dialectology and typology. In contrast to the commonly used Levenshtein algorithm, NDL is grounded in cognitive theory of competitive reinforcement learning and is able to generate asymmetrical pronunciation distances. In a first study, we validated the NDL-based pronunciation distances by comparing them to a large set of native-likeness ratings given by native American English speakers when presented with accented English speech. In a second study, the NDL-based pronunciation distances were validated on the basis of perceptual dialect distances of Norwegian speakers. Results indicated that the NDL-based pronunciation distances matched perceptual distances reasonably well with correlations ranging between 0.7 and 0.8. While the correlations were comparable to those obtained using the Levenshtein distance, the NDL-based approach is more flexible as it is also able to incorporate acoustic information other than sound segments. PMID:24416119
The ICSI+ Multilingual Sentence Segmentation System

DTIC Science & Technology

2006-01-01

these steps the ASR output needs to be enriched with information additional to words, such as speaker diarization , sentence segmentation, or story...and the out- of a speaker diarization is considered as well. We first detail extraction of the prosodic features, and then describe the clas- ation...also takes into account the speaker turns that estimated by the diarization system. In addition to the Max- 1) model speaker turn unigrams, trigram
Speaker Segmentation and Clustering Using Gender Information

DTIC Science & Technology

2006-02-01

used in the first stages of segmentation forder information in the clustering of the opposite-gender speaker diarization of news broadcasts. files, the...AFRL-HE-WP-TP-2006-0026 AIR FORCE RESEARCH LABORATORY Speaker Segmentation and Clustering Using Gender Information Brian M. Ore General Dynamics...COVERED (From - To) February 2006 ProceedinLgs 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER Speaker Segmentation and Clustering Using Gender Information 5b
The 2016 NIST Speaker Recognition Evaluation

DTIC Science & Technology

2017-08-20

The 2016 NIST Speaker Recognition Evaluation Seyed Omid Sadjadi1,∗, Timothée Kheyrkhah1,†, Audrey Tong1, Craig Greenberg1, Douglas Reynolds2, Elliot...recent in an ongoing series of speaker recognition evaluations (SRE) to foster research in ro- bust text-independent speaker recognition, as well as...online evaluation platform, a fixed training data condition, more variability in test segment duration (uni- formly distributed between 10s and 60s
Magnetic Fluids Deliver Better Speaker Sound Quality

NASA Technical Reports Server (NTRS)

2015-01-01

In the 1960s, Glenn Research Center developed a magnetized fluid to draw rocket fuel into spacecraft engines while in space. Sony has incorporated the technology into its line of slim speakers by using the fluid as a liquid stand-in for the speaker's dampers, which prevent the speaker from blowing out while adding stability. The fluid helps to deliver more volume and hi-fidelity sound while reducing distortion.
Coronal View Ultrasound Imaging of Movement in Different Segments of the Tongue during Paced Recital: Findings from Four Normal Speakers and a Speaker with Partial Glossectomy

ERIC Educational Resources Information Center

Bressmann, Tim; Flowers, Heather; Wong, Willy; Irish, Jonathan C.

2010-01-01

The goal of this study was to quantitatively describe aspects of coronal tongue movement in different anatomical regions of the tongue. Four normal speakers and a speaker with partial glossectomy read four repetitions of a metronome-paced poem. Their tongue movement was recorded in four coronal planes using two-dimensional B-mode ultrasound…
Making Math Real: Effective Qualities of Guest Speaker Presentations and the Impact of Speakers on Student Attitude and Achievement in the Algebra Classroom

ERIC Educational Resources Information Center

McKain, Danielle R.

2012-01-01

The term real world is often used in mathematics education, yet the definition of real-world problems and how to incorporate them in the classroom remains ambiguous. One way real-world connections can be made is through guest speakers. Guest speakers can offer different perspectives and share knowledge about various subject areas, yet the impact…
When pitch Accents Encode Speaker Commitment: Evidence from French Intonation.

PubMed

Michelas, Amandine; Portes, Cristel; Champagne-Lavau, Maud

2016-06-01

Recent studies on a variety of languages have shown that a speaker's commitment to the propositional content of his or her utterance can be encoded, among other strategies, by pitch accent types. Since prior research mainly relied on lexical-stress languages, our understanding of how speakers of a non-lexical-stress language encode speaker commitment is limited. This paper explores the contribution of the last pitch accent of an intonation phrase to convey speaker commitment in French, a language that has stress at the phrasal level as well as a restricted set of pitch accents. In a production experiment, participants had to produce sentences in two pragmatic contexts: unbiased questions (the speaker had no particular belief with respect to the expected answer) and negatively biased questions (the speaker believed the proposition to be false). Results revealed that negatively biased questions consistently exhibited an additional unaccented F0 peak in the preaccentual syllable (an H+!H* pitch accent) while unbiased questions were often realized with a rising pattern across the accented syllable (an H* pitch accent). These results provide evidence that pitch accent types in French can signal the speaker's belief about the certainty of the proposition expressed in French. It also has implications for the phonological model of French intonation.
Challenging stereotypes and changing attitudes: Improving quality of care for people with hepatitis C through Positive Speakers programs.

PubMed

Brener, Loren; Wilson, Hannah; Rose, Grenville; Mackenzie, Althea; de Wit, John

2013-01-01

Positive Speakers programs consist of people who are trained to speak publicly about their illness. The focus of these programs, especially with stigmatised illnesses such as hepatitis C (HCV), is to inform others of the speakers' experiences, thereby humanising the illness and reducing ignorance associated with the disease. This qualitative research aimed to understand the perceived impact of Positive Speakers programs on changing audience members' attitudes towards people with HCV. Interviews were conducted with nine Positive Speakers and 16 of their audience members to assess the way in which these sessions were perceived by both speakers and the audience to challenge stereotypes and stigma associated with HCV and promote positive attitude change amongst the audience. Data were analysed using Intergroup Contact Theory to frame the analysis with a focus on whether the program met the optimal conditions to promote attitude change. Findings suggest that there are a number of vital components to this Positive Speakers program which ensures that the program meets the requirements for successful and equitable intergroup contact. This Positive Speakers program thereby helps to deconstruct stereotypes about people with HCV, while simultaneously increasing positive attitudes among audience members with the ultimate aim of improving quality of health care and treatment for people with HCV.
Aeroacoustic Characterization of the NASA Ames Experimental Aero-Physics Branch 32- by 48-Inch Subsonic Wind Tunnel with a 24-Element Phased Microphone Array

NASA Technical Reports Server (NTRS)

Costanza, Bryan T.; Horne, William C.; Schery, S. D.; Babb, Alex T.

2011-01-01

The Aero-Physics Branch at NASA Ames Research Center utilizes a 32- by 48-inch subsonic wind tunnel for aerodynamics research. The feasibility of acquiring acoustic measurements with a phased microphone array was recently explored. Acoustic characterization of the wind tunnel was carried out with a floor-mounted 24-element array and two ceiling-mounted speakers. The minimum speaker level for accurate level measurement was evaluated for various tunnel speeds up to a Mach number of 0.15 and streamwise speaker locations. A variety of post-processing procedures, including conventional beamforming and deconvolutional processing such as TIDY, were used. The speaker measurements, with and without flow, were used to compare actual versus simulated in-flow speaker calibrations. Data for wind-off speaker sound and wind-on tunnel background noise were found valuable for predicting sound levels for which the speakers were detectable when the wind was on. Speaker sources were detectable 2 - 10 dB below the peak background noise level with conventional data processing. The effectiveness of background noise cross-spectral matrix subtraction was assessed and found to improve the detectability of test sound sources by approximately 10 dB over a wide frequency range.
Engaging spaces: Intimate electro-acoustic display in alternative performance venues

NASA Astrophysics Data System (ADS)

Bahn, Curtis; Moore, Stephan

2004-05-01

In past presentations to the ASA, we have described the design and construction of four generations of unique spherical speakers (multichannel, outward-radiating geodesic speaker arrays) and Sensor-Speaker-Arrays, (SenSAs: combinations of various sensor devices with outward-radiating multichannel speaker arrays). This presentation will detail the ways in which arrays of these speakers have been employed in alternative performance venues-providing presence and intimacy in the performance of electro-acoustic chamber music and sound installation, while engaging natural and unique acoustical qualities of various locations. We will present documentation of the use of multichannel sonic diffusion arrays in small clubs, ``black-box'' theaters, planetariums, and art galleries.

Intonation and gender perception: applications for transgender speakers.

PubMed

Hancock, Adrienne; Colton, Lindsey; Douglas, Fiacre

2014-03-01

Intonation is commonly addressed in voice and communication feminization therapy, yet empirical evidence of gender differences for intonation is scarce and rarely do studies examine how it relates to gender perception of transgender speakers. This study examined intonation of 12 males, 12 females, six female-to-male, and 14 male-to-female transgender speakers describing a Norman Rockwell image. Several intonation measures were compared between biological gender groups, between perceived gender groups, and between male-to-female (MTF) speakers who were perceived as male, female, or ambiguous gender. Speakers with a larger percentage of utterances with upward intonation and a larger utterance semitone range were perceived as female by listeners, despite no significant differences between the actual intonation of the four gender groups. MTF speakers who do not pass as female appear to use less upward and more downward intonations than female and passing MTF speakers. Intonation has potential for use in transgender communication therapy because it can influence perception to some degree. Copyright © 2014 The Voice Foundation. Published by Mosby, Inc. All rights reserved.
What a speaker's choice of frame reveals: reference points, frame selection, and framing effects.

PubMed

McKenzie, Craig R M; Nelson, Jonathan D

2003-09-01

Framing effects are well established: Listeners' preferences depend on how outcomes are described to them, or framed. Less well understood is what determines how speakers choose frames. Two experiments revealed that reference points systematically influenced speakers' choices between logically equivalent frames. For example, speakers tended to describe a 4-ounce cup filled to the 2-ounce line as half full if it was previously empty but described it as half empty if it was previously full. Similar results were found when speakers could describe the outcome of a medical treatment in terms of either mortality or survival (e.g., 25% die vs. 75% survive). Two additional experiments showed that listeners made accurate inferences about speakers' reference points on the basis of the selected frame (e.g., if a speaker described a cup as half empty, listeners inferred that the cup used to be full). Taken together, the data suggest that frames reliably convey implicit information in addition to their explicit content, which helps explain why framing effects are so robust.
The association between tobacco, alcohol, and drug use, stress, and depression among uninsured free clinic patients: U.S.-born English speakers, non-U.S.-born English speakers, and Spanish speakers.

PubMed

Kamimura, Akiko; Ashby, Jeanie; Tabler, Jennifer; Nourian, Maziar M; Trinh, Ha Ngoc; Chen, Jason; Reel, Justine J

2017-01-01

The abuse of substances is a significant public health issue. Perceived stress and depression have been found to be related to the abuse of substances. The purpose of this study is to examine the prevalence of substance use (i.e., alcohol problems, smoking, and drug use) and the association between substance use, perceived stress, and depression among free clinic patients. Patients completed a self-administered survey in 2015 (N = 504). The overall prevalence of substance use among free clinic patients was not high compared to the U.S. general population. U.S.-born English speakers reported a higher prevalence rate of tobacco smoking and drug use than did non-U.S.-born English speakers and Spanish speakers. Alcohol problems and smoking were significantly related to higher levels of perceived stress and depression. Substance use prevention and education should be included in general health education programs. U.S.-born English speakers would need additional attention. Mental health intervention would be essential to prevention and intervention.
Cerebral bases of subliminal speech priming.

PubMed

Kouider, Sid; de Gardelle, Vincent; Dehaene, Stanislas; Dupoux, Emmanuel; Pallier, Christophe

2010-01-01

While the neural correlates of unconscious perception and subliminal priming have been largely studied for visual stimuli, little is known about their counterparts in the auditory modality. Here we used a subliminal speech priming method in combination with fMRI to investigate which regions of the cerebral network for language can respond in the absence of awareness. Participants performed a lexical decision task on target items preceded by subliminal primes, which were either phonetically identical or different from the target. Moreover, the prime and target could be spoken by the same speaker or by two different speakers. Word repetition reduced the activity in the insula and in the left superior temporal gyrus. Although the priming effect on reaction times was independent of voice manipulation, neural repetition suppression was modulated by speaker change in the superior temporal gyrus while the insula showed voice-independent priming. These results provide neuroimaging evidence of subliminal priming for spoken words and inform us on the first, unconscious stages of speech perception.
Speaker box made of composite particle board based on mushroom growing media waste

NASA Astrophysics Data System (ADS)

Tjahjanti, P. H.; Sutarman, Widodo, E.; Kurniawan, A. R.; Winarno, A. T.; Yani, A.

2017-06-01

This research aimed to use mushroom growing media waste (MGMW) that was added by urea, starch and polyvinyl chloride (PVC) glue as a composite particle board to be used as the material of speaker box manufacture. Physical and mechanical testing of particle board including density, moisture content, thickness swelling after immersion in water, strength in water absorption, internal bonding, modulus of elasticity, modulus of rupture and screw holding power, were carried out in accordance with the Stándar Nasional Indonesia (SNI) 03-2105-2006 and Japanese International Standard (JIS) A 5908-2003. The optimum composition of composite particle boards was 60% MGMW + 39% (50% urea +50% starch) + 1% PVC glue. Furthermore, the optimum composition to create speaker box with hardness values of 14.9 Brinnel Hardness Number and results of vibration test obtained amplitude values of the Z-axis, minimum of 0.032007 and maximum of 0.151575. For the acoustic test, results showed good sound absorption coefficients at frequencies of 500 Hz and it has better damping absorption.
Interface strategies in monolingual and end-state L2 Spanish grammars are not that different.

PubMed

Parafita Couto, María C; Mueller Gathercole, Virginia C; Stadthagen-González, Hans

2014-01-01

This study explores syntactic, pragmatic, and lexical influences on adherence to SV and VS orders in native and fluent L2 speakers of Spanish. A judgment task examined 20 native monolingual and 20 longstanding L2 bilingual Spanish speakers' acceptance of SV and VS structures. Seventy-six distinct verbs were tested under a combination of syntactic and pragmatic constraints. Our findings challenge the hypothesis that internal interfaces are acquired more easily than external interfaces (Sorace, 2005, 2011; Sorace and Filiaci, 2006; White, 2006). Additional findings are that (a) bilinguals' judgments are less firm overall than monolinguals' (i.e., monolinguals are more likely to give extreme "yes" or "no" judgments) and (b) individual verbs do not necessarily behave as predicted under standard definitions of unaccusatives and unergatives. Correlations of the patterns found in the data with verb frequencies suggest that usage-based accounts of grammatical knowledge could help provide insight into speakers' knowledge of these constructs.
Electrophysiological evidence for a general auditory prediction deficit in adults who stutter

PubMed Central

Daliri, Ayoub; Max, Ludo

2015-01-01

We previously found that stuttering individuals do not show the typical auditory modulation observed during speech planning in nonstuttering individuals. In this follow-up study, we further elucidate this difference by investigating whether stuttering speakers’ atypical auditory modulation is observed only when sensory predictions are based on movement planning or also when predictable auditory input is not a consequence of one’s own actions. We recorded 10 stuttering and 10 nonstuttering adults’ auditory evoked potentials in response to random probe tones delivered while anticipating either speaking aloud or hearing one’s own speech played back and in a control condition without auditory input (besides probe tones). N1 amplitude of nonstuttering speakers was reduced prior to both speaking and hearing versus the control condition. Stuttering speakers, however, showed no N1 amplitude reduction in either the speaking or hearing condition as compared with control. Thus, findings suggest that stuttering speakers have general auditory prediction difficulties. PMID:26335995
Self-, other-, and joint monitoring using forward models.

PubMed

Pickering, Martin J; Garrod, Simon

2014-01-01

In the psychology of language, most accounts of self-monitoring assume that it is based on comprehension. Here we outline and develop the alternative account proposed by Pickering and Garrod (2013), in which speakers construct forward models of their upcoming utterances and compare them with the utterance as they produce them. We propose that speakers compute inverse models derived from the discrepancy (error) between the utterance and the predicted utterance and use that to modify their production command or (occasionally) begin anew. We then propose that comprehenders monitor other people's speech by simulating their utterances using covert imitation and forward models, and then comparing those forward models with what they hear. They use the discrepancy to compute inverse models and modify their representation of the speaker's production command, or realize that their representation is incorrect and may develop a new production command. We then discuss monitoring in dialogue, paying attention to sequential contributions, concurrent feedback, and the relationship between monitoring and alignment.
Tracking Multiple Statistics: Simultaneous Learning of Object Names and Categories in English and Mandarin Speakers.

PubMed

Chen, Chi-Hsin; Gershkoff-Stowe, Lisa; Wu, Chih-Yi; Cheung, Hintat; Yu, Chen

2017-08-01

Two experiments were conducted to examine adult learners' ability to extract multiple statistics in simultaneously presented visual and auditory input. Experiment 1 used a cross-situational learning paradigm to test whether English speakers were able to use co-occurrences to learn word-to-object mappings and concurrently form object categories based on the commonalities across training stimuli. Experiment 2 replicated the first experiment and further examined whether speakers of Mandarin, a language in which final syllables of object names are more predictive of category membership than English, were able to learn words and form object categories when trained with the same type of structures. The results indicate that both groups of learners successfully extracted multiple levels of co-occurrence and used them to learn words and object categories simultaneously. However, marked individual differences in performance were also found, suggesting possible interference and competition in processing the two concurrent streams of regularities. Copyright © 2016 Cognitive Science Society, Inc.
EEG-based auditory attention decoding using unprocessed binaural signals in reverberant and noisy conditions?

PubMed

Aroudi, Ali; Doclo, Simon

2017-07-01

To decode auditory attention from single-trial EEG recordings in an acoustic scenario with two competing speakers, a least-squares method has been recently proposed. This method however requires the clean speech signals of both the attended and the unattended speaker to be available as reference signals. Since in practice only the binaural signals consisting of a reverberant mixture of both speakers and background noise are available, in this paper we explore the potential of using these (unprocessed) signals as reference signals for decoding auditory attention in different acoustic conditions (anechoic, reverberant, noisy, and reverberant-noisy). In addition, we investigate whether it is possible to use these signals instead of the clean attended speech signal for filter training. The experimental results show that using the unprocessed binaural signals for filter training and for decoding auditory attention is feasible with a relatively large decoding performance, although for most acoustic conditions the decoding performance is significantly lower than when using the clean speech signals.
The Emergence of Systematic Review in Toxicology

PubMed Central

Stephens, Martin L.; Betts, Kellyn; Beck, Nancy B.; Cogliano, Vincent; Dickersin, Kay; Fitzpatrick, Suzanne; Freeman, James; Gray, George; Hartung, Thomas; McPartland, Jennifer; Rooney, Andrew A.; Scherer, Roberta W.; Verloo, Didier; Hoffmann, Sebastian

2016-01-01

The Evidence-based Toxicology Collaboration hosted a workshop on “The Emergence of Systematic Review and Related Evidence-based Approaches in Toxicology,” on November 21, 2014 in Baltimore, Maryland. The workshop featured speakers from agencies and organizations applying systematic review approaches to questions in toxicology, speakers with experience in conducting systematic reviews in medicine and healthcare, and stakeholders in industry, government, academia, and non-governmental organizations. Based on the workshop presentations and discussion, here we address the state of systematic review methods in toxicology, historical antecedents in both medicine and toxicology, challenges to the translation of systematic review from medicine to toxicology, and thoughts on the way forward. We conclude with a recommendation that as various agencies and organizations adapt systematic review methods, they continue to work together to ensure that there is a harmonized process for how the basic elements of systematic review methods are applied in toxicology. PMID:27208075
Advanced Therapy Medicinal Products: How to Bring Cell-Based Medicinal Products Successfully to the Market - Report from the CAT-DGTI-GSCN Workshop at the DGTI Annual Meeting 2014.

PubMed

Celis, Patrick; Ferry, Nicolas; Hystad, Marit; Schüßler-Lenz, Martina; Doevendans, Pieter A; Flory, Egbert; Beuneu, Claire; Reischl, Ilona; Salmikangas, Paula

2015-05-01

On September 11, 2014, a workshop entitled 'Advanced Therapy Medicinal Products: How to Bring Cell-Based Medicinal Product Successfully to the Market' was held at the 47th annual meeting of the German Society for Transfusion Medicine and Immunohematology (DGTI), co-organised by the European Medicines Agency (EMA) and the DGTI in collaboration with the German Stem Cell Network (GSCN). The workshop brought together over 160 participants from academia, hospitals, small- or medium-sized enterprise developers and regulators. At the workshop, speakers from EMA, the Committee for Advanced Therapies (CAT), industry and academia addressed the regulatory aspects of development and authorisation of advanced therapy medicinal products (ATMPs), classification of ATMPs and considerations on cell-based therapies for cardiac repair. The open forum discussion session allowed for a direct interaction between ATMP developers and the speakers from EMA and CAT.
Real-time feedback control of three-dimensional Tollmien-Schlichting waves using a dual-slot actuator geometry

NASA Astrophysics Data System (ADS)

Vemuri, SH. S.; Bosworth, R.; Morrison, J. F.; Kerrigan, E. C.

2018-05-01

The growth of Tollmien-Schlichting (TS) waves is experimentally attenuated using a single-input and single-output (SISO) feedback system, where the TS wave packet is generated by a surface point source in a flat-plate boundary layer. The SISO system consists of a single wall-mounted hot wire as the sensor and a miniature speaker as the actuator. The actuation is achieved through a dual-slot geometry to minimize the cavity near-field effects on the sensor. The experimental setup to generate TS waves or wave packets is very similar to that used by Li and Gaster [J. Fluid Mech. 550, 185 (2006), 10.1017/S0022112005008219]. The aim is to investigate the performance of the SISO control system in attenuating single-frequency, two-dimensional disturbances generated by these configurations. The necessary plant models are obtained using system identification, and the controllers are then designed based on the models and implemented in real-time to test their performance. Cancellation of the rms streamwise velocity fluctuation of TS waves is evident over a significant domain.
Spatial hearing ability of the pigmented Guinea pig (Cavia porcellus): Minimum audible angle and spatial release from masking in azimuth.

PubMed

Greene, Nathaniel T; Anbuhl, Kelsey L; Ferber, Alexander T; DeGuzman, Marisa; Allen, Paul D; Tollin, Daniel J

2018-08-01

Despite the common use of guinea pigs in investigations of the neural mechanisms of binaural and spatial hearing, their behavioral capabilities in spatial hearing tasks have surprisingly not been thoroughly investigated. To begin to fill this void, we tested the spatial hearing of adult male guinea pigs in several experiments using a paradigm based on the prepulse inhibition (PPI) of the acoustic startle response. In the first experiment, we presented continuous broadband noise from one speaker location and switched to a second speaker location (the "prepulse") along the azimuth prior to presenting a brief, ∼110 dB SPL startle-eliciting stimulus. We found that the startle response amplitude was systematically reduced for larger changes in speaker swap angle (i.e., greater PPI), indicating that using the speaker "swap" paradigm is sufficient to assess stimulus detection of spatially separated sounds. In a second set of experiments, we swapped low- and high-pass noise across the midline to estimate their ability to utilize interaural time- and level-difference cues, respectively. The results reveal that guinea pigs can utilize both binaural cues to discriminate azimuthal sound sources. A third set of experiments examined spatial release from masking using a continuous broadband noise masker and a broadband chirp signal, both presented concurrently at various speaker locations. In general, animals displayed an increase in startle amplitude (i.e., lower PPI) when the masker was presented at speaker locations near that of the chirp signal, and reduced startle amplitudes (increased PPI) indicating lower detection thresholds when the noise was presented from more distant speaker locations. In summary, these results indicate that guinea pigs can: 1) discriminate changes in source location within a hemifield as well as across the midline, 2) discriminate sources of low- and high-pass sounds, demonstrating that they can effectively utilize both low-frequency interaural time and high-frequency level difference sound localization cues, and 3) utilize spatial release from masking to discriminate sound sources. This report confirms the guinea pig as a suitable spatial hearing model and reinforces prior estimates of guinea pig hearing ability from acoustical and physiological measurements. Copyright © 2018 Elsevier B.V. All rights reserved.
A Study on Metadiscoursive Interaction in the MA Theses of the Native Speakers of English and the Turkish Speakers of English

ERIC Educational Resources Information Center

Köroglu, Zehra; Tüm, Gülden

2017-01-01

This study has been conducted to evaluate the TM usage in the MA theses written by the native speakers (NSs) of English and the Turkish speakers (TSs) of English. The purpose is to compare the TM usage in the introduction, results and discussion, and conclusion sections by both groups' randomly selected MA theses in the field of ELT between the…
Improving the Effectiveness of Speaker Verification Domain Adaptation With Inadequate In-Domain Data

DTIC Science & Technology

2017-08-20

Improving the Effectiveness of Speaker Verification Domain Adaptation With Inadequate In-Domain Data Bengt J. Borgström1, Elliot Singer1, Douglas...ll.mit.edu.edu, dar@ll.mit.edu, es@ll.mit.edu, omid.sadjadi@nist.gov Abstract This paper addresses speaker verification domain adaptation with...contain speakers with low channel diversity. Existing domain adaptation methods are reviewed, and their shortcomings are discussed. We derive an
Mortality inequality in two native population groups.

PubMed

Saarela, Jan; Finnäs, Fjalar

2005-11-01

A sample of people aged 40-67 years, taken from a longitudinal register compiled by Statistics Finland, is used to analyse mortality differences between Swedish speakers and Finnish speakers in Finland. Finnish speakers are known to have higher death rates than Swedish speakers. The purpose is to explore whether labour-market experience and partnership status, treated as proxies for measures of variation in health-related characteristics, are related to the mortality differential. Persons who are single, disability pensioners, and those having experienced unemployment are found to have substantially higher death rates than those with a partner and employed persons. Swedish speakers have a more favourable distribution on both variables, which thus notably helps to reduce the Finnish-Swedish mortality gradient. A conclusion from this study is that future analyses on the topic should focus on mechanisms that bring a greater proportion of Finnish speakers into the groups with poor health or supposed unhealthy behaviour.
How Psychological Stress Affects Emotional Prosody.

PubMed

Paulmann, Silke; Furnes, Desire; Bøkenes, Anne Ming; Cozzolino, Philip J

2016-01-01

We explored how experimentally induced psychological stress affects the production and recognition of vocal emotions. In Study 1a, we demonstrate that sentences spoken by stressed speakers are judged by naïve listeners as sounding more stressed than sentences uttered by non-stressed speakers. In Study 1b, negative emotions produced by stressed speakers are generally less well recognized than the same emotions produced by non-stressed speakers. Multiple mediation analyses suggest this poorer recognition of negative stimuli was due to a mismatch between the variation of volume voiced by speakers and the range of volume expected by listeners. Together, this suggests that the stress level of the speaker affects judgments made by the receiver. In Study 2, we demonstrate that participants who were induced with a feeling of stress before carrying out an emotional prosody recognition task performed worse than non-stressed participants. Overall, findings suggest detrimental effects of induced stress on interpersonal sensitivity.
In the eye of the beholder: eye contact increases resistance to persuasion.

PubMed

Chen, Frances S; Minson, Julia A; Schöne, Maren; Heinrichs, Markus

2013-11-01

Popular belief holds that eye contact increases the success of persuasive communication, and prior research suggests that speakers who direct their gaze more toward their listeners are perceived as more persuasive. In contrast, we demonstrate that more eye contact between the listener and speaker during persuasive communication predicts less attitude change in the direction advocated. In Study 1, participants freely watched videos of speakers expressing various views on controversial sociopolitical issues. Greater direct gaze at the speaker's eyes was associated with less attitude change in the direction advocated by the speaker. In Study 2, we instructed participants to look at either the eyes or the mouths of speakers presenting arguments counter to participants' own attitudes. Intentionally maintaining direct eye contact led to less persuasion than did gazing at the mouth. These findings suggest that efforts at increasing eye contact may be counterproductive across a variety of persuasion contexts.
How Psychological Stress Affects Emotional Prosody

PubMed Central

Paulmann, Silke; Furnes, Desire; Bøkenes, Anne Ming; Cozzolino, Philip J.

2016-01-01

We explored how experimentally induced psychological stress affects the production and recognition of vocal emotions. In Study 1a, we demonstrate that sentences spoken by stressed speakers are judged by naïve listeners as sounding more stressed than sentences uttered by non-stressed speakers. In Study 1b, negative emotions produced by stressed speakers are generally less well recognized than the same emotions produced by non-stressed speakers. Multiple mediation analyses suggest this poorer recognition of negative stimuli was due to a mismatch between the variation of volume voiced by speakers and the range of volume expected by listeners. Together, this suggests that the stress level of the speaker affects judgments made by the receiver. In Study 2, we demonstrate that participants who were induced with a feeling of stress before carrying out an emotional prosody recognition task performed worse than non-stressed participants. Overall, findings suggest detrimental effects of induced stress on interpersonal sensitivity. PMID:27802287

Don't Underestimate the Benefits of Being Misunderstood.

PubMed

Gibson, Edward; Tan, Caitlin; Futrell, Richard; Mahowald, Kyle; Konieczny, Lars; Hemforth, Barbara; Fedorenko, Evelina

2017-06-01

Being a nonnative speaker of a language poses challenges. Individuals often feel embarrassed by the errors they make when talking in their second language. However, here we report an advantage of being a nonnative speaker: Native speakers give foreign-accented speakers the benefit of the doubt when interpreting their utterances; as a result, apparently implausible utterances are more likely to be interpreted in a plausible way when delivered in a foreign than in a native accent. Across three replicated experiments, we demonstrated that native English speakers are more likely to interpret implausible utterances, such as "the mother gave the candle the daughter," as similar plausible utterances ("the mother gave the candle to the daughter") when the speaker has a foreign accent. This result follows from the general model of language interpretation in a noisy channel, under the hypothesis that listeners assume a higher error rate in foreign-accented than in nonaccented speech.
Speakers of different languages process the visual world differently.

PubMed

Chabal, Sarah; Marian, Viorica

2015-06-01

Language and vision are highly interactive. Here we show that people activate language when they perceive the visual world, and that this language information impacts how speakers of different languages focus their attention. For example, when searching for an item (e.g., clock) in the same visual display, English and Spanish speakers look at different objects. Whereas English speakers searching for the clock also look at a cloud, Spanish speakers searching for the clock also look at a gift, because the Spanish names for gift (regalo) and clock (reloj) overlap phonologically. These different looking patterns emerge despite an absence of direct language input, showing that linguistic information is automatically activated by visual scene processing. We conclude that the varying linguistic information available to speakers of different languages affects visual perception, leading to differences in how the visual world is processed. (c) 2015 APA, all rights reserved).
A Feature-Based Contrastive Approach to the L2 Acquisition of Specificity

ERIC Educational Resources Information Center

Cho, Jacee; Slabakova, Roumyana

2017-01-01

This study examined the acquisition of the Russian indefinite determiners ("kakoj"-"to" "which"-"to" and "kakoj"-"nibud" "which"-"nibud'') encoding scopal specificity by English and Korean native speakers within the feature-based contrastive framework (Lardiere 2008, 2009).…
Birth order and mortality in two ethno-linguistic groups: Register-based evidence from Finland.

PubMed

Saarela, Jan; Cederström, Agneta; Rostila, Mikael

2016-06-01

Previous research has documented an association between birth order and suicide, although no study has examined whether it depends on the cultural context. Our aim was to study the association between birth order and cause-specific mortality in Finland, and whether it varies by ethno-linguistic affiliation. We used data from the Finnish population register, representing a 5% random sample of all Finnish speakers and a 20% random sample of Swedish speakers, who lived in Finland in any year 1987-2011. For each person, there was a link to all children who were alive in 1987. In total, there were 254,059 siblings in 96,387 sibling groups, and 9797 deaths. We used Cox regressions stratified by each siblings group and estimated all-cause and cause-specific mortality risks during the period 1987-2011. In line with previous research from Sweden, deaths from suicide were significantly associated with birth order. As compared to first-born, second-born had a suicide risk of 1.27, third-born of 1.35, and fourth- or higher-born of 1.72, while other causes of death did not display an evident and consistent birth-order pattern. Results for the Finnish-speaking siblings groups were almost identical to those based on both ethno-linguistic groups. In the Swedish-speaking siblings groups, there was no increase in the suicide risk by birth order, but a statistically not significant tendency towards an association with other external causes of death and deaths from cardiovascular diseases. Our findings provided evidence for an association between birth order and suicide among Finnish speakers in Finland, while no such association was found for Swedish speakers, suggesting that the birth order effect might depend on the cultural context. Copyright © 2016 Elsevier Ltd. All rights reserved.
Who Is He? Children with ASD and ADHD Take the Listener into Account in Their Production of Ambiguous Pronouns

PubMed Central

Kuijper, Sanne J. M.; Hartman, Catharina A.; Hendriks, Petra

2015-01-01

During conversation, speakers constantly make choices about how specific they wish to be in their use of referring expressions. In the present study we investigate whether speakers take the listener into account or whether they base their referential choices solely on their own representation of the discourse. We do this by examining the cognitive mechanisms that underlie the choice of referring expression at different discourse moments. Furthermore, we provide insights into how children with Autism Spectrum Disorder (ASD) and Attention Deficit Hyperactivity Disorder (ADHD) use referring expressions and whether their use differs from that of typically developing (TD) children. Children between 6 and 12 years old (ASD: n=46; ADHD: n=37; TD: n=38) were tested on their production of referring expressions and on Theory of Mind, response inhibition and working memory. We found support for the view that speakers take the listener into account when choosing a referring expression: Theory of Mind was related to referential choice only at those moments when speakers could not solely base their choice on their own discourse representation to be understood. Working memory appeared to be involved in keeping track of the different referents in the discourse. Furthermore, we found that TD children as well as children with ASD and children with ADHD took the listener into account in their choice of referring expression. In addition, children with ADHD were less specific than TD children in contexts with more than one referent. The previously observed problems with referential choice in children with ASD may lie in difficulties in keeping track of longer and more complex discourses, rather than in problems with taking into account the listener. PMID:26147200
Flexible spatial perspective-taking: conversational partners weigh multiple cues in collaborative tasks.

PubMed

Galati, Alexia; Avraamides, Marios N

2013-01-01

Research on spatial perspective-taking often focuses on the cognitive processes of isolated individuals as they adopt or maintain imagined perspectives. Collaborative studies of spatial perspective-taking typically examine speakers' linguistic choices, while overlooking their underlying processes and representations. We review evidence from two collaborative experiments that examine the contribution of social and representational cues to spatial perspective choices in both language and the organization of spatial memory. Across experiments, speakers organized their memory representations according to the convergence of various cues. When layouts were randomly configured and did not afford intrinsic cues, speakers encoded their partner's viewpoint in memory, if available, but did not use it as an organizing direction. On the other hand, when the layout afforded an intrinsic structure, speakers organized their spatial memories according to the person-centered perspective reinforced by the layout's structure. Similarly, in descriptions, speakers considered multiple cues whether available a priori or at the interaction. They used partner-centered expressions more frequently (e.g., "to your right") when the partner's viewpoint was misaligned by a small offset or coincided with the layout's structure. Conversely, they used egocentric expressions more frequently when their own viewpoint coincided with the intrinsic structure or when the partner was misaligned by a computationally difficult, oblique offset. Based on these findings we advocate for a framework for flexible perspective-taking: people weigh multiple cues (including social ones) to make attributions about the relative difficulty of perspective-taking for each partner, and adapt behavior to minimize their collective effort. This framework is not specialized for spatial reasoning but instead emerges from the same principles and memory-depended processes that govern perspective-taking in non-spatial tasks.
Flexible spatial perspective-taking: conversational partners weigh multiple cues in collaborative tasks

PubMed Central

Galati, Alexia; Avraamides, Marios N.

2013-01-01

Research on spatial perspective-taking often focuses on the cognitive processes of isolated individuals as they adopt or maintain imagined perspectives. Collaborative studies of spatial perspective-taking typically examine speakers' linguistic choices, while overlooking their underlying processes and representations. We review evidence from two collaborative experiments that examine the contribution of social and representational cues to spatial perspective choices in both language and the organization of spatial memory. Across experiments, speakers organized their memory representations according to the convergence of various cues. When layouts were randomly configured and did not afford intrinsic cues, speakers encoded their partner's viewpoint in memory, if available, but did not use it as an organizing direction. On the other hand, when the layout afforded an intrinsic structure, speakers organized their spatial memories according to the person-centered perspective reinforced by the layout's structure. Similarly, in descriptions, speakers considered multiple cues whether available a priori or at the interaction. They used partner-centered expressions more frequently (e.g., “to your right”) when the partner's viewpoint was misaligned by a small offset or coincided with the layout's structure. Conversely, they used egocentric expressions more frequently when their own viewpoint coincided with the intrinsic structure or when the partner was misaligned by a computationally difficult, oblique offset. Based on these findings we advocate for a framework for flexible perspective-taking: people weigh multiple cues (including social ones) to make attributions about the relative difficulty of perspective-taking for each partner, and adapt behavior to minimize their collective effort. This framework is not specialized for spatial reasoning but instead emerges from the same principles and memory-depended processes that govern perspective-taking in non-spatial tasks. PMID:24133432
Predictors and Outcomes of Early vs. Later English Language Proficiency Among English Language Learners

PubMed Central

Halle, Tamara; Hair, Elizabeth; Wandner, Laura; McNamara, Michelle; Chien, Nina

2011-01-01

The development of English language learners (ELLs) was explored from kindergarten through eighth grade within a nationally representative sample of first-time kindergartners (N = 19,890). Growth curve analyses indicated that, compared to native English speakers, ELLs were rated by teachers more favorably on approaches to learning, self control, and externalizing behaviors in kindergarten and generally continued to grow in a positive direction on these social/behavioral outcomes at a steeper rate compared to their native English-speaking peers, holding other factors constant. Differences in reading and math achievement between ELLs and native English speakers varied based on the grade at which English proficiency is attained. Specifically, ELLs who were proficient in English by kindergarten entry kept pace with native English speakers in both reading and math initially and over time; ELLs who were proficient by first grade had modest gaps in reading and math achievement compared to native English speakers that closed narrowly or persisted over time; and ELLs who were not proficient by first grade had the largest initial gaps in reading and math achievement compared to native speakers but the gap narrowed over time in reading and grew over time in math. Among those whose home language is not English, acquiring English proficiency by kindergarten entry was associated with better cognitive and behavioral outcomes through eighth grade compared to taking longer to achieve proficiency. Multinomial regression analyses indicated that child, family, and school characteristics predict achieving English proficiency by kindergarten entry compared to achieving proficiency later. Results are discussed in terms of policies and practices that can support ELL children’s growth and development. PMID:22389551
Speech and pause characteristics associated with voluntary rate reduction in Parkinson's disease and Multiple Sclerosis.

PubMed

Tjaden, Kris; Wilding, Greg

2011-01-01

The primary purpose of this study was to investigate how speakers with Parkinson's disease (PD) and Multiple Sclerosis (MS) accomplish voluntary reductions in speech rate. A group of talkers with no history of neurological disease was included for comparison. This study was motivated by the idea that knowledge of how speakers with dysarthria voluntarily accomplish a reduced speech rate would contribute toward a descriptive model of speaking rate change in dysarthria. Such a model has the potential to assist in identifying rate control strategies to receive focus in clinical treatment programs and also would advance understanding of global speech timing in dysarthria. All speakers read a passage in Habitual and Slow conditions. Speech rate, articulation rate, pause duration, and pause frequency were measured. All speaker groups adjusted articulation time as well as pause time to reduce overall speech rate. Group differences in how voluntary rate reduction was accomplished were primarily one of quantity or degree. Overall, a slower-than-normal rate was associated with a reduced articulation rate, shorter speech runs that included fewer syllables, and longer more frequent pauses. Taken together, these results suggest that existing skills or strategies used by patients should be emphasized in dysarthria training programs focusing on rate reduction. Results further suggest that a model of voluntary speech rate reduction based on neurologically normal speech shows promise as being applicable for mild to moderate dysarthria. The reader will be able to: (1) describe the importance of studying voluntary adjustments in speech rate in dysarthria, (2) discuss how speakers with Parkinson's disease and Multiple Sclerosis adjust articulation time and pause time to slow speech rate. Copyright © 2011 Elsevier Inc. All rights reserved.
Proficiency Differences in Syntactic Processing of Monolingual Native Speakers Indexed by Event-related Potentials

PubMed Central

Pakulak, Eric; Neville, Helen J.

2010-01-01

While anecdotally there appear to be differences in the way native speakers use and comprehend their native language, most empirical investigations of language processing study university students and none have studied differences in language proficiency which may be independent of resource limitations such as working memory span. We examined differences in language proficiency in adult monolingual native speakers of English using an event-related potential (ERP) paradigm. ERPs were recorded to insertion phrase structure violations in naturally spoken English sentences. Participants recruited from a wide spectrum of society were given standardized measures of English language proficiency, and two complementary ERP analyses were performed. In between-groups analyses, participants were divided, based on standardized proficiency scores, into Lower Proficiency (LP) and Higher Proficiency (HP) groups. Compared to LP participants, HP participants showed an early anterior negativity that was more focal, both spatially and temporally, and a larger and more widely distributed positivity (P600) to violations. In correlational analyses, we utilized a wide spectrum of proficiency scores to examine the degree to which individual proficiency scores correlated with individual neural responses to syntactic violations in regions and time windows identified in the between-group analyses. This approach also employed partial correlation analyses to control for possible confounding variables. These analyses provided evidence for the effects of proficiency that converged with the between-groups analyses. These results suggest that adult monolingual native speakers of English who vary in language proficiency differ in the recruitment of syntactic processes that are hypothesized to be at least in part automatic as well as of those thought to be more controlled. These results also suggest that in order to fully characterize neural organization for language in native speakers it is necessary to include participants of varying proficiency. PMID:19925188
Processing ser and estar to locate objects and events: An ERP study with L2 speakers of Spanish.

PubMed

Dussias, Paola E; Contemori, Carla; Román, Patricia

2014-01-01

In Spanish locative constructions, a different form of the copula is selected in relation to the semantic properties of the grammatical subject: sentences that locate objects require estar while those that locate events require ser (both translated in English as 'to be'). In an ERP study, we examined whether second language (L2) speakers of Spanish are sensitive to the selectional restrictions that the different types of subjects impose on the choice of the two copulas. Twenty-four native speakers of Spanish and two groups of L2 Spanish speakers (24 beginners and 18 advanced speakers) were recruited to investigate the processing of 'object/event + estar/ser ' permutations. Participants provided grammaticality judgments on correct (object + estar ; event + ser ) and incorrect (object + ser ; event + estar ) sentences while their brain activity was recorded. In line with previous studies (Leone-Fernández, Molinaro, Carreiras, & Barber, 2012; Sera, Gathje, & Pintado, 1999), the results of the grammaticality judgment for the native speakers showed that participants correctly accepted object + estar and event + ser constructions. In addition, while 'object + ser ' constructions were considered grossly ungrammatical, 'event + estar ' combinations were perceived as unacceptable to a lesser degree. For these same participants, ERP recording time-locked to the onset of the critical word ' en ' showed a larger P600 for the ser predicates when the subject was an object than when it was an event (*La silla es en la cocina vs. La fiesta es en la cocina). This P600 effect is consistent with syntactic repair of the defining predicate when it does not fit with the adequate semantic properties of the subject. For estar predicates (La silla está en la cocina vs. *La fiesta está en la cocina), the findings showed a central-frontal negativity between 500-700 ms. Grammaticality judgment data for the L2 speakers of Spanish showed that beginners were significantly less accurate than native speakers in all conditions, while the advanced speakers only differed from the natives in the event+ ser and event+ estar conditions. For the ERPs, the beginning learners did not show any effects in the time-windows under analysis. The advanced speakers showed a pattern similar to that of native speakers: (1) a P600 response to 'object + ser ' violation more central and frontally distributed, and (2) a central-frontal negativity between 500-700 ms for 'event + estar ' violation. Findings for the advanced speakers suggest that behavioral methods commonly used to assess grammatical knowledge in the L2 may be underestimating what L2 speakers have actually learned.
Reasoning about knowledge: Children's evaluations of generality and verifiability.

PubMed

Koenig, Melissa A; Cole, Caitlin A; Meyer, Meredith; Ridge, Katherine E; Kushnir, Tamar; Gelman, Susan A

2015-12-01

In a series of experiments, we examined 3- to 8-year-old children's (N=223) and adults' (N=32) use of two properties of testimony to estimate a speaker's knowledge: generality and verifiability. Participants were presented with a "Generic speaker" who made a series of 4 general claims about "pangolins" (a novel animal kind), and a "Specific speaker" who made a series of 4 specific claims about "this pangolin" as an individual. To investigate the role of verifiability, we systematically varied whether the claim referred to a perceptually-obvious feature visible in a picture (e.g., "has a pointy nose") or a non-evident feature that was not visible (e.g., "sleeps in a hollow tree"). Three main findings emerged: (1) young children showed a pronounced reliance on verifiability that decreased with age. Three-year-old children were especially prone to credit knowledge to speakers who made verifiable claims, whereas 7- to 8-year-olds and adults credited knowledge to generic speakers regardless of whether the claims were verifiable; (2) children's attributions of knowledge to generic speakers was not detectable until age 5, and only when those claims were also verifiable; (3) children often generalized speakers' knowledge outside of the pangolin domain, indicating a belief that a person's knowledge about pangolins likely extends to new facts. Findings indicate that young children may be inclined to doubt speakers who make claims they cannot verify themselves, as well as a developmentally increasing appreciation for speakers who make general claims. Copyright © 2015 Elsevier Inc. All rights reserved.
Why We Serve - U.S. Department of Defense Official Website

Science.gov Websites

described by a soldier, sailor, airman or Marine who lives it. Story HOW TO HOST A SPEAKER Organizations other organizations. Speakers Photos MEET THE SPEAKERS January 2008 Army Major Lisa L. Carter Navy
Formant transitions in the fluent speech of Farsi-speaking people who stutter.

PubMed

Dehqan, Ali; Yadegari, Fariba; Blomgren, Michael; Scherer, Ronald C

2016-06-01

Second formant (F2) transitions can be used to infer attributes of articulatory transitions. This study compared formant transitions during fluent speech segments of Farsi (Persian) speaking people who stutter and normally fluent Farsi speakers. Ten Iranian males who stutter and 10 normally fluent Iranian males participated. Sixteen different "CVt" tokens were embedded within the phrase "Begu CVt an". Measures included overall F2 transition frequency extents, durations, and derived overall slopes, initial F2 transition slopes at 30ms and 60ms, and speaking rate. (1) Mean overall formant frequency extent was significantly greater in 14 of the 16 CVt tokens for the group of stuttering speakers. (2) Stuttering speakers exhibited significantly longer overall F2 transitions for all 16 tokens compared to the nonstuttering speakers. (3) The overall F2 slopes were similar between the two groups. (4) The stuttering speakers exhibited significantly greater initial F2 transition slopes (positive or negative) for five of the 16 tokens at 30ms and six of the 16 tokens at 60ms. (5) The stuttering group produced a slower syllable rate than the non-stuttering group. During perceptually fluent utterances, the stuttering speakers had greater F2 frequency extents during transitions, took longer to reach vowel steady state, exhibited some evidence of steeper slopes at the beginning of transitions, had overall similar F2 formant slopes, and had slower speaking rates compared to nonstuttering speakers. Findings support the notion of different speech motor timing strategies in stuttering speakers. Findings are likely to be independent of the language spoken. Educational objectives This study compares aspects of F2 formant transitions between 10 stuttering and 10 nonstuttering speakers. Readers will be able to describe: (a) characteristics of formant frequency as a specific acoustic feature used to infer speech movements in stuttering and nonstuttering speakers, (b) two methods of measuring second formant (F2) transitions: the visual criteria method and fixed time criteria method, (c) characteristics of F2 transitions in the fluent speech of stuttering speakers and how those characteristics appear to differ from normally fluent speakers, and (d) possible cross-linguistic effects on acoustic analyses of stuttering. Copyright © 2016 Elsevier Inc. All rights reserved.
Referential first mention in narratives by mildly mentally retarded adults.

PubMed

Kernan, K T; Sabsay, S

1987-01-01

Referential first mentions in narrative reports of a short film by 40 mildly mentally retarded adults and 20 nonretarded adults were compared. The mentally retarded sample included equal numbers of male and female, and black and white speakers. The mentally retarded speakers made significantly fewer first mentions and significantly more errors in the form of the first mentions than did nonretarded speakers. A pattern of better performance by black males than by other mentally retarded speakers was found. It is suggested that task difficulty and incomplete mastery of the use of definite and indefinite forms for encoding old and new information, rather than some global type of egocentrism, accounted for the poorer performance by mentally retarded speakers.
Somatotype and Body Composition of Normal and Dysphonic Adult Speakers.

PubMed

Franco, Débora; Fragoso, Isabel; Andrea, Mário; Teles, Júlia; Martins, Fernando

2017-01-01

Voice quality provides information about the anatomical characteristics of the speaker. The patterns of somatotype and body composition can provide essential knowledge to characterize the individuality of voice quality. The aim of this study was to verify if there were significant differences in somatotype and body composition between normal and dysphonic speakers. Cross-sectional study. Anthropometric measurements were taken of a sample of 72 adult participants (40 normal speakers and 32 dysphonic speakers) according to International Society for the Advancement of Kinanthropometry standards, which allowed the calculation of endomorphism, mesomorphism, ectomorphism components, body density, body mass index, fat mass, percentage fat, and fat-free mass. Perception and acoustic evaluations as well as nasoendoscopy were used to assign speakers into normal or dysphonic groups. There were no significant differences between normal and dysphonic speakers in the mean somatotype attitudinal distance and somatotype dispersion distance (in spite of marginally significant differences [P < 0.10] in somatotype attitudinal distance and somatotype dispersion distance between groups) and in the mean vector of the somatotype components. Furthermore, no significant differences were found between groups concerning the mean of percentage fat, fat mass, fat-free mass, body density, and body mass index after controlling by sex. The findings suggested no significant differences in the somatotype and body composition variables, between normal and dysphonic speakers. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Strength of German accent under altered auditory feedback

PubMed Central

HOWELL, PETER; DWORZYNSKI, KATHARINA

2007-01-01

Borden’s (1979, 1980) hypothesis that speakers with vulnerable speech systems rely more heavily on feedback monitoring than do speakers with less vulnerable systems was investigated. The second language (L2) of a speaker is vulnerable, in comparison with the native language, so alteration to feedback should have a detrimental effect on it, according to this hypothesis. Here, we specifically examined whether altered auditory feedback has an effect on accent strength when speakers speak L2. There were three stages in the experiment. First, 6 German speakers who were fluent in English (their L2) were recorded under six conditions—normal listening, amplified voice level, voice shifted in frequency, delayed auditory feedback, and slowed and accelerated speech rate conditions. Second, judges were trained to rate accent strength. Training was assessed by whether it was successful in separating German speakers speaking English from native English speakers, also speaking English. In the final stage, the judges ranked recordings of each speaker from the first stage as to increasing strength of German accent. The results show that accents were more pronounced under frequency-shifted and delayed auditory feedback conditions than under normal or amplified feedback conditions. Control tests were done to ensure that listeners were judging accent, rather than fluency changes caused by altered auditory feedback. The findings are discussed in terms of Borden’s hypothesis and other accounts about why altered auditory feedback disrupts speech control. PMID:11414137
Political skill: explaining the effects of nonnative accent on managerial hiring and entrepreneurial investment decisions.

PubMed

Huang, Laura; Frideger, Marcia; Pearce, Jone L

2013-11-01

We propose and test a new theory explaining glass-ceiling bias against nonnative speakers as driven by perceptions that nonnative speakers have weak political skill. Although nonnative accent is a complex signal, its effects on assessments of the speakers' political skill are something that speakers can actively mitigate; this makes it an important bias to understand. In Study 1, White and Asian nonnative speakers using the same scripted responses as native speakers were found to be significantly less likely to be recommended for a middle-management position, and this bias was fully mediated by assessments of their political skill. The alternative explanations of race, communication skill, and collaborative skill were nonsignificant. In Study 2, entrepreneurial start-up pitches from national high-technology, new-venture funding competitions were shown to experienced executive MBA students. Nonnative speakers were found to have a significantly lower likelihood of receiving new-venture funding, and this was fully mediated by the coders' assessments of their political skill. The entrepreneurs' race, communication skill, and collaborative skill had no effect. We discuss the value of empirically testing various posited reasons for glass-ceiling biases, how the importance and ambiguity of political skill for executive success serve as an ostensibly meritocratic cover for nonnative speaker bias, and other theoretical and practical implications of this work. (c) 2013 APA, all rights reserved.
The Phonetics and Phonology of the Polish Calling Melodies.

PubMed

Arvaniti, Amalia; Żygis, Marzena; Jaskuła, Marek

2016-01-01

Two calling melodies of Polish were investigated, the routine call, used to call someone for an everyday reason, and the urgent call, which conveys disapproval of the addressee's actions. A Discourse Completion Task was used to elicit the two melodies from Polish speakers using twelve names from one to four syllables long; there were three names per syllable count, and speakers produced three tokens of each name with each melody. The results, based on eleven speakers, show that the routine calling melody consists of a low F0 stretch followed by a rise-fall-rise; the urgent calling melody, on the other hand, is a simple rise-fall. Systematic differences were found in the scaling and alignment of tonal targets: the routine call showed late alignment of the accentual pitch peak, and in most instances lower scaling of targets. The accented vowel was also affected, being overall louder in the urgent call. Based on the data and comparisons with other Polish melodies, we analyze the routine call as LH* !H-H% and the urgent call as H* L-L%. We discuss the results and our analysis in light of recent findings on calling melodies in other languages, and explore their repercussions for intonational phonology and the modeling of intonation. © 2017 S. Karger AG, Basel.
Evaluating language environment analysis system performance for Chinese: a pilot study in Shanghai.

PubMed

Gilkerson, Jill; Zhang, Yiwen; Xu, Dongxin; Richards, Jeffrey A; Xu, Xiaojuan; Jiang, Fan; Harnsberger, James; Topping, Keith

2015-04-01

The purpose of this study was to evaluate performance of the Language Environment Analysis (LENA) automated language-analysis system for the Chinese Shanghai dialect and Mandarin (SDM) languages. Volunteer parents of 22 children aged 3-23 months were recruited in Shanghai. Families provided daylong in-home audio recordings using LENA. A native speaker listened to 15 min of randomly selected audio samples per family to label speaker regions and provide Chinese character and SDM word counts for adult speakers. LENA segment labeling and counts were compared with rater-based values. LENA demonstrated good sensitivity in identifying adult and child; this sensitivity was comparable to that of American English validation samples. Precision was strong for adults but less so for children. LENA adult word count correlated strongly with both Chinese characters and SDM word counts. LENA conversational turn counts correlated similarly with rater-based counts after the exclusion of three unusual samples. Performance related to some degree to child age. LENA adult word count and conversational turn provided reasonably accurate estimates for SDM over the age range tested. Theoretical and practical considerations regarding LENA performance in non-English languages are discussed. Despite the pilot nature and other limitations of the study, results are promising for broader cross-linguistic applications.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.