Advancements in robust algorithm formulation for speaker identification of whispered speech
NASA Astrophysics Data System (ADS)
Fan, Xing
Whispered speech is an alternative speech production mode from neutral speech, which is used by talkers intentionally in natural conversational scenarios to protect privacy and to avoid certain content from being overheard/made public. Due to the profound differences between whispered and neutral speech in production mechanism and the absence of whispered adaptation data, the performance of speaker identification systems trained with neutral speech degrades significantly. This dissertation therefore focuses on developing a robust closed-set speaker recognition system for whispered speech by using no or limited whispered adaptation data from non-target speakers. This dissertation proposes the concept of "High''/"Low'' performance whispered data for the purpose of speaker identification. A variety of acoustic properties are identified that contribute to the quality of whispered data. An acoustic analysis is also conducted to compare the phoneme/speaker dependency of the differences between whispered and neutral data in the feature domain. The observations from those acoustic analysis are new in this area and also serve as a guidance for developing robust speaker identification systems for whispered speech. This dissertation further proposes two systems for speaker identification of whispered speech. One system focuses on front-end processing. A two-dimensional feature space is proposed to search for "Low''-quality performance based whispered utterances and separate feature mapping functions are applied to vowels and consonants respectively in order to retain the speaker's information shared between whispered and neutral speech. The other system focuses on speech-mode-independent model training. The proposed method generates pseudo whispered features from neutral features by using the statistical information contained in a whispered Universal Background model (UBM) trained from extra collected whispered data from non-target speakers. Four modeling methods are proposed for the transformation estimation in order to generate the pseudo whispered features. Both of the above two systems demonstrate a significant improvement over the baseline system on the evaluation data. This dissertation has therefore contributed to providing a scientific understanding of the differences between whispered and neutral speech as well as improved front-end processing and modeling method for speaker identification of whispered speech. Such advancements will ultimately contribute to improve the robustness of speech processing systems.
NASA Astrophysics Data System (ADS)
Tovarek, Jaromir; Partila, Pavol
2017-05-01
This article discusses the speaker identification for the improvement of the security communication between law enforcement units. The main task of this research was to develop the text-independent speaker identification system which can be used for real-time recognition. This system is designed for identification in the open set. It means that the unknown speaker can be anyone. Communication itself is secured, but we have to check the authorization of the communication parties. We have to decide if the unknown speaker is the authorized for the given action. The calls are recorded by IP telephony server and then these recordings are evaluate using classification If the system evaluates that the speaker is not authorized, it sends a warning message to the administrator. This message can detect, for example a stolen phone or other unusual situation. The administrator then performs the appropriate actions. Our novel proposal system uses multilayer neural network for classification and it consists of three layers (input layer, hidden layer, and output layer). A number of neurons in input layer corresponds with the length of speech features. Output layer then represents classified speakers. Artificial Neural Network classifies speech signal frame by frame, but the final decision is done over the complete record. This rule substantially increases accuracy of the classification. Input data for the neural network are a thirteen Mel-frequency cepstral coefficients, which describe the behavior of the vocal tract. These parameters are the most used for speaker recognition. Parameters for training, testing and validation were extracted from recordings of authorized users. Recording conditions for training data correspond with the real traffic of the system (sampling frequency, bit rate). The main benefit of the research is the system developed for text-independent speaker identification which is applied to secure communication between law enforcement units.
Unsupervised real-time speaker identification for daily movies
NASA Astrophysics Data System (ADS)
Li, Ying; Kuo, C.-C. Jay
2002-07-01
The problem of identifying speakers for movie content analysis is addressed in this paper. While most previous work on speaker identification was carried out in a supervised mode using pure audio data, more robust results can be obtained in real-time by integrating knowledge from multiple media sources in an unsupervised mode. In this work, both audio and visual cues will be employed and subsequently combined in a probabilistic framework to identify speakers. Particularly, audio information is used to identify speakers with a maximum likelihood (ML)-based approach while visual information is adopted to distinguish speakers by detecting and recognizing their talking faces based on face detection/recognition and mouth tracking techniques. Moreover, to accommodate for speakers' acoustic variations along time, we update their models on the fly by adapting to their newly contributed speech data. Encouraging results have been achieved through extensive experiments, which shows a promising future of the proposed audiovisual-based unsupervised speaker identification system.
NASA Astrophysics Data System (ADS)
Kamiński, K.; Dobrowolski, A. P.
2017-04-01
The paper presents the architecture and the results of optimization of selected elements of the Automatic Speaker Recognition (ASR) system that uses Gaussian Mixture Models (GMM) in the classification process. Optimization was performed on the process of selection of individual characteristics using the genetic algorithm and the parameters of Gaussian distributions used to describe individual voices. The system that was developed was tested in order to evaluate the impact of different compression methods used, among others, in landline, mobile, and VoIP telephony systems, on effectiveness of the speaker identification. Also, the results were presented of effectiveness of speaker identification at specific levels of noise with the speech signal and occurrence of other disturbances that could appear during phone calls, which made it possible to specify the spectrum of applications of the presented ASR system.
Analysis of human scream and its impact on text-independent speaker verification.
Hansen, John H L; Nandwana, Mahesh Kumar; Shokouhi, Navid
2017-04-01
Scream is defined as sustained, high-energy vocalizations that lack phonological structure. Lack of phonological structure is how scream is identified from other forms of loud vocalization, such as "yell." This study investigates the acoustic aspects of screams and addresses those that are known to prevent standard speaker identification systems from recognizing the identity of screaming speakers. It is well established that speaker variability due to changes in vocal effort and Lombard effect contribute to degraded performance in automatic speech systems (i.e., speech recognition, speaker identification, diarization, etc.). However, previous research in the general area of speaker variability has concentrated on human speech production, whereas less is known about non-speech vocalizations. The UT-NonSpeech corpus is developed here to investigate speaker verification from scream samples. This study considers a detailed analysis in terms of fundamental frequency, spectral peak shift, frame energy distribution, and spectral tilt. It is shown that traditional speaker recognition based on the Gaussian mixture models-universal background model framework is unreliable when evaluated with screams.
Performance enhancement for audio-visual speaker identification using dynamic facial muscle model.
Asadpour, Vahid; Towhidkhah, Farzad; Homayounpour, Mohammad Mehdi
2006-10-01
Science of human identification using physiological characteristics or biometry has been of great concern in security systems. However, robust multimodal identification systems based on audio-visual information has not been thoroughly investigated yet. Therefore, the aim of this work to propose a model-based feature extraction method which employs physiological characteristics of facial muscles producing lip movements. This approach adopts the intrinsic properties of muscles such as viscosity, elasticity, and mass which are extracted from the dynamic lip model. These parameters are exclusively dependent on the neuro-muscular properties of speaker; consequently, imitation of valid speakers could be reduced to a large extent. These parameters are applied to a hidden Markov model (HMM) audio-visual identification system. In this work, a combination of audio and video features has been employed by adopting a multistream pseudo-synchronized HMM training method. Noise robust audio features such as Mel-frequency cepstral coefficients (MFCC), spectral subtraction (SS), and relative spectra perceptual linear prediction (J-RASTA-PLP) have been used to evaluate the performance of the multimodal system once efficient audio feature extraction methods have been utilized. The superior performance of the proposed system is demonstrated on a large multispeaker database of continuously spoken digits, along with a sentence that is phonetically rich. To evaluate the robustness of algorithms, some experiments were performed on genetically identical twins. Furthermore, changes in speaker voice were simulated with drug inhalation tests. In 3 dB signal to noise ratio (SNR), the dynamic muscle model improved the identification rate of the audio-visual system from 91 to 98%. Results on identical twins revealed that there was an apparent improvement on the performance for the dynamic muscle model-based system, in which the identification rate of the audio-visual system was enhanced from 87 to 96%.
Optimization of multilayer neural network parameters for speaker recognition
NASA Astrophysics Data System (ADS)
Tovarek, Jaromir; Partila, Pavol; Rozhon, Jan; Voznak, Miroslav; Skapa, Jan; Uhrin, Dominik; Chmelikova, Zdenka
2016-05-01
This article discusses the impact of multilayer neural network parameters for speaker identification. The main task of speaker identification is to find a specific person in the known set of speakers. It means that the voice of an unknown speaker (wanted person) belongs to a group of reference speakers from the voice database. One of the requests was to develop the text-independent system, which means to classify wanted person regardless of content and language. Multilayer neural network has been used for speaker identification in this research. Artificial neural network (ANN) needs to set parameters like activation function of neurons, steepness of activation functions, learning rate, the maximum number of iterations and a number of neurons in the hidden and output layers. ANN accuracy and validation time are directly influenced by the parameter settings. Different roles require different settings. Identification accuracy and ANN validation time were evaluated with the same input data but different parameter settings. The goal was to find parameters for the neural network with the highest precision and shortest validation time. Input data of neural networks are a Mel-frequency cepstral coefficients (MFCC). These parameters describe the properties of the vocal tract. Audio samples were recorded for all speakers in a laboratory environment. Training, testing and validation data set were split into 70, 15 and 15 %. The result of the research described in this article is different parameter setting for the multilayer neural network for four speakers.
Using Avatars for Improving Speaker Identification in Captioning
NASA Astrophysics Data System (ADS)
Vy, Quoc V.; Fels, Deborah I.
Captioning is the main method for accessing television and film content by people who are deaf or hard-of-hearing. One major difficulty consistently identified by the community is that of knowing who is speaking particularly for an off screen narrator. A captioning system was created using a participatory design method to improve speaker identification. The final prototype contained avatars and a coloured border for identifying specific speakers. Evaluation results were very positive; however participants also wanted to customize various components such as caption and avatar location.
Discriminative analysis of lip motion features for speaker identification and speech-reading.
Cetingül, H Ertan; Yemez, Yücel; Erzin, Engin; Tekalp, A Murat
2006-10-01
There have been several studies that jointly use audio, lip intensity, and lip geometry information for speaker identification and speech-reading applications. This paper proposes using explicit lip motion information, instead of or in addition to lip intensity and/or geometry information, for speaker identification and speech-reading within a unified feature selection and discrimination analysis framework, and addresses two important issues: 1) Is using explicit lip motion information useful, and, 2) if so, what are the best lip motion features for these two applications? The best lip motion features for speaker identification are considered to be those that result in the highest discrimination of individual speakers in a population, whereas for speech-reading, the best features are those providing the highest phoneme/word/phrase recognition rate. Several lip motion feature candidates have been considered including dense motion features within a bounding box about the lip, lip contour motion features, and combination of these with lip shape features. Furthermore, a novel two-stage, spatial, and temporal discrimination analysis is introduced to select the best lip motion features for speaker identification and speech-reading applications. Experimental results using an hidden-Markov-model-based recognition system indicate that using explicit lip motion information provides additional performance gains in both applications, and lip motion features prove more valuable in the case of speech-reading application.
NASA Astrophysics Data System (ADS)
S. Al-Kaltakchi, Musab T.; Woo, Wai L.; Dlay, Satnam; Chambers, Jonathon A.
2017-12-01
In this study, a speaker identification system is considered consisting of a feature extraction stage which utilizes both power normalized cepstral coefficients (PNCCs) and Mel frequency cepstral coefficients (MFCC). Normalization is applied by employing cepstral mean and variance normalization (CMVN) and feature warping (FW), together with acoustic modeling using a Gaussian mixture model-universal background model (GMM-UBM). The main contributions are comprehensive evaluations of the effect of both additive white Gaussian noise (AWGN) and non-stationary noise (NSN) (with and without a G.712 type handset) upon identification performance. In particular, three NSN types with varying signal to noise ratios (SNRs) were tested corresponding to street traffic, a bus interior, and a crowded talking environment. The performance evaluation also considered the effect of late fusion techniques based on score fusion, namely, mean, maximum, and linear weighted sum fusion. The databases employed were TIMIT, SITW, and NIST 2008; and 120 speakers were selected from each database to yield 3600 speech utterances. As recommendations from the study, mean fusion is found to yield overall best performance in terms of speaker identification accuracy (SIA) with noisy speech, whereas linear weighted sum fusion is overall best for original database recordings.
Identification and tracking of particular speaker in noisy environment
NASA Astrophysics Data System (ADS)
Sawada, Hideyuki; Ohkado, Minoru
2004-10-01
Human is able to exchange information smoothly using voice under different situations such as noisy environment in a crowd and with the existence of plural speakers. We are able to detect the position of a source sound in 3D space, extract a particular sound from mixed sounds, and recognize who is talking. By realizing this mechanism with a computer, new applications will be presented for recording a sound with high quality by reducing noise, presenting a clarified sound, and realizing a microphone-free speech recognition by extracting particular sound. The paper will introduce a realtime detection and identification of particular speaker in noisy environment using a microphone array based on the location of a speaker and the individual voice characteristics. The study will be applied to develop an adaptive auditory system of a mobile robot which collaborates with a factory worker.
Cost-sensitive learning for emotion robust speaker recognition.
Li, Dongdong; Yang, Yingchun; Dai, Weihui
2014-01-01
In the field of information security, voice is one of the most important parts in biometrics. Especially, with the development of voice communication through the Internet or telephone system, huge voice data resources are accessed. In speaker recognition, voiceprint can be applied as the unique password for the user to prove his/her identity. However, speech with various emotions can cause an unacceptably high error rate and aggravate the performance of speaker recognition system. This paper deals with this problem by introducing a cost-sensitive learning technology to reweight the probability of test affective utterances in the pitch envelop level, which can enhance the robustness in emotion-dependent speaker recognition effectively. Based on that technology, a new architecture of recognition system as well as its components is proposed in this paper. The experiment conducted on the Mandarin Affective Speech Corpus shows that an improvement of 8% identification rate over the traditional speaker recognition is achieved.
Cost-Sensitive Learning for Emotion Robust Speaker Recognition
Li, Dongdong; Yang, Yingchun
2014-01-01
In the field of information security, voice is one of the most important parts in biometrics. Especially, with the development of voice communication through the Internet or telephone system, huge voice data resources are accessed. In speaker recognition, voiceprint can be applied as the unique password for the user to prove his/her identity. However, speech with various emotions can cause an unacceptably high error rate and aggravate the performance of speaker recognition system. This paper deals with this problem by introducing a cost-sensitive learning technology to reweight the probability of test affective utterances in the pitch envelop level, which can enhance the robustness in emotion-dependent speaker recognition effectively. Based on that technology, a new architecture of recognition system as well as its components is proposed in this paper. The experiment conducted on the Mandarin Affective Speech Corpus shows that an improvement of 8% identification rate over the traditional speaker recognition is achieved. PMID:24999492
Recognition of speaker-dependent continuous speech with KEAL
NASA Astrophysics Data System (ADS)
Mercier, G.; Bigorgne, D.; Miclet, L.; Le Guennec, L.; Querre, M.
1989-04-01
A description of the speaker-dependent continuous speech recognition system KEAL is given. An unknown utterance, is recognized by means of the followng procedures: acoustic analysis, phonetic segmentation and identification, word and sentence analysis. The combination of feature-based, speaker-independent coarse phonetic segmentation with speaker-dependent statistical classification techniques is one of the main design features of the acoustic-phonetic decoder. The lexical access component is essentially based on a statistical dynamic programming technique which aims at matching a phonemic lexical entry containing various phonological forms, against a phonetic lattice. Sentence recognition is achieved by use of a context-free grammar and a parsing algorithm derived from Earley's parser. A speaker adaptation module allows some of the system parameters to be adjusted by matching known utterances with their acoustical representation. The task to be performed, described by its vocabulary and its grammar, is given as a parameter of the system. Continuously spoken sentences extracted from a 'pseudo-Logo' language are analyzed and results are presented.
Open-set speaker identification with diverse-duration speech data
NASA Astrophysics Data System (ADS)
Karadaghi, Rawande; Hertlein, Heinz; Ariyaeeinia, Aladdin
2015-05-01
The concern in this paper is an important category of applications of open-set speaker identification in criminal investigation, which involves operating with short and varied duration speech. The study presents investigations into the adverse effects of such an operating condition on the accuracy of open-set speaker identification, based on both GMMUBM and i-vector approaches. The experiments are conducted using a protocol developed for the identification task, based on the NIST speaker recognition evaluation corpus of 2008. In order to closely cover the real-world operating conditions in the considered application area, the study includes experiments with various combinations of training and testing data duration. The paper details the characteristics of the experimental investigations conducted and provides a thorough analysis of the results obtained.
Evaluation of speaker de-identification based on voice gender and age conversion
NASA Astrophysics Data System (ADS)
Přibil, Jiří; Přibilová, Anna; Matoušek, Jindřich
2018-03-01
Two basic tasks are covered in this paper. The first one consists in the design and practical testing of a new method for voice de-identification that changes the apparent age and/or gender of a speaker by multi-segmental frequency scale transformation combined with prosody modification. The second task is aimed at verification of applicability of a classifier based on Gaussian mixture models (GMM) to detect the original Czech and Slovak speakers after applied voice deidentification. The performed experiments confirm functionality of the developed gender and age conversion for all selected types of de-identification which can be objectively evaluated by the GMM-based open-set classifier. The original speaker detection accuracy was compared also for sentences uttered by German and English speakers showing language independence of the proposed method.
INTERPOL survey of the use of speaker identification by law enforcement agencies.
Morrison, Geoffrey Stewart; Sahito, Farhan Hyder; Jardine, Gaëlle; Djokic, Djordje; Clavet, Sophie; Berghs, Sabine; Goemans Dorny, Caroline
2016-06-01
A survey was conducted of the use of speaker identification by law enforcement agencies around the world. A questionnaire was circulated to law enforcement agencies in the 190 member countries of INTERPOL. 91 responses were received from 69 countries. 44 respondents reported that they had speaker identification capabilities in house or via external laboratories. Half of these came from Europe. 28 respondents reported that they had databases of audio recordings of speakers. The clearest pattern in the responses was that of diversity. A variety of different approaches to speaker identification were used: The human-supervised-automatic approach was the most popular in North America, the auditory-acoustic-phonetic approach was the most popular in Europe, and the spectrographic/auditory-spectrographic approach was the most popular in Africa, Asia, the Middle East, and South and Central America. Globally, and in Europe, the most popular framework for reporting conclusions was identification/exclusion/inconclusive. In Europe, the second most popular framework was the use of verbal likelihood ratio scales. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Johansson, Kerstin; Strömbergsson, Sofia; Robieux, Camille; McAllister, Anita
2017-01-01
Reduced respiratory function following lower cervical spinal cord injuries (CSCIs) may indirectly result in vocal dysfunction. Although self-reports indicate voice change and limitations following CSCI, earlier efforts using global perceptual ratings to distinguish speakers with CSCI from noninjured speakers have not been very successful. We investigate the use of an audience response system-based approach to distinguish speakers with CSCI from noninjured speakers, and explore whether specific vocal traits can be identified as characteristic for speakers with CSCI. Fourteen speech-language pathologists participated in a web-based perceptual task, where their overt reactions to vocal dysfunction were registered during the continuous playback of recordings of 36 speakers (18 with CSCI, and 18 matched controls). Dysphonic events were identified through manual perceptual analysis, to allow the exploration of connections between dysphonic events and listener reactions. More dysphonic events, and more listener reactions, were registered for speakers with CSCI than for noninjured speakers. Strain (particularly in phrase-final position) and creak (particularly in nonphrase-final position) distinguish speakers with CSCI from noninjured speakers. For the identification of intermittent and subtle signs of vocal dysfunction, an approach where the temporal distribution of symptoms is registered offers a viable means to distinguish speakers affected by voice dysfunction from non-affected speakers. In speakers with CSCI, clinicians should listen for presence of final strain and nonfinal creak, and pay attention to self-reported voice function and voice problems, to identify individuals in need for clinical assessment and intervention. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Speaker gender identification based on majority vote classifiers
NASA Astrophysics Data System (ADS)
Mezghani, Eya; Charfeddine, Maha; Nicolas, Henri; Ben Amar, Chokri
2017-03-01
Speaker gender identification is considered among the most important tools in several multimedia applications namely in automatic speech recognition, interactive voice response systems and audio browsing systems. Gender identification systems performance is closely linked to the selected feature set and the employed classification model. Typical techniques are based on selecting the best performing classification method or searching optimum tuning of one classifier parameters through experimentation. In this paper, we consider a relevant and rich set of features involving pitch, MFCCs as well as other temporal and frequency-domain descriptors. Five classification models including decision tree, discriminant analysis, nave Bayes, support vector machine and k-nearest neighbor was experimented. The three best perming classifiers among the five ones will contribute by majority voting between their scores. Experimentations were performed on three different datasets spoken in three languages: English, German and Arabic in order to validate language independency of the proposed scheme. Results confirm that the presented system has reached a satisfying accuracy rate and promising classification performance thanks to the discriminating abilities and diversity of the used features combined with mid-level statistics.
An automatic speech recognition system with speaker-independent identification support
NASA Astrophysics Data System (ADS)
Caranica, Alexandru; Burileanu, Corneliu
2015-02-01
The novelty of this work relies on the application of an open source research software toolkit (CMU Sphinx) to train, build and evaluate a speech recognition system, with speaker-independent support, for voice-controlled hardware applications. Moreover, we propose to use the trained acoustic model to successfully decode offline voice commands on embedded hardware, such as an ARMv6 low-cost SoC, Raspberry PI. This type of single-board computer, mainly used for educational and research activities, can serve as a proof-of-concept software and hardware stack for low cost voice automation systems.
Noise Reduction with Microphone Arrays for Speaker Identification
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cohen, Z
Reducing acoustic noise in audio recordings is an ongoing problem that plagues many applications. This noise is hard to reduce because of interfering sources and non-stationary behavior of the overall background noise. Many single channel noise reduction algorithms exist but are limited in that the more the noise is reduced; the more the signal of interest is distorted due to the fact that the signal and noise overlap in frequency. Specifically acoustic background noise causes problems in the area of speaker identification. Recording a speaker in the presence of acoustic noise ultimately limits the performance and confidence of speaker identificationmore » algorithms. In situations where it is impossible to control the environment where the speech sample is taken, noise reduction filtering algorithms need to be developed to clean the recorded speech of background noise. Because single channel noise reduction algorithms would distort the speech signal, the overall challenge of this project was to see if spatial information provided by microphone arrays could be exploited to aid in speaker identification. The goals are: (1) Test the feasibility of using microphone arrays to reduce background noise in speech recordings; (2) Characterize and compare different multichannel noise reduction algorithms; (3) Provide recommendations for using these multichannel algorithms; and (4) Ultimately answer the question - Can the use of microphone arrays aid in speaker identification?« less
The perception of FM sweeps by Chinese and English listeners.
Luo, Huan; Boemio, Anthony; Gordon, Michael; Poeppel, David
2007-02-01
Frequency-modulated (FM) signals are an integral acoustic component of ecologically natural sounds and are analyzed effectively in the auditory systems of humans and animals. Linearly frequency-modulated tone sweeps were used here to evaluate two questions. First, how rapid a sweep can listeners accurately perceive? Second, is there an effect of native language insofar as the language (phonology) is differentially associated with processing of FM signals? Speakers of English and Mandarin Chinese were tested to evaluate whether being a speaker of a tone language altered the perceptual identification of non-speech tone sweeps. In two psychophysical studies, we demonstrate that Chinese subjects perform better than English subjects in FM direction identification, but not in an FM discrimination task, in which English and Chinese speakers show similar detection thresholds of approximately 20 ms duration. We suggest that the better FM direction identification in Chinese subjects is related to their experience with FM direction analysis in the tone-language environment, even though supra-segmental tonal variation occurs over a longer time scale. Furthermore, the observed common discrimination temporal threshold across two language groups supports the conjecture that processing auditory signals at durations of approximately 20 ms constitutes a fundamental auditory perceptual threshold.
Ma, Joan K-Y; Whitehill, Tara L; So, Susanne Y-S
2010-08-01
Speech produced by individuals with hypokinetic dysarthria associated with Parkinson's disease (PD) is characterized by a number of features including impaired speech prosody. The purpose of this study was to investigate intonation contrasts produced by this group of speakers. Speech materials with a question-statement contrast were collected from 14 Cantonese speakers with PD. Twenty listeners then classified the productions as either questions or statements. Acoustic analyses of F0, duration, and intensity were conducted to determine which acoustic cues distinguished the production of questions from statements, and which cues appeared to be exploited by listeners in identifying intonational contrasts. The results show that listeners identified statements with a high degree of accuracy, but the accuracy of question identification ranged from 0.56% to 96% across the 14 speakers. The speakers with PD used similar acoustic cues as nondysarthric Cantonese speakers to mark the question-statement contrast, although the contrasts were not observed in all speakers. Listeners mainly used F0 cues at the final syllable for intonation identification. These data contribute to the researchers' understanding of intonation marking in speakers with PD, with specific application to the production and perception of intonation in a lexical tone language.
Sensing of Particular Speakers for the Construction of Voice Interface Utilized in Noisy Environment
NASA Astrophysics Data System (ADS)
Sawada, Hideyuki; Ohkado, Minoru
Human is able to exchange information smoothly using voice under different situations such as noisy environment in a crowd and with the existence of plural speakers. We are able to detect the position of a source sound in 3D space, extract a particular sound from mixed sounds, and recognize who is talking. By realizing this mechanism with a computer, new applications will be presented for recording a sound with high quality by reducing noise, presenting a clarified sound, and realizing a microphone-free speech recognition by extracting particular sound. The paper will introduce a realtime detection and identification of particular speaker in noisy environment using a microphone array based on the location of a speaker and the individual voice characteristics. The study will be applied to develop an adaptive auditory system of a mobile robot which collaborates with a factory worker.
ERIC Educational Resources Information Center
Yook, Cheongmin; Lindemann, Stephanie
2013-01-01
This study investigates how the attitudes of 60 Korean university students towards five varieties of English are affected by the identification of the speaker's nationality and ethnicity. The study employed both a verbal guise technique and questions eliciting overt beliefs and preferences related to learning English. While the majority of the…
Jung, Jeeyoun; Park, Bongki; Lee, Ju Ah; You, Sooseong; Alraek, Terje; Bian, Zhao-Xiang; Birch, Stephen; Kim, Tae-Hun; Xu, Hao; Zaslawski, Chris; Kang, Byoung-Kab; Lee, Myeong Soo
2016-09-01
An international brainstorming session on standardizing pattern identification (PI) was held at the Korea Institute of Oriental Medicine on October 1, 2013 in Daejeon, South Korea. This brainstorming session was convened to gather insights from international traditional East Asian medicine specialists regarding PI standardization. With eight presentations and discussion sessions, the meeting allowed participants to discuss research methods and diagnostic systems used in traditional medicine for PI. One speaker presented a talk titled "The diagnostic criteria for blood stasis syndrome: implications for standardization of PI". Four speakers presented on future strategies and objective measurement tools that could be used in PI research. Later, participants shared information and methodology for accurate diagnosis and PI. They also discussed the necessity for standardizing PI and methods for international collaborations in pattern research.
Landwehr, Markus; Fürstenberg, Dirk; Walger, Martin; von Wedel, Hasso; Meister, Hartmut
2014-01-01
Advances in speech coding strategies and electrode array designs for cochlear implants (CIs) predominantly aim at improving speech perception. Current efforts are also directed at transmitting appropriate cues of the fundamental frequency (F0) to the auditory nerve with respect to speech quality, prosody, and music perception. The aim of this study was to examine the effects of various electrode configurations and coding strategies on speech intonation identification, speaker gender identification, and music quality rating. In six MED-EL CI users electrodes were selectively deactivated in order to simulate different insertion depths and inter-electrode distances when using the high definition continuous interleaved sampling (HDCIS) and fine structure processing (FSP) speech coding strategies. Identification of intonation and speaker gender was determined and music quality rating was assessed. For intonation identification HDCIS was robust against the different electrode configurations, whereas fine structure processing showed significantly worse results when a short electrode depth was simulated. In contrast, speaker gender recognition was not affected by electrode configuration or speech coding strategy. Music quality rating was sensitive to electrode configuration. In conclusion, the three experiments revealed different outcomes, even though they all addressed the reception of F0 cues. Rapid changes in F0, as seen with intonation, were the most sensitive to electrode configurations and coding strategies. In contrast, electrode configurations and coding strategies did not show large effects when F0 information was available over a longer time period, as seen with speaker gender. Music quality relies on additional spectral cues other than F0, and was poorest when a shallow insertion was simulated.
Shibboleth: An Automated Foreign Accent Identification Program
ERIC Educational Resources Information Center
Frost, Wende
2013-01-01
The speech of non-native (L2) speakers of a language contains phonological rules that differentiate them from native speakers. These phonological rules characterize or distinguish accents in an L2. The Shibboleth program creates combinatorial rule-sets to describe the phonological pattern of these accents and classifies L2 speakers into their…
Single-Word Intelligibility in Speakers with Repaired Cleft Palate
ERIC Educational Resources Information Center
Whitehill, Tara; Chau, Cynthia
2004-01-01
Many speakers with repaired cleft palate have reduced intelligibility, but there are limitations with current procedures for assessing intelligibility. The aim of this study was to construct a single-word intelligibility test for speakers with cleft palate. The test used a multiple-choice identification format, and was based on phonetic contrasts…
ERIC Educational Resources Information Center
Bryden, James D.
The purpose of this study was to specify variables which function significantly in the racial identification and speech quality rating of Negro and white speakers by Negro and white listeners. Ninety-one adults served as subjects for the speech task; 86 of these subjects, 43 Negro and 43 white, provided the listener responses. Subjects were chosen…
Performance of wavelet analysis and neural networks for pathological voices identification
NASA Astrophysics Data System (ADS)
Salhi, Lotfi; Talbi, Mourad; Abid, Sabeur; Cherif, Adnane
2011-09-01
Within the medical environment, diverse techniques exist to assess the state of the voice of the patient. The inspection technique is inconvenient for a number of reasons, such as its high cost, the duration of the inspection, and above all, the fact that it is an invasive technique. This study focuses on a robust, rapid and accurate system for automatic identification of pathological voices. This system employs non-invasive, non-expensive and fully automated method based on hybrid approach: wavelet transform analysis and neural network classifier. First, we present the results obtained in our previous study while using classic feature parameters. These results allow visual identification of pathological voices. Second, quantified parameters drifting from the wavelet analysis are proposed to characterise the speech sample. On the other hand, a system of multilayer neural networks (MNNs) has been developed which carries out the automatic detection of pathological voices. The developed method was evaluated using voice database composed of recorded voice samples (continuous speech) from normophonic or dysphonic speakers. The dysphonic speakers were patients of a National Hospital 'RABTA' of Tunis Tunisia and a University Hospital in Brussels, Belgium. Experimental results indicate a success rate ranging between 75% and 98.61% for discrimination of normal and pathological voices using the proposed parameters and neural network classifier. We also compared the average classification rate based on the MNN, Gaussian mixture model and support vector machines.
Sadakata, Makiko; McQueen, James M
2013-08-01
This study reports effects of a high-variability training procedure on nonnative learning of a Japanese geminate-singleton fricative contrast. Thirty native speakers of Dutch took part in a 5-day training procedure in which they identified geminate and singleton variants of the Japanese fricative /s/. Participants were trained with either many repetitions of a limited set of words recorded by a single speaker (low-variability training) or with fewer repetitions of a more variable set of words recorded by multiple speakers (high-variability training). Both types of training enhanced identification of speech but not of nonspeech materials, indicating that learning was domain specific. High-variability training led to superior performance in identification but not in discrimination tests, and supported better generalization of learning as shown by transfer from the trained fricatives to the identification of untrained stops and affricates. Variability thus helps nonnative listeners to form abstract categories rather than to enhance early acoustic analysis.
Gender identification from high-pass filtered vowel segments: the use of high-frequency energy.
Donai, Jeremy J; Lass, Norman J
2015-10-01
The purpose of this study was to examine the use of high-frequency information for making gender identity judgments from high-pass filtered vowel segments produced by adult speakers. Specifically, the effect of removing lower-frequency spectral detail (i.e., F3 and below) from vowel segments via high-pass filtering was evaluated. Thirty listeners (ages 18-35) with normal hearing participated in the experiment. A within-subjects design was used to measure gender identification for six 250-ms vowel segments (/æ/, /ɪ /, /ɝ/, /ʌ/, /ɔ/, and /u/), produced by ten male and ten female speakers. The results of this experiment demonstrated that despite the removal of low-frequency spectral detail, the listeners were accurate in identifying speaker gender from the vowel segments, and did so with performance significantly above chance. The removal of low-frequency spectral detail reduced gender identification by approximately 16 % relative to unfiltered vowel segments. Classification results using linear discriminant function analyses followed the perceptual data, using spectral and temporal representations derived from the high-pass filtered segments. Cumulatively, these findings indicate that normal-hearing listeners are able to make accurate perceptual judgments regarding speaker gender from vowel segments with low-frequency spectral detail removed via high-pass filtering. Therefore, it is reasonable to suggest the presence of perceptual cues related to gender identity in the high-frequency region of naturally produced vowel signals. Implications of these findings and possible mechanisms for performing the gender identification task from high-pass filtered stimuli are discussed.
Shattuck-Hufnagel, S.; Choi, J. Y.; Moro-Velázquez, L.; Gómez-García, J. A.
2017-01-01
Although a large amount of acoustic indicators have already been proposed in the literature to evaluate the hypokinetic dysarthria of people with Parkinson’s Disease, the goal of this work is to identify and interpret new reliable and complementary articulatory biomarkers that could be applied to predict/evaluate Parkinson’s Disease from a diadochokinetic test, contributing to the possibility of a further multidimensional analysis of the speech of parkinsonian patients. The new biomarkers proposed are based on the kinetic behaviour of the envelope trace, which is directly linked with the articulatory dysfunctions introduced by the disease since the early stages. The interest of these new articulatory indicators stands on their easiness of identification and interpretation, and their potential to be translated into computer based automatic methods to screen the disease from the speech. Throughout this paper, the accuracy provided by these acoustic kinetic biomarkers is compared with the one obtained with a baseline system based on speaker identification techniques. Results show accuracies around 85% that are in line with those obtained with the complex state of the art speaker recognition techniques, but with an easier physical interpretation, which open the possibility to be transferred to a clinical setting. PMID:29240814
Robust Recognition of Loud and Lombard speech in the Fighter Cockpit Environment
1988-08-01
the latter as inter-speaker variability. According to Zue [Z85j, inter-speaker variabilities can be attributed to sociolinguistic background, dialect...34 Journal of the Acoustical Society of America , Vol 50, 1971. [At74I B. S. Atal, "Linear prediction for speaker identification," Journal of the Acoustical...Society of America , Vol 55, 1974. [B771 B. Beek, E. P. Neuberg, and D. C. Hodge, "An Assessment of the Technology of Automatic Speech Recognition for
Yu, Chengzhu; Hansen, John H L
2017-03-01
Human physiology has evolved to accommodate environmental conditions, including temperature, pressure, and air chemistry unique to Earth. However, the environment in space varies significantly compared to that on Earth and, therefore, variability is expected in astronauts' speech production mechanism. In this study, the variations of astronaut voice characteristics during the NASA Apollo 11 mission are analyzed. Specifically, acoustical features such as fundamental frequency and phoneme formant structure that are closely related to the speech production system are studied. For a further understanding of astronauts' vocal tract spectrum variation in space, a maximum likelihood frequency warping based analysis is proposed to detect the vocal tract spectrum displacement during space conditions. The results from fundamental frequency, formant structure, as well as vocal spectrum displacement indicate that astronauts change their speech production mechanism when in space. Moreover, the experimental results for astronaut voice identification tasks indicate that current speaker recognition solutions are highly vulnerable to astronaut voice production variations in space conditions. Future recommendations from this study suggest that successful applications of speaker recognition during extended space missions require robust speaker modeling techniques that could effectively adapt to voice production variation caused by diverse space conditions.
ERIC Educational Resources Information Center
Bullock, Heather E.; Fernald, Julian L.
2003-01-01
Drawing on a communications model of persuasion (Hovland, Janis, & Kelley, 1953), this study examined the effect of target appearance on feminists' and nonfeminists' perceptions of a speaker delivering a feminist or an antifeminist message. One hundred three college women watched one of four videotaped speeches that varied by content (profeminist…
Accent Identification by Adults with Aphasia
ERIC Educational Resources Information Center
Newton, Caroline; Burns, Rebecca; Bruce, Carolyn
2013-01-01
The UK is a diverse society where individuals regularly interact with speakers with different accents. Whilst there is a growing body of research on the impact of speaker accent on comprehension in people with aphasia, there is none which explores their ability to identify accents. This study investigated the ability of this group to identify the…
The non-trusty clown attack on model-based speaker recognition systems
NASA Astrophysics Data System (ADS)
Farrokh Baroughi, Alireza; Craver, Scott
2015-03-01
Biometric detectors for speaker identification commonly employ a statistical model for a subject's voice, such as a Gaussian Mixture Model, that combines multiple means to improve detector performance. This allows a malicious insider to amend or append a component of a subject's statistical model so that a detector behaves normally except under a carefully engineered circumstance. This allows an attacker to force a misclassification of his or her voice only when desired, by smuggling data into a database far in advance of an attack. Note that the attack is possible if attacker has access to database even for a limited time to modify victim's model. We exhibit such an attack on a speaker identification, in which an attacker can force a misclassification by speaking in an unusual voice, and replacing the least weighted component of victim's model by the most weighted competent of the unusual voice of the attacker's model. The reason attacker make his or her voice unusual during the attack is because his or her normal voice model can be in database, and by attacking with unusual voice, the attacker has the option to be recognized as himself or herself when talking normally or as the victim when talking in the unusual manner. By attaching an appropriately weighted vector to a victim's model, we can impersonate all users in our simulations, while avoiding unwanted false rejections.
NASA Astrophysics Data System (ADS)
Zilletti, Michele; Marker, Arthur; Elliott, Stephen John; Holland, Keith
2017-05-01
In this study model identification of the nonlinear dynamics of a micro-speaker is carried out by purely electrical measurements, avoiding any explicit vibration measurements. It is shown that a dynamic model of the micro-speaker, which takes into account the nonlinear damping characteristic of the device, can be identified by measuring the response between the voltage input and the current flowing into the coil. An analytical formulation of the quasi-linear model of the micro-speaker is first derived and an optimisation method is then used to identify a polynomial function which describes the mechanical damping behaviour of the micro-speaker. The analytical results of the quasi-linear model are compared with numerical results. This study potentially opens up the possibility of efficiently implementing nonlinear echo cancellers.
Speaker Recognition by Combining MFCC and Phase Information in Noisy Conditions
NASA Astrophysics Data System (ADS)
Wang, Longbiao; Minami, Kazue; Yamamoto, Kazumasa; Nakagawa, Seiichi
In this paper, we investigate the effectiveness of phase for speaker recognition in noisy conditions and combine the phase information with mel-frequency cepstral coefficients (MFCCs). To date, almost speaker recognition methods are based on MFCCs even in noisy conditions. For MFCCs which dominantly capture vocal tract information, only the magnitude of the Fourier Transform of time-domain speech frames is used and phase information has been ignored. High complement of the phase information and MFCCs is expected because the phase information includes rich voice source information. Furthermore, some researches have reported that phase based feature was robust to noise. In our previous study, a phase information extraction method that normalizes the change variation in the phase depending on the clipping position of the input speech was proposed, and the performance of the combination of the phase information and MFCCs was remarkably better than that of MFCCs. In this paper, we evaluate the robustness of the proposed phase information for speaker identification in noisy conditions. Spectral subtraction, a method skipping frames with low energy/Signal-to-Noise (SN) and noisy speech training models are used to analyze the effect of the phase information and MFCCs in noisy conditions. The NTT database and the JNAS (Japanese Newspaper Article Sentences) database added with stationary/non-stationary noise were used to evaluate our proposed method. MFCCs outperformed the phase information for clean speech. On the other hand, the degradation of the phase information was significantly smaller than that of MFCCs for noisy speech. The individual result of the phase information was even better than that of MFCCs in many cases by clean speech training models. By deleting unreliable frames (frames having low energy/SN), the speaker identification performance was improved significantly. By integrating the phase information with MFCCs, the speaker identification error reduction rate was about 30%-60% compared with the standard MFCC-based method.
Jiang, Jun; Liu, Fang; Wan, Xuan; Jiang, Cunmei
2015-07-01
Tone language experience benefits pitch processing in music and speech for typically developing individuals. No known studies have examined pitch processing in individuals with autism who speak a tone language. This study investigated discrimination and identification of melodic contour and speech intonation in a group of Mandarin-speaking individuals with high-functioning autism. Individuals with autism showed superior melodic contour identification but comparable contour discrimination relative to controls. In contrast, these individuals performed worse than controls on both discrimination and identification of speech intonation. These findings provide the first evidence for differential pitch processing in music and speech in tone language speakers with autism, suggesting that tone language experience may not compensate for speech intonation perception deficits in individuals with autism.
Speaker recognition with temporal cues in acoustic and electric hearing
NASA Astrophysics Data System (ADS)
Vongphoe, Michael; Zeng, Fan-Gang
2005-08-01
Natural spoken language processing includes not only speech recognition but also identification of the speaker's gender, age, emotional, and social status. Our purpose in this study is to evaluate whether temporal cues are sufficient to support both speech and speaker recognition. Ten cochlear-implant and six normal-hearing subjects were presented with vowel tokens spoken by three men, three women, two boys, and two girls. In one condition, the subject was asked to recognize the vowel. In the other condition, the subject was asked to identify the speaker. Extensive training was provided for the speaker recognition task. Normal-hearing subjects achieved nearly perfect performance in both tasks. Cochlear-implant subjects achieved good performance in vowel recognition but poor performance in speaker recognition. The level of the cochlear implant performance was functionally equivalent to normal performance with eight spectral bands for vowel recognition but only to one band for speaker recognition. These results show a disassociation between speech and speaker recognition with primarily temporal cues, highlighting the limitation of current speech processing strategies in cochlear implants. Several methods, including explicit encoding of fundamental frequency and frequency modulation, are proposed to improve speaker recognition for current cochlear implant users.
English Language Schooling, Linguistic Realities, and the Native Speaker of English in Hong Kong
ERIC Educational Resources Information Center
Hansen Edwards, Jette G.
2018-01-01
The study employs a case study approach to examine the impact of educational backgrounds on nine Hong Kong tertiary students' English and Cantonese language practices and identifications as native speakers of English and Cantonese. The study employed both survey and interview data to probe the participants' English and Cantonese language use at…
Priming of Non-Speech Vocalizations in Male Adults: The Influence of the Speaker's Gender
ERIC Educational Resources Information Center
Fecteau, Shirley; Armony, Jorge L.; Joanette, Yves; Belin, Pascal
2004-01-01
Previous research reported a priming effect for voices. However, the type of information primed is still largely unknown. In this study, we examined the influence of speaker's gender and emotional category of the stimulus on priming of non-speech vocalizations in 10 male participants, who performed a gender identification task. We found a…
Effects of Phonetic Similarity in the Identification of Mandarin Tones
ERIC Educational Resources Information Center
Li, Bin; Shao, Jing; Bao, Mingzhen
2017-01-01
Tonal languages differ in how they use phonetic correlates, e.g. average pitch height and pitch direction, for tonal contrasts. Thus, native speakers of a tonal language may need to adjust their attention to familiar or unfamiliar phonetic cues when perceiving non-native tones. On the other hand, speakers of a non-tonal language may need to…
Lee, Kichol; Casali, John G
2016-01-01
To investigate the effect of controlled low-speed wind-noise on the auditory situation awareness performance afforded by military hearing protection/enhancement devices (HPED) and tactical communication and protective systems (TCAPS). Recognition/identification and pass-through communications tasks were separately conducted under three wind conditions (0, 5, and 10 mph). Subjects wore two in-ear-type TCAPS, one earmuff-type TCAPS, a Combat Arms Earplug in its 'open' or pass-through setting, and an EB-15LE electronic earplug. Devices with electronic gain systems were tested under two gain settings: 'unity' and 'max'. Testing without any device (open ear) was conducted as a control. Ten subjects were recruited from the student population at Virginia Tech. Audiometric requirements were 25 dBHL or better at 500, 1000, 2000, 4000, and 8000 Hz in both ears. Performance on the interaction of communication task-by-device was significantly different only in 0 mph wind speed. The between-device performance differences varied with azimuthal speaker locations. It is evident from this study that stable (non-gusting) wind speeds up to 10 mph did not significantly degrade recognition/identification task performance and pass-through communication performance of the group of HPEDs and TCAPS tested. However, the various devices performed differently as the test sound signal speaker location was varied and it appears that physical as well as electronic features may have contributed to this directional result.
ERIC Educational Resources Information Center
Hayes-Harb, Rachel
2006-01-01
English as a second language (ESL) teachers have long noted that native speakers of Arabic exhibit exceptional difficulty with English reading comprehension (e.g., Thompson-Panos & Thomas-Ruzic, 1983). Most existing work in this area has looked to higher level aspects of reading such as familiarity with discourse structure and cultural knowledge…
The Effect of Scene Variation on the Redundant Use of Color in Definite Reference
ERIC Educational Resources Information Center
Koolen, Ruud; Goudbeek, Martijn; Krahmer, Emiel
2013-01-01
This study investigates to what extent the amount of variation in a visual scene causes speakers to mention the attribute color in their definite target descriptions, focusing on scenes in which this attribute is not needed for identification of the target. The results of our three experiments show that speakers are more likely to redundantly…
Experimental study on GMM-based speaker recognition
NASA Astrophysics Data System (ADS)
Ye, Wenxing; Wu, Dapeng; Nucci, Antonio
2010-04-01
Speaker recognition plays a very important role in the field of biometric security. In order to improve the recognition performance, many pattern recognition techniques have be explored in the literature. Among these techniques, the Gaussian Mixture Model (GMM) is proved to be an effective statistic model for speaker recognition and is used in most state-of-the-art speaker recognition systems. The GMM is used to represent the 'voice print' of a speaker through modeling the spectral characteristic of speech signals of the speaker. In this paper, we implement a speaker recognition system, which consists of preprocessing, Mel-Frequency Cepstrum Coefficients (MFCCs) based feature extraction, and GMM based classification. We test our system with TIDIGITS data set (325 speakers) and our own recordings of more than 200 speakers; our system achieves 100% correct recognition rate. Moreover, we also test our system under the scenario that training samples are from one language but test samples are from a different language; our system also achieves 100% correct recognition rate, which indicates that our system is language independent.
Parker, Mark; Cunningham, Stuart; Enderby, Pam; Hawley, Mark; Green, Phil
2006-01-01
The STARDUST project developed robust computer speech recognizers for use by eight people with severe dysarthria and concomitant physical disability to access assistive technologies. Independent computer speech recognizers trained with normal speech are of limited functional use by those with severe dysarthria due to limited and inconsistent proximity to "normal" articulatory patterns. Severe dysarthric output may also be characterized by a small mass of distinguishable phonetic tokens making the acoustic differentiation of target words difficult. Speaker dependent computer speech recognition using Hidden Markov Models was achieved by the identification of robust phonetic elements within the individual speaker output patterns. A new system of speech training using computer generated visual and auditory feedback reduced the inconsistent production of key phonetic tokens over time.
Bent, Tessa; Holt, Rachael Frush
2018-02-01
Children's ability to understand speakers with a wide range of dialects and accents is essential for efficient language development and communication in a global society. Here, the impact of regional dialect and foreign-accent variability on children's speech understanding was evaluated in both quiet and noisy conditions. Five- to seven-year-old children ( n = 90) and adults ( n = 96) repeated sentences produced by three speakers with different accents-American English, British English, and Japanese-accented English-in quiet or noisy conditions. Adults had no difficulty understanding any speaker in quiet conditions. Their performance declined for the nonnative speaker with a moderate amount of noise; their performance only substantially declined for the British English speaker (i.e., below 93% correct) when their understanding of the American English speaker was also impeded. In contrast, although children showed accurate word recognition for the American and British English speakers in quiet conditions, they had difficulty understanding the nonnative speaker even under ideal listening conditions. With a moderate amount of noise, their perception of British English speech declined substantially and their ability to understand the nonnative speaker was particularly poor. These results suggest that although school-aged children can understand unfamiliar native dialects under ideal listening conditions, their ability to recognize words in these dialects may be highly susceptible to the influence of environmental degradation. Fully adult-like word identification for speakers with unfamiliar accents and dialects may exhibit a protracted developmental trajectory.
Response Identification in the Extremely Low Frequency Region of an Electret Condenser Microphone
Jeng, Yih-Nen; Yang, Tzung-Ming; Lee, Shang-Yin
2011-01-01
This study shows that a small electret condenser microphone connected to a notebook or a personal computer (PC) has a prominent response in the extremely low frequency region in a specific environment. It confines most acoustic waves within a tiny air cell as follows. The air cell is constructed by drilling a small hole in a digital versatile disk (DVD) plate. A small speaker and an electret condenser microphone are attached to the two sides of the hole. Thus, the acoustic energy emitted by the speaker and reaching the microphone is strong enough to actuate the diaphragm of the latter. The experiments showed that, once small air leakages are allowed on the margin of the speaker, the microphone captured the signal in the range of 0.5 to 20 Hz. Moreover, by removing the plastic cover of the microphone and attaching the microphone head to the vibration surface, the low frequency signal can be effectively captured too. Two examples are included to show the convenience of applying the microphone to pick up the low frequency vibration information of practical systems. PMID:22346594
Response identification in the extremely low frequency region of an electret condenser microphone.
Jeng, Yih-Nen; Yang, Tzung-Ming; Lee, Shang-Yin
2011-01-01
This study shows that a small electret condenser microphone connected to a notebook or a personal computer (PC) has a prominent response in the extremely low frequency region in a specific environment. It confines most acoustic waves within a tiny air cell as follows. The air cell is constructed by drilling a small hole in a digital versatile disk (DVD) plate. A small speaker and an electret condenser microphone are attached to the two sides of the hole. Thus, the acoustic energy emitted by the speaker and reaching the microphone is strong enough to actuate the diaphragm of the latter. The experiments showed that, once small air leakages are allowed on the margin of the speaker, the microphone captured the signal in the range of 0.5 to 20 Hz. Moreover, by removing the plastic cover of the microphone and attaching the microphone head to the vibration surface, the low frequency signal can be effectively captured too. Two examples are included to show the convenience of applying the microphone to pick up the low frequency vibration information of practical systems.
How Captain Amerika uses neural networks to fight crime
NASA Technical Reports Server (NTRS)
Rogers, Steven K.; Kabrisky, Matthew; Ruck, Dennis W.; Oxley, Mark E.
1994-01-01
Artificial neural network models can make amazing computations. These models are explained along with their application in problems associated with fighting crime. Specific problems addressed are identification of people using face recognition, speaker identification, and fingerprint and handwriting analysis (biometric authentication).
The Effects of the Literal Meaning of Emotional Phrases on the Identification of Vocal Emotions.
Shigeno, Sumi
2018-02-01
This study investigates the discrepancy between the literal emotional content of speech and emotional tone in the identification of speakers' vocal emotions in both the listeners' native language (Japanese), and in an unfamiliar language (random-spliced Japanese). Both experiments involve a "congruent condition," in which the emotion contained in the literal meaning of speech (words and phrases) was compatible with vocal emotion, and an "incongruent condition," in which these forms of emotional information were discordant. Results for Japanese indicated that performance in identifying emotions did not differ significantly between the congruent and incongruent conditions. However, the results for random-spliced Japanese indicated that vocal emotion was correctly identified more often in the congruent than in the incongruent condition. The different results for Japanese and random-spliced Japanese suggested that the literal meaning of emotional phrases influences the listener's perception of the speaker's emotion, and that Japanese participants could infer speakers' intended emotions in the incongruent condition.
A language-familiarity effect for speaker discrimination without comprehension.
Fleming, David; Giordano, Bruno L; Caldara, Roberto; Belin, Pascal
2014-09-23
The influence of language familiarity upon speaker identification is well established, to such an extent that it has been argued that "Human voice recognition depends on language ability" [Perrachione TK, Del Tufo SN, Gabrieli JDE (2011) Science 333(6042):595]. However, 7-mo-old infants discriminate speakers of their mother tongue better than they do foreign speakers [Johnson EK, Westrek E, Nazzi T, Cutler A (2011) Dev Sci 14(5):1002-1011] despite their limited speech comprehension abilities, suggesting that speaker discrimination may rely on familiarity with the sound structure of one's native language rather than the ability to comprehend speech. To test this hypothesis, we asked Chinese and English adult participants to rate speaker dissimilarity in pairs of sentences in English or Mandarin that were first time-reversed to render them unintelligible. Even in these conditions a language-familiarity effect was observed: Both Chinese and English listeners rated pairs of native-language speakers as more dissimilar than foreign-language speakers, despite their inability to understand the material. Our data indicate that the language familiarity effect is not based on comprehension but rather on familiarity with the phonology of one's native language. This effect may stem from a mechanism analogous to the "other-race" effect in face recognition.
Integrated Robust Open-Set Speaker Identification System (IROSIS)
2012-05-01
29 LIST OF TABLES Table 1. Detail of NIST Data Used for Training and Testing ............................................ 3 Table 2...scenarios are referred to as VB-YB, VL-YL, VB-YL and VL-YB respectively. Table 1. Detail of NIST Data Used for Training and Testing Purpose Source No...M is the UBM supervector, and that the difference between ( )L m and ( , )Q M m is the Kullback - Leibler divergence between the “alignment” of the
ERIC Educational Resources Information Center
Blount, Ben G.; Padgug, Elise J.
Features of parental speech to young children was studied in four English-speaking and four Spanish-speaking families. Children ranged in age from 9 to 12 months for the English speakers and from 8 to 22 months for the Spanish speakers. Examination of the utterances led to the identification of 34 prosodic, paralinguistic, and interactional…
Parallel Processing of Large Scale Microphone Arrays for Sound Capture
NASA Astrophysics Data System (ADS)
Jan, Ea-Ee.
1995-01-01
Performance of microphone sound pick up is degraded by deleterious properties of the acoustic environment, such as multipath distortion (reverberation) and ambient noise. The degradation becomes more prominent in a teleconferencing environment in which the microphone is positioned far away from the speaker. Besides, the ideal teleconference should feel as easy and natural as face-to-face communication with another person. This suggests hands-free sound capture with no tether or encumbrance by hand-held or body-worn sound equipment. Microphone arrays for this application represent an appropriate approach. This research develops new microphone array and signal processing techniques for high quality hands-free sound capture in noisy, reverberant enclosures. The new techniques combine matched-filtering of individual sensors and parallel processing to provide acute spatial volume selectivity which is capable of mitigating the deleterious effects of noise interference and multipath distortion. The new method outperforms traditional delay-and-sum beamformers which provide only directional spatial selectivity. The research additionally explores truncated matched-filtering and random distribution of transducers to reduce complexity and improve sound capture quality. All designs are first established by computer simulation of array performance in reverberant enclosures. The simulation is achieved by a room model which can efficiently calculate the acoustic multipath in a rectangular enclosure up to a prescribed order of images. It also calculates the incident angle of the arriving signal. Experimental arrays were constructed and their performance was measured in real rooms. Real room data were collected in a hard-walled laboratory and a controllable variable acoustics enclosure of similar size, approximately 6 x 6 x 3 m. An extensive speech database was also collected in these two enclosures for future research on microphone arrays. The simulation results are shown to be consistent with the real room data. Localization of sound sources has been explored using cross-power spectrum time delay estimation and has been evaluated using real room data under slightly, moderately and highly reverberant conditions. To improve the accuracy and reliability of the source localization, an outlier detector that removes incorrect time delay estimation has been invented. To provide speaker selectivity for microphone array systems, a hands-free speaker identification system has been studied. A recently invented feature using selected spectrum information outperforms traditional recognition methods. Measured results demonstrate the capabilities of speaker selectivity from a matched-filtered array. In addition, simulation utilities, including matched -filtering processing of the array and hands-free speaker identification, have been implemented on the massively -parallel nCube super-computer. This parallel computation highlights the requirements for real-time processing of array signals.
Age as a Factor in Ethnic Accent Identification in Singapore
ERIC Educational Resources Information Center
Tan, Ying Ying
2012-01-01
This study seeks to answer two research questions. First, can listeners distinguish the ethnicity of the speakers on the basis of voice quality alone? Second, do demographic differences among the listeners affect discriminability? A simple but carefully designed and controlled ethnic identification test was carried out on 325 Singaporean…
Congenital amusia in speakers of a tone language: association with lexical tone agnosia.
Nan, Yun; Sun, Yanan; Peretz, Isabelle
2010-09-01
Congenital amusia is a neurogenetic disorder that affects the processing of musical pitch in speakers of non-tonal languages like English and French. We assessed whether this musical disorder exists among speakers of Mandarin Chinese who use pitch to alter the meaning of words. Using the Montreal Battery of Evaluation of Amusia, we tested 117 healthy young Mandarin speakers with no self-declared musical problems and 22 individuals who reported musical difficulties and scored two standard deviations below the mean obtained by the Mandarin speakers without amusia. These 22 amusic individuals showed a similar pattern of musical impairment as did amusic speakers of non-tonal languages, by exhibiting a more pronounced deficit in melody than in rhythm processing. Furthermore, nearly half the tested amusics had impairments in the discrimination and identification of Mandarin lexical tones. Six showed marked impairments, displaying what could be called lexical tone agnosia, but had normal tone production. Our results show that speakers of tone languages such as Mandarin may experience musical pitch disorder despite early exposure to speech-relevant pitch contrasts. The observed association between the musical disorder and lexical tone difficulty indicates that the pitch disorder as defining congenital amusia is not specific to music or culture but is rather general in nature.
Understanding of emotions and false beliefs among hearing children versus deaf children.
Ziv, Margalit; Most, Tova; Cohen, Shirit
2013-04-01
Emotion understanding and theory of mind (ToM) are two major aspects of social cognition in which deaf children demonstrate developmental delays. The current study investigated these social cognition aspects in two subgroups of deaf children-those with cochlear implants who communicate orally (speakers) and those who communicate primarily using sign language (signers)-in comparison to hearing children. Participants were 53 Israeli kindergartners-20 speakers, 10 signers, and 23 hearing children. Tests included four emotion identification and understanding tasks and one false belief task (ToM). Results revealed similarities among all children's emotion labeling and affective perspective taking abilities, similarities between speakers and hearing children in false beliefs and in understanding emotions in typical contexts, and lower performance of signers on the latter three tasks. Adapting educational experiences to the unique characteristics and needs of speakers and signers is recommended.
Development of panel loudspeaker system: design, evaluation and enhancement.
Bai, M R; Huang, T
2001-06-01
Panel speakers are investigated in terms of structural vibration and acoustic radiation. A panel speaker primarily consists of a panel and an inertia exciter. Contrary to conventional speakers, flexural resonance is encouraged such that the panel vibrates as randomly as possible. Simulation tools are developed to facilitate system integration of panel speakers. In particular, electro-mechanical analogy, finite element analysis, and fast Fourier transform are employed to predict panel vibration and the acoustic radiation. Design procedures are also summarized. In order to compare the panel speakers with the conventional speakers, experimental investigations were undertaken to evaluate frequency response, directional response, sensitivity, efficiency, and harmonic distortion of both speakers. The results revealed that the panel speakers suffered from a problem of sensitivity and efficiency. To alleviate the problem, a woofer using electronic compensation based on H2 model matching principle is utilized to supplement the bass response. As indicated in the result, significant improvement over the panel speaker alone was achieved by using the combined panel-woofer system.
2008-12-01
AUTHOR(S) H.T. Kung, Chit-Kwan Lin, Chia-Yung Su, Dario Vlah, John Grieco, Mark Huggins, and Bruce Suter 5d. PROJECT NUMBER WCNA 5e. TASK NUMBER...APPLICATION H. T. Kung, Chit-Kwan Lin, Chia-Yung Su, Dario Vlah John Grieco†, Mark Huggins‡, Bruce Suter† Harvard University Air Force Research Lab†, Oasis...contributing its C7 processors used in our wireless testbed. REFERENCES [1] R. North, N. Browne, and L. Schiavone , “Joint tactical radio system - connecting
Perception of musical and lexical tones by Taiwanese-speaking musicians.
Lee, Chao-Yang; Lee, Yuh-Fang; Shr, Chia-Lin
2011-07-01
This study explored the relationship between music and speech by examining absolute pitch and lexical tone perception. Taiwanese-speaking musicians were asked to identify musical tones without a reference pitch and multispeaker Taiwanese level tones without acoustic cues typically present for speaker normalization. The results showed that a high percentage of the participants (65% with an exact match required and 81% with one-semitone errors allowed) possessed absolute pitch, as measured by the musical tone identification task. A negative correlation was found between occurrence of absolute pitch and age of onset of musical training, suggesting that the acquisition of absolute pitch resembles the acquisition of speech. The participants were able to identify multispeaker Taiwanese level tones with above-chance accuracy, even though the acoustic cues typically present for speaker normalization were not available in the stimuli. No correlations were found between the performance in musical tone identification and the performance in Taiwanese tone identification. Potential reasons for the lack of association between the two tasks are discussed. © 2011 Acoustical Society of America
"Who" is saying "what"? Brain-based decoding of human voice and speech.
Formisano, Elia; De Martino, Federico; Bonte, Milene; Goebel, Rainer
2008-11-07
Can we decipher speech content ("what" is being said) and speaker identity ("who" is saying it) from observations of brain activity of a listener? Here, we combine functional magnetic resonance imaging with a data-mining algorithm and retrieve what and whom a person is listening to from the neural fingerprints that speech and voice signals elicit in the listener's auditory cortex. These cortical fingerprints are spatially distributed and insensitive to acoustic variations of the input so as to permit the brain-based recognition of learned speech from unknown speakers and of learned voices from previously unheard utterances. Our findings unravel the detailed cortical layout and computational properties of the neural populations at the basis of human speech recognition and speaker identification.
NASA Astrophysics Data System (ADS)
Adhi Pradana, Wisnu; Adiwijaya; Novia Wisesty, Untari
2018-03-01
Support Vector Machine or commonly called SVM is one method that can be used to process the classification of a data. SVM classifies data from 2 different classes with hyperplane. In this study, the system was built using SVM to develop Arabic Speech Recognition. In the development of the system, there are 2 kinds of speakers that have been tested that is dependent speakers and independent speakers. The results from this system is an accuracy of 85.32% for speaker dependent and 61.16% for independent speakers.
Choi, Ji Eun; Moon, Il Joon; Kim, Eun Yeon; Park, Hee-Sung; Kim, Byung Kil; Chung, Won-Ho; Cho, Yang-Sun; Brown, Carolyn J; Hong, Sung Hwa
The aim of this study was to compare binaural performance of auditory localization task and speech perception in babble measure between children who use a cochlear implant (CI) in one ear and a hearing aid (HA) in the other (bimodal fitting) and those who use bilateral CIs. Thirteen children (mean age ± SD = 10 ± 2.9 years) with bilateral CIs and 19 children with bimodal fitting were recruited to participate. Sound localization was assessed using a 13-loudspeaker array in a quiet sound-treated booth. Speakers were placed in an arc from -90° azimuth to +90° azimuth (15° interval) in horizontal plane. To assess the accuracy of sound location identification, we calculated the absolute error in degrees between the target speaker and the response speaker during each trial. The mean absolute error was computed by dividing the sum of absolute errors by the total number of trials. We also calculated the hemifield identification score to reflect the accuracy of right/left discrimination. Speech-in-babble perception was also measured in the sound field using target speech presented from the front speaker. Eight-talker babble was presented in the following four different listening conditions: from the front speaker (0°), from one of the two side speakers (+90° or -90°), from both side speakers (±90°). Speech, spatial, and quality questionnaire was administered. When the two groups of children were directly compared with each other, there was no significant difference in localization accuracy ability or hemifield identification score under binaural condition. Performance in speech perception test was also similar to each other under most babble conditions. However, when the babble was from the first device side (CI side for children with bimodal stimulation or first CI side for children with bilateral CIs), speech understanding in babble by bilateral CI users was significantly better than that by bimodal listeners. Speech, spatial, and quality scores were comparable with each other between the two groups. Overall, the binaural performance was similar to each other between children who are fit with two CIs (CI + CI) and those who use bimodal stimulation (HA + CI) in most conditions. However, the bilateral CI group showed better speech perception than the bimodal CI group when babble was from the first device side (first CI side for bilateral CI users or CI side for bimodal listeners). Therefore, if bimodal performance is significantly below the mean bilateral CI performance on speech perception in babble, these results suggest that a child should be considered to transit from bimodal stimulation to bilateral CIs.
Interface of Linguistic and Visual Information During Audience Design.
Fukumura, Kumiko
2015-08-01
Evidence suggests that speakers can take account of the addressee's needs when referring. However, what representations drive the speaker's audience design has been less clear. This study aims to go beyond previous studies by investigating the interplay between the visual and linguistic context during audience design. Speakers repeated subordinate descriptions (e.g., firefighter) given in the prior linguistic context less and used basic-level descriptions (e.g., man) more when the addressee did not hear the linguistic context than when s/he did. But crucially, this effect happened only when the referent lacked the visual attributes associated with the expressions (e.g., the referent was in plain clothes rather than in a firefighter uniform), so there was no other contextual cue available for the identification of the referent. This suggests that speakers flexibly use different contextual cues to help their addressee map the referring expression onto the intended referent. In addition, speakers used fewer pronouns when the addressee did not hear the linguistic antecedent than when s/he did. This suggests that although speakers may be egocentric during anaphoric reference (Fukumura & Van Gompel, 2012), they can cooperatively avoid pronouns when the linguistic antecedents were not shared with their addressee during initial reference. © 2014 Cognitive Science Society, Inc.
Voice input/output capabilities at Perception Technology Corporation
NASA Technical Reports Server (NTRS)
Ferber, Leon A.
1977-01-01
Condensed resumes of key company personnel at the Perception Technology Corporation are presented. The staff possesses recognition, speech synthesis, speaker authentication, and language identification. Hardware and software engineers' capabilities are included.
Noh, Heil; Lee, Dong-Hee
2012-01-01
To identify the quantitative differences between Korean and English in long-term average speech spectra (LTASS). Twenty Korean speakers, who lived in the capital of Korea and spoke standard Korean as their first language, were compared with 20 native English speakers. For the Korean speakers, a passage from a novel and a passage from a leading newspaper article were chosen. For the English speakers, the Rainbow Passage was used. The speech was digitally recorded using GenRad 1982 Precision Sound Level Meter and GoldWave® software and analyzed using MATLAB program. There was no significant difference in the LTASS between the Korean subjects reading a news article or a novel. For male subjects, the LTASS of Korean speakers was significantly lower than that of English speakers above 1.6 kHz except at 4 kHz and its difference was more than 5 dB, especially at higher frequencies. For women, the LTASS of Korean speakers showed significantly lower levels at 0.2, 0.5, 1, 1.25, 2, 2.5, 6.3, 8, and 10 kHz, but the differences were less than 5 dB. Compared with English speakers, the LTASS of Korean speakers showed significantly lower levels in frequencies above 2 kHz except at 4 kHz. The difference was less than 5 dB between 2 and 5 kHz but more than 5 dB above 6 kHz. To adjust the formula for fitting hearing aids for Koreans, our results based on the LTASS analysis suggest that one needs to raise the gain in high-frequency regions.
Perception of English palatal codas by Korean speakers of English
NASA Astrophysics Data System (ADS)
Yeon, Sang-Hee
2003-04-01
This study aimed at looking at perception of English palatal codas by Korean speakers of English to determine if perception problems are the source of production problems. In particular, first, this study looked at the possible first language effect on the perception of English palatal codas. Second, a possible perceptual source of vowel epenthesis after English palatal codas was investigated. In addition, individual factors, such as length of residence, TOEFL score, gender and academic status, were compared to determine if those affected the varying degree of the perception accuracy. Eleven adult Korean speakers of English as well as three native speakers of English participated in the study. Three sets of a perception test including identification of minimally different English pseudo- or real words were carried out. The results showed that, first, the Korean speakers perceived the English codas significantly worse than the Americans. Second, the study supported the idea that Koreans perceived an extra /i/ after the final affricates due to final release. Finally, none of the individual factors explained the varying degree of the perceptional accuracy. In particular, TOEFL scores and the perception test scores did not have any statistically significant association.
Pruitt, John S; Jenkins, James J; Strange, Winifred
2006-03-01
Perception of second language speech sounds is influenced by one's first language. For example, speakers of American English have difficulty perceiving dental versus retroflex stop consonants in Hindi although English has both dental and retroflex allophones of alveolar stops. Japanese, unlike English, has a contrast similar to Hindi, specifically, the Japanese /d/ versus the flapped /r/ which is sometimes produced as a retroflex. This study compared American and Japanese speakers' identification of the Hindi contrast in CV syllable contexts where C varied in voicing and aspiration. The study then evaluated the participants' increase in identifying the distinction after training with a computer-interactive program. Training sessions progressively increased in difficulty by decreasing the extent of vowel truncation in stimuli and by adding new speakers. Although all participants improved significantly, Japanese participants were more accurate than Americans in distinguishing the contrast on pretest, during training, and on posttest. Transfer was observed to three new consonantal contexts, a new vowel context, and a new speaker's productions. Some abstract aspect of the contrast was apparently learned during training. It is suggested that allophonic experience with dental and retroflex stops may be detrimental to perception of the new contrast.
Speech recognition: Acoustic-phonetic knowledge acquisition and representation
NASA Astrophysics Data System (ADS)
Zue, Victor W.
1988-09-01
The long-term research goal is to develop and implement speaker-independent continuous speech recognition systems. It is believed that the proper utilization of speech-specific knowledge is essential for such advanced systems. This research is thus directed toward the acquisition, quantification, and representation, of acoustic-phonetic and lexical knowledge, and the application of this knowledge to speech recognition algorithms. In addition, we are exploring new speech recognition alternatives based on artificial intelligence and connectionist techniques. We developed a statistical model for predicting the acoustic realization of stop consonants in various positions in the syllable template. A unification-based grammatical formalism was developed for incorporating this model into the lexical access algorithm. We provided an information-theoretic justification for the hierarchical structure of the syllable template. We analyzed segmented duration for vowels and fricatives in continuous speech. Based on contextual information, we developed durational models for vowels and fricatives that account for over 70 percent of the variance, using data from multiple, unknown speakers. We rigorously evaluated the ability of human spectrogram readers to identify stop consonants spoken by many talkers and in a variety of phonetic contexts. Incorporating the declarative knowledge used by the readers, we developed a knowledge-based system for stop identification. We achieved comparable system performance to that to the readers.
Code of Federal Regulations, 2011 CFR
2011-07-01
..., or transcriptions of electronic recordings including the identification of speakers, shall to the... cost of transcription. (c) The agency shall maintain a complete verbatim copy of the transcript, a...
Code of Federal Regulations, 2010 CFR
2010-07-01
..., or transcriptions of electronic recordings including the identification of speakers, shall to the... cost of transcription. (c) The agency shall maintain a complete verbatim copy of the transcript, a...
The ICSI+ Multilingual Sentence Segmentation System
2006-01-01
these steps the ASR output needs to be enriched with information additional to words, such as speaker diarization , sentence segmentation, or story...and the out- of a speaker diarization is considered as well. We first detail extraction of the prosodic features, and then describe the clas- ation...also takes into account the speaker turns that estimated by the diarization system. In addition to the Max- 1) model speaker turn unigrams, trigram
Bone, Daniel; Li, Ming; Black, Matthew P.; Narayanan, Shrikanth S.
2013-01-01
Segmental and suprasegmental speech signal modulations offer information about paralinguistic content such as affect, age and gender, pathology, and speaker state. Speaker state encompasses medium-term, temporary physiological phenomena influenced by internal or external biochemical actions (e.g., sleepiness, alcohol intoxication). Perceptual and computational research indicates that detecting speaker state from speech is a challenging task. In this paper, we present a system constructed with multiple representations of prosodic and spectral features that provided the best result at the Intoxication Subchallenge of Interspeech 2011 on the Alcohol Language Corpus. We discuss the details of each classifier and show that fusion improves performance. We additionally address the question of how best to construct a speaker state detection system in terms of robust and practical marginalization of associated variability such as through modeling speakers, utterance type, gender, and utterance length. As is the case in human perception, speaker normalization provides significant improvements to our system. We show that a held-out set of baseline (sober) data can be used to achieve comparable gains to other speaker normalization techniques. Our fused frame-level statistic-functional systems, fused GMM systems, and final combined system achieve unweighted average recalls (UARs) of 69.7%, 65.1%, and 68.8%, respectively, on the test set. More consistent numbers compared to development set results occur with matched-prompt training, where the UARs are 70.4%, 66.2%, and 71.4%, respectively. The combined system improves over the Challenge baseline by 5.5% absolute (8.4% relative), also improving upon our previously best result. PMID:24376305
Report of an international symposium on drugs and driving
DOT National Transportation Integrated Search
1975-06-30
This report presents the proceedings of a Symposium on Drugs (other than alcohol) and Driving. Speaker's papers and work session summaries are included. Major topics include: Overview of Problem, Risk Identification, Drug Measurement in Biological Ma...
Individual differences in selective attention predict speech identification at a cocktail party.
Oberfeld, Daniel; Klöckner-Nowotny, Felicitas
2016-08-31
Listeners with normal hearing show considerable individual differences in speech understanding when competing speakers are present, as in a crowded restaurant. Here, we show that one source of this variance are individual differences in the ability to focus selective attention on a target stimulus in the presence of distractors. In 50 young normal-hearing listeners, the performance in tasks measuring auditory and visual selective attention was associated with sentence identification in the presence of spatially separated competing speakers. Together, the measures of selective attention explained a similar proportion of variance as the binaural sensitivity for the acoustic temporal fine structure. Working memory span, age, and audiometric thresholds showed no significant association with speech understanding. These results suggest that a reduced ability to focus attention on a target is one reason why some listeners with normal hearing sensitivity have difficulty communicating in situations with background noise.
Analysis of wolves and sheep. Final report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hogden, J.; Papcun, G.; Zlokarnik, I.
1997-08-01
In evaluating speaker verification systems, asymmetries have been observed in the ease with which people are able to break into other people`s voice locks. People who are good at breaking into voice locks are called wolves, and people whose locks are easy to break into are called sheep. (Goats are people that have a difficult time opening their own voice locks.) Analyses of speaker verification algorithms could be used to understand wolf/sheep asymmetries. Using the notion of a ``speaker space``, it is demonstrated that such asymmetries could arise even though the similarity of voice 1 to voice 2 is themore » same as the inverse similarity. This explains partially the wolf/sheep asymmetries, although there may be other factors. The speaker space can be computed from interspeaker similarity data using multidimensional scaling, and such speaker space can be used to given a good approximation of the interspeaker similarities. The derived speaker space can be used to predict which of the enrolled speakers are likely to be wolves and which are likely to be sheep. However, a speaker must first enroll in the speaker key system and then be compared to each of the other speakers; a good estimate of a person`s speaker space position could be obtained using only a speech sample.« less
Entropy Based Classifier Combination for Sentence Segmentation
2007-01-01
speaker diarization system to divide the audio data into hypothetical speakers [17...the prosodic feature also includes turn-based features which describe the position of a word in relation to diarization seg- mentation. The speaker ...ro- bust speaker segmentation: the ICSI-SRI fall 2004 diarization system,” in Proc. RT-04F Workshop, 2004. [18] “The rich transcription fall 2003,” http://nist.gov/speech/tests/rt/rt2003/fall/docs/rt03-fall-eval- plan-v9.pdf.
The speakers' bureau system: a form of peer selling.
Reid, Lynette; Herder, Matthew
2013-01-01
In the speakers' bureau system, physicians are recruited and trained by pharmaceutical, biotechnology, and medical device companies to deliver information about products to other physicians, in exchange for a fee. Using publicly available disclosures, we assessed the thesis that speakers' bureau involvement is not a feature of academic medicine in Canada, by estimating the prevalence of participation in speakers' bureaus among Canadian faculty in one medical specialty, cardiology. We analyzed the relevant features of an actual contract made public by the physician addressee and applied the Canadian Medical Association (CMA) guidelines on physician-industry relations to participation in a speakers' bureau. We argue that speakers' bureau participation constitutes a form of peer selling that should be understood to contravene the prohibition on product endorsement in the CMA Code of Ethics. Academic medical institutions, in conjunction with regulatory colleges, should continue and strengthen their policies to address participation in speakers' bureaus.
Greek perception and production of an English vowel contrast: A preliminary study
NASA Astrophysics Data System (ADS)
Podlipský, Václav J.
2005-04-01
This study focused on language-independent principles functioning in acquisition of second language (L2) contrasts. Specifically, it tested Bohn's Desensitization Hypothesis [in Speech perception and linguistic experience: Issues in Cross Language Research, edited by W. Strange (York Press, Baltimore, 1995)] which predicted that Greek speakers of English as an L2 would base their perceptual identification of English /i/ and /I/ on durational differences. Synthetic vowels differing orthogonally in duration and spectrum between the /i/ and /I/ endpoints served as stimuli for a forced-choice identification test. To assess L2 proficiency and to evaluate the possibility of cross-language category assimilation, productions of English /i/, /I/, and /ɛ/ and of Greek /i/ and /e/ were elicited and analyzed acoustically. The L2 utterances were also rated for the degree of foreign accent. Two native speakers of Modern Greek with low and 2 with intermediate experience in English participated. Six native English (NE) listeners and 6 NE speakers tested in an earlier study constituted the control groups. Heterogeneous perceptual behavior was observed for the L2 subjects. It is concluded that until acquisition in completely naturalistic settings is tested, possible interference of formally induced meta-linguistic differentiation between a ``short'' and a ``long'' vowel cannot be eliminated.
Speaker diarization system on the 2007 NIST rich transcription meeting recognition evaluation
NASA Astrophysics Data System (ADS)
Sun, Hanwu; Nwe, Tin Lay; Koh, Eugene Chin Wei; Bin, Ma; Li, Haizhou
2007-09-01
This paper presents a speaker diarization system developed at the Institute for Infocomm Research (I2R) for NIST Rich Transcription 2007 (RT-07) evaluation task. We describe in details our primary approaches for the speaker diarization on the Multiple Distant Microphones (MDM) conditions in conference room scenario. Our proposed system consists of six modules: 1). Least-mean squared (NLMS) adaptive filter for the speaker direction estimate via Time Difference of Arrival (TDOA), 2). An initial speaker clustering via two-stage TDOA histogram distribution quantization approach, 3). Multiple microphone speaker data alignment via GCC-PHAT Time Delay Estimate (TDE) among all the distant microphone channel signals, 4). A speaker clustering algorithm based on GMM modeling approach, 5). Non-speech removal via speech/non-speech verification mechanism and, 6). Silence removal via "Double-Layer Windowing"(DLW) method. We achieves error rate of 31.02% on the 2006 Spring (RT-06s) MDM evaluation task and a competitive overall error rate of 15.32% for the NIST Rich Transcription 2007 (RT-07) MDM evaluation task.
Gayle, Alberto Alexander; Shimaoka, Motomu
2017-01-01
The predominance of English in scientific research has created hurdles for "non-native speakers" of English. Here we present a novel application of native language identification (NLI) for the assessment of medical-scientific writing. For this purpose, we created a novel classification system whereby scoring would be based solely on text features found to be distinctive among native English speakers (NS) within a given context. We dubbed this the "Genuine Index" (GI). This methodology was validated using a small set of journals in the field of pediatric oncology. Our dataset consisted of 5,907 abstracts, representing work from 77 countries. A support vector machine (SVM) was used to generate our model and for scoring. Accuracy, precision, and recall of the classification model were 93.3%, 93.7%, and 99.4%, respectively. Class specific F-scores were 96.5% for NS and 39.8% for our benchmark class, Japan. Overall kappa was calculated to be 37.2%. We found significant differences between countries with respect to the GI score. Significant correlation was found between GI scores and two validated objective measures of writing proficiency and readability. Two sets of key terms and phrases differentiating NS and non-native writing were identified. Our GI model was able to detect, with a high degree of reliability, subtle differences between the terms and phrasing used by native and non-native speakers in peer reviewed journals, in the field of pediatric oncology. In addition, L1 language transfer was found to be very likely to survive revision, especially in non-Western countries such as Japan. These findings show that even when the language used is technically correct, there may still be some phrasing or usage that impact quality.
Robust speaker's location detection in a vehicle environment using GMM models.
Hu, Jwu-Sheng; Cheng, Chieh-Cheng; Liu, Wei-Han
2006-04-01
Abstract-Human-computer interaction (HCI) using speech communication is becoming increasingly important, especially in driving where safety is the primary concern. Knowing the speaker's location (i.e., speaker localization) not only improves the enhancement results of a corrupted signal, but also provides assistance to speaker identification. Since conventional speech localization algorithms suffer from the uncertainties of environmental complexity and noise, as well as from the microphone mismatch problem, they are frequently not robust in practice. Without a high reliability, the acceptance of speech-based HCI would never be realized. This work presents a novel speaker's location detection method and demonstrates high accuracy within a vehicle cabinet using a single linear microphone array. The proposed approach utilize Gaussian mixture models (GMM) to model the distributions of the phase differences among the microphones caused by the complex characteristic of room acoustic and microphone mismatch. The model can be applied both in near-field and far-field situations in a noisy environment. The individual Gaussian component of a GMM represents some general location-dependent but content and speaker-independent phase difference distributions. Moreover, the scheme performs well not only in nonline-of-sight cases, but also when the speakers are aligned toward the microphone array but at difference distances from it. This strong performance can be achieved by exploiting the fact that the phase difference distributions at different locations are distinguishable in the environment of a car. The experimental results also show that the proposed method outperforms the conventional multiple signal classification method (MUSIC) technique at various SNRs.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bouchard, Kristofer E.; Conant, David F.; Anumanchipalli, Gopala K.
A complete neurobiological understanding of speech motor control requires determination of the relationship between simultaneously recorded neural activity and the kinematics of the lips, jaw, tongue, and larynx. Many speech articulators are internal to the vocal tract, and therefore simultaneously tracking the kinematics of all articulators is nontrivial-especially in the context of human electrophysiology recordings. Here, we describe a noninvasive, multi-modal imaging system to monitor vocal tract kinematics, demonstrate this system in six speakers during production of nine American English vowels, and provide new analysis of such data. Classification and regression analysis revealed considerable variability in the articulator-to-acoustic relationship acrossmore » speakers. Non-negative matrix factorization extracted basis sets capturing vocal tract shapes allowing for higher vowel classification accuracy than traditional methods. Statistical speech synthesis generated speech from vocal tract measurements, and we demonstrate perceptual identification. We demonstrate the capacity to predict lip kinematics from ventral sensorimotor cortical activity. These results demonstrate a multi-modal system to non-invasively monitor articulator kinematics during speech production, describe novel analytic methods for relating kinematic data to speech acoustics, and provide the first decoding of speech kinematics from electrocorticography. These advances will be critical for understanding the cortical basis of speech production and the creation of vocal prosthetics.« less
Anumanchipalli, Gopala K.; Dichter, Benjamin; Chaisanguanthum, Kris S.; Johnson, Keith; Chang, Edward F.
2016-01-01
A complete neurobiological understanding of speech motor control requires determination of the relationship between simultaneously recorded neural activity and the kinematics of the lips, jaw, tongue, and larynx. Many speech articulators are internal to the vocal tract, and therefore simultaneously tracking the kinematics of all articulators is nontrivial—especially in the context of human electrophysiology recordings. Here, we describe a noninvasive, multi-modal imaging system to monitor vocal tract kinematics, demonstrate this system in six speakers during production of nine American English vowels, and provide new analysis of such data. Classification and regression analysis revealed considerable variability in the articulator-to-acoustic relationship across speakers. Non-negative matrix factorization extracted basis sets capturing vocal tract shapes allowing for higher vowel classification accuracy than traditional methods. Statistical speech synthesis generated speech from vocal tract measurements, and we demonstrate perceptual identification. We demonstrate the capacity to predict lip kinematics from ventral sensorimotor cortical activity. These results demonstrate a multi-modal system to non-invasively monitor articulator kinematics during speech production, describe novel analytic methods for relating kinematic data to speech acoustics, and provide the first decoding of speech kinematics from electrocorticography. These advances will be critical for understanding the cortical basis of speech production and the creation of vocal prosthetics. PMID:27019106
Bouchard, Kristofer E.; Conant, David F.; Anumanchipalli, Gopala K.; ...
2016-03-28
A complete neurobiological understanding of speech motor control requires determination of the relationship between simultaneously recorded neural activity and the kinematics of the lips, jaw, tongue, and larynx. Many speech articulators are internal to the vocal tract, and therefore simultaneously tracking the kinematics of all articulators is nontrivial-especially in the context of human electrophysiology recordings. Here, we describe a noninvasive, multi-modal imaging system to monitor vocal tract kinematics, demonstrate this system in six speakers during production of nine American English vowels, and provide new analysis of such data. Classification and regression analysis revealed considerable variability in the articulator-to-acoustic relationship acrossmore » speakers. Non-negative matrix factorization extracted basis sets capturing vocal tract shapes allowing for higher vowel classification accuracy than traditional methods. Statistical speech synthesis generated speech from vocal tract measurements, and we demonstrate perceptual identification. We demonstrate the capacity to predict lip kinematics from ventral sensorimotor cortical activity. These results demonstrate a multi-modal system to non-invasively monitor articulator kinematics during speech production, describe novel analytic methods for relating kinematic data to speech acoustics, and provide the first decoding of speech kinematics from electrocorticography. These advances will be critical for understanding the cortical basis of speech production and the creation of vocal prosthetics.« less
Brain systems mediating voice identity processing in blind humans.
Hölig, Cordula; Föcker, Julia; Best, Anna; Röder, Brigitte; Büchel, Christian
2014-09-01
Blind people rely more on vocal cues when they recognize a person's identity than sighted people. Indeed, a number of studies have reported better voice recognition skills in blind than in sighted adults. The present functional magnetic resonance imaging study investigated changes in the functional organization of neural systems involved in voice identity processing following congenital blindness. A group of congenitally blind individuals and matched sighted control participants were tested in a priming paradigm, in which two voice stimuli (S1, S2) were subsequently presented. The prime (S1) and the target (S2) were either from the same speaker (person-congruent voices) or from two different speakers (person-incongruent voices). Participants had to classify the S2 as either a old or a young person. Person-incongruent voices (S2) compared with person-congruent voices elicited an increased activation in the right anterior fusiform gyrus in congenitally blind individuals but not in matched sighted control participants. In contrast, only matched sighted controls showed a higher activation in response to person-incongruent compared with person-congruent voices (S2) in the right posterior superior temporal sulcus. These results provide evidence for crossmodal plastic changes of the person identification system in the brain after visual deprivation. Copyright © 2014 Wiley Periodicals, Inc.
NASA Astrophysics Data System (ADS)
Karam, Walid; Mokbel, Chafic; Greige, Hanna; Chollet, Gerard
2006-05-01
A GMM based audio visual speaker verification system is described and an Active Appearance Model with a linear speaker transformation system is used to evaluate the robustness of the verification. An Active Appearance Model (AAM) is used to automatically locate and track a speaker's face in a video recording. A Gaussian Mixture Model (GMM) based classifier (BECARS) is used for face verification. GMM training and testing is accomplished on DCT based extracted features of the detected faces. On the audio side, speech features are extracted and used for speaker verification with the GMM based classifier. Fusion of both audio and video modalities for audio visual speaker verification is compared with face verification and speaker verification systems. To improve the robustness of the multimodal biometric identity verification system, an audio visual imposture system is envisioned. It consists of an automatic voice transformation technique that an impostor may use to assume the identity of an authorized client. Features of the transformed voice are then combined with the corresponding appearance features and fed into the GMM based system BECARS for training. An attempt is made to increase the acceptance rate of the impostor and to analyzing the robustness of the verification system. Experiments are being conducted on the BANCA database, with a prospect of experimenting on the newly developed PDAtabase developed within the scope of the SecurePhone project.
Speaker-Machine Interaction in Automatic Speech Recognition. Technical Report.
ERIC Educational Resources Information Center
Makhoul, John I.
The feasibility and limitations of speaker adaptation in improving the performance of a "fixed" (speaker-independent) automatic speech recognition system were examined. A fixed vocabulary of 55 syllables is used in the recognition system which contains 11 stops and fricatives and five tense vowels. The results of an experiment on speaker…
Respiratory Control in Stuttering Speakers: Evidence from Respiratory High-Frequency Oscillations.
ERIC Educational Resources Information Center
Denny, Margaret; Smith, Anne
2000-01-01
This study examined whether stuttering speakers (N=10) differed from fluent speakers in relations between the neural control systems for speech and life support. It concluded that in some stuttering speakers the relations between respiratory controllers are atypical, but that high participation by the high frequency oscillation-producing circuitry…
ERIC Educational Resources Information Center
Binder, Richard
The thesis of this paper is that the "do so" test described by Lakoff and Ross (1966) is a test of the speaker's belief system regarding the relationship of verbs to their surface subject, and that judgments of grammaticality concerning "do so" are based on the speaker's underlying semantic beliefs. ("Speaker" refers here to both speakers and…
Multilevel Analysis in Analyzing Speech Data
ERIC Educational Resources Information Center
Guddattu, Vasudeva; Krishna, Y.
2011-01-01
The speech produced by human vocal tract is a complex acoustic signal, with diverse applications in phonetics, speech synthesis, automatic speech recognition, speaker identification, communication aids, speech pathology, speech perception, machine translation, hearing research, rehabilitation and assessment of communication disorders and many…
ERIC Educational Resources Information Center
Penta, Darrell J.
2017-01-01
The sentence production system transforms preverbal messages in the mind of a speaker into coherent grammatical utterances. During this process, which unfolds rapidly, the system has to link meaning information from the speaker's message to appropriate lexical and grammatical information from the speaker's memory. It usually does so with fluency…
A cross-language study of perception of lexical stress in English.
Yu, Vickie Y; Andruski, Jean E
2010-08-01
This study investigates the question of whether language background affects the perception of lexical stress in English. Thirty native English speakers and 30 native Chinese learners of English participated in a stressed-syllable identification task and a discrimination task involving three types of stimuli (real words/pseudowords/hums). The results show that both language groups were able to identify and discriminate stress patterns. Lexical and segmental information affected the English and Chinese speakers in varying degrees. English and Chinese speakers showed different response patterns to trochaic vs. iambic stress across the three types of stimuli. An acoustic analysis revealed that two language groups used different acoustic cues to process lexical stress. The findings suggest that the different degrees of lexical and segmental effects can be explained by language background, which in turn supports the hypothesis that language background affects the perception of lexical stress in English.
Brain Plasticity in Speech Training in Native English Speakers Learning Mandarin Tones
NASA Astrophysics Data System (ADS)
Heinzen, Christina Carolyn
The current study employed behavioral and event-related potential (ERP) measures to investigate brain plasticity associated with second-language (L2) phonetic learning based on an adaptive computer training program. The program utilized the acoustic characteristics of Infant-Directed Speech (IDS) to train monolingual American English-speaking listeners to perceive Mandarin lexical tones. Behavioral identification and discrimination tasks were conducted using naturally recorded speech, carefully controlled synthetic speech, and non-speech control stimuli. The ERP experiments were conducted with selected synthetic speech stimuli in a passive listening oddball paradigm. Identical pre- and post- tests were administered on nine adult listeners, who completed two-to-three hours of perceptual training. The perceptual training sessions used pair-wise lexical tone identification, and progressed through seven levels of difficulty for each tone pair. The levels of difficulty included progression in speaker variability from one to four speakers and progression through four levels of acoustic exaggeration of duration, pitch range, and pitch contour. Behavioral results for the natural speech stimuli revealed significant training-induced improvement in identification of Tones 1, 3, and 4. Improvements in identification of Tone 4 generalized to novel stimuli as well. Additionally, comparison between discrimination of across-category and within-category stimulus pairs taken from a synthetic continuum revealed a training-induced shift toward more native-like categorical perception of the Mandarin lexical tones. Analysis of the Mismatch Negativity (MMN) responses in the ERP data revealed increased amplitude and decreased latency for pre-attentive processing of across-category discrimination as a result of training. There were also laterality changes in the MMN responses to the non-speech control stimuli, which could reflect reallocation of brain resources in processing pitch patterns for the across-category lexical tone contrast. Overall, the results support the use of IDS characteristics in training non-native speech contrasts and provide impetus for further research.
Speaker Clustering for a Mixture of Singing and Reading (Preprint)
2012-03-01
diarization [2, 3] which answers the ques- tion of ”who spoke when?” is a combination of speaker segmentation and clustering. Although it is possible to...focuses on speaker clustering, the techniques developed here can be applied to speaker diarization . For the remainder of this paper, the term ”speech...and retrieval,” Proceedings of the IEEE, vol. 88, 2000. [2] S. Tranter and D. Reynolds, “An overview of automatic speaker diarization systems,” IEEE
``The perceptual bases of speaker identity'' revisited
NASA Astrophysics Data System (ADS)
Voiers, William D.
2003-10-01
A series of experiments begun 40 years ago [W. D. Voiers, J. Acoust. Soc. Am. 36, 1065-1073 (1964)] was concerned with identifying the perceived voice traits (PVTs) on which human recognition of voices depends. It culminated with the development of a voice taxonomy based on 20 PVTs and a set of highly reliable rating scales for classifying voices with respect to those PVTs. The development of a perceptual voice taxonomy was motivated by the need for a practical method of evaluating speaker recognizability in voice communication systems. The Diagnostic Speaker Recognition Test (DSRT) evaluates the effects of systems on speaker recognizability as reflected in changes in the inter-listener reliability of voice ratings on the 20 PVTs. The DSRT thus provides a qualitative, as well as quantitative, evaluation of the effects of a system on speaker recognizability. A fringe benefit of this project is PVT rating data for a sample of 680 voices. [Work partially supported by USAFRL.
Application of the wavelet transform for speech processing
NASA Technical Reports Server (NTRS)
Maes, Stephane
1994-01-01
Speaker identification and word spotting will shortly play a key role in space applications. An approach based on the wavelet transform is presented that, in the context of the 'modulation model,' enables extraction of speech features which are used as input for the classification process.
Individual differences in selective attention predict speech identification at a cocktail party
Oberfeld, Daniel; Klöckner-Nowotny, Felicitas
2016-01-01
Listeners with normal hearing show considerable individual differences in speech understanding when competing speakers are present, as in a crowded restaurant. Here, we show that one source of this variance are individual differences in the ability to focus selective attention on a target stimulus in the presence of distractors. In 50 young normal-hearing listeners, the performance in tasks measuring auditory and visual selective attention was associated with sentence identification in the presence of spatially separated competing speakers. Together, the measures of selective attention explained a similar proportion of variance as the binaural sensitivity for the acoustic temporal fine structure. Working memory span, age, and audiometric thresholds showed no significant association with speech understanding. These results suggest that a reduced ability to focus attention on a target is one reason why some listeners with normal hearing sensitivity have difficulty communicating in situations with background noise. DOI: http://dx.doi.org/10.7554/eLife.16747.001 PMID:27580272
Blue-green color categorization in Mandarin-English speakers.
Wuerger, Sophie; Xiao, Kaida; Mylonas, Dimitris; Huang, Qingmei; Karatzas, Dimosthenis; Hird, Emily; Paramei, Galina
2012-02-01
Observers are faster to detect a target among a set of distracters if the targets and distracters come from different color categories. This cross-boundary advantage seems to be limited to the right visual field, which is consistent with the dominance of the left hemisphere for language processing [Gilbert et al., Proc. Natl. Acad. Sci. USA 103, 489 (2006)]. Here we study whether a similar visual field advantage is found in the color identification task in speakers of Mandarin, a language that uses a logographic system. Forty late Mandarin-English bilinguals performed a blue-green color categorization task, in a blocked design, in their first language (L1: Mandarin) or second language (L2: English). Eleven color singletons ranging from blue to green were presented for 160 ms, randomly in the left visual field (LVF) or right visual field (RVF). Color boundary and reaction times (RTs) at the color boundary were estimated in L1 and L2, for both visual fields. We found that the color boundary did not differ between the languages; RTs at the color boundary, however, were on average more than 100 ms shorter in the English compared to the Mandarin sessions, but only when the stimuli were presented in the RVF. The finding may be explained by the script nature of the two languages: Mandarin logographic characters are analyzed visuospatially in the right hemisphere, which conceivably facilitates identification of color presented to the LVF. © 2012 Optical Society of America
Flege, J E; Hillenbrand, J
1986-02-01
This study examined the effect of linguistic experience on perception of the English /s/-/z/ contrast in word-final position. The durations of the periodic ("vowel") and aperiodic ("fricative") portions of stimuli, ranging from peas to peace, were varied in a 5 X 5 factorial design. Forced-choice identification judgments were elicited from two groups of native speakers of American English differing in dialect, and from two groups each of native speakers of French, Swedish, and Finnish differing in English-language experience. The results suggested that the non-native subjects used cues established for the perception of phonetic contrasts in their native language to identify fricatives as /s/ or /z/. Lengthening vowel duration increased /z/ judgments in all eight subject groups, although the effect was smaller for native speakers of French than for native speakers of the other languages. Shortening fricative duration, on the other hand, significantly decreased /z/ judgments only by the English and French subjects. It did not influence voicing judgments by the Swedish and Finnish subjects, even those who had lived for a year or more in an English-speaking environment. These findings raise the question of whether adults who learn a foreign language can acquire the ability to integrate multiple acoustic cues to a phonetic contrast which does not exist in their native language.
Temporal and acoustic characteristics of Greek vowels produced by adults with cerebral palsy
NASA Astrophysics Data System (ADS)
Botinis, Antonis; Orfanidou, Ioanna; Fourakis, Marios; Fourakis, Marios
2005-09-01
The present investigation examined the temporal and spectral characteristics of Greek vowels as produced by speakers with intact (NO) versus cerebral palsy affected (CP) neuromuscular systems. Six NO and six CP native speakers of Greek produced the Greek vowels [i, e, a, o, u] in the first syllable of CVCV nonsense words in a short carrier phrase. Stress could be on either the first or second syllable. There were three female and three male speakers in each group. In terms of temporal characteristics, the results showed that: vowels produced by CP speakers were longer than vowels produced by NO speakers; stressed vowels were longer than unstressed vowels; vowels produced by female speakers were longer than vowels produced by male speakers. In terms of spectral characteristics the results showed that the vowel space of the CP speakers was smaller than that of the NO speakers. This is similar to the results recently reported by Liu et al. [J. Acoust. Soc. Am. 117, 3879-3889 (2005)] for CP speakers of Mandarin. There was also a reduction of the acoustic vowel space defined by unstressed vowels, but this reduction was much more pronounced in the vowel productions of CP speakers than NO speakers.
NASA Astrophysics Data System (ADS)
Mosko, J. D.; Stevens, K. N.; Griffin, G. R.
1983-08-01
Acoustical analyses were conducted of words produced by four speakers in a motion stress-inducing situation. The aim of the analyses was to document the kinds of changes that occur in the vocal utterances of speakers who are exposed to motion stress and to comment on the implications of these results for the design and development of voice interactive systems. The speakers differed markedly in the types and magnitudes of the changes that occurred in their speech. For some speakers, the stress-inducing experimental condition caused an increase in fundamental frequency, changes in the pattern of vocal fold vibration, shifts in vowel production and changes in the relative amplitudes of sounds containing turbulence noise. All speakers showed greater variability in the experimental condition than in more relaxed control situation. The variability was manifested in the acoustical characteristics of individual phonetic elements, particularly in speech sound variability observed serve to unstressed syllables. The kinds of changes and variability observed serve to emphasize the limitations of speech recognition systems based on template matching of patterns that are stored in the system during a training phase. There is need for a better understanding of these phonetic modifications and for developing ways of incorporating knowledge about these changes within a speech recognition system.
Feedforward and feedback control in apraxia of speech: effects of noise masking on vowel production.
Maas, Edwin; Mailend, Marja-Liisa; Guenther, Frank H
2015-04-01
This study was designed to test two hypotheses about apraxia of speech (AOS) derived from the Directions Into Velocities of Articulators (DIVA) model (Guenther et al., 2006): the feedforward system deficit hypothesis and the feedback system deficit hypothesis. The authors used noise masking to minimize auditory feedback during speech. Six speakers with AOS and aphasia, 4 with aphasia without AOS, and 2 groups of speakers without impairment (younger and older adults) participated. Acoustic measures of vowel contrast, variability, and duration were analyzed. Younger, but not older, speakers without impairment showed significantly reduced vowel contrast with noise masking. Relative to older controls, the AOS group showed longer vowel durations overall (regardless of masking condition) and a greater reduction in vowel contrast under masking conditions. There were no significant differences in variability. Three of the 6 speakers with AOS demonstrated the group pattern. Speakers with aphasia without AOS did not differ from controls in contrast, duration, or variability. The greater reduction in vowel contrast with masking noise for the AOS group is consistent with the feedforward system deficit hypothesis but not with the feedback system deficit hypothesis; however, effects were small and not present in all individual speakers with AOS. Theoretical implications and alternative interpretations of these findings are discussed.
Feedforward and Feedback Control in Apraxia of Speech: Effects of Noise Masking on Vowel Production
Mailend, Marja-Liisa; Guenther, Frank H.
2015-01-01
Purpose This study was designed to test two hypotheses about apraxia of speech (AOS) derived from the Directions Into Velocities of Articulators (DIVA) model (Guenther et al., 2006): the feedforward system deficit hypothesis and the feedback system deficit hypothesis. Method The authors used noise masking to minimize auditory feedback during speech. Six speakers with AOS and aphasia, 4 with aphasia without AOS, and 2 groups of speakers without impairment (younger and older adults) participated. Acoustic measures of vowel contrast, variability, and duration were analyzed. Results Younger, but not older, speakers without impairment showed significantly reduced vowel contrast with noise masking. Relative to older controls, the AOS group showed longer vowel durations overall (regardless of masking condition) and a greater reduction in vowel contrast under masking conditions. There were no significant differences in variability. Three of the 6 speakers with AOS demonstrated the group pattern. Speakers with aphasia without AOS did not differ from controls in contrast, duration, or variability. Conclusion The greater reduction in vowel contrast with masking noise for the AOS group is consistent with the feedforward system deficit hypothesis but not with the feedback system deficit hypothesis; however, effects were small and not present in all individual speakers with AOS. Theoretical implications and alternative interpretations of these findings are discussed. PMID:25565143
Factor analysis of auto-associative neural networks with application in speaker verification.
Garimella, Sri; Hermansky, Hynek
2013-04-01
Auto-associative neural network (AANN) is a fully connected feed-forward neural network, trained to reconstruct its input at its output through a hidden compression layer, which has fewer numbers of nodes than the dimensionality of input. AANNs are used to model speakers in speaker verification, where a speaker-specific AANN model is obtained by adapting (or retraining) the universal background model (UBM) AANN, an AANN trained on multiple held out speakers, using corresponding speaker data. When the amount of speaker data is limited, this adaptation procedure may lead to overfitting as all the parameters of UBM-AANN are adapted. In this paper, we introduce and develop the factor analysis theory of AANNs to alleviate this problem. We hypothesize that only the weight matrix connecting the last nonlinear hidden layer and the output layer is speaker-specific, and further restrict it to a common low-dimensional subspace during adaptation. The subspace is learned using large amounts of development data, and is held fixed during adaptation. Thus, only the coordinates in a subspace, also known as i-vector, need to be estimated using speaker-specific data. The update equations are derived for learning both the common low-dimensional subspace and the i-vectors corresponding to speakers in the subspace. The resultant i-vector representation is used as a feature for the probabilistic linear discriminant analysis model. The proposed system shows promising results on the NIST-08 speaker recognition evaluation (SRE), and yields a 23% relative improvement in equal error rate over the previously proposed weighted least squares-based subspace AANNs system. The experiments on NIST-10 SRE confirm that these improvements are consistent and generalize across datasets.
DOE Office of Scientific and Technical Information (OSTI.GOV)
McClanahan, Richard; De Leon, Phillip L.
The majority of state-of-the-art speaker recognition systems (SR) utilize speaker models that are derived from an adapted universal background model (UBM) in the form of a Gaussian mixture model (GMM). This is true for GMM supervector systems, joint factor analysis systems, and most recently i-vector systems. In all of the identified systems, the posterior probabilities and sufficient statistics calculations represent a computational bottleneck in both enrollment and testing. We propose a multi-layered hash system, employing a tree-structured GMM–UBM which uses Runnalls’ Gaussian mixture reduction technique, in order to reduce the number of these calculations. Moreover, with this tree-structured hash, wemore » can trade-off reduction in computation with a corresponding degradation of equal error rate (EER). As an example, we also reduce this computation by a factor of 15× while incurring less than 10% relative degradation of EER (or 0.3% absolute EER) when evaluated with NIST 2010 speaker recognition evaluation (SRE) telephone data.« less
McClanahan, Richard; De Leon, Phillip L.
2014-08-20
The majority of state-of-the-art speaker recognition systems (SR) utilize speaker models that are derived from an adapted universal background model (UBM) in the form of a Gaussian mixture model (GMM). This is true for GMM supervector systems, joint factor analysis systems, and most recently i-vector systems. In all of the identified systems, the posterior probabilities and sufficient statistics calculations represent a computational bottleneck in both enrollment and testing. We propose a multi-layered hash system, employing a tree-structured GMM–UBM which uses Runnalls’ Gaussian mixture reduction technique, in order to reduce the number of these calculations. Moreover, with this tree-structured hash, wemore » can trade-off reduction in computation with a corresponding degradation of equal error rate (EER). As an example, we also reduce this computation by a factor of 15× while incurring less than 10% relative degradation of EER (or 0.3% absolute EER) when evaluated with NIST 2010 speaker recognition evaluation (SRE) telephone data.« less
Neural Systems Involved When Attending to a Speaker
Kamourieh, Salwa; Braga, Rodrigo M.; Leech, Robert; Newbould, Rexford D.; Malhotra, Paresh; Wise, Richard J. S.
2015-01-01
Remembering what a speaker said depends on attention. During conversational speech, the emphasis is on working memory, but listening to a lecture encourages episodic memory encoding. With simultaneous interference from background speech, the need for auditory vigilance increases. We recreated these context-dependent demands on auditory attention in 2 ways. The first was to require participants to attend to one speaker in either the absence or presence of a distracting background speaker. The second was to alter the task demand, requiring either an immediate or delayed recall of the content of the attended speech. Across 2 fMRI studies, common activated regions associated with segregating attended from unattended speech were the right anterior insula and adjacent frontal operculum (aI/FOp), the left planum temporale, and the precuneus. In contrast, activity in a ventral right frontoparietal system was dependent on both the task demand and the presence of a competing speaker. Additional multivariate analyses identified other domain-general frontoparietal systems, where activity increased during attentive listening but was modulated little by the need for speech stream segregation in the presence of 2 speakers. These results make predictions about impairments in attentive listening in different communicative contexts following focal or diffuse brain pathology. PMID:25596592
Korean Word Frequency and Commonality Study for Augmentative and Alternative Communication
ERIC Educational Resources Information Center
Shin, Sangeun; Hill, Katya
2016-01-01
Background: Vocabulary frequency results have been reported to design and support augmentative and alternative communication (AAC) interventions. A few studies exist for adult speakers and for other natural languages. With the increasing demand on AAC treatment for Korean adults, identification of high-frequency or core vocabulary (CV) becomes…
A Report by the Air Pollution Committee
ERIC Educational Resources Information Center
Kirkpatrick, Lane
1972-01-01
Description of a Symposium on Air Resource Protection and the Environment,'' held at the 1972 Environmental Health Conference and Exposition. Reports included a mathematical model for predicting future levels of air pollution, evaluation and identification of transportation controls, and a panel discussion of points raised by the speakers. (LK)
Myths and Political Rhetoric: Jimmy Carter Accepts the Nomination.
ERIC Educational Resources Information Center
Corso, Dianne M.
Like other political speakers who have drawn on the personification, identification, and dramatic encounter images of mythology to pressure and persuade audiences, Jimmy Carter evoked the myths of the hero, the American Dream, and the ideal political process in his presidential nomination acceptance speech. By stressing his unknown status, his…
Inside-in, alternative paradigms for sound spatialization
NASA Astrophysics Data System (ADS)
Bahn, Curtis; Moore, Stephan
2003-04-01
Arrays of widely spaced mono-directional loudspeakers (P.A.-style stereo configurations or ``outside-in'' surround-sound systems) have long provided the dominant paradigms for electronic sound diffusion. So prevalent are these models that alternatives have largely been ignored and electronic sound, regardless of musical aesthetic, has come to be inseparably associated with single-channel speakers, or headphones. We recognize the value of these familiar paradigms, but believe that electronic sound can and should have many alternative, idiosyncratic voices. Through the design and construction of unique sound diffusion structures, one can reinvent the nature of electronic sound; when allied with new sensor technologies, these structures offer alternative modes of interaction with techniques of sonic computation. This paper describes several recent applications of spherical speakers (multichannel, outward-radiating geodesic speaker arrays) and Sensor-Speaker-Arrays (SenSAs: combinations of various sensor devices with outward-radiating multi-channel speaker arrays). This presentation introduces the development of four generations of spherical speakers-over a hundred individual speakers of various configurations-and their use in many different musical situations including live performance, recording, and sound installation. We describe the design and construction of these systems, and, more generally, the new ``voices'' they give to electronic sound.
A dynamic multi-channel speech enhancement system for distributed microphones in a car environment
NASA Astrophysics Data System (ADS)
Matheja, Timo; Buck, Markus; Fingscheidt, Tim
2013-12-01
Supporting multiple active speakers in automotive hands-free or speech dialog applications is an interesting issue not least due to comfort reasons. Therefore, a multi-channel system for enhancement of speech signals captured by distributed distant microphones in a car environment is presented. Each of the potential speakers in the car has a dedicated directional microphone close to his position that captures the corresponding speech signal. The aim of the resulting overall system is twofold: On the one hand, a combination of an arbitrary pre-defined subset of speakers' signals can be performed, e.g., to create an output signal in a hands-free telephone conference call for a far-end communication partner. On the other hand, annoying cross-talk components from interfering sound sources occurring in multiple different mixed output signals are to be eliminated, motivated by the possibility of other hands-free applications being active in parallel. The system includes several signal processing stages. A dedicated signal processing block for interfering speaker cancellation attenuates the cross-talk components of undesired speech. Further signal enhancement comprises the reduction of residual cross-talk and background noise. Subsequently, a dynamic signal combination stage merges the processed single-microphone signals to obtain appropriate mixed signals at the system output that may be passed to applications such as telephony or a speech dialog system. Based on signal power ratios between the particular microphone signals, an appropriate speaker activity detection and therewith a robust control mechanism of the whole system is presented. The proposed system may be dynamically configured and has been evaluated for a car setup with four speakers sitting in the car cabin disturbed in various noise conditions.
Sixth International Conference on Systems Biology (ICSB 2005)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Professor Andrew Murray
2005-10-22
This grant supported the Sixth International Conference on Systems Biology (ICSB 2005), held in Boston, Massachusetts from October 19th to 22nd, 2005. The ICSB is the only major, annual, international conference focused exclusively on the important emerging field of systems biology. It draws together scientists with expertise in theoretical, computational and experimental approaches to understanding biological systems at many levels. Previous ICSB meetings have been held in Tokyo (2000), at Caltech (2001), at the Karolinska Institute (2002), at Washington University in St. Louis (2003), and in Heidelberg (2004). These conferences have been increasingly successful at bringing together the growing communitymore » of established and junior researchers with interests in this area. Boston is home to several groups that have shown leadership in the field and was therefore an ideal place to hold this conference . The executive committee for the conference comprised Jim Collins (Biomedical Engineering, Boston University), Marc Kirschner (chair of the new Department of Systems Biology at Harvard Medical School), Eric Lander (director of the Broad Institute of MIT and Harvard), Andrew Murray (director of Harvard’s Bauer Center for Genomics Research) and Peter Sorger (director of MIT’s Computational and Systems Biology Initiative). There are almost as many definitions of systems biology as there are systems biologists. We take a broad view of the field, and we succeeded in one of our major aims in organizing a conference that bridges two types of divide. The first is that between traditional academic disciplines: each of our sessions includes speakers from biology and from one or more physical or quantitative sciences. The second type includes those that separate experimental biologists from their colleagues who work on theory or computation. Here again, each session included representatives from at least two of these three categories; indeed, many of the speakers combined at least two of the categories in their own research activities. We define systems biology as a widening of focus in biology from individual genes or proteins to the complex networks of these molecules that allow cells and organisms to function. In the same way that conscious thought cannot be said to reside in any single neuron in the brain, simpler biological functions such as cell division arise from the interactions among many components in a network or ‘functional module’. For us, systems biology is characterized by the recognition that a higher-order description of biological function, accompanied by quantitative methods of analysis — often borrowed from disciplines such as physics, engineering, computer science or mathematics — can lead to the identification of general principles that underlie the structure, behavior, and evolution of cells and organisms. The heart of the conference were sessions on six topics: intracellular dynamics (featuring measurements on single cells, and their interpretation); biology by design (synthetic biology); intracellular networks (signal transduction and transcriptional regulation); multicellular networks (development and pattern formation); mechanics and scale in cellular behavior (featuring work on cytoskeletal mechanics, and on scaling relationships in biology); and evolution in action (including experimental evolution, of both real and artificial life-forms). Each session had four invited speakers; 23 of the 24 invited speakers attended (see below). We have selected these speakers not only for the interest of their research, but for their skills as communicators, thereby giving us the best chance of bridging the divides mentioned above. We also made a point of including women, younger investigators and people from outside the United States among the speakers. In addition to the invited speakers, we allotted time in the program for at least five contributed talks, which were selected from the poster submissions. Our aim in selecting these contributors showcased work that is “hot off the bench” (or computer) at the time of the conference, and also created additional opportunities for younger investigators to present their work. The main conference was preceded by a day of tutorials, and followed by two days of workshops, on a range of topics in quantitative, computational and systems biology.« less
Strength of German accent under altered auditory feedback
HOWELL, PETER; DWORZYNSKI, KATHARINA
2007-01-01
Borden’s (1979, 1980) hypothesis that speakers with vulnerable speech systems rely more heavily on feedback monitoring than do speakers with less vulnerable systems was investigated. The second language (L2) of a speaker is vulnerable, in comparison with the native language, so alteration to feedback should have a detrimental effect on it, according to this hypothesis. Here, we specifically examined whether altered auditory feedback has an effect on accent strength when speakers speak L2. There were three stages in the experiment. First, 6 German speakers who were fluent in English (their L2) were recorded under six conditions—normal listening, amplified voice level, voice shifted in frequency, delayed auditory feedback, and slowed and accelerated speech rate conditions. Second, judges were trained to rate accent strength. Training was assessed by whether it was successful in separating German speakers speaking English from native English speakers, also speaking English. In the final stage, the judges ranked recordings of each speaker from the first stage as to increasing strength of German accent. The results show that accents were more pronounced under frequency-shifted and delayed auditory feedback conditions than under normal or amplified feedback conditions. Control tests were done to ensure that listeners were judging accent, rather than fluency changes caused by altered auditory feedback. The findings are discussed in terms of Borden’s hypothesis and other accounts about why altered auditory feedback disrupts speech control. PMID:11414137
Toddlers learn words in a foreign language: The role of native vocabulary knowledge
Koenig, Melissa A.; Woodward, Amanda L.
2013-01-01
The current study examined monolingual English-speaking toddlers’ (N=50) ability to learn word-referent links from native speakers of Dutch versus English and secondly, whether children generalized or sequestered their extensions when terms were tested by a subsequent speaker of English. Overall, children performed better in the English than in the Dutch condition; however, children with high native vocabularies successfully selected the target object for terms trained in fluent Dutch. Furthermore, children with higher vocabularies did not indicate their comprehension of Dutch terms when subsequently tested by an English speaker whereas children with low vocabulary scores responded at chance levels to both the original Dutch speaker and the second English speaker. These findings demonstrate that monolingual toddlers with proficiency in their native language are capable of learning words outside of their conventional system and may be sensitive to the boundaries that exist between language systems. PMID:22310327
Effect of delayed auditory feedback on normal speakers at two speech rates
NASA Astrophysics Data System (ADS)
Stuart, Andrew; Kalinowski, Joseph; Rastatter, Michael P.; Lynch, Kerry
2002-05-01
This study investigated the effect of short and long auditory feedback delays at two speech rates with normal speakers. Seventeen participants spoke under delayed auditory feedback (DAF) at 0, 25, 50, and 200 ms at normal and fast rates of speech. Significantly two to three times more dysfluencies were displayed at 200 ms (p<0.05) relative to no delay or the shorter delays. There were significantly more dysfluencies observed at the fast rate of speech (p=0.028). These findings implicate the peripheral feedback system(s) of fluent speakers for the disruptive effects of DAF on normal speech production at long auditory feedback delays. Considering the contrast in fluency/dysfluency exhibited between normal speakers and those who stutter at short and long delays, it appears that speech disruption of normal speakers under DAF is a poor analog of stuttering.
Gender Differences in the Recognition of Vocal Emotions
Lausen, Adi; Schacht, Annekathrin
2018-01-01
The conflicting findings from the few studies conducted with regard to gender differences in the recognition of vocal expressions of emotion have left the exact nature of these differences unclear. Several investigators have argued that a comprehensive understanding of gender differences in vocal emotion recognition can only be achieved by replicating these studies while accounting for influential factors such as stimulus type, gender-balanced samples, number of encoders, decoders, and emotional categories. This study aimed to account for these factors by investigating whether emotion recognition from vocal expressions differs as a function of both listeners' and speakers' gender. A total of N = 290 participants were randomly and equally allocated to two groups. One group listened to words and pseudo-words, while the other group listened to sentences and affect bursts. Participants were asked to categorize the stimuli with respect to the expressed emotions in a fixed-choice response format. Overall, females were more accurate than males when decoding vocal emotions, however, when testing for specific emotions these differences were small in magnitude. Speakers' gender had a significant impact on how listeners' judged emotions from the voice. The group listening to words and pseudo-words had higher identification rates for emotions spoken by male than by female actors, whereas in the group listening to sentences and affect bursts the identification rates were higher when emotions were uttered by female than male actors. The mixed pattern for emotion-specific effects, however, indicates that, in the vocal channel, the reliability of emotion judgments is not systematically influenced by speakers' gender and the related stereotypes of emotional expressivity. Together, these results extend previous findings by showing effects of listeners' and speakers' gender on the recognition of vocal emotions. They stress the importance of distinguishing these factors to explain recognition ability in the processing of emotional prosody. PMID:29922202
Hearing history influences voice gender perceptual performance in cochlear implant users.
Kovačić, Damir; Balaban, Evan
2010-12-01
The study was carried out to assess the role that five hearing history variables (chronological age, age at onset of deafness, age of first cochlear implant [CI] activation, duration of CI use, and duration of known deafness) play in the ability of CI users to identify speaker gender. Forty-one juvenile CI users participated in two voice gender identification tasks. In a fixed, single-interval task, subjects listened to a single speech item from one of 20 adult male or 20 adult female speakers and had to identify speaker gender. In an adaptive speech-based voice gender discrimination task with the fundamental frequency difference between the voices as the adaptive parameter, subjects listened to a pair of speech items presented in sequential order, one of which was always spoken by an adult female and the other by an adult male. Subjects had to identify the speech item spoken by the female voice. Correlation and regression analyses between perceptual scores in the two tasks and the hearing history variables were performed. Subjects fell into three performance groups: (1) those who could distinguish voice gender in both tasks, (2) those who could distinguish voice gender in the adaptive but not the fixed task, and (3) those who could not distinguish voice gender in either task. Gender identification performance for single voices in the fixed task was significantly and negatively related to the duration of deafness before cochlear implantation (shorter deafness yielded better performance), whereas performance in the adaptive task was weakly but significantly related to age at first activation of the CI device, with earlier activations yielding better scores. The existence of a group of subjects able to perform adaptive discrimination but unable to identify the gender of singly presented voices demonstrates the potential dissociability of the skills required for these two tasks, suggesting that duration of deafness and age of cochlear implantation could have dissociable effects on the development of different skills required by CI users to identify speaker gender.
Acoustic and perceptual effects of overall F0 range in a lexical pitch accent distinction
NASA Astrophysics Data System (ADS)
Wade, Travis
2002-05-01
A speaker's overall fundamental frequency range is generally considered a variable, nonlinguistic element of intonation. This study examined the precision with which overall F0 is predictable based on previous intonational context and the extent to which it may be perceptually significant. Speakers of Tokyo Japanese produced pairs of sentences differing lexically only in the presence or absence of a single pitch accent as responses to visual and prerecorded speech cues presented in an interactive manner. F0 placement of high tones (previously observed to be relatively variable in pitch contours) was found to be consistent across speakers and uniformly dependent on the intonation of the different sentences used as cues. In a subsequent perception experiment, continuous manipulation of these same sentences between typical accented and typical non-accent-containing versions were presented to Japanese listeners for lexical identification. Results showed that listeners' perception was not significantly altered in compensation for artificial manipulation of preceding intonation. Implications are discussed within an autosegmental analysis of tone. The current results are consistent with the notion that pitch range (i.e., specific vertical locations of tonal peaks) does not simply vary gradiently across speakers and situations but constitutes a predictable part of the phonetic specification of tones.
NASA Technical Reports Server (NTRS)
Simpson, C. A.
1985-01-01
In the present study of the responses of pairs of pilots to aircraft warning classification tasks using an isolated word, speaker-dependent speech recognition system, the induced stress was manipulated by means of different scoring procedures for the classification task and by the inclusion of a competitive manual control task. Both speech patterns and recognition accuracy were analyzed, and recognition errors were recorded by type for an isolated word speaker-dependent system and by an offline technique for a connected word speaker-dependent system. While errors increased with task loading for the isolated word system, there was no such effect for task loading in the case of the connected word system.
2004-11-01
this paper we describe the systems developed by MITLL and used in DARPA EARS Rich Transcription Fall 2004 (RT-04F) speaker diarization evaluation...many types of audio sources, the focus if the DARPA EARS project and the NIST Rich Transcription evaluations is primarily speaker diarization ...present or samples of any of the speakers . An overview of the general diarization problem and approaches can be found in [1]. In this paper, we
NASA Technical Reports Server (NTRS)
1980-01-01
Many manufacturers of loudspeakers are now using a magnetic liquid cooling agent known as ferrofluid. Commercialized by Ferrofluids Corporation, ferrofluid is a liquid material in which sub-microscopic particles of iron oxide are permanently suspended. Injected into the voice coil segment of speaker system, magnetic liquid serves as superior heat transfer medium for cooling the voice coil, thus substantially increasing the system's ability to handle higher power levels and decreasing chance of speaker failure. Ferrofluid offers several additional advantages which add up to improved speaker performance, lower manufacturing costs and fewer rejects.
The Language of Persuasion, English, Vocabulary: 5114.68.
ERIC Educational Resources Information Center
Groff, Irvin
Developed for a high school quinmester unit on the language of persuasion, this guide provides the teacher with teaching strategies for a study of the speaker or writer as a persuader, the identification of the logical and psychological tools of persuasion, an examination of the levels of abstraction, the techniques of propaganda, and the…
ERIC Educational Resources Information Center
Jiang, Jun; Liu, Fang; Wan, Xuan; Jiang, Cunmei
2015-01-01
Tone language experience benefits pitch processing in music and speech for typically developing individuals. No known studies have examined pitch processing in individuals with autism who speak a tone language. This study investigated discrimination and identification of melodic contour and speech intonation in a group of Mandarin-speaking…
A Cross-Language Study of Perception of Lexical Stress in English
ERIC Educational Resources Information Center
Yu, Vickie Y.; Andruski, Jean E.
2010-01-01
This study investigates the question of whether language background affects the perception of lexical stress in English. Thirty native English speakers and 30 native Chinese learners of English participated in a stressed-syllable identification task and a discrimination task involving three types of stimuli (real words/pseudowords/hums). The…
Processing of Acoustic Cues in Lexical-Tone Identification by Pediatric Cochlear-Implant Recipients
ERIC Educational Resources Information Center
Peng, Shu-Chen; Lu, Hui-Ping; Lu, Nelson; Lin, Yung-Song; Deroche, Mickael L. D.; Chatterjee, Monita
2017-01-01
Purpose: The objective was to investigate acoustic cue processing in lexical-tone recognition by pediatric cochlear-implant (CI) recipients who are native Mandarin speakers. Method: Lexical-tone recognition was assessed in pediatric CI recipients and listeners with normal hearing (NH) in 2 tasks. In Task 1, participants identified naturally…
Acquisition of L2 Vowel Duration in Japanese by Native English Speakers
ERIC Educational Resources Information Center
Okuno, Tomoko
2013-01-01
Research has demonstrated that focused perceptual training facilitates L2 learners' segmental perception and spoken word identification. Hardison (2003) and Motohashi-Saigo and Hardison (2009) found benefits of visual cues in the training for acquisition of L2 contrasts. The present study examined factors affecting perception and production…
Improving Speaker Recognition by Biometric Voice Deconstruction
Mazaira-Fernandez, Luis Miguel; Álvarez-Marquina, Agustín; Gómez-Vilda, Pedro
2015-01-01
Person identification, especially in critical environments, has always been a subject of great interest. However, it has gained a new dimension in a world threatened by a new kind of terrorism that uses social networks (e.g., YouTube) to broadcast its message. In this new scenario, classical identification methods (such as fingerprints or face recognition) have been forcedly replaced by alternative biometric characteristics such as voice, as sometimes this is the only feature available. The present study benefits from the advances achieved during last years in understanding and modeling voice production. The paper hypothesizes that a gender-dependent characterization of speakers combined with the use of a set of features derived from the components, resulting from the deconstruction of the voice into its glottal source and vocal tract estimates, will enhance recognition rates when compared to classical approaches. A general description about the main hypothesis and the methodology followed to extract the gender-dependent extended biometric parameters is given. Experimental validation is carried out both on a highly controlled acoustic condition database, and on a mobile phone network recorded under non-controlled acoustic conditions. PMID:26442245
Improving Speaker Recognition by Biometric Voice Deconstruction.
Mazaira-Fernandez, Luis Miguel; Álvarez-Marquina, Agustín; Gómez-Vilda, Pedro
2015-01-01
Person identification, especially in critical environments, has always been a subject of great interest. However, it has gained a new dimension in a world threatened by a new kind of terrorism that uses social networks (e.g., YouTube) to broadcast its message. In this new scenario, classical identification methods (such as fingerprints or face recognition) have been forcedly replaced by alternative biometric characteristics such as voice, as sometimes this is the only feature available. The present study benefits from the advances achieved during last years in understanding and modeling voice production. The paper hypothesizes that a gender-dependent characterization of speakers combined with the use of a set of features derived from the components, resulting from the deconstruction of the voice into its glottal source and vocal tract estimates, will enhance recognition rates when compared to classical approaches. A general description about the main hypothesis and the methodology followed to extract the gender-dependent extended biometric parameters is given. Experimental validation is carried out both on a highly controlled acoustic condition database, and on a mobile phone network recorded under non-controlled acoustic conditions.
NASA Astrophysics Data System (ADS)
Kuroki, Hayato; Ino, Shuichi; Nakano, Satoko; Hori, Kotaro; Ifukube, Tohru
The authors of this paper have been studying a real-time speech-to-caption system using speech recognition technology with a “repeat-speaking” method. In this system, they used a “repeat-speaker” who listens to a lecturer's voice and then speaks back the lecturer's speech utterances into a speech recognition computer. The througoing system showed that the accuracy of the captions is about 97% in Japanese-Japanese conversion and the conversion time from voices to captions is about 4 seconds in English-English conversion in some international conferences. Of course it required a lot of costs to achieve these high performances. In human communications, speech understanding depends not only on verbal information but also on non-verbal information such as speaker's gestures, and face and mouth movements. So the authors found the idea to display information of captions and speaker's face movement images with a suitable way to achieve a higher comprehension after storing information once into a computer briefly. In this paper, we investigate the relationship of the display sequence and display timing between captions that have speech recognition errors and the speaker's face movement images. The results show that the sequence “to display the caption before the speaker's face image” improves the comprehension of the captions. The sequence “to display both simultaneously” shows an improvement only a few percent higher than the question sentence, and the sequence “to display the speaker's face image before the caption” shows almost no change. In addition, the sequence “to display the caption 1 second before the speaker's face shows the most significant improvement of all the conditions.
Robust Speaker Authentication Based on Combined Speech and Voiceprint Recognition
NASA Astrophysics Data System (ADS)
Malcangi, Mario
2009-08-01
Personal authentication is becoming increasingly important in many applications that have to protect proprietary data. Passwords and personal identification numbers (PINs) prove not to be robust enough to ensure that unauthorized people do not use them. Biometric authentication technology may offer a secure, convenient, accurate solution but sometimes fails due to its intrinsically fuzzy nature. This research aims to demonstrate that combining two basic speech processing methods, voiceprint identification and speech recognition, can provide a very high degree of robustness, especially if fuzzy decision logic is used.
Event identification by acoustic signature recognition
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dress, W.B.; Kercel, S.W.
1995-07-01
Many events of interest to the security commnnity produce acoustic emissions that are, in principle, identifiable as to cause. Some obvious examples are gunshots, breaking glass, takeoffs and landings of small aircraft, vehicular engine noises, footsteps (high frequencies when on gravel, very low frequencies. when on soil), and voices (whispers to shouts). We are investigating wavelet-based methods to extract unique features of such events for classification and identification. We also discuss methods of classification and pattern recognition specifically tailored for acoustic signatures obtained by wavelet analysis. The paper is divided into three parts: completed work, work in progress, and futuremore » applications. The completed phase has led to the successful recognition of aircraft types on landing and takeoff. Both small aircraft (twin-engine turboprop) and large (commercial airliners) were included in the study. The project considered the design of a small, field-deployable, inexpensive device. The techniques developed during the aircraft identification phase were then adapted to a multispectral electromagnetic interference monitoring device now deployed in a nuclear power plant. This is a general-purpose wavelet analysis engine, spanning 14 octaves, and can be adapted for other specific tasks. Work in progress is focused on applying the methods previously developed to speaker identification. Some of the problems to be overcome include recognition of sounds as voice patterns and as distinct from possible background noises (e.g., music), as well as identification of the speaker from a short-duration voice sample. A generalization of the completed work and the work in progress is a device capable of classifying any number of acoustic events-particularly quasi-stationary events such as engine noises and voices and singular events such as gunshots and breaking glass. We will show examples of both kinds of events and discuss their recognition likelihood.« less
The contribution of dynamic visual cues to audiovisual speech perception.
Jaekl, Philip; Pesquita, Ana; Alsius, Agnes; Munhall, Kevin; Soto-Faraco, Salvador
2015-08-01
Seeing a speaker's facial gestures can significantly improve speech comprehension, especially in noisy environments. However, the nature of the visual information from the speaker's facial movements that is relevant for this enhancement is still unclear. Like auditory speech signals, visual speech signals unfold over time and contain both dynamic configural information and luminance-defined local motion cues; two information sources that are thought to engage anatomically and functionally separate visual systems. Whereas, some past studies have highlighted the importance of local, luminance-defined motion cues in audiovisual speech perception, the contribution of dynamic configural information signalling changes in form over time has not yet been assessed. We therefore attempted to single out the contribution of dynamic configural information to audiovisual speech processing. To this aim, we measured word identification performance in noise using unimodal auditory stimuli, and with audiovisual stimuli. In the audiovisual condition, speaking faces were presented as point light displays achieved via motion capture of the original talker. Point light displays could be isoluminant, to minimise the contribution of effective luminance-defined local motion information, or with added luminance contrast, allowing the combined effect of dynamic configural cues and local motion cues. Audiovisual enhancement was found in both the isoluminant and contrast-based luminance conditions compared to an auditory-only condition, demonstrating, for the first time the specific contribution of dynamic configural cues to audiovisual speech improvement. These findings imply that globally processed changes in a speaker's facial shape contribute significantly towards the perception of articulatory gestures and the analysis of audiovisual speech. Copyright © 2015 Elsevier Ltd. All rights reserved.
Nelson, Jessica R.; Liu, Ying; Fiez, Julie; Perfetti, Charles A.
2017-01-01
Using fMRI, we compared the patterns of fusiform activity produced by viewing English and Chinese for readers who were either English speakers learning Chinese, or Chinese-English bilinguals. The pattern of fusiform activity depended on both the writing system and the reader’s native language. Native Chinese speakers fluent in English recruited bilateral fusiform areas when viewing both Chinese and English. English speakers learning Chinese, however, used heavily left-lateralized fusiform regions when viewing English, but recruited an additional right fusiform region for viewing Chinese. Thus, English learners of Chinese show an accommodation pattern, in which the reading network accommodates the new writing system by adding neural resources that support its specific graphic requirements. Chinese speakers show an assimilation pattern, in which the reading network established for L1 includes procedures sufficient for the graphic demands of L2 without major change. PMID:18381767
Center for Neural Engineering: applications of pulse-coupled neural networks
NASA Astrophysics Data System (ADS)
Malkani, Mohan; Bodruzzaman, Mohammad; Johnson, John L.; Davis, Joel
1999-03-01
Pulsed-Coupled Neural Network (PCNN) is an oscillatory model neural network where grouping of cells and grouping among the groups that form the output time series (number of cells that fires in each input presentation also called `icon'). This is based on the synchronicity of oscillations. Recent work by Johnson and others demonstrated the functional capabilities of networks containing such elements for invariant feature extraction using intensity maps. PCNN thus presents itself as a more biologically plausible model with solid functional potential. This paper will present the summary of several projects and their results where we successfully applied PCNN. In project one, the PCNN was applied for object recognition and classification through a robotic vision system. The features (icons) generated by the PCNN were then fed into a feedforward neural network for classification. In project two, we developed techniques for sensory data fusion. The PCNN algorithm was implemented and tested on a B14 mobile robot. The PCNN-based features were extracted from the images taken from the robot vision system and used in conjunction with the map generated by data fusion of the sonar and wheel encoder data for the navigation of the mobile robot. In our third project, we applied the PCNN for speaker recognition. The spectrogram image of speech signals are fed into the PCNN to produce invariant feature icons which are then fed into a feedforward neural network for speaker identification.
Khander, Amrin; Farag, Sara; Chen, Katherine T
2017-12-22
With an increasing number of patients requiring translator services, many providers are turning to mobile applications (apps) for assistance. However, there have been no published reviews of medical translator apps. To identify and evaluate medical translator mobile apps using an adapted APPLICATIONS scoring system. A list of apps was identified from the Apple iTunes and Google Play stores, using the search term, "medical translator." Apps not found on two different searches, not in an English-based platform, not used for translation, or not functional after purchase, were excluded. The remaining apps were evaluated using an adapted APPLICATIONS scoring system, which included both objective and subjective criteria. App comprehensiveness was a weighted score defined by the number of non-English languages included in each app relative to the proportion of non-English speakers in the United States. The Apple iTunes and Google Play stores. Medical translator apps identified using the search term "medical translator." Main Outcomes and Measures: Compilation of medical translator apps for provider usage. A total of 524 apps were initially found. After applying the exclusion criteria, 20 (8.2%) apps from the Google Play store and 26 (9.2%) apps from the Apple iTunes store remained for evaluation. The highest scoring apps, Canopy Medical Translator, Universal Doctor Speaker, and Vocre Translate, scored 13.5 out of 18.7 possible points. A large proportion of apps initially found did not function as medical translator apps. Using the APPLICATIONS scoring system, we have identified and evaluated medical translator apps for providers who care for non-English speaking patients.
Perception of Non-Native Consonant Length Contrast: The Role of Attention in Phonetic Processing
ERIC Educational Resources Information Center
Porretta, Vincent J.; Tucker, Benjamin V.
2015-01-01
The present investigation examines English speakers' ability to identify and discriminate non-native consonant length contrast. Three groups (L1 English No-Instruction, L1 English Instruction, and L1 Finnish control) performed a speeded forced-choice identification task and a speeded AX discrimination task on Finnish non-words (e.g.…
The Effect of Pitch Peak Alignment on Sentence Type Identification in Russian
ERIC Educational Resources Information Center
Makarova, Veronika
2007-01-01
This paper reports the results of an experimental phonetic study examining pitch peak alignment in production and perception of three-syllable one-word sentences with phonetic rising-falling pitch movement by speakers of Russian. The first part of the study (Experiment 1) utilizes 22 one-word three-syllable utterances read by five female speakers…
ERIC Educational Resources Information Center
Liu, Fang; Xu, Yi; Patel, Aniruddh D.; Francart, Tom; Jiang, Cunmei
2012-01-01
This study examined whether "melodic contour deafness" (insensitivity to the direction of pitch movement) in congenital amusia is associated with specific types of pitch patterns (discrete versus gliding pitches) or stimulus types (speech syllables versus complex tones). Thresholds for identification of pitch direction were obtained using discrete…
Automatic Method of Pause Measurement for Normal and Dysarthric Speech
ERIC Educational Resources Information Center
Rosen, Kristin; Murdoch, Bruce; Folker, Joanne; Vogel, Adam; Cahill, Louise; Delatycki, Martin; Corben, Louise
2010-01-01
This study proposes an automatic method for the detection of pauses and identification of pause types in conversational speech for the purpose of measuring the effects of Friedreich's Ataxia (FRDA) on speech. Speech samples of [approximately] 3 minutes were recorded from 13 speakers with FRDA and 18 healthy controls. Pauses were measured from the…
Effects of Audio-Visual Integration on the Detection of Masked Speech and Non-Speech Sounds
ERIC Educational Resources Information Center
Eramudugolla, Ranmalee; Henderson, Rachel; Mattingley, Jason B.
2011-01-01
Integration of simultaneous auditory and visual information about an event can enhance our ability to detect that event. This is particularly evident in the perception of speech, where the articulatory gestures of the speaker's lips and face can significantly improve the listener's detection and identification of the message, especially when that…
NASA Astrophysics Data System (ADS)
Kayasith, Prakasith; Theeramunkong, Thanaruk
It is a tedious and subjective task to measure severity of a dysarthria by manually evaluating his/her speech using available standard assessment methods based on human perception. This paper presents an automated approach to assess speech quality of a dysarthric speaker with cerebral palsy. With the consideration of two complementary factors, speech consistency and speech distinction, a speech quality indicator called speech clarity index (Ψ) is proposed as a measure of the speaker's ability to produce consistent speech signal for a certain word and distinguished speech signal for different words. As an application, it can be used to assess speech quality and forecast speech recognition rate of speech made by an individual dysarthric speaker before actual exhaustive implementation of an automatic speech recognition system for the speaker. The effectiveness of Ψ as a speech recognition rate predictor is evaluated by rank-order inconsistency, correlation coefficient, and root-mean-square of difference. The evaluations had been done by comparing its predicted recognition rates with ones predicted by the standard methods called the articulatory and intelligibility tests based on the two recognition systems (HMM and ANN). The results show that Ψ is a promising indicator for predicting recognition rate of dysarthric speech. All experiments had been done on speech corpus composed of speech data from eight normal speakers and eight dysarthric speakers.
NASA Technical Reports Server (NTRS)
Dillon, Christina
2013-01-01
The goal of this project was to design, model, build, and test a flat panel speaker and frame for a spherical dome structure being made into a simulator. The simulator will be a test bed for evaluating an immersive environment for human interfaces. This project focused on the loud speakers and a sound diffuser for the dome. The rest of the team worked on an Ambisonics 3D sound system, video projection system, and multi-direction treadmill to create the most realistic scene possible. The main programs utilized in this project, were Pro-E and COMSOL. Pro-E was used for creating detailed figures for the fabrication of a frame that held a flat panel loud speaker. The loud speaker was made from a thin sheet of Plexiglas and 4 acoustic exciters. COMSOL, a multiphysics finite analysis simulator, was used to model and evaluate all stages of the loud speaker, frame, and sound diffuser. Acoustical testing measurements were utilized to create polar plots from the working prototype which were then compared to the COMSOL simulations to select the optimal design for the dome. The final goal of the project was to install the flat panel loud speaker design in addition to a sound diffuser on to the wall of the dome. After running tests in COMSOL on various speaker configurations, including a warped Plexiglas version, the optimal speaker design included a flat piece of Plexiglas with a rounded frame to match the curvature of the dome. Eight of these loud speakers will be mounted into an inch and a half of high performance acoustic insulation, or Thinsulate, that will cover the inside of the dome. The following technical paper discusses these projects and explains the engineering processes used, knowledge gained, and the projected future goals of this project
Gayle, Alberto Alexander; Shimaoka, Motomu
2017-01-01
Introduction The predominance of English in scientific research has created hurdles for “non-native speakers” of English. Here we present a novel application of native language identification (NLI) for the assessment of medical-scientific writing. For this purpose, we created a novel classification system whereby scoring would be based solely on text features found to be distinctive among native English speakers (NS) within a given context. We dubbed this the “Genuine Index” (GI). Methodology This methodology was validated using a small set of journals in the field of pediatric oncology. Our dataset consisted of 5,907 abstracts, representing work from 77 countries. A support vector machine (SVM) was used to generate our model and for scoring. Results Accuracy, precision, and recall of the classification model were 93.3%, 93.7%, and 99.4%, respectively. Class specific F-scores were 96.5% for NS and 39.8% for our benchmark class, Japan. Overall kappa was calculated to be 37.2%. We found significant differences between countries with respect to the GI score. Significant correlation was found between GI scores and two validated objective measures of writing proficiency and readability. Two sets of key terms and phrases differentiating NS and non-native writing were identified. Conclusions Our GI model was able to detect, with a high degree of reliability, subtle differences between the terms and phrasing used by native and non-native speakers in peer reviewed journals, in the field of pediatric oncology. In addition, L1 language transfer was found to be very likely to survive revision, especially in non-Western countries such as Japan. These findings show that even when the language used is technically correct, there may still be some phrasing or usage that impact quality. PMID:28212419
Speaker verification system using acoustic data and non-acoustic data
Gable, Todd J [Walnut Creek, CA; Ng, Lawrence C [Danville, CA; Holzrichter, John F [Berkeley, CA; Burnett, Greg C [Livermore, CA
2006-03-21
A method and system for speech characterization. One embodiment includes a method for speaker verification which includes collecting data from a speaker, wherein the data comprises acoustic data and non-acoustic data. The data is used to generate a template that includes a first set of "template" parameters. The method further includes receiving a real-time identity claim from a claimant, and using acoustic data and non-acoustic data from the identity claim to generate a second set of parameters. The method further includes comparing the first set of parameters to the set of parameters to determine whether the claimant is the speaker. The first set of parameters and the second set of parameters include at least one purely non-acoustic parameter, including a non-acoustic glottal shape parameter derived from averaging multiple glottal cycle waveforms.
How do speakers resist distraction? Evidence from a taboo picture-word interference task.
Dhooge, Elisah; Hartsuiker, Robert J
2011-07-01
Even in the presence of irrelevant stimuli, word production is a highly accurate and fluent process. But how do speakers prevent themselves from naming the wrong things? One possibility is that an attentional system inhibits task-irrelevant representations. Alternatively, a verbal self-monitoring system might check speech for accuracy and remove errors stemming from irrelevant information. Because self-monitoring is sensitive to social appropriateness, taboo errors should be intercepted more than neutral errors are. To prevent embarrassment, speakers might also speak more slowly when confronted with taboo distractors. Our results from two experiments are consistent with the self-monitoring account: Examining picture-naming speed (Experiment 1) and accuracy (Experiment 2), we found fewer naming errors but longer picture-naming latencies for pictures presented with taboo distractors than for pictures presented with neutral distractors. These results suggest that when intrusions of irrelevant words are highly undesirable, speakers do not simply inhibit these words: Rather, the language-production system adjusts itself to the context and filters out the undesirable words.
Comprehension of Competing Argument Marking Systems in Two Australian Mixed Languages
ERIC Educational Resources Information Center
O'Shannessy, Carmel; Meakins, Felicity
2012-01-01
Crosslinguistic influence has been seen in bilingual adult and child learners when compared to monolingual learners. For speakers of Light Warlpiri and Gurindji Kriol there is no monolingual group for comparison, yet crosslinguistic influence can be seen in how the speakers resolve competition between case-marking and word order systems in each…
A "Musical Pathway" for Spatially Disoriented Blind Residents of a Skilled Nursing Facility.
ERIC Educational Resources Information Center
Uslan, M. M.; And Others
1988-01-01
The "Auditory Directional System" designed to help blind persons get to interior destinations in an institutional setting, uses a compact disc player, a network of speakers, infrared "people" detection equipment, and a computer controlled speaker-sequencing system. After initial destination selection, musical cues are activated as the person…
The Mechanism of Speech Processing in Congenital Amusia: Evidence from Mandarin Speakers
Liu, Fang; Jiang, Cunmei; Thompson, William Forde; Xu, Yi; Yang, Yufang; Stewart, Lauren
2012-01-01
Congenital amusia is a neuro-developmental disorder of pitch perception that causes severe problems with music processing but only subtle difficulties in speech processing. This study investigated speech processing in a group of Mandarin speakers with congenital amusia. Thirteen Mandarin amusics and thirteen matched controls participated in a set of tone and intonation perception tasks and two pitch threshold tasks. Compared with controls, amusics showed impaired performance on word discrimination in natural speech and their gliding tone analogs. They also performed worse than controls on discriminating gliding tone sequences derived from statements and questions, and showed elevated thresholds for pitch change detection and pitch direction discrimination. However, they performed as well as controls on word identification, and on statement-question identification and discrimination in natural speech. Overall, tasks that involved multiple acoustic cues to communicative meaning were not impacted by amusia. Only when the tasks relied mainly on pitch sensitivity did amusics show impaired performance compared to controls. These findings help explain why amusia only affects speech processing in subtle ways. Further studies on a larger sample of Mandarin amusics and on amusics of other language backgrounds are needed to consolidate these results. PMID:22347374
The mechanism of speech processing in congenital amusia: evidence from Mandarin speakers.
Liu, Fang; Jiang, Cunmei; Thompson, William Forde; Xu, Yi; Yang, Yufang; Stewart, Lauren
2012-01-01
Congenital amusia is a neuro-developmental disorder of pitch perception that causes severe problems with music processing but only subtle difficulties in speech processing. This study investigated speech processing in a group of Mandarin speakers with congenital amusia. Thirteen Mandarin amusics and thirteen matched controls participated in a set of tone and intonation perception tasks and two pitch threshold tasks. Compared with controls, amusics showed impaired performance on word discrimination in natural speech and their gliding tone analogs. They also performed worse than controls on discriminating gliding tone sequences derived from statements and questions, and showed elevated thresholds for pitch change detection and pitch direction discrimination. However, they performed as well as controls on word identification, and on statement-question identification and discrimination in natural speech. Overall, tasks that involved multiple acoustic cues to communicative meaning were not impacted by amusia. Only when the tasks relied mainly on pitch sensitivity did amusics show impaired performance compared to controls. These findings help explain why amusia only affects speech processing in subtle ways. Further studies on a larger sample of Mandarin amusics and on amusics of other language backgrounds are needed to consolidate these results.
Voice recognition through phonetic features with Punjabi utterances
NASA Astrophysics Data System (ADS)
Kaur, Jasdeep; Juglan, K. C.; Sharma, Vishal; Upadhyay, R. K.
2017-07-01
This paper deals with perception and disorders of speech in view of Punjabi language. Visualizing the importance of voice identification, various parameters of speaker identification has been studied. The speech material was recorded with a tape recorder in their normal and disguised mode of utterances. Out of the recorded speech materials, the utterances free from noise, etc were selected for their auditory and acoustic spectrographic analysis. The comparison of normal and disguised speech of seven subjects is reported. The fundamental frequency (F0) at similar places, Plosive duration at certain phoneme, Amplitude ratio (A1:A2) etc. were compared in normal and disguised speech. It was found that the formant frequency of normal and disguised speech remains almost similar only if it is compared at the position of same vowel quality and quantity. If the vowel is more closed or more open in the disguised utterance the formant frequency will be changed in comparison to normal utterance. The ratio of the amplitude (A1: A2) is found to be speaker dependent. It remains unchanged in the disguised utterance. However, this value may shift in disguised utterance if cross sectioning is not done at the same location.
NASA Astrophysics Data System (ADS)
Chang, Ya-Ling; Hsu, Kuan-Yu; Lee, Chih-Kung
2016-03-01
Advancement of distributed piezo-electret sensors and actuators facilitates various smart systems development, which include paper speakers, opto-piezo/electret bio-chips, etc. The array-based loudspeaker system possess several advantages over conventional coil speakers, such as light-weightness, flexibility, low power consumption, directivity, etc. With the understanding that the performance of the large-area piezo-electret loudspeakers or even the microfluidic biochip transport behavior could be tailored by changing their dynamic behaviors, a full-field real-time high-resolution non-contact metrology system was developed. In this paper, influence of the resonance modes and the transient vibrations of an arraybased loudspeaker system on the acoustic effect were measured by using a real-time projection moiré metrology system and microphones. To make the paper speaker even more versatile, we combine the photosensitive material TiOPc into the original electret loudspeaker. The vibration of this newly developed opto-electret loudspeaker could be manipulated by illuminating different light-intensity patterns. Trying to facilitate the tailoring process of the opto-electret loudspeaker, projection moiré was adopted to measure its vibration. By recording the projected fringes which are modulated by the contours of the testing sample, the phase unwrapping algorithm can give us a continuous phase distribution which is proportional to the object height variations. With the aid of the projection moiré metrology system, the vibrations associated with each distinctive light pattern could be characterized. Therefore, we expect that the overall acoustic performance could be improved by finding the suitable illuminating patterns. In this manuscript, the system performance of the projection moiré and the optoelectret paper speakers were cross-examined and verified by the experimental results obtained.
NASA Astrophysics Data System (ADS)
Morishima, Shigeo; Nakamura, Satoshi
2004-12-01
We introduce a multimodal English-to-Japanese and Japanese-to-English translation system that also translates the speaker's speech motion by synchronizing it to the translated speech. This system also introduces both a face synthesis technique that can generate any viseme lip shape and a face tracking technique that can estimate the original position and rotation of a speaker's face in an image sequence. To retain the speaker's facial expression, we substitute only the speech organ's image with the synthesized one, which is made by a 3D wire-frame model that is adaptable to any speaker. Our approach provides translated image synthesis with an extremely small database. The tracking motion of the face from a video image is performed by template matching. In this system, the translation and rotation of the face are detected by using a 3D personal face model whose texture is captured from a video frame. We also propose a method to customize the personal face model by using our GUI tool. By combining these techniques and the translated voice synthesis technique, an automatic multimodal translation can be achieved that is suitable for video mail or automatic dubbing systems into other languages.
Entertainment and Pacification System For Car Seat
NASA Technical Reports Server (NTRS)
Elrod, Susan Vinz (Inventor); Dabney, Richard W. (Inventor)
2006-01-01
An entertainment and pacification system for use with a child car seat has speakers mounted in the child car seat with a plurality of audio sources and an anti-noise audio system coupled to the child car seat. A controllable switching system provides for, at any given time, the selective activation of i) one of the audio sources such that the audio signal generated thereby is coupled to one or more of the speakers, and ii) the anti-noise audio system such that an ambient-noise-canceling audio signal generated thereby is coupled to one or more of the speakers. The controllable switching system can receive commands generated at one of first controls located at the child car seat and second controls located remotely with respect to the child car seat with commands generated by the second controls overriding commands generated by the first controls.
Degraded Vowel Acoustics and the Perceptual Consequences in Dysarthria
NASA Astrophysics Data System (ADS)
Lansford, Kaitlin L.
Distorted vowel production is a hallmark characteristic of dysarthric speech, irrespective of the underlying neurological condition or dysarthria diagnosis. A variety of acoustic metrics have been used to study the nature of vowel production deficits in dysarthria; however, not all demonstrate sensitivity to the exhibited deficits. Less attention has been paid to quantifying the vowel production deficits associated with the specific dysarthrias. Attempts to characterize the relationship between naturally degraded vowel production in dysarthria with overall intelligibility have met with mixed results, leading some to question the nature of this relationship. It has been suggested that aberrant vowel acoustics may be an index of overall severity of the impairment and not an "integral component" of the intelligibility deficit. A limitation of previous work detailing perceptual consequences of disordered vowel acoustics is that overall intelligibility, not vowel identification accuracy, has been the perceptual measure of interest. A series of three experiments were conducted to address the problems outlined herein. The goals of the first experiment were to identify subsets of vowel metrics that reliably distinguish speakers with dysarthria from non-disordered speakers and differentiate the dysarthria subtypes. Vowel metrics that capture vowel centralization and reduced spectral distinctiveness among vowels differentiated dysarthric from non-disordered speakers. Vowel metrics generally failed to differentiate speakers according to their dysarthria diagnosis. The second and third experiments were conducted to evaluate the relationship between degraded vowel acoustics and the resulting percept. In the second experiment, correlation and regression analyses revealed vowel metrics that capture vowel centralization and distinctiveness and movement of the second formant frequency were most predictive of vowel identification accuracy and overall intelligibility. The third experiment was conducted to evaluate the extent to which the nature of the acoustic degradation predicts the resulting percept. Results suggest distinctive vowel tokens are better identified and, likewise, better-identified tokens are more distinctive. Further, an above-chance level agreement between nature of vowel misclassification and misidentification errors was demonstrated for all vowels, suggesting degraded vowel acoustics are not merely an index of severity in dysarthria, but rather are an integral component of the resultant intelligibility disorder.
ERIC Educational Resources Information Center
Danesi, Marcel
1974-01-01
The teaching of standard Italian to speakers of Italian dialects both in Italy and in North America is discussed, specifically through a specialized pedagogical program within the framework of a sociolinguistic and psycholinguistic perspective, and based on a structural analysis of linguistic systems in contact. Italian programs in Toronto are…
Pitch perception and production in congenital amusia: Evidence from Cantonese speakers.
Liu, Fang; Chan, Alice H D; Ciocca, Valter; Roquet, Catherine; Peretz, Isabelle; Wong, Patrick C M
2016-07-01
This study investigated pitch perception and production in speech and music in individuals with congenital amusia (a disorder of musical pitch processing) who are native speakers of Cantonese, a tone language with a highly complex tonal system. Sixteen Cantonese-speaking congenital amusics and 16 controls performed a set of lexical tone perception, production, singing, and psychophysical pitch threshold tasks. Their tone production accuracy and singing proficiency were subsequently judged by independent listeners, and subjected to acoustic analyses. Relative to controls, amusics showed impaired discrimination of lexical tones in both speech and non-speech conditions. They also received lower ratings for singing proficiency, producing larger pitch interval deviations and making more pitch interval errors compared to controls. Demonstrating higher pitch direction identification thresholds than controls for both speech syllables and piano tones, amusics nevertheless produced native lexical tones with comparable pitch trajectories and intelligibility as controls. Significant correlations were found between pitch threshold and lexical tone perception, music perception and production, but not between lexical tone perception and production for amusics. These findings provide further evidence that congenital amusia is a domain-general language-independent pitch-processing deficit that is associated with severely impaired music perception and production, mildly impaired speech perception, and largely intact speech production.
Pitch perception and production in congenital amusia: Evidence from Cantonese speakers
Liu, Fang; Chan, Alice H. D.; Ciocca, Valter; Roquet, Catherine; Peretz, Isabelle; Wong, Patrick C. M.
2016-01-01
This study investigated pitch perception and production in speech and music in individuals with congenital amusia (a disorder of musical pitch processing) who are native speakers of Cantonese, a tone language with a highly complex tonal system. Sixteen Cantonese-speaking congenital amusics and 16 controls performed a set of lexical tone perception, production, singing, and psychophysical pitch threshold tasks. Their tone production accuracy and singing proficiency were subsequently judged by independent listeners, and subjected to acoustic analyses. Relative to controls, amusics showed impaired discrimination of lexical tones in both speech and non-speech conditions. They also received lower ratings for singing proficiency, producing larger pitch interval deviations and making more pitch interval errors compared to controls. Demonstrating higher pitch direction identification thresholds than controls for both speech syllables and piano tones, amusics nevertheless produced native lexical tones with comparable pitch trajectories and intelligibility as controls. Significant correlations were found between pitch threshold and lexical tone perception, music perception and production, but not between lexical tone perception and production for amusics. These findings provide further evidence that congenital amusia is a domain-general language-independent pitch-processing deficit that is associated with severely impaired music perception and production, mildly impaired speech perception, and largely intact speech production. PMID:27475178
Flanagan, Sheila; Zorilă, Tudor-Cătălin; Stylianou, Yannis; Moore, Brian C J
2018-01-01
Auditory processing disorder (APD) may be diagnosed when a child has listening difficulties but has normal audiometric thresholds. For adults with normal hearing and with mild-to-moderate hearing impairment, an algorithm called spectral shaping with dynamic range compression (SSDRC) has been shown to increase the intelligibility of speech when background noise is added after the processing. Here, we assessed the effect of such processing using 8 children with APD and 10 age-matched control children. The loudness of the processed and unprocessed sentences was matched using a loudness model. The task was to repeat back sentences produced by a female speaker when presented with either speech-shaped noise (SSN) or a male competing speaker (CS) at two signal-to-background ratios (SBRs). Speech identification was significantly better with SSDRC processing than without, for both groups. The benefit of SSDRC processing was greater for the SSN than for the CS background. For the SSN, scores were similar for the two groups at both SBRs. For the CS, the APD group performed significantly more poorly than the control group. The overall improvement produced by SSDRC processing could be useful for enhancing communication in a classroom where the teacher's voice is broadcast using a wireless system.
Crossmodal plasticity in the fusiform gyrus of late blind individuals during voice recognition.
Hölig, Cordula; Föcker, Julia; Best, Anna; Röder, Brigitte; Büchel, Christian
2014-12-01
Blind individuals are trained in identifying other people through voices. In congenitally blind adults the anterior fusiform gyrus has been shown to be active during voice recognition. Such crossmodal changes have been associated with a superiority of blind adults in voice perception. The key question of the present functional magnetic resonance imaging (fMRI) study was whether visual deprivation that occurs in adulthood is followed by similar adaptive changes of the voice identification system. Late blind individuals and matched sighted participants were tested in a priming paradigm, in which two voice stimuli were subsequently presented. The prime (S1) and the target (S2) were either from the same speaker (person-congruent voices) or from two different speakers (person-incongruent voices). Participants had to classify the S2 as either coming from an old or a young person. Only in late blind but not in matched sighted controls, the activation in the anterior fusiform gyrus was modulated by voice identity: late blind volunteers showed an increase of the BOLD signal in response to person-incongruent compared with person-congruent trials. These results suggest that the fusiform gyrus adapts to input of a new modality even in the mature brain and thus demonstrate an adult type of crossmodal plasticity. Copyright © 2014 Elsevier Inc. All rights reserved.
You had me at "Hello": Rapid extraction of dialect information from spoken words.
Scharinger, Mathias; Monahan, Philip J; Idsardi, William J
2011-06-15
Research on the neuronal underpinnings of speaker identity recognition has identified voice-selective areas in the human brain with evolutionary homologues in non-human primates who have comparable areas for processing species-specific calls. Most studies have focused on estimating the extent and location of these areas. In contrast, relatively few experiments have investigated the time-course of speaker identity, and in particular, dialect processing and identification by electro- or neuromagnetic means. We show here that dialect extraction occurs speaker-independently, pre-attentively and categorically. We used Standard American English and African-American English exemplars of 'Hello' in a magnetoencephalographic (MEG) Mismatch Negativity (MMN) experiment. The MMN as an automatic change detection response of the brain reflected dialect differences that were not entirely reducible to acoustic differences between the pronunciations of 'Hello'. Source analyses of the M100, an auditory evoked response to the vowels suggested additional processing in voice-selective areas whenever a dialect change was detected. These findings are not only relevant for the cognitive neuroscience of language, but also for the social sciences concerned with dialect and race perception. Copyright © 2011 Elsevier Inc. All rights reserved.
Development of equally intelligible Telugu sentence-lists to test speech recognition in noise.
Tanniru, Kishore; Narne, Vijaya Kumar; Jain, Chandni; Konadath, Sreeraj; Singh, Niraj Kumar; Sreenivas, K J Ramadevi; K, Anusha
2017-09-01
To develop sentence lists in the Telugu language for the assessment of speech recognition threshold (SRT) in the presence of background noise through identification of the mean signal-to-noise ratio required to attain a 50% sentence recognition score (SRTn). This study was conducted in three phases. The first phase involved the selection and recording of Telugu sentences. In the second phase, 20 lists, each consisting of 10 sentences with equal intelligibility, were formulated using a numerical optimisation procedure. In the third phase, the SRTn of the developed lists was estimated using adaptive procedures on individuals with normal hearing. A total of 68 native Telugu speakers with normal hearing participated in the study. Of these, 18 (including the speakers) performed on various subjective measures in first phase, 20 performed on sentence/word recognition in noise for second phase and 30 participated in the list equivalency procedures in third phase. In all, 15 lists of comparable difficulty were formulated as test material. The mean SRTn across these lists corresponded to -2.74 (SD = 0.21). The developed sentence lists provided a valid and reliable tool to measure SRTn in Telugu native speakers.
2014-07-25
composition of simple temporal structures to a speaker diarization task with the goal of segmenting conference audio in the presence of an unknown number of...application domains including neuroimaging, diverse document selection, speaker diarization , stock modeling, and target tracking. We detail each of...recall performance than competing methods in a task of discovering articles preferred by the user • a gold-standard speaker diarization method, as
Acoustic Calibration of the Exterior Effects Room at the NASA Langley Research Center
NASA Technical Reports Server (NTRS)
Faller, Kenneth J., II; Rizzi, Stephen A.; Klos, Jacob; Chapin, William L.; Surucu, Fahri; Aumann, Aric R.
2010-01-01
The Exterior Effects Room (EER) at the NASA Langley Research Center is a 39-seat auditorium built for psychoacoustic studies of aircraft community noise. The original reproduction system employed monaural playback and hence lacked sound localization capability. In an effort to more closely recreate field test conditions, a significant upgrade was undertaken to allow simulation of a three-dimensional audio and visual environment. The 3D audio system consists of 27 mid and high frequency satellite speakers and 4 subwoofers, driven by a real-time audio server running an implementation of Vector Base Amplitude Panning. The audio server is part of a larger simulation system, which controls the audio and visual presentation of recorded and synthesized aircraft flyovers. The focus of this work is on the calibration of the 3D audio system, including gains used in the amplitude panning algorithm, speaker equalization, and absolute gain control. Because the speakers are installed in an irregularly shaped room, the speaker equalization includes time delay and gain compensation due to different mounting distances from the focal point, filtering for color compensation due to different installations (half space, corner, baffled/unbaffled), and cross-over filtering.
Liu, Hanjun; Wang, Emily Q.; Chen, Zhaocong; Liu, Peng; Larson, Charles R.; Huang, Dongfeng
2010-01-01
The purpose of this cross-language study was to examine whether the online control of voice fundamental frequency (F0) during vowel phonation is influenced by language experience. Native speakers of Cantonese and Mandarin, both tonal languages spoken in China, participated in the experiments. Subjects were asked to vocalize a vowel sound ∕u∕ at their comfortable habitual F0, during which their voice pitch was unexpectedly shifted (±50, ±100, ±200, or ±500 cents, 200 ms duration) and fed back instantaneously to them over headphones. The results showed that Cantonese speakers produced significantly smaller responses than Mandarin speakers when the stimulus magnitude varied from 200 to 500 cents. Further, response magnitudes decreased along with the increase in stimulus magnitude in Cantonese speakers, which was not observed in Mandarin speakers. These findings suggest that online control of voice F0 during vocalization is sensitive to language experience. Further, systematic modulations of vocal responses across stimulus magnitude were observed in Cantonese speakers but not in Mandarin speakers, which indicates that this highly automatic feedback mechanism is sensitive to the specific tonal system of each language. PMID:21218905
Byers-Heinlein, Krista; Chen, Ke Heng; Xu, Fei
2014-03-01
Languages function as independent and distinct conventional systems, and so each language uses different words to label the same objects. This study investigated whether 2-year-old children recognize that speakers of their native language and speakers of a foreign language do not share the same knowledge. Two groups of children unfamiliar with Mandarin were tested: monolingual English-learning children (n=24) and bilingual children learning English and another language (n=24). An English speaker taught children the novel label fep. On English mutual exclusivity trials, the speaker asked for the referent of a novel label (wug) in the presence of the fep and a novel object. Both monolingual and bilingual children disambiguated the reference of the novel word using a mutual exclusivity strategy, choosing the novel object rather than the fep. On similar trials with a Mandarin speaker, children were asked to find the referent of a novel Mandarin label kuò. Monolinguals again chose the novel object rather than the object with the English label fep, even though the Mandarin speaker had no access to conventional English words. Bilinguals did not respond systematically to the Mandarin speaker, suggesting that they had enhanced understanding of the Mandarin speaker's ignorance of English words. The results indicate that monolingual children initially expect words to be conventionally shared across all speakers-native and foreign. Early bilingual experience facilitates children's discovery of the nature of foreign language words. Copyright © 2013 Elsevier Inc. All rights reserved.
The Human Communication Research Centre dialogue database.
Anderson, A H; Garrod, S C; Clark, A; Boyle, E; Mullin, J
1992-10-01
The HCRC dialogue database consists of over 700 transcribed and coded dialogues from pairs of speakers aged from seven to fourteen. The speakers are recorded while tackling co-operative problem-solving tasks and the same pairs of speakers are recorded over two years tackling 10 different versions of our two tasks. In addition there are over 200 dialogues recorded between pairs of undergraduate speakers engaged on versions of the same tasks. Access to the database, and to its accompanying custom-built search software, is available electronically over the JANET system by contacting liz@psy.glasgow.ac.uk, from whom further information about the database and a user's guide to the database can be obtained.
NASA Astrophysics Data System (ADS)
Poock, G. K.; Martin, B. J.
1984-02-01
This was an applied investigation examining the ability of a speech recognition system to recognize speakers' inputs when the speakers were under different stress levels. Subjects were asked to speak to a voice recognition system under three conditions: (1) normal office environment, (2) emotional stress, and (3) perceptual-motor stress. Results indicate a definite relationship between voice recognition system performance and the type of low stress reference patterns used to achieve recognition.
Computer-Mediated Assessment of Intelligibility in Aphasia and Apraxia of Speech
Haley, Katarina L.; Roth, Heidi; Grindstaff, Enetta; Jacks, Adam
2011-01-01
Background Previous work indicates that single word intelligibility tests developed for dysarthria are sensitive to segmental production errors in aphasic individuals with and without apraxia of speech. However, potential listener learning effects and difficulties adapting elicitation procedures to coexisting language impairments limit their applicability to left hemisphere stroke survivors. Aims The main purpose of this study was to examine basic psychometric properties for a new monosyllabic intelligibility test developed for individuals with aphasia and/or AOS. A related purpose was to examine clinical feasibility and potential to standardize a computer-mediated administration approach. Methods & Procedures A 600-item monosyllabic single word intelligibility test was constructed by assembling sets of phonetically similar words. Custom software was used to select 50 target words from this test in a pseudo-random fashion and to elicit and record production of these words by 23 speakers with aphasia and 20 neurologically healthy participants. To evaluate test-retest reliability, two identical sets of 50-word lists were elicited by requesting repetition after a live speaker model. To examine the effect of a different word set and auditory model, an additional set of 50 different words was elicited with a pre-recorded model. The recorded words were presented to normal-hearing listeners for identification via orthographic and multiple-choice response formats. To examine construct validity, production accuracy for each speaker was estimated via phonetic transcription and rating of overall articulation. Outcomes & Results Recording and listening tasks were completed in less than six minutes for all speakers and listeners. Aphasic speakers were significantly less intelligible than neurologically healthy speakers and displayed a wide range of intelligibility scores. Test-retest and inter-listener reliability estimates were strong. No significant difference was found in scores based on recordings from a live model versus a pre-recorded model, but some individual speakers favored the live model. Intelligibility test scores correlated highly with segmental accuracy derived from broad phonetic transcription of the same speech sample and a motor speech evaluation. Scores correlated moderately with rated articulation difficulty. Conclusions We describe a computerized, single-word intelligibility test that yields clinically feasible, reliable, and valid measures of segmental speech production in adults with aphasia. This tool can be used in clinical research to facilitate appropriate participant selection and to establish matching across comparison groups. For a majority of speakers, elicitation procedures can be standardized by using a pre-recorded auditory model for repetition. This assessment tool has potential utility for both clinical assessment and outcomes research. PMID:22215933
Modelling Errors in Automatic Speech Recognition for Dysarthric Speakers
NASA Astrophysics Data System (ADS)
Caballero Morales, Santiago Omar; Cox, Stephen J.
2009-12-01
Dysarthria is a motor speech disorder characterized by weakness, paralysis, or poor coordination of the muscles responsible for speech. Although automatic speech recognition (ASR) systems have been developed for disordered speech, factors such as low intelligibility and limited phonemic repertoire decrease speech recognition accuracy, making conventional speaker adaptation algorithms perform poorly on dysarthric speakers. In this work, rather than adapting the acoustic models, we model the errors made by the speaker and attempt to correct them. For this task, two techniques have been developed: (1) a set of "metamodels" that incorporate a model of the speaker's phonetic confusion matrix into the ASR process; (2) a cascade of weighted finite-state transducers at the confusion matrix, word, and language levels. Both techniques attempt to correct the errors made at the phonetic level and make use of a language model to find the best estimate of the correct word sequence. Our experiments show that both techniques outperform standard adaptation techniques.
Artificially intelligent recognition of Arabic speaker using voice print-based local features
NASA Astrophysics Data System (ADS)
Mahmood, Awais; Alsulaiman, Mansour; Muhammad, Ghulam; Akram, Sheeraz
2016-11-01
Local features for any pattern recognition system are based on the information extracted locally. In this paper, a local feature extraction technique was developed. This feature was extracted in the time-frequency plain by taking the moving average on the diagonal directions of the time-frequency plane. This feature captured the time-frequency events producing a unique pattern for each speaker that can be viewed as a voice print of the speaker. Hence, we referred to this technique as voice print-based local feature. The proposed feature was compared to other features including mel-frequency cepstral coefficient (MFCC) for speaker recognition using two different databases. One of the databases used in the comparison is a subset of an LDC database that consisted of two short sentences uttered by 182 speakers. The proposed feature attained 98.35% recognition rate compared to 96.7% for MFCC using the LDC subset.
Alternative Speech Communication System for Persons with Severe Speech Disorders
NASA Astrophysics Data System (ADS)
Selouani, Sid-Ahmed; Sidi Yakoub, Mohammed; O'Shaughnessy, Douglas
2009-12-01
Assistive speech-enabled systems are proposed to help both French and English speaking persons with various speech disorders. The proposed assistive systems use automatic speech recognition (ASR) and speech synthesis in order to enhance the quality of communication. These systems aim at improving the intelligibility of pathologic speech making it as natural as possible and close to the original voice of the speaker. The resynthesized utterances use new basic units, a new concatenating algorithm and a grafting technique to correct the poorly pronounced phonemes. The ASR responses are uttered by the new speech synthesis system in order to convey an intelligible message to listeners. Experiments involving four American speakers with severe dysarthria and two Acadian French speakers with sound substitution disorders (SSDs) are carried out to demonstrate the efficiency of the proposed methods. An improvement of the Perceptual Evaluation of the Speech Quality (PESQ) value of 5% and more than 20% is achieved by the speech synthesis systems that deal with SSD and dysarthria, respectively.
NASA Astrophysics Data System (ADS)
O'Sullivan, James; Chen, Zhuo; Herrero, Jose; McKhann, Guy M.; Sheth, Sameer A.; Mehta, Ashesh D.; Mesgarani, Nima
2017-10-01
Objective. People who suffer from hearing impairments can find it difficult to follow a conversation in a multi-speaker environment. Current hearing aids can suppress background noise; however, there is little that can be done to help a user attend to a single conversation amongst many without knowing which speaker the user is attending to. Cognitively controlled hearing aids that use auditory attention decoding (AAD) methods are the next step in offering help. Translating the successes in AAD research to real-world applications poses a number of challenges, including the lack of access to the clean sound sources in the environment with which to compare with the neural signals. We propose a novel framework that combines single-channel speech separation algorithms with AAD. Approach. We present an end-to-end system that (1) receives a single audio channel containing a mixture of speakers that is heard by a listener along with the listener’s neural signals, (2) automatically separates the individual speakers in the mixture, (3) determines the attended speaker, and (4) amplifies the attended speaker’s voice to assist the listener. Main results. Using invasive electrophysiology recordings, we identified the regions of the auditory cortex that contribute to AAD. Given appropriate electrode locations, our system is able to decode the attention of subjects and amplify the attended speaker using only the mixed audio. Our quality assessment of the modified audio demonstrates a significant improvement in both subjective and objective speech quality measures. Significance. Our novel framework for AAD bridges the gap between the most recent advancements in speech processing technologies and speech prosthesis research and moves us closer to the development of cognitively controlled hearable devices for the hearing impaired.
Hollien, Harry; Huntley Bahr, Ruth; Harnsberger, James D
2014-03-01
The following article provides a general review of an area that can be referred to as Forensic Voice. Its goals will be outlined and that discussion will be followed by a description of its major elements. Considered are (1) the processing and analysis of spoken utterances, (2) distorted speech, (3) enhancement of speech intelligibility (re: surveillance and other recordings), (4) transcripts, (5) authentication of recordings, (6) speaker identification, and (7) the detection of deception, intoxication, and emotions in speech. Stress in speech and the psychological stress evaluation systems (that some individuals attempt to use as lie detectors) also will be considered. Points of entry will be suggested for individuals with the kinds of backgrounds possessed by professionals already working in the voice area. Copyright © 2014 The Voice Foundation. Published by Mosby, Inc. All rights reserved.
Brainstem Correlates of Speech-in-Noise Perception in Children
Anderson, Samira; Skoe, Erika; Chandrasekaran, Bharath; Zecker, Steven; Kraus, Nina
2010-01-01
Children often have difficulty understanding speech in challenging listening environments. In the absence of peripheral hearing loss, these speech perception difficulties may arise from dysfunction at more central levels in the auditory system, including subcortical structures. We examined brainstem encoding of pitch in a speech syllable in 38 school-age children. In children with poor speech-in-noise perception, we find impaired encoding of the fundamental frequency and the second harmonic, two important cues for pitch perception. Pitch, an important factor in speaker identification, aids the listener in tracking a specific voice from a background of voices. These results suggest that the robustness of subcortical neural encoding of pitch features in time-varying signals is an important factor in determining success with speech perception in noise. PMID:20708671
Simulation and testing of a multichannel system for 3D sound localization
NASA Astrophysics Data System (ADS)
Matthews, Edward Albert
Three-dimensional (3D) audio involves the ability to localize sound anywhere in a three-dimensional space. 3D audio can be used to provide the listener with the perception of moving sounds and can provide a realistic listening experience for applications such as gaming, video conferencing, movies, and concerts. The purpose of this research is to simulate and test 3D audio by incorporating auditory localization techniques in a multi-channel speaker system. The objective is to develop an algorithm that can place an audio event in a desired location by calculating and controlling the gain factors of each speaker. A MATLAB simulation displays the location of the speakers and perceived sound, which is verified through experimentation. The scenario in which the listener is not equidistant from each of the speakers is also investigated and simulated. This research is envisioned to lead to a better understanding of human localization of sound, and will contribute to a more realistic listening experience.
Adaptive Communication: Languages with More Non-Native Speakers Tend to Have Fewer Word Forms
Bentz, Christian; Verkerk, Annemarie; Kiela, Douwe; Hill, Felix; Buttery, Paula
2015-01-01
Explaining the diversity of languages across the world is one of the central aims of typological, historical, and evolutionary linguistics. We consider the effect of language contact-the number of non-native speakers a language has-on the way languages change and evolve. By analysing hundreds of languages within and across language families, regions, and text types, we show that languages with greater levels of contact typically employ fewer word forms to encode the same information content (a property we refer to as lexical diversity). Based on three types of statistical analyses, we demonstrate that this variance can in part be explained by the impact of non-native speakers on information encoding strategies. Finally, we argue that languages are information encoding systems shaped by the varying needs of their speakers. Language evolution and change should be modeled as the co-evolution of multiple intertwined adaptive systems: On one hand, the structure of human societies and human learning capabilities, and on the other, the structure of language. PMID:26083380
The impact of musical training and tone language experience on talker identification
Xie, Xin; Myers, Emily
2015-01-01
Listeners can use pitch changes in speech to identify talkers. Individuals exhibit large variability in sensitivity to pitch and in accuracy perceiving talker identity. In particular, people who have musical training or long-term tone language use are found to have enhanced pitch perception. In the present study, the influence of pitch experience on talker identification was investigated as listeners identified talkers in native language as well as non-native languages. Experiment 1 was designed to explore the influence of pitch experience on talker identification in two groups of individuals with potential advantages for pitch processing: musicians and tone language speakers. Experiment 2 further investigated individual differences in pitch processing and the contribution to talker identification by testing a mediation model. Cumulatively, the results suggested that (a) musical training confers an advantage for talker identification, supporting a shared resources hypothesis regarding music and language and (b) linguistic use of lexical tones also increases accuracy in hearing talker identity. Importantly, these two types of hearing experience enhance talker identification by sharpening pitch perception skills in a domain-general manner. PMID:25618071
The impact of musical training and tone language experience on talker identification.
Xie, Xin; Myers, Emily
2015-01-01
Listeners can use pitch changes in speech to identify talkers. Individuals exhibit large variability in sensitivity to pitch and in accuracy perceiving talker identity. In particular, people who have musical training or long-term tone language use are found to have enhanced pitch perception. In the present study, the influence of pitch experience on talker identification was investigated as listeners identified talkers in native language as well as non-native languages. Experiment 1 was designed to explore the influence of pitch experience on talker identification in two groups of individuals with potential advantages for pitch processing: musicians and tone language speakers. Experiment 2 further investigated individual differences in pitch processing and the contribution to talker identification by testing a mediation model. Cumulatively, the results suggested that (a) musical training confers an advantage for talker identification, supporting a shared resources hypothesis regarding music and language and (b) linguistic use of lexical tones also increases accuracy in hearing talker identity. Importantly, these two types of hearing experience enhance talker identification by sharpening pitch perception skills in a domain-general manner.
Speech transformations based on a sinusoidal representation
NASA Astrophysics Data System (ADS)
Quatieri, T. E.; McAulay, R. J.
1986-05-01
A new speech analysis/synthesis technique is presented which provides the basis for a general class of speech transformation including time-scale modification, frequency scaling, and pitch modification. These modifications can be performed with a time-varying change, permitting continuous adjustment of a speaker's fundamental frequency and rate of articulation. The method is based on a sinusoidal representation of the speech production mechanism that has been shown to produce synthetic speech that preserves the waveform shape and is essentially perceptually indistinguishable from the original. Although the analysis/synthesis system originally was designed for single-speaker signals, it is equally capable of recovering and modifying nonspeech signals such as music; multiple speakers, marine biologic sounds, and speakers in the presence of interferences such as noise and musical backgrounds.
ERIC Educational Resources Information Center
Defense Language Inst., Washington, DC.
These 14 volumes of the Defense Language Institute's basic course in Turkish consist of 112 lesson units designed to train native English language speakers to Level 3 proficiency in comprehending, speaking, reading, and writing Turkish. (Native-speaker fluency is Level 5.) An introduction to the sound system, vowel harmony, and syllable division…
Vanrell, Maria del Mar; Mascaró, Ignasi; Torres-Tamarit, Francesc; Prieto, Pilar
2013-06-01
Recent studies in the field of intonational phonology have shown that information-seeking questions can be distinguished from confirmation-seeking questions by prosodic means in a variety of languages (Armstrong, 2010, for Puerto Rican Spanish; Grice & Savino, 1997, for Bari Italian; Kügler, 2003, for Leipzig German; Mata & Santos, 2010, for European Portuguese; Vanrell, Mascaró, Prieto, & Torres-Tamarit, 2010, for Catalan). However, all these studies have relied on production experiments and little is known about the perceptual relevance of these intonational cues. This paper explores whether Majorcan Catalan listeners distinguish information- and confirmation-seeking questions by means of two distinct nuclear falling pitch accents. Three behavioral tasks were conducted with 20 Majorcan Catalan subjects, namely a semantic congruity test, a rating test, and a classical categorical perception identification/discrimination test. The results show that a difference in pitch scaling on the leading H tone of the H+L* nuclear pitch accent is the main cue used by Majorcan Catalan listeners to distinguish confirmation questions from information-seeking questions. Thus, while a iH+L* pitch accent signals an information-seeking question (i.e., the speaker has no expectation about the nature of the answer), the H+L* pitch accent indicates that the speaker is asking about mutually shared information. We argue that these results have implications in representing the distinctions of tonal height in Catalan. The results also support the claim that phonological contrasts in intonation, together with other linguistic strategies, can signal the speakers' beliefs about the certainty of the proposition expressed.
Voice Recognition Software Accuracy with Second Language Speakers of English.
ERIC Educational Resources Information Center
Coniam, D.
1999-01-01
Explores the potential of the use of voice-recognition technology with second-language speakers of English. Involves the analysis of the output produced by a small group of very competent second-language subjects reading a text into the voice recognition software Dragon Systems "Dragon NaturallySpeaking." (Author/VWL)
Beyond the School: What Else Educates?
ERIC Educational Resources Information Center
Hansen, Kenneth H., Ed.
Eleven essays explore the educational effects of social forces and agencies outside of the formal school environment. Speakers at the 1977 Chief State School Officers' Institute examined how these social forces can be used to enhance the work of the American school system. Speakers represented schools of education, research institutes, media…
Bilingual Computerized Speech Recognition Screening for Depression Symptoms
ERIC Educational Resources Information Center
Gonzalez, Gerardo; Carter, Colby; Blanes, Erika
2007-01-01
The Voice-Interactive Depression Assessment System (VIDAS) is a computerized speech recognition application for screening depression based on the Center for Epidemiological Studies--Depression scale in English and Spanish. Study 1 included 50 English and 47 Spanish speakers. Study 2 involved 108 English and 109 Spanish speakers. Participants…
The Speech Community in Evolutionary Language Dynamics
ERIC Educational Resources Information Center
Blythe, Richard A.; Croft, William A.
2009-01-01
Language is a complex adaptive system: Speakers are agents who interact with each other, and their past and current interactions feed into speakers' future behavior in complex ways. In this article, we describe the social cognitive linguistic basis for this analysis of language and a mathematical model developed in collaboration between…
Evidential Uses in the Spanish of Quechua Speakers in Peru.
ERIC Educational Resources Information Center
Escobar, Anna Maria
1994-01-01
Analysis of recordings of spontaneous speech of native speakers of Quechua speaking Spanish as a second language reveals that, using verbal morphological resources of Spanish, they have grammaticalized an epistemic marking system resembling that of Quechua. Sources of this process in both Quechua and Spanish are analyzed. (MSE)
Perceptual Learning of Time-Compressed Speech: More than Rapid Adaptation
Banai, Karen; Lavner, Yizhar
2012-01-01
Background Time-compressed speech, a form of rapidly presented speech, is harder to comprehend than natural speech, especially for non-native speakers. Although it is possible to adapt to time-compressed speech after a brief exposure, it is not known whether additional perceptual learning occurs with further practice. Here, we ask whether multiday training on time-compressed speech yields more learning than that observed during the initial adaptation phase and whether the pattern of generalization following successful learning is different than that observed with initial adaptation only. Methodology/Principal Findings Two groups of non-native Hebrew speakers were tested on five different conditions of time-compressed speech identification in two assessments conducted 10–14 days apart. Between those assessments, one group of listeners received five practice sessions on one of the time-compressed conditions. Between the two assessments, trained listeners improved significantly more than untrained listeners on the trained condition. Furthermore, the trained group generalized its learning to two untrained conditions in which different talkers presented the trained speech materials. In addition, when the performance of the non-native speakers was compared to that of a group of naïve native Hebrew speakers, performance of the trained group was equivalent to that of the native speakers on all conditions on which learning occurred, whereas performance of the untrained non-native listeners was substantially poorer. Conclusions/Significance Multiday training on time-compressed speech results in significantly more perceptual learning than brief adaptation. Compared to previous studies of adaptation, the training induced learning is more stimulus specific. Taken together, the perceptual learning of time-compressed speech appears to progress from an initial, rapid adaptation phase to a subsequent prolonged and more stimulus specific phase. These findings are consistent with the predictions of the Reverse Hierarchy Theory of perceptual learning and suggest constraints on the use of perceptual-learning regimens during second language acquisition. PMID:23056592
Analysis and Classification of Voice Pathologies Using Glottal Signal Parameters.
Forero M, Leonardo A; Kohler, Manoela; Vellasco, Marley M B R; Cataldo, Edson
2016-09-01
The classification of voice diseases has many applications in health, in diseases treatment, and in the design of new medical equipment for helping doctors in diagnosing pathologies related to the voice. This work uses the parameters of the glottal signal to help the identification of two types of voice disorders related to the pathologies of the vocal folds: nodule and unilateral paralysis. The parameters of the glottal signal are obtained through a known inverse filtering method, and they are used as inputs to an Artificial Neural Network, a Support Vector Machine, and also to a Hidden Markov Model, to obtain the classification, and to compare the results, of the voice signals into three different groups: speakers with nodule in the vocal folds; speakers with unilateral paralysis of the vocal folds; and speakers with normal voices, that is, without nodule or unilateral paralysis present in the vocal folds. The database is composed of 248 voice recordings (signals of vowels production) containing samples corresponding to the three groups mentioned. In this study, a larger database was used for the classification when compared with similar studies, and its classification rate is superior to other studies, reaching 97.2%. Copyright © 2016 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Groenewold, Rimke; Armstrong, Elizabeth
2018-05-14
Previous research has shown that speakers with aphasia rely on enactment more often than non-brain-damaged language users. Several studies have been conducted to explain this observed increase, demonstrating that spoken language containing enactment is easier to produce and is more engaging to the conversation partner. This paper describes the effects of the occurrence of enactment in casual conversation involving individuals with aphasia on its level of conversational assertiveness. To evaluate whether and to what extent the occurrence of enactment in speech of individuals with aphasia contributes to its conversational assertiveness. Conversations between a speaker with aphasia and his wife (drawn from AphasiaBank) were analysed in several steps. First, the transcripts were divided into moves, and all moves were coded according to the systemic functional linguistics (SFL) framework. Next, all moves were labelled in terms of their level of conversational assertiveness, as defined in the previous literature. Finally, all enactments were identified and their level of conversational assertiveness was compared with that of non-enactments. Throughout their conversations, the non-brain-damaged speaker was more assertive than the speaker with aphasia. However, the speaker with aphasia produced more enactments than the non-brain-damaged speaker. The moves of the speaker with aphasia containing enactment were more assertive than those without enactment. The use of enactment in the conversations under study positively affected the level of conversational assertiveness of the speaker with aphasia, a competence that is important for speakers with aphasia because it contributes to their floor time, chances to be heard seriously and degree of control over the conversation topic. © 2018 The Authors International Journal of Language & Communication Disorders published by John Wiley & Sons Ltd on behalf of Royal College of Speech and Language Therapists.
Utterance selection model of language change
NASA Astrophysics Data System (ADS)
Baxter, G. J.; Blythe, R. A.; Croft, W.; McKane, A. J.
2006-04-01
We present a mathematical formulation of a theory of language change. The theory is evolutionary in nature and has close analogies with theories of population genetics. The mathematical structure we construct similarly has correspondences with the Fisher-Wright model of population genetics, but there are significant differences. The continuous time formulation of the model is expressed in terms of a Fokker-Planck equation. This equation is exactly soluble in the case of a single speaker and can be investigated analytically in the case of multiple speakers who communicate equally with all other speakers and give their utterances equal weight. Whilst the stationary properties of this system have much in common with the single-speaker case, time-dependent properties are richer. In the particular case where linguistic forms can become extinct, we find that the presence of many speakers causes a two-stage relaxation, the first being a common marginal distribution that persists for a long time as a consequence of ultimate extinction being due to rare fluctuations.
Getzmann, Stephan; Jasny, Julian; Falkenstein, Michael
2017-02-01
Verbal communication in a "cocktail-party situation" is a major challenge for the auditory system. In particular, changes in target speaker usually result in declined speech perception. Here, we investigated whether speech cues indicating a subsequent change in target speaker reduce the costs of switching in younger and older adults. We employed event-related potential (ERP) measures and a speech perception task, in which sequences of short words were simultaneously presented by four speakers. Changes in target speaker were either unpredictable or semantically cued by a word within the target stream. Cued changes resulted in a less decreased performance than uncued changes in both age groups. The ERP analysis revealed shorter latencies in the change-related N400 and late positive complex (LPC) after cued changes, suggesting an acceleration in context updating and attention switching. Thus, both younger and older listeners used semantic cues to prepare changes in speaker setting. Copyright © 2016 Elsevier Inc. All rights reserved.
Complexity Matters: On Gender Agreement in Heritage Scandinavian
Johannessen, Janne Bondi; Larsson, Ida
2015-01-01
This paper investigates aspects of the noun phrase from a Scandinavian heritage language perspective, with an emphasis on noun phrase-internal gender agreement and noun declension. Our results are somewhat surprising compared with earlier research: We find that noun phrase-internal agreement for the most part is rather stable. To the extent that we find attrition, it affects agreement in the noun phrase, but not the declension of the noun. We discuss whether this means that gender is lost and has been reduced to a pure declension class, or whether gender is retained. We argue that gender is actually retained in these heritage speakers. One argument for this is that the speakers who lack agreement in complex noun phrases, have agreement intact in simpler phrases. We have thus found that the complexity of the noun phrase is crucial for some speakers. However, among the heritage speakers we also find considerable inter-individual variation, and different speakers can have partly different systems. PMID:26733114
On how the brain decodes vocal cues about speaker confidence.
Jiang, Xiaoming; Pell, Marc D
2015-05-01
In speech communication, listeners must accurately decode vocal cues that refer to the speaker's mental state, such as their confidence or 'feeling of knowing'. However, the time course and neural mechanisms associated with online inferences about speaker confidence are unclear. Here, we used event-related potentials (ERPs) to examine the temporal neural dynamics underlying a listener's ability to infer speaker confidence from vocal cues during speech processing. We recorded listeners' real-time brain responses while they evaluated statements wherein the speaker's tone of voice conveyed one of three levels of confidence (confident, close-to-confident, unconfident) or were spoken in a neutral manner. Neural responses time-locked to event onset show that the perceived level of speaker confidence could be differentiated at distinct time points during speech processing: unconfident expressions elicited a weaker P2 than all other expressions of confidence (or neutral-intending utterances), whereas close-to-confident expressions elicited a reduced negative response in the 330-500 msec and 550-740 msec time window. Neutral-intending expressions, which were also perceived as relatively confident, elicited a more delayed, larger sustained positivity than all other expressions in the 980-1270 msec window for this task. These findings provide the first piece of evidence of how quickly the brain responds to vocal cues signifying the extent of a speaker's confidence during online speech comprehension; first, a rough dissociation between unconfident and confident voices occurs as early as 200 msec after speech onset. At a later stage, further differentiation of the exact level of speaker confidence (i.e., close-to-confident, very confident) is evaluated via an inferential system to determine the speaker's meaning under current task settings. These findings extend three-stage models of how vocal emotion cues are processed in speech comprehension (e.g., Schirmer & Kotz, 2006) by revealing how a speaker's mental state (i.e., feeling of knowing) is simultaneously inferred from vocal expressions. Copyright © 2015 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Vemuri, SH. S.; Bosworth, R.; Morrison, J. F.; Kerrigan, E. C.
2018-05-01
The growth of Tollmien-Schlichting (TS) waves is experimentally attenuated using a single-input and single-output (SISO) feedback system, where the TS wave packet is generated by a surface point source in a flat-plate boundary layer. The SISO system consists of a single wall-mounted hot wire as the sensor and a miniature speaker as the actuator. The actuation is achieved through a dual-slot geometry to minimize the cavity near-field effects on the sensor. The experimental setup to generate TS waves or wave packets is very similar to that used by Li and Gaster [J. Fluid Mech. 550, 185 (2006), 10.1017/S0022112005008219]. The aim is to investigate the performance of the SISO control system in attenuating single-frequency, two-dimensional disturbances generated by these configurations. The necessary plant models are obtained using system identification, and the controllers are then designed based on the models and implemented in real-time to test their performance. Cancellation of the rms streamwise velocity fluctuation of TS waves is evident over a significant domain.
Expression transmission using exaggerated animation for Elfoid
Hori, Maiya; Tsuruda, Yu; Yoshimura, Hiroki; Iwai, Yoshio
2015-01-01
We propose an expression transmission system using a cellular-phone-type teleoperated robot called Elfoid. Elfoid has a soft exterior that provides the look and feel of human skin, and is designed to transmit the speaker's presence to their communication partner using a camera and microphone. To transmit the speaker's presence, Elfoid sends not only the voice of the speaker but also the facial expression captured by the camera. In this research, facial expressions are recognized using a machine learning technique. Elfoid cannot, however, display facial expressions because of its compactness and a lack of sufficiently small actuator motors. To overcome this problem, facial expressions are displayed using Elfoid's head-mounted mobile projector. In an experiment, we built a prototype system and experimentally evaluated it's subjective usability. PMID:26347686
NASA Astrophysics Data System (ADS)
Gillen, Mathew
2014-03-01
The speaker will address policy changes and improvements in visa processing that help scientists and students to visit and study in the United States. The speaker will also discuss challenges involved with balancing the needs of U.S. science with national security interests.
Attentional influences on functional mapping of speech sounds in human auditory cortex.
Obleser, Jonas; Elbert, Thomas; Eulitz, Carsten
2004-07-21
The speech signal contains both information about phonological features such as place of articulation and non-phonological features such as speaker identity. These are different aspects of the 'what'-processing stream (speaker vs. speech content), and here we show that they can be further segregated as they may occur in parallel but within different neural substrates. Subjects listened to two different vowels, each spoken by two different speakers. During one block, they were asked to identify a given vowel irrespectively of the speaker (phonological categorization), while during the other block the speaker had to be identified irrespectively of the vowel (speaker categorization). Auditory evoked fields were recorded using 148-channel magnetoencephalography (MEG), and magnetic source imaging was obtained for 17 subjects. During phonological categorization, a vowel-dependent difference of N100m source location perpendicular to the main tonotopic gradient replicated previous findings. In speaker categorization, the relative mapping of vowels remained unchanged but sources were shifted towards more posterior and more superior locations. These results imply that the N100m reflects the extraction of abstract invariants from the speech signal. This part of the processing is accomplished in auditory areas anterior to AI, which are part of the auditory 'what' system. This network seems to include spatially separable modules for identifying the phonological information and for associating it with a particular speaker that are activated in synchrony but within different regions, suggesting that the 'what' processing can be more adequately modeled by a stream of parallel stages. The relative activation of the parallel processing stages can be modulated by attentional or task demands.
The Development of the Speaker Independent ARM Continuous Speech Recognition System
1992-01-01
spokeTi airborne reconnaissance reports u-ing a speech recognition system based on phoneme-level hidden Markov models (HMMs). Previous versions of the ARM...will involve automatic selection from multiple model sets, corresponding to different speaker types, and that the most rudimen- tary partition of a...The vocabulary size for the ARM task is 497 words. These words are related to the phoneme-level symbols corresponding to the models in the model set
Google Home: smart speaker as environmental control unit.
Noda, Kenichiro
2017-08-23
Environmental Control Units (ECU) are devices or a system that allows a person to control appliances in their home or work environment. Such system can be utilized by clients with physical and/or functional disability to enhance their ability to control their environment, to promote independence and improve their quality of life. Over the last several years, there have been an emergence of several inexpensive, commercially-available, voice activated smart speakers into the market such as Google Home and Amazon Echo. These smart speakers are equipped with far field microphone that supports voice recognition, and allows for complete hand-free operation for various purposes, including for playing music, for information retrieval, and most importantly, for environmental control. Clients with disability could utilize these features to turn the unit into a simple ECU that is completely voice activated and wirelessly connected to appliances. Smart speakers, with their ease of setup, low cost and versatility, may be a more affordable and accessible alternative to the traditional ECU. Implications for Rehabilitation Environmental Control Units (ECU) enable independence for physically and functionally disabled clients, and reduce burden and frequency of demands on carers. Traditional ECU can be costly and may require clients to learn specialized skills to use. Smart speakers have the potential to be used as a new-age ECU by overcoming these barriers, and can be used by a wider range of clients.
Lotto, A J; Kluender, K R
1998-05-01
When members of a series of synthesized stop consonants varying acoustically in F3 characteristics and varying perceptually from /da/ to /ga/ are preceded by /al/, subjects report hearing more /ga/ syllables relative to when each member is preceded by /ar/ (Mann, 1980). It has been suggested that this result demonstrates the existence of a mechanism that compensates for coarticulation via tacit knowledge of articulatory dynamics and constraints, or through perceptual recovery of vocal-tract dynamics. The present study was designed to assess the degree to which these perceptual effects are specific to qualities of human articulatory sources. In three experiments, series of consonant-vowel (CV) stimuli varying in F3-onset frequency (/da/-/ga/) were preceded by speech versions or nonspeech analogues of /al/ and /ar/. The effect of liquid identity on stop consonant labeling remained when the preceding VC was produced by a female speaker and the CV syllable was modeled after a male speaker's productions. Labeling boundaries also shifted when the CV was preceded by a sine wave glide modeled after F3 characteristics of /al/ and /ar/. Identifications shifted even when the preceding sine wave was of constant frequency equal to the offset frequency of F3 from a natural production. These results suggest an explanation in terms of general auditory processes as opposed to recovery of or knowledge of specific articulatory dynamics.
ERIC Educational Resources Information Center
Law, Wai Ling
2017-01-01
In diglossic contexts, when speakers typically use two different languages on a regular basis, bilingual speakers display a wide array of attitudes towards each of their languages and associated cultures (Galindo, 1995) and such variability in attitudes can affect their linguistic behaviors (Lambert, Hodgson, Gardner & Fillenbaum, 1960).…
ERIC Educational Resources Information Center
Mazuka, Reiko; Friedman, Ronald S.
2000-01-01
Tested claims by Lucy (1992a, 1992b) that differences between the number marking systems used by Yucatec Maya and English lead speakers of these languages to differentially attend to either the material composition or the shape of objects. Replicated Lucy's critical objects' classification experiments using speakers of English and Japanese.…
Optimizing Vowel Formant Measurements in Four Acoustic Analysis Systems for Diverse Speaker Groups
Derdemezis, Ekaterini; Kent, Ray D.; Fourakis, Marios; Reinicke, Emily L.; Bolt, Daniel M.
2016-01-01
Purpose This study systematically assessed the effects of select linear predictive coding (LPC) analysis parameter manipulations on vowel formant measurements for diverse speaker groups using 4 trademarked Speech Acoustic Analysis Software Packages (SAASPs): CSL, Praat, TF32, and WaveSurfer. Method Productions of 4 words containing the corner vowels were recorded from 4 speaker groups with typical development (male and female adults and male and female children) and 4 speaker groups with Down syndrome (male and female adults and male and female children). Formant frequencies were determined from manual measurements using a consensus analysis procedure to establish formant reference values, and from the 4 SAASPs (using both the default analysis parameters and with adjustments or manipulations to select parameters). Smaller differences between values obtained from the SAASPs and the consensus analysis implied more optimal analysis parameter settings. Results Manipulations of default analysis parameters in CSL, Praat, and TF32 yielded more accurate formant measurements, though the benefit was not uniform across speaker groups and formants. In WaveSurfer, manipulations did not improve formant measurements. Conclusions The effects of analysis parameter manipulations on accuracy of formant-frequency measurements varied by SAASP, speaker group, and formant. The information from this study helps to guide clinical and research applications of SAASPs. PMID:26501214
NASA Astrophysics Data System (ADS)
Peng, Bo; Zheng, Sifa; Liao, Xiangning; Lian, Xiaomin
2018-03-01
In order to achieve sound field reproduction in a wide frequency band, multiple-type speakers are used. The reproduction accuracy is not only affected by the signals sent to the speakers, but also depends on the position and the number of each type of speaker. The method of optimizing a mixed speaker array is investigated in this paper. A virtual-speaker weighting method is proposed to optimize both the position and the number of each type of speaker. In this method, a virtual-speaker model is proposed to quantify the increment of controllability of the speaker array when the speaker number increases. While optimizing a mixed speaker array, the gain of the virtual-speaker transfer function is used to determine the priority orders of the candidate speaker positions, which optimizes the position of each type of speaker. Then the relative gain of the virtual-speaker transfer function is used to determine whether the speakers are redundant, which optimizes the number of each type of speaker. Finally the virtual-speaker weighting method is verified by reproduction experiments of the interior sound field in a passenger car. The results validate that the optimum mixed speaker array can be obtained using the proposed method.
Whitehead, Tanya D
2006-01-01
Diversity of language among healthcare employees and nursing students is growing as diversity increases among the general population. Institutions have begun to develop systems to accommodate diversity and to assimilate workers. One barrier to nonnative English-speaking nurse hires may be posed by readiness for the licensure exam and the critical thinking assessments that are now an expected outcome of nursing programs, and act as a gatekeeper to graduation and to employment. To assist in preparing for high-stakes testing, the Assessment Technologies Institute Critical Thinking Assessment was developed in compliance with credentialing bodies' educational outcomes criteria. This pilot study of 209 nursing students was designed to reveal any possible language bias that might act as a barrier to nonnative English speakers. Nursing students were entered as whole classes to the study to control for selection bias. A sample representative of national nursing enrollment was obtained from 21 universities, with 192 (92%) native English-speaking students and 17 (8%) nonnative English speakers participating in the study. All students were given the Assessment Technologies Institute Critical Thinking Assessment at entry and exit to their nursing program. Average scores on entry were 66% for nonnative speakers and 72% for native speakers. At exit, the nonnative speakers had closed the gap in academic outcomes. They had an average score of 72% compared to 73% for native speakers. The study found that the slight differences between the native and nonnative speakers on 2 exit outcome measures-National Council licensure examination (NCLEX-RN) pass rates and Critical Thinking Assessment-were not statistically significant, demonstrating that nonnative English speakers achieved parity with native English-speaking peers on the Critical Thinking Assessment tool, which is often believed to be related to employment readiness.
Attentional influences on functional mapping of speech sounds in human auditory cortex
Obleser, Jonas; Elbert, Thomas; Eulitz, Carsten
2004-01-01
Background The speech signal contains both information about phonological features such as place of articulation and non-phonological features such as speaker identity. These are different aspects of the 'what'-processing stream (speaker vs. speech content), and here we show that they can be further segregated as they may occur in parallel but within different neural substrates. Subjects listened to two different vowels, each spoken by two different speakers. During one block, they were asked to identify a given vowel irrespectively of the speaker (phonological categorization), while during the other block the speaker had to be identified irrespectively of the vowel (speaker categorization). Auditory evoked fields were recorded using 148-channel magnetoencephalography (MEG), and magnetic source imaging was obtained for 17 subjects. Results During phonological categorization, a vowel-dependent difference of N100m source location perpendicular to the main tonotopic gradient replicated previous findings. In speaker categorization, the relative mapping of vowels remained unchanged but sources were shifted towards more posterior and more superior locations. Conclusions These results imply that the N100m reflects the extraction of abstract invariants from the speech signal. This part of the processing is accomplished in auditory areas anterior to AI, which are part of the auditory 'what' system. This network seems to include spatially separable modules for identifying the phonological information and for associating it with a particular speaker that are activated in synchrony but within different regions, suggesting that the 'what' processing can be more adequately modeled by a stream of parallel stages. The relative activation of the parallel processing stages can be modulated by attentional or task demands. PMID:15268765
Speech coding, reconstruction and recognition using acoustics and electromagnetic waves
Holzrichter, J.F.; Ng, L.C.
1998-03-17
The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching. 35 figs.
Speech coding, reconstruction and recognition using acoustics and electromagnetic waves
Holzrichter, John F.; Ng, Lawrence C.
1998-01-01
The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching.
Phonetic Encoding of Coda Voicing Contrast under Different Focus Conditions in L1 vs. L2 English.
Choi, Jiyoun; Kim, Sahayng; Cho, Taehong
2016-01-01
This study investigated how coda voicing contrast in English would be phonetically encoded in the temporal vs. spectral dimension of the preceding vowel (in vowel duration vs. F1/F2) by Korean L2 speakers of English, and how their L2 phonetic encoding pattern would be compared to that of native English speakers. Crucially, these questions were explored by taking into account the phonetics-prosody interface, testing effects of prominence by comparing target segments in three focus conditions (phonological focus, lexical focus, and no focus). Results showed that Korean speakers utilized the temporal dimension (vowel duration) to encode coda voicing contrast, but failed to use the spectral dimension (F1/F2), reflecting their native language experience-i.e., with a more sparsely populated vowel space in Korean, they are less sensitive to small changes in the spectral dimension, and hence fine-grained spectral cues in English are not readily accessible. Results also showed that along the temporal dimension, both the L1 and L2 speakers hyperarticulated coda voicing contrast under prominence (when phonologically or lexically focused), but hypoarticulated it in the non-prominent condition. This indicates that low-level phonetic realization and high-order information structure interact in a communicatively efficient way, regardless of the speakers' native language background. The Korean speakers, however, used the temporal phonetic space differently from the way the native speakers did, especially showing less reduction in the no focus condition. This was also attributable to their native language experience-i.e., the Korean speakers' use of temporal dimension is constrained in a way that is not detrimental to the preservation of coda voicing contrast, given that they failed to add additional cues along the spectral dimension. The results imply that the L2 phonetic system can be more fully illuminated through an investigation of the phonetics-prosody interface in connection with the L2 speakers' native language experience.
Zhang, Xujin; Samuel, Arthur G.; Liu, Siyun
2011-01-01
Previous research has found that a speaker’s native phonological system has a great influence on perception of another language. In three experiments, we tested the perception and representation of Mandarin phonological contrasts by Guangzhou Cantonese speakers, and compared their performance to that of native Mandarin speakers. Despite their rich experience using Mandarin Chinese, the Cantonese speakers had problems distinguishing specific Mandarin segmental and tonal contrasts that do not exist in Guangzhou Cantonese. However, we found evidence that the subtle differences between two members of a contrast were nonetheless represented in the lexicon. We also found different processing patterns for non-native segmental versus non-native tonal contrasts. The results provide substantial new information about the representation and processing of segmental and prosodic information by individuals listening to a closely-related, very well-learned, but still non-native language. PMID:22707849
Heritage language and linguistic theory
Scontras, Gregory; Fuchs, Zuzanna; Polinsky, Maria
2015-01-01
This paper discusses a common reality in many cases of multilingualism: heritage speakers, or unbalanced bilinguals, simultaneous or sequential, who shifted early in childhood from one language (their heritage language) to their dominant language (the language of their speech community). To demonstrate the relevance of heritage linguistics to the study of linguistic competence more broadly defined, we present a series of case studies on heritage linguistics, documenting some of the deficits and abilities typical of heritage speakers, together with the broader theoretical questions they inform. We consider the reorganization of morphosyntactic feature systems, the reanalysis of atypical argument structure, the attrition of the syntax of relativization, and the simplification of scope interpretations; these phenomena implicate diverging trajectories and outcomes in the development of heritage speakers. The case studies also have practical and methodological implications for the study of multilingualism. We conclude by discussing more general concepts central to linguistic inquiry, in particular, complexity and native speaker competence. PMID:26500595
A study on (K, Na) NbO3 based multilayer piezoelectric ceramics micro speaker
NASA Astrophysics Data System (ADS)
Gao, Renlong; Chu, Xiangcheng; Huan, Yu; Sun, Yiming; Liu, Jiayi; Wang, Xiaohui; Li, Longtu
2014-10-01
A flat panel micro speaker was fabricated from (K, Na) NbO3 (KNN)-based multilayer piezoelectric ceramics by a tape casting and cofiring process using Ag-Pd alloys as an inner electrode. The interface between ceramic and electrode was investigated by scanning electron microscope (SEM) and transmission electron microscope (TEM). The acoustic response was characterized by a standard audio test system. We found that the micro speaker with dimensions of 23 × 27 × 0.6 mm3, using three layers of 30 μm thickness KNN-based ceramic, has a high average sound pressure level (SPL) of 87 dB, between 100 Hz-20 kHz under five voltage. This result was even better than that of lead zirconate titanate (PZT)-based ceramics under the same conditions. The experimental results show that the KNN-based multilayer ceramics could be used as lead free piezoelectric micro speakers.
Participation of Second Language and Second Dialect Speakers in the Legal System.
ERIC Educational Resources Information Center
Eades, Diana
2003-01-01
Overviews current theory and practice and research on second language and second dialect speakers and the language of the law. Suggests most of the studies on the topic have analyzed language in courtrooms, where access to data is much easier than in other legal settings, such as police interviews, mediation sessions, or lawyer-client interviews.…
ERIC Educational Resources Information Center
Folker, Joanne E.; Murdoch, Bruce E.; Cahill, Louise M.; Delatycki, Martin B.; Corben, Louise A.; Vogel, Adam P.
2011-01-01
Articulatory kinematics were investigated using electromagnetic articulography (EMA) in four dysarthric speakers with Friedreich's ataxia (FRDA). Specifically, tongue-tip and tongue-back movements were recorded by the AG-200 EMA system during production of the consonants t and k as produced within a sentence utterance and during a rapid syllable…
Brush Talk at the Conversation Table: Interaction between L1 and L2 Speakers of Chinese
ERIC Educational Resources Information Center
Hwang, Menq-Ju
2009-01-01
Chinese characters are used in both Chinese and Japanese writing systems. When literate speakers of either language experience problems in finding or understanding words, they often resort to using Chinese characters or "kanji" (i.e., Chinese characters used in Japanese writing) in their talk, a practice known as "brush talk" ("bitan" in Chinese,…
Training Japanese listeners to identify English /r/ and /l/: A first report
Logan, John S.; Lively, Scott E.; Pisoni, David B.
2012-01-01
Native speakers of Japanese learning English generally have difficulty differentiating the phonemes /r/ and /l/, even after years of experience with English. Previous research that attempted to train Japanese listeners to distinguish this contrast using synthetic stimuli reported little success, especially when transfer to natural tokens containing /r/ and /l/ was tested. In the present study, a different training procedure that emphasized variability among stimulus tokens was used. Japanese subjects were trained in a minimal pair identification paradigm using multiple natural exemplars contrasting /r/ and /l/ from a variety of phonetic environments as stimuli. A pretest–posttest design containing natural tokens was used to assess the effects of training. Results from six subjects showed that the new procedure was more robust than earlier training techniques. Small but reliable differences in performance were obtained between pretest and posttest scores. The results demonstrate the importance of stimulus variability and task-related factors in training nonnative speakers to perceive novel phonetic contrasts that are not distinctive in their native language. PMID:2016438
1989-08-18
for public release; distribution 2b. DECLASSIFICATION / DOWNGRADING SCHEDULE ufnl i mi ted. 4. PERFORMING ORGANIZATION REPORT NUMBER(S) S. MONITORING... ORGANIZATION REPORT NUMBER(S) 6a. NAME OF PERFORMING ORGANIZATION 6b OFFICE SYMBOL 7a. NAME OF MONITORING ORGANIZATION University of California (if...OF FUNDING/SPONSORING 8b OFFICE SYMBOL 9 PROCUREMENT INSTRUMENT IDENTIFICATION NUMBER ORGANIZATION (If applicable) N00014-85-K0562 8c. ADDRESS (City
Short Stature Diagnosis and Referral
Maghnie, Mohamad; Labarta, José I.; Koledova, Ekaterina; Rohrer, Tilman R.
2018-01-01
The “360° GH in Europe” meeting, which examined various aspects of GH diseases, was held in Lisbon, Portugal, in June 2016. The Merck KGaA (Germany) funded meeting comprised three sessions entitled “Short Stature Diagnosis and Referral,” “Optimizing Patient Management,” and “Managing Transition.” Each session had three speaker presentations, followed by a discussion period, and is reported as a manuscript, authored by the speakers. The first session examined current processes of diagnosis and referral by endocrine specialists for pediatric patients with short stature. Requirements for referral vary widely, by country and by patient characteristics such as age. A balance must be made to ensure eligible patients get referred while healthcare systems are not over-burdened by excessive referrals. Late referral and diagnosis of non-GH deficiency conditions can result in increased morbidity and mortality. The consequent delays in making a diagnosis may compromise the effectiveness of GH treatment. Algorithms for growth monitoring and evaluation of skeletal disproportions can improve identification of non-GH deficiency conditions. Performance and validation of guidelines for diagnosis of GH deficiency have not been sufficiently tested. Provocative tests for investigation of GH deficiency remain equivocal, with insufficient information on variations due to patient characteristics, and cutoff values for definition differ not only by country but also by the assay used. When referring and diagnosing causes of short stature in pediatric patients, clinicians need to rely on many factors, but the most essential is clinical experience. PMID:29375479
NASA Astrophysics Data System (ADS)
Wang, Hongcui; Kawahara, Tatsuya
CALL (Computer Assisted Language Learning) systems using ASR (Automatic Speech Recognition) for second language learning have received increasing interest recently. However, it still remains a challenge to achieve high speech recognition performance, including accurate detection of erroneous utterances by non-native speakers. Conventionally, possible error patterns, based on linguistic knowledge, are added to the lexicon and language model, or the ASR grammar network. However, this approach easily falls in the trade-off of coverage of errors and the increase of perplexity. To solve the problem, we propose a method based on a decision tree to learn effective prediction of errors made by non-native speakers. An experimental evaluation with a number of foreign students learning Japanese shows that the proposed method can effectively generate an ASR grammar network, given a target sentence, to achieve both better coverage of errors and smaller perplexity, resulting in significant improvement in ASR accuracy.
High Performance Computing at NASA
NASA Technical Reports Server (NTRS)
Bailey, David H.; Cooper, D. M. (Technical Monitor)
1994-01-01
The speaker will give an overview of high performance computing in the U.S. in general and within NASA in particular, including a description of the recently signed NASA-IBM cooperative agreement. The latest performance figures of various parallel systems on the NAS Parallel Benchmarks will be presented. The speaker was one of the authors of the NAS (National Aerospace Standards) Parallel Benchmarks, which are now widely cited in the industry as a measure of sustained performance on realistic high-end scientific applications. It will be shown that significant progress has been made by the highly parallel supercomputer industry during the past year or so, with several new systems, based on high-performance RISC processors, that now deliver superior performance per dollar compared to conventional supercomputers. Various pitfalls in reporting performance will be discussed. The speaker will then conclude by assessing the general state of the high performance computing field.
Linear array of photodiodes to track a human speaker for video recording
NASA Astrophysics Data System (ADS)
DeTone, D.; Neal, H.; Lougheed, R.
2012-12-01
Communication and collaboration using stored digital media has garnered more interest by many areas of business, government and education in recent years. This is due primarily to improvements in the quality of cameras and speed of computers. An advantage of digital media is that it can serve as an effective alternative when physical interaction is not possible. Video recordings that allow for viewers to discern a presenter's facial features, lips and hand motions are more effective than videos that do not. To attain this, one must maintain a video capture in which the speaker occupies a significant portion of the captured pixels. However, camera operators are costly, and often do an imperfect job of tracking presenters in unrehearsed situations. This creates motivation for a robust, automated system that directs a video camera to follow a presenter as he or she walks anywhere in the front of a lecture hall or large conference room. Such a system is presented. The system consists of a commercial, off-the-shelf pan/tilt/zoom (PTZ) color video camera, a necklace of infrared LEDs and a linear photodiode array detector. Electronic output from the photodiode array is processed to generate the location of the LED necklace, which is worn by a human speaker. The computer controls the video camera movements to record video of the speaker. The speaker's vertical position and depth are assumed to remain relatively constant- the video camera is sent only panning (horizontal) movement commands. The LED necklace is flashed at 70Hz at a 50% duty cycle to provide noise-filtering capability. The benefit to using a photodiode array versus a standard video camera is its higher frame rate (4kHz vs. 60Hz). The higher frame rate allows for the filtering of infrared noise such as sunlight and indoor lighting-a capability absent from other tracking technologies. The system has been tested in a large lecture hall and is shown to be effective.
Speaker normalization for chinese vowel recognition in cochlear implants.
Luo, Xin; Fu, Qian-Jie
2005-07-01
Because of the limited spectra-temporal resolution associated with cochlear implants, implant patients often have greater difficulty with multitalker speech recognition. The present study investigated whether multitalker speech recognition can be improved by applying speaker normalization techniques to cochlear implant speech processing. Multitalker Chinese vowel recognition was tested with normal-hearing Chinese-speaking subjects listening to a 4-channel cochlear implant simulation, with and without speaker normalization. For each subject, speaker normalization was referenced to the speaker that produced the best recognition performance under conditions without speaker normalization. To match the remaining speakers to this "optimal" output pattern, the overall frequency range of the analysis filter bank was adjusted for each speaker according to the ratio of the mean third formant frequency values between the specific speaker and the reference speaker. Results showed that speaker normalization provided a small but significant improvement in subjects' overall recognition performance. After speaker normalization, subjects' patterns of recognition performance across speakers changed, demonstrating the potential for speaker-dependent effects with the proposed normalization technique.
Oral-diadochokinesis rates across languages: English and Hebrew norms.
Icht, Michal; Ben-David, Boaz M
2014-01-01
Oro-facial and speech motor control disorders represent a variety of speech and language pathologies. Early identification of such problems is important and carries clinical implications. A common and simple tool for gauging the presence and severity of speech motor control impairments is oral-diadochokinesis (oral-DDK). Surprisingly, norms for adult performance are missing from the literature. The goals of this study were: (1) to establish a norm for oral-DDK rate for (young to middle-age) adult English speakers, by collecting data from the literature (five studies, N=141); (2) to investigate the possible effect of language (and culture) on oral-DDK performance, by analyzing studies conducted in other languages (five studies, N=140), alongside the English norm; and (3) to find a new norm for adult Hebrew speakers, by testing 115 speakers. We first offer an English norm with a mean of 6.2syllables/s (SD=.8), and a lower boundary of 5.4syllables/s that can be used to indicate possible abnormality. Next, we found significant differences between four tested languages (English, Portuguese, Farsi and Greek) in oral-DDK rates. Results suggest the need to set language and culture sensitive norms for the application of the oral-DDK task world-wide. Finally, we found the oral-DDK performance for adult Hebrew speakers to be 6.4syllables/s (SD=.8), not significantly different than the English norms. This implies possible phonological similarities between English and Hebrew. We further note that no gender effects were found in our study. We recommend using oral-DDK as an important tool in the speech language pathologist's arsenal. Yet, application of this task should be done carefully, comparing individual performance to a set norm within the specific language. Readers will be able to: (1) identify the Speech-Language Pathologist assessment process using the oral-DDK task, by comparing an individual performance to the present English norm, (2) describe the impact of language on oral-DDK performance, and (3) accurately detect Hebrew speakers' patients using this tool. Copyright © 2014 Elsevier Inc. All rights reserved.
Hey kid! Wanna build a loudspeaker? The first one's free
NASA Astrophysics Data System (ADS)
Garrett, Steven
2005-09-01
In 2000, Penn State University instituted a First Year Seminar (FYS) requirement for every entering undergraduate student. This paper describes a hands-on FYS on audio engineering that has each freshman, in a class of 20, construct and test a two-way loudspeaker system during eight 2-hour meetings. Time and resource constraints dictated that the speaker system must be assembled using only hand tools and characterized using only an oscillator and digital multimeters. Due to limitations on the funds made available for each FYS by the College of Engineering, the total cost of entire system could not exceed $65/side. Each student is provided with a woofer, tweeter, crossover components, and enclosure parts to build one speaker system. The students are offered the option of purchasing a second set of parts for $65 so that they can complete the course with a stereo pair. Ninety percent of the students exercise the stereo option. This presentation will describe the speaker system, using an enclosure made primarily from PVC plumbing parts, and the four laboratory exercises that the students perform and write up that are designed to introduce basic engineering concepts including graphing, electrical impedance, resonance, transfer functions, mechanical and gas stiffness, and nondestructive parameter measurement.
Kong, Anthony Pak-Hin; Whiteside, Janet; Bargmann, Peggy
2016-10-01
Discourse from speakers with dementia and aphasia is associated with comparable but not identical deficits, necessitating appropriate methods to differentiate them. The current study aims to validate the Main Concept Analysis (MCA) to be used for eliciting and quantifying discourse among native typical English speakers and to establish its norm, and investigate the validity and sensitivity of the MCA to compare discourse produced by individuals with fluent aphasia, non-fluent aphasia, or dementia of Alzheimer's type (DAT), and unimpaired elderly. Discourse elicited through a sequential picture description task was collected from 60 unimpaired participants to determine the MCA scoring criteria; 12 speakers with fluent aphasia, 12 with non-fluent aphasia, 13 with DAT, and 20 elderly participants from the healthy group were compared on the finalized MCA. Results of MANOVA revealed significant univariate omnibus effects of speaker group as an independent variable on each main concept index. MCA profiles differed significantly between all participant groups except dementia versus fluent aphasia. Correlations between the MCA performances and the Western Aphasia Battery and Cognitive Linguistic Quick Test were found to be statistically significant among the clinical groups. The MCA was appropriate to be used among native speakers of English. The results also provided further empirical evidence of discourse deficits in aphasia and dementia. Practitioners can use the MCA to evaluate discourse production systemically and objectively.
Judgement of Countability and Plural Marking in English by Native and Non-Native English Speakers
ERIC Educational Resources Information Center
Tsang, Art
2017-01-01
Learning whether English nouns are countable or not is a source of great difficulty for many ESL/EFL learners. In the present study, a grammaticality judgement task comprised of a range of nouns representative of the different facets of the countability system in English was distributed to 82 native speakers of English (NSs) and 98 non-native…
Parallel approach to incorporating face image information into dialogue processing
NASA Astrophysics Data System (ADS)
Ren, Fuji
2000-10-01
There are many kinds of so-called irregular expressions in natural dialogues. Even if the content of a conversation is the same in words, different meanings can be interpreted by a person's feeling or face expression. To have a good understanding of dialogues, it is required in a flexible dialogue processing system to infer the speaker's view properly. However, it is difficult to obtain the meaning of the speaker's sentences in various scenes using traditional methods. In this paper, a new approach for dialogue processing that incorporates information from the speaker's face is presented. We first divide conversation statements into several simple tasks. Second, we process each simple task using an independent processor. Third, we employ some speaker's face information to estimate the view of the speakers to solve ambiguities in dialogues. The approach presented in this paper can work efficiently, because independent processors run in parallel, writing partial results to a shared memory, incorporating partial results at appropriate points, and complementing each other. A parallel algorithm and a method for employing the face information in a dialogue machine translation will be discussed, and some results will be included in this paper.
Botti, F; Alexander, A; Drygajlo, A
2004-12-02
This paper deals with a procedure to compensate for mismatched recording conditions in forensic speaker recognition, using a statistical score normalization. Bayesian interpretation of the evidence in forensic automatic speaker recognition depends on three sets of recordings in order to perform forensic casework: reference (R) and control (C) recordings of the suspect, and a potential population database (P), as well as a questioned recording (QR) . The requirement of similar recording conditions between suspect control database (C) and the questioned recording (QR) is often not satisfied in real forensic cases. The aim of this paper is to investigate a procedure of normalization of scores, which is based on an adaptation of the Test-normalization (T-norm) [2] technique used in the speaker verification domain, to compensate for the mismatch. Polyphone IPSC-02 database and ASPIC (an automatic speaker recognition system developed by EPFL and IPS-UNIL in Lausanne, Switzerland) were used in order to test the normalization procedure. Experimental results for three different recording condition scenarios are presented using Tippett plots and the effect of the compensation on the evaluation of the strength of the evidence is discussed.
Validity of Single-Item Screening for Limited Health Literacy in English and Spanish Speakers.
Bishop, Wendy Pechero; Craddock Lee, Simon J; Skinner, Celette Sugg; Jones, Tiffany M; McCallister, Katharine; Tiro, Jasmin A
2016-05-01
To evaluate 3 single-item screening measures for limited health literacy in a community-based population of English and Spanish speakers. We recruited 324 English and 314 Spanish speakers from a community research registry in Dallas, Texas, enrolled between 2009 and 2012. We used 3 screening measures: (1) How would you rate your ability to read?; (2) How confident are you filling out medical forms by yourself?; and (3) How often do you have someone help you read hospital materials? In analyses stratified by language, we used area under the receiver operating characteristic (AUROC) curves to compare each item with the validated 40-item Short Test of Functional Health Literacy in Adults. For English speakers, no difference was seen among the items. For Spanish speakers, "ability to read" identified inadequate literacy better than "help reading hospital materials" (AUROC curve = 0.76 vs 0.65; P = .019). The "ability to read" item performed the best, supporting use as a screening tool in safety-net systems caring for diverse populations. Future studies should investigate how to implement brief measures in safety-net settings and whether highlighting health literacy level influences providers' communication practices and patient outcomes.
NASA Astrophysics Data System (ADS)
Chen, Jin; Cheng, Wen; Lopresti, Daniel
2011-01-01
Since real data is time-consuming and expensive to collect and label, researchers have proposed approaches using synthetic variations for the tasks of signature verification, speaker authentication, handwriting recognition, keyword spotting, etc. However, the limitation of real data is particularly critical in the field of writer identification in that in forensics, adversaries cannot be expected to provide sufficient data to train a classifier. Therefore, it is unrealistic to always assume sufficient real data to train classifiers extensively for writer identification. In addition, this field differs from many others in that we strive to preserve as much inter-writer variations, but model-perturbed handwriting might break such discriminability among writers. Building on work described in another paper where human subjects were involved in calibrating realistic-looking transformation, we then measured the effects of incorporating perturbed handwriting into the training dataset. Experimental results justified our hypothesis that with limited real data, model-perturbed handwriting improved the performance of writer identification. Particularly, if only one single sample for each writer was available, incorporating perturbed data achieved a 36x performance gain.
Multimodal fusion of polynomial classifiers for automatic person recgonition
NASA Astrophysics Data System (ADS)
Broun, Charles C.; Zhang, Xiaozheng
2001-03-01
With the prevalence of the information age, privacy and personalization are forefront in today's society. As such, biometrics are viewed as essential components of current evolving technological systems. Consumers demand unobtrusive and non-invasive approaches. In our previous work, we have demonstrated a speaker verification system that meets these criteria. However, there are additional constraints for fielded systems. The required recognition transactions are often performed in adverse environments and across diverse populations, necessitating robust solutions. There are two significant problem areas in current generation speaker verification systems. The first is the difficulty in acquiring clean audio signals in all environments without encumbering the user with a head- mounted close-talking microphone. Second, unimodal biometric systems do not work with a significant percentage of the population. To combat these issues, multimodal techniques are being investigated to improve system robustness to environmental conditions, as well as improve overall accuracy across the population. We propose a multi modal approach that builds on our current state-of-the-art speaker verification technology. In order to maintain the transparent nature of the speech interface, we focus on optical sensing technology to provide the additional modality-giving us an audio-visual person recognition system. For the audio domain, we use our existing speaker verification system. For the visual domain, we focus on lip motion. This is chosen, rather than static face or iris recognition, because it provides dynamic information about the individual. In addition, the lip dynamics can aid speech recognition to provide liveness testing. The visual processing method makes use of both color and edge information, combined within Markov random field MRF framework, to localize the lips. Geometric features are extracted and input to a polynomial classifier for the person recognition process. A late integration approach, based on a probabilistic model, is employed to combine the two modalities. The system is tested on the XM2VTS database combined with AWGN in the audio domain over a range of signal-to-noise ratios.
Second NASA Workshop on Wiring for Space Applications
NASA Technical Reports Server (NTRS)
1994-01-01
This document contains the proceedings of the Second NASA Workshop on Wiring for Space Applications held at NASA LeRC in Cleveland, OH, 6-7 Oct. 1993. The workshop was sponsored by NASA Headquarters Code QW Office of Safety and Mission Quality, Technical Standards Division and hosted by NASA LeRC, Power Technology Division, Electrical Components and Systems Branch. The workshop addressed key technology issues in the field of electrical power wiring for space applications. Speakers from government, industry, and academia presented and discussed topics on arc tracking phenomena, wiring system design, insulation constructions, and system protection. Presentation materials provided by the various speakers are included in this document.
A hybrid active/passive exhaust noise control system for locomotives.
Remington, Paul J; Knight, J Scott; Hanna, Doug; Rowley, Craig
2005-01-01
A prototype hybrid system consisting of active and passive components for controlling far-field locomotive exhaust noise has been designed, assembled, and tested on a locomotive. The system consisted of a resistive passive silencer for controlling high-frequency broadband noise and a feedforward multiple-input, multiple-output active control system for suppressing low-frequency tonal noise. The active system used ten roof-mounted bandpass speaker enclosures with 2-12-in. speakers per enclosure as actuators, eight roof-mounted electret microphones as residual sensors, and an optical tachometer that sensed locomotive engine speed as a reference sensor. The system was installed on a passenger locomotive and tested in an operating rail yard. Details of the system are described and the near-field and far-field noise reductions are compared against the design goal.
Speech coding, reconstruction and recognition using acoustics and electromagnetic waves
DOE Office of Scientific and Technical Information (OSTI.GOV)
Holzrichter, J.F.; Ng, L.C.
The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used formore » purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching. 35 figs.« less
Speech endpoint detection with non-language speech sounds for generic speech processing applications
NASA Astrophysics Data System (ADS)
McClain, Matthew; Romanowski, Brian
2009-05-01
Non-language speech sounds (NLSS) are sounds produced by humans that do not carry linguistic information. Examples of these sounds are coughs, clicks, breaths, and filled pauses such as "uh" and "um" in English. NLSS are prominent in conversational speech, but can be a significant source of errors in speech processing applications. Traditionally, these sounds are ignored by speech endpoint detection algorithms, where speech regions are identified in the audio signal prior to processing. The ability to filter NLSS as a pre-processing step can significantly enhance the performance of many speech processing applications, such as speaker identification, language identification, and automatic speech recognition. In order to be used in all such applications, NLSS detection must be performed without the use of language models that provide knowledge of the phonology and lexical structure of speech. This is especially relevant to situations where the languages used in the audio are not known apriori. We present the results of preliminary experiments using data from American and British English speakers, in which segments of audio are classified as language speech sounds (LSS) or NLSS using a set of acoustic features designed for language-agnostic NLSS detection and a hidden-Markov model (HMM) to model speech generation. The results of these experiments indicate that the features and model used are capable of detection certain types of NLSS, such as breaths and clicks, while detection of other types of NLSS such as filled pauses will require future research.
Qi, Beier; Liu, Bo; Liu, Sha; Liu, Haihong; Dong, Ruijuan; Zhang, Ning; Gong, Shusheng
2011-05-01
To study the effect of cochlear electrode coverage and different insertion region on speech recognition, especially tone perception of cochlear implant users whose native language is Mandarin Chinese. Setting seven test conditions by fitting software. All conditions were created by switching on/off respective channels in order to simulate different insertion position. Then Mandarin CI users received 4 Speech tests, including Vowel Identification test, Consonant Identification test, Tone Identification test-male speaker, Mandarin HINT test (SRS) in quiet and noise. To all test conditions: the average score of vowel identification was significantly different, from 56% to 91% (Rank sum test, P < 0.05). The average score of consonant identification was significantly different, from 72% to 85% (ANOVNA, P < 0.05). The average score of Tone identification was not significantly different (ANOVNA, P > 0.05). However the more channels activated, the higher scores obtained, from 68% to 81%. This study shows that there is a correlation between insertion depth and speech recognition. Because all parts of the basement membrane can help CI users to improve their speech recognition ability, it is very important to enhance verbal communication ability and social interaction ability of CI users by increasing insertion depth and actively stimulating the top region of cochlear.
Development of emergent processing loops as a system of systems concept
NASA Astrophysics Data System (ADS)
Gainey, James C., Jr.; Blasch, Erik P.
1999-03-01
This paper describes an engineering approach toward implementing the current neuroscientific understanding of how the primate brain fuses, or integrates, 'information' in the decision-making process. We describe a System of Systems (SoS) design for improving the overall performance, capabilities, operational robustness, and user confidence in Identification (ID) systems and show how it could be applied to biometrics security. We use the Physio-associative temporal sensor integration algorithm (PATSIA) which is motivated by observed functions and interactions of the thalamus, hippocampus, and cortical structures in the brain. PATSIA utilizes signal theory mathematics to model how the human efficiently perceives and uses information from the environment. The hybrid architecture implements a possible SoS-level description of the Joint Directors of US Laboratories for Fusion Working Group's functional description involving 5 levels of fusion and their associated definitions. This SoS architecture propose dynamic sensor and knowledge-source integration by implementing multiple Emergent Processing Loops for predicting, feature extracting, matching, and Searching both static and dynamic database like MSTAR's PEMS loops. Biologically, this effort demonstrates these objectives by modeling similar processes from the eyes, ears, and somatosensory channels, through the thalamus, and to the cortices as appropriate while using the hippocampus for short-term memory search and storage as necessary. The particular approach demonstrated incorporates commercially available speaker verification and face recognition software and hardware to collect data and extract features to the PATSIA. The PATSIA maximizes the confidence levels for target identification or verification in dynamic situations using a belief filter. The proof of concept described here is easily adaptable and scaleable to other military and nonmilitary sensor fusion applications.
2007-03-22
Chief Logistics Program and Industrial Management Division, USCG 12:00pm Luncheon in Jasmine Hibiscus with Speaker: Mr. Louis Kratz, Vice President...Corporation 12:00pm Luncheon in Jasmine Hibiscus with Speaker: VADM Ann Rondeau, USN, Deputy Commander, USTRANSCOM Presentation of the Edward...outcomes Operational Level Integration • JFC integrates JFC rqmts with national systems • “THIS IS THE ESSENCE OF JOINT LOGISTICS” • Optimize component
DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS
2015-05-29
DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS Elliot Singer and Douglas Reynolds Massachusetts Institute of...development data is assumed to be unavailable. The method is based on a generalization of data whitening used in association with i-vector length...normalization and utilizes a library of whitening transforms trained at system development time using strictly out-of-domain data. The approach is
Fan, Samantha P; Liberman, Zoe; Keysar, Boaz; Kinzler, Katherine D
2015-07-01
Early language exposure is essential to developing a formal language system, but may not be sufficient for communicating effectively. To understand a speaker's intention, one must take the speaker's perspective. Multilingual exposure may promote effective communication by enhancing perspective taking. We tested children on a task that required perspective taking to interpret a speaker's intended meaning. Monolingual children failed to interpret the speaker's meaning dramatically more often than both bilingual children and children who were exposed to a multilingual environment but were not bilingual themselves. Children who were merely exposed to a second language performed as well as bilingual children, despite having lower executive-function scores. Thus, the communicative advantages demonstrated by the bilinguals may be social in origin, and not due to enhanced executive control. For millennia, multilingual exposure has been the norm. Our study shows that such an environment may facilitate the development of perspective-taking tools that are critical for effective communication. © The Author(s) 2015.
Musical Sophistication and the Effect of Complexity on Auditory Discrimination in Finnish Speakers.
Dawson, Caitlin; Aalto, Daniel; Šimko, Juraj; Vainio, Martti; Tervaniemi, Mari
2017-01-01
Musical experiences and native language are both known to affect auditory processing. The present work aims to disentangle the influences of native language phonology and musicality on behavioral and subcortical sound feature processing in a population of musically diverse Finnish speakers as well as to investigate the specificity of enhancement from musical training. Finnish speakers are highly sensitive to duration cues since in Finnish, vowel and consonant duration determine word meaning. Using a correlational approach with a set of behavioral sound feature discrimination tasks, brainstem recordings, and a musical sophistication questionnaire, we find no evidence for an association between musical sophistication and more precise duration processing in Finnish speakers either in the auditory brainstem response or in behavioral tasks, but they do show an enhanced pitch discrimination compared to Finnish speakers with less musical experience and show greater duration modulation in a complex task. These results are consistent with a ceiling effect set for certain sound features which corresponds to the phonology of the native language, leaving an opportunity for music experience-based enhancement of sound features not explicitly encoded in the language (such as pitch, which is not explicitly encoded in Finnish). Finally, the pattern of duration modulation in more musically sophisticated Finnish speakers suggests integrated feature processing for greater efficiency in a real world musical situation. These results have implications for research into the specificity of plasticity in the auditory system as well as to the effects of interaction of specific language features with musical experiences.
Musical Sophistication and the Effect of Complexity on Auditory Discrimination in Finnish Speakers
Dawson, Caitlin; Aalto, Daniel; Šimko, Juraj; Vainio, Martti; Tervaniemi, Mari
2017-01-01
Musical experiences and native language are both known to affect auditory processing. The present work aims to disentangle the influences of native language phonology and musicality on behavioral and subcortical sound feature processing in a population of musically diverse Finnish speakers as well as to investigate the specificity of enhancement from musical training. Finnish speakers are highly sensitive to duration cues since in Finnish, vowel and consonant duration determine word meaning. Using a correlational approach with a set of behavioral sound feature discrimination tasks, brainstem recordings, and a musical sophistication questionnaire, we find no evidence for an association between musical sophistication and more precise duration processing in Finnish speakers either in the auditory brainstem response or in behavioral tasks, but they do show an enhanced pitch discrimination compared to Finnish speakers with less musical experience and show greater duration modulation in a complex task. These results are consistent with a ceiling effect set for certain sound features which corresponds to the phonology of the native language, leaving an opportunity for music experience-based enhancement of sound features not explicitly encoded in the language (such as pitch, which is not explicitly encoded in Finnish). Finally, the pattern of duration modulation in more musically sophisticated Finnish speakers suggests integrated feature processing for greater efficiency in a real world musical situation. These results have implications for research into the specificity of plasticity in the auditory system as well as to the effects of interaction of specific language features with musical experiences. PMID:28450829
Speaking fundamental frequency and vowel formant frequencies: effects on perception of gender.
Gelfer, Marylou Pausewang; Bennett, Quinn E
2013-09-01
The purpose of the present study was to investigate the contribution of vowel formant frequencies to gender identification in connected speech, the distinctiveness of vowel formants in males versus females, and how ambiguous speaking fundamental frequencies (SFFs) and vowel formants might affect perception of gender. Multivalent experimental. Speakers subjects (eight tall males, eight short females, and seven males and seven females of "middle" height) were recorded saying two carrier phrases to elicit the vowels /i/ and /α/ and a sentence. The gender/height groups were selected to (presumably) maximize formant differences between some groups (tall vs short) and minimize differences between others (middle height). Each subjects' samples were digitally altered to distinct SFFs (116, 145, 155, 165, and 207 Hz) to represent SFFs typical of average males, average females, and in an ambiguous range. Listeners judged the gender of each randomized altered speech sample. Results indicated that female speakers were perceived as female even with an SFF in the typical male range. For male speakers, gender perception was less accurate at SFFs of 165 Hz and higher. Although the ranges of vowel formants had considerable overlap between genders, significant differences in formant frequencies of males and females were seen. Vowel formants appeared to be important to perception of gender, especially for SFFs in the range of 145-165 Hz; however, formants may be a more salient cue in connected speech when compared with isolated vowels or syllables. Copyright © 2013 The Voice Foundation. Published by Mosby, Inc. All rights reserved.
Effect of gender on communication of health information to older adults.
Dearborn, Jennifer L; Panzer, Victoria P; Burleson, Joseph A; Hornung, Frederick E; Waite, Harrison; Into, Frances H
2006-04-01
To examine the effect of gender on three key elements of communication with elderly individuals: effectiveness of the communication, perceived relevance to the individual, and effect of gender-stereotyped content. Survey. University of Connecticut Health Center. Thirty-three subjects (17 female); aged 69 to 91 (mean+/-standard deviation 82+/-5.4). Older adults listened to 16 brief narratives randomized in order and by the sex of the speaker (Narrator Voice). Effectiveness was measured according to ability to identify key features (Risks), and subjects were asked to rate the relevance (Plausibility). Number of Risks detected and determinations of plausibility were analyzed according to Subject Gender and Narrator Voice. Narratives were written for either sex or included male or female bias (Neutral or Stereotyped). Female subjects identified a significantly higher number of Risks across all narratives (P=.01). Subjects perceived a significantly higher number of Risks with a female Narrator Voice (P=.03). A significant Voice-by-Stereotype interaction was present for female-stereotyped narratives (P=.009). In narratives rated as Plausible, subjects detected more Risks (P=.02). Subject Gender influenced communication effectiveness. A female speaker resulted in identification of more Risks for subjects of both sexes, particularly for Stereotyped narratives. There was no significant effect of matching Subject Gender and Narrator Voice. This study suggests that the sex of the speaker influences the effectiveness of communication with older adults. These findings should motivate future research into the means by which medical providers can improve communication with their patients.
Facial Expression Generation from Speaker's Emotional States in Daily Conversation
NASA Astrophysics Data System (ADS)
Mori, Hiroki; Ohshima, Koh
A framework for generating facial expressions from emotional states in daily conversation is described. It provides a mapping between emotional states and facial expressions, where the former is represented by vectors with psychologically-defined abstract dimensions, and the latter is coded by the Facial Action Coding System. In order to obtain the mapping, parallel data with rated emotional states and facial expressions were collected for utterances of a female speaker, and a neural network was trained with the data. The effectiveness of proposed method is verified by a subjective evaluation test. As the result, the Mean Opinion Score with respect to the suitability of generated facial expression was 3.86 for the speaker, which was close to that of hand-made facial expressions.
Lee, Soomin; Katsuura, Tetsuo; Shimomura, Yoshihiro
2011-01-01
In recent years, a new type of speaker called the parametric speaker has been used to generate highly directional sound, and these speakers are now commercially available. In our previous study, we verified that the burden of the parametric speaker was lower than that of the general speaker for endocrine functions. However, nothing has yet been demonstrated about the effects of the shorter distance than 2.6 m between parametric speakers and the human body. Therefore, we investigated the distance effect on endocrinological function and subjective evaluation. Nine male subjects participated in this study. They completed three consecutive sessions: a 20-min quiet period as a baseline, a 30-min mental task period with general speakers or parametric speakers, and a 20-min recovery period. We measured salivary cortisol and chromogranin A (CgA) concentrations. Furthermore, subjects took the Kwansei-gakuin Sleepiness Scale (KSS) test before and after the task and also a sound quality evaluation test after it. Four experiments, one with a speaker condition (general speaker and parametric speaker), the other with a distance condition (0.3 m and 1.0 m), were conducted, respectively, at the same time of day on separate days. We used three-way repeated measures ANOVA (speaker factor × distance factor × time factor) to examine the effects of the parametric speaker. We found that the endocrinological functions were not significantly different between the speaker condition and the distance condition. The results also showed that the physiological burdens increased with progress in time independent of the speaker condition and distance condition.
The Communication of Public Speaking Anxiety: Perceptions of Asian and American Speakers.
ERIC Educational Resources Information Center
Martini, Marianne; And Others
1992-01-01
Finds that U.S. audiences perceive Asian speakers to have more speech anxiety than U.S. speakers, even though Asian speakers do not self-report higher anxiety levels. Confirms that speech state anxiety is not communicated effectively between speakers and audiences for Asian or U.S. speakers. (SR)
An Investigation of Syntactic Priming among German Speakers at Varying Proficiency Levels
ERIC Educational Resources Information Center
Ruf, Helena T.
2011-01-01
This dissertation investigates syntactic priming in second language (L2) development among three speaker populations: (1) less proficient L2 speakers; (2) advanced L2 speakers; and (3) LI speakers. Using confederate scripting this study examines how German speakers choose certain word orders in locative constructions (e.g., "Auf dem Tisch…
Modeling Speaker Proficiency, Comprehensibility, and Perceived Competence in a Language Use Domain
ERIC Educational Resources Information Center
Schmidgall, Jonathan Edgar
2013-01-01
Research suggests that listener perceptions of a speaker's oral language use, or a speaker's "comprehensibility," may be influenced by a variety of speaker-, listener-, and context-related factors. Primary speaker factors include aspects of the speaker's proficiency in the target language such as pronunciation and…
Methods and apparatus for non-acoustic speech characterization and recognition
Holzrichter, John F.
1999-01-01
By simultaneously recording EM wave reflections and acoustic speech information, the positions and velocities of the speech organs as speech is articulated can be defined for each acoustic speech unit. Well defined time frames and feature vectors describing the speech, to the degree required, can be formed. Such feature vectors can uniquely characterize the speech unit being articulated each time frame. The onset of speech, rejection of external noise, vocalized pitch periods, articulator conditions, accurate timing, the identification of the speaker, acoustic speech unit recognition, and organ mechanical parameters can be determined.
Methods and apparatus for non-acoustic speech characterization and recognition
DOE Office of Scientific and Technical Information (OSTI.GOV)
Holzrichter, J.F.
By simultaneously recording EM wave reflections and acoustic speech information, the positions and velocities of the speech organs as speech is articulated can be defined for each acoustic speech unit. Well defined time frames and feature vectors describing the speech, to the degree required, can be formed. Such feature vectors can uniquely characterize the speech unit being articulated each time frame. The onset of speech, rejection of external noise, vocalized pitch periods, articulator conditions, accurate timing, the identification of the speaker, acoustic speech unit recognition, and organ mechanical parameters can be determined.
Photovoltaic Performance and Reliability Workshop summary
NASA Astrophysics Data System (ADS)
Kroposki, Benjamin
1997-02-01
The objective of the Photovoltaic Performance and Reliability Workshop was to provide a forum where the entire photovoltaic (PV) community (manufacturers, researchers, system designers, and customers) could get together and discuss technical issues relating to PV. The workshop included presentations from twenty-five speakers and had more than one hundred attendees. This workshop also included several open sessions in which the audience and speakers could discuss technical subjects in depth. Several major topics were discussed including: PV characterization and measurements, service lifetimes for PV devices, degradation and failure mechanisms for PV devices, standardization of testing procedures, AC module performance and reliability testing, inverter performance and reliability testing, standardization of utility interconnect requirements, experience from field deployed systems, and system certification.
Molineaux, Benjamin J
2017-03-01
Today, virtually all speakers of Mapudungun (formerly Araucanian), an endangered language of Chile and Argentina, are bilingual in Spanish. As a result, the firmness of native speaker intuitions-especially regarding perceptually complex issues such as word-stress-has been called into question. Even though native intuitions are unavoidable in the investigation of stress position, efforts can be made in order to clarify what the actual sources of the intuitions are, and how consistent and 'native' they remain given the language's asymmetrical contact conditions. In this article, the use of non-native speaker intuitions is proposed as a valid means for assessing the position of stress in Mapudungun, and evaluating whether it represents the unchanged, 'native' pattern. The alternative, of course, is that the patterns that present variability simply result from overlap of the bilingual speakers' phonological modules, hence displaying a contact-induced innovation. A forced decision perception task is reported on, showing that native and non-native perception of Mapudungun stress converges across speakers of six separate first languages, thus giving greater reliability to native judgements. The relative difference in the perception of Mapudungun stress given by Spanish monolinguals and Mapudungun-Spanish bilinguals is also taken to support the diachronic maintenance of the endangered language's stress system.
Bergstra, Myrthe; DE Mulder, Hannah N M; Coopmans, Peter
2018-04-06
This study investigated how speaker certainty (a rational cue) and speaker benevolence (an emotional cue) influence children's willingness to learn words in a selective learning paradigm. In two experiments four- to six-year-olds learnt novel labels from two speakers and, after a week, their memory for these labels was reassessed. Results demonstrated that children retained the label-object pairings for at least a week. Furthermore, children preferred to learn from certain over uncertain speakers, but they had no significant preference for nice over nasty speakers. When the cues were combined, children followed certain speakers, even if they were nasty. However, children did prefer to learn from nice and certain speakers over nasty and certain speakers. These results suggest that rational cues regarding a speaker's linguistic competence trump emotional cues regarding a speaker's affective status in word learning. However, emotional cues were found to have a subtle influence on this process.
Chun, Audrey; Reinhardt, Joann P; Ramirez, Mildred; Ellis, Julie M; Silver, Stephanie; Burack, Orah; Eimicke, Joseph P; Cimarolli, Verena; Teresi, Jeanne A
2017-12-01
To examine agreement between Minimum Data Set clinician ratings and researcher assessments of depression among ethnically diverse nursing home residents using the 9-item Patient Health Questionnaire. Although depression is common among nursing homes residents, its recognition remains a challenge. Observational baseline data from a longitudinal intervention study. Sample of 155 residents from 12 long-term care units in one US facility; 50 were interviewed in Spanish. Convergence between clinician and researcher ratings was examined for (i) self-report capacity, (ii) suicidal ideation, (iii) at least moderate depression, (iv) Patient Health Questionnaire severity scores. Experiences by clinical raters using the depression assessment were analysed. The intraclass correlation coefficient was used to examine concordance and Cohen's kappa to examine agreement between clinicians and researchers. Moderate agreement (κ = 0.52) was observed in determination of capacity and poor to fair agreement in reporting suicidal ideation (κ = 0.10-0.37) across time intervals. Poor agreement was observed in classification of at least moderate depression (κ = -0.02 to 0.24), lower than the maximum kappa obtainable (0.58-0.85). Eight assessors indicated problems assessing Spanish-speaking residents. Among Spanish speakers, researchers identified 16% with Patient Health Questionnaire scores of 10 or greater, and 14% with thoughts of self-harm whilst clinicians identified 6% and 0%, respectively. This study advances the field of depression recognition in long-term care by identification of possible challenges in assessing Spanish speakers. Use of the Patient Health Questionnaire requires further investigation, particularly among non-English speakers. Depression screening for ethnically diverse nursing home residents is required, as underreporting of depression and suicidal ideation among Spanish speakers may result in lack of depression recognition and referral for evaluation and treatment. Training in depression recognition is imperative to improve the recognition, evaluation and treatment of depression in older people living in nursing homes. © 2017 John Wiley & Sons Ltd.
Speaker's voice as a memory cue.
Campeanu, Sandra; Craik, Fergus I M; Alain, Claude
2015-02-01
Speaker's voice occupies a central role as the cornerstone of auditory social interaction. Here, we review the evidence suggesting that speaker's voice constitutes an integral context cue in auditory memory. Investigation into the nature of voice representation as a memory cue is essential to understanding auditory memory and the neural correlates which underlie it. Evidence from behavioral and electrophysiological studies suggest that while specific voice reinstatement (i.e., same speaker) often appears to facilitate word memory even without attention to voice at study, the presence of a partial benefit of similar voices between study and test is less clear. In terms of explicit memory experiments utilizing unfamiliar voices, encoding methods appear to play a pivotal role. Voice congruency effects have been found when voice is specifically attended at study (i.e., when relatively shallow, perceptual encoding takes place). These behavioral findings coincide with neural indices of memory performance such as the parietal old/new recollection effect and the late right frontal effect. The former distinguishes between correctly identified old words and correctly identified new words, and reflects voice congruency only when voice is attended at study. Characterization of the latter likely depends upon voice memory, rather than word memory. There is also evidence to suggest that voice effects can be found in implicit memory paradigms. However, the presence of voice effects appears to depend greatly on the task employed. Using a word identification task, perceptual similarity between study and test conditions is, like for explicit memory tests, crucial. In addition, the type of noise employed appears to have a differential effect. While voice effects have been observed when white noise is used at both study and test, using multi-talker babble does not confer the same results. In terms of neuroimaging research modulations, characterization of an implicit memory effect reflective of voice congruency is currently lacking. Copyright © 2014 Elsevier B.V. All rights reserved.
Callan, Daniel E.; Jones, Jeffery A.; Callan, Akiko
2014-01-01
Behavioral and neuroimaging studies have demonstrated that brain regions involved with speech production also support speech perception, especially under degraded conditions. The premotor cortex (PMC) has been shown to be active during both observation and execution of action (“Mirror System” properties), and may facilitate speech perception by mapping unimodal and multimodal sensory features onto articulatory speech gestures. For this functional magnetic resonance imaging (fMRI) study, participants identified vowels produced by a speaker in audio-visual (saw the speaker's articulating face and heard her voice), visual only (only saw the speaker's articulating face), and audio only (only heard the speaker's voice) conditions with varying audio signal-to-noise ratios in order to determine the regions of the PMC involved with multisensory and modality specific processing of visual speech gestures. The task was designed so that identification could be made with a high level of accuracy from visual only stimuli to control for task difficulty and differences in intelligibility. The results of the functional magnetic resonance imaging (fMRI) analysis for visual only and audio-visual conditions showed overlapping activity in inferior frontal gyrus and PMC. The left ventral inferior premotor cortex (PMvi) showed properties of multimodal (audio-visual) enhancement with a degraded auditory signal. The left inferior parietal lobule and right cerebellum also showed these properties. The left ventral superior and dorsal premotor cortex (PMvs/PMd) did not show this multisensory enhancement effect, but there was greater activity for the visual only over audio-visual conditions in these areas. The results suggest that the inferior regions of the ventral premotor cortex are involved with integrating multisensory information, whereas, more superior and dorsal regions of the PMC are involved with mapping unimodal (in this case visual) sensory features of the speech signal with articulatory speech gestures. PMID:24860526
Improvements of ModalMax High-Fidelity Piezoelectric Audio Device
NASA Technical Reports Server (NTRS)
Woodard, Stanley E.
2005-01-01
ModalMax audio speakers have been enhanced by innovative means of tailoring the vibration response of thin piezoelectric plates to produce a high-fidelity audio response. The ModalMax audio speakers are 1 mm in thickness. The device completely supplants the need to have a separate driver and speaker cone. ModalMax speakers can perform the same applications of cone speakers, but unlike cone speakers, ModalMax speakers can function in harsh environments such as high humidity or extreme wetness. New design features allow the speakers to be completely submersed in salt water, making them well suited for maritime applications. The sound produced from the ModalMax audio speakers has sound spatial resolution that is readily discernable for headset users.
Chang, Son-A; Won, Jong Ho; Kim, HyangHee; Oh, Seung-Ha; Tyler, Richard S.; Cho, Chang Hyun
2018-01-01
Background and Objectives It is important to understand the frequency region of cues used, and not used, by cochlear implant (CI) recipients. Speech and environmental sound recognition by individuals with CI and normal-hearing (NH) was measured. Gradients were also computed to evaluate the pattern of change in identification performance with respect to the low-pass filtering or high-pass filtering cutoff frequencies. Subjects and Methods Frequency-limiting effects were implemented in the acoustic waveforms by passing the signals through low-pass filters (LPFs) or high-pass filters (HPFs) with seven different cutoff frequencies. Identification of Korean vowels and consonants produced by a male and female speaker and environmental sounds was measured. Crossover frequencies were determined for each identification test, where the LPF and HPF conditions show the identical identification scores. Results CI and NH subjects showed changes in identification performance in a similar manner as a function of cutoff frequency for the LPF and HPF conditions, suggesting that the degraded spectral information in the acoustic signals may similarly constraint the identification performance for both subject groups. However, CI subjects were generally less efficient than NH subjects in using the limited spectral information for speech and environmental sound identification due to the inefficient coding of acoustic cues through the CI sound processors. Conclusions This finding will provide vital information in Korean for understanding how different the frequency information is in receiving speech and environmental sounds by CI processor from normal hearing. PMID:29325391
Chang, Son-A; Won, Jong Ho; Kim, HyangHee; Oh, Seung-Ha; Tyler, Richard S; Cho, Chang Hyun
2017-12-01
It is important to understand the frequency region of cues used, and not used, by cochlear implant (CI) recipients. Speech and environmental sound recognition by individuals with CI and normal-hearing (NH) was measured. Gradients were also computed to evaluate the pattern of change in identification performance with respect to the low-pass filtering or high-pass filtering cutoff frequencies. Frequency-limiting effects were implemented in the acoustic waveforms by passing the signals through low-pass filters (LPFs) or high-pass filters (HPFs) with seven different cutoff frequencies. Identification of Korean vowels and consonants produced by a male and female speaker and environmental sounds was measured. Crossover frequencies were determined for each identification test, where the LPF and HPF conditions show the identical identification scores. CI and NH subjects showed changes in identification performance in a similar manner as a function of cutoff frequency for the LPF and HPF conditions, suggesting that the degraded spectral information in the acoustic signals may similarly constraint the identification performance for both subject groups. However, CI subjects were generally less efficient than NH subjects in using the limited spectral information for speech and environmental sound identification due to the inefficient coding of acoustic cues through the CI sound processors. This finding will provide vital information in Korean for understanding how different the frequency information is in receiving speech and environmental sounds by CI processor from normal hearing.
Partially supervised speaker clustering.
Tang, Hao; Chu, Stephen Mingyu; Hasegawa-Johnson, Mark; Huang, Thomas S
2012-05-01
Content-based multimedia indexing, retrieval, and processing as well as multimedia databases demand the structuring of the media content (image, audio, video, text, etc.), one significant goal being to associate the identity of the content to the individual segments of the signals. In this paper, we specifically address the problem of speaker clustering, the task of assigning every speech utterance in an audio stream to its speaker. We offer a complete treatment to the idea of partially supervised speaker clustering, which refers to the use of our prior knowledge of speakers in general to assist the unsupervised speaker clustering process. By means of an independent training data set, we encode the prior knowledge at the various stages of the speaker clustering pipeline via 1) learning a speaker-discriminative acoustic feature transformation, 2) learning a universal speaker prior model, and 3) learning a discriminative speaker subspace, or equivalently, a speaker-discriminative distance metric. We study the directional scattering property of the Gaussian mixture model (GMM) mean supervector representation of utterances in the high-dimensional space, and advocate exploiting this property by using the cosine distance metric instead of the euclidean distance metric for speaker clustering in the GMM mean supervector space. We propose to perform discriminant analysis based on the cosine distance metric, which leads to a novel distance metric learning algorithm—linear spherical discriminant analysis (LSDA). We show that the proposed LSDA formulation can be systematically solved within the elegant graph embedding general dimensionality reduction framework. Our speaker clustering experiments on the GALE database clearly indicate that 1) our speaker clustering methods based on the GMM mean supervector representation and vector-based distance metrics outperform traditional speaker clustering methods based on the “bag of acoustic features” representation and statistical model-based distance metrics, 2) our advocated use of the cosine distance metric yields consistent increases in the speaker clustering performance as compared to the commonly used euclidean distance metric, 3) our partially supervised speaker clustering concept and strategies significantly improve the speaker clustering performance over the baselines, and 4) our proposed LSDA algorithm further leads to state-of-the-art speaker clustering performance.
Hoedemaker, Renske S; Ernst, Jessica; Meyer, Antje S; Belke, Eva
2017-01-01
This study assessed the effects of semantic context in the form of self-produced and other-produced words on subsequent language production. Pairs of participants performed a joint picture naming task, taking turns while naming a continuous series of pictures. In the single-speaker version of this paradigm, naming latencies have been found to increase for successive presentations of exemplars from the same category, a phenomenon known as Cumulative Semantic Interference (CSI). As expected, the joint-naming task showed a within-speaker CSI effect, such that naming latencies increased as a function of the number of category exemplars named previously by the participant (self-produced items). Crucially, we also observed an across-speaker CSI effect, such that naming latencies slowed as a function of the number of category members named by the participant's task partner (other-produced items). The magnitude of the across-speaker CSI effect did not vary as a function of whether or not the listening participant could see the pictures their partner was naming. The observation of across-speaker CSI suggests that the effect originates at the conceptual level of the language system, as proposed by Belke's (2013) Conceptual Accumulation account. Whereas self-produced and other-produced words both resulted in a CSI effect on naming latencies, post-experiment free recall rates were higher for self-produced than other-produced items. Together, these results suggest that both speaking and listening result in implicit learning at the conceptual level of the language system but that these effects are independent of explicit learning as indicated by item recall. Copyright © 2016 Elsevier B.V. All rights reserved.
Lee, Kichol; Casali, John G
2017-01-01
To design a test battery and conduct a proof-of-concept experiment of a test method that can be used to measure the detection performance afforded by military advanced hearing protection devices (HPDs) and tactical communication and protective systems (TCAPS). The detection test was conducted with each of the four loudspeakers located at front, right, rear and left of the participant. Participants wore 2 in-ear-type TCAPS, 1 earmuff-type TCAPS, a passive Combat Arms Earplug in its "open" or pass-through setting and an EB-15LE™ electronic earplug. Devices with electronic gain systems were tested under two gain settings: "unity" and "max". Testing without any device (open ear) was conducted as a control. Ten participants with audiometric requirements of 25 dBHL or better at 500, 1000, 2000, 4000, 8000 Hz in both ears. Detection task performance varied with different signals and speaker locations. The test identified performance differences among certain TCAPS and protectors, and the open ear. A computer-controlled detection subtest of the Detection-Recognition/Identification-Localisation-Communication (DRILCOM) test battery was designed and implemented. Tested in a proof-of-concept experiment, it showed statistically-significant sensitivity to device differences in detection effects with the small sample of participants (10). This result has important implications for selection and deployment of TCAPS and HPDs on soldiers and workers in dynamic situations.
Deep bottleneck features for spoken language identification.
Jiang, Bing; Song, Yan; Wei, Si; Liu, Jun-Hua; McLoughlin, Ian Vince; Dai, Li-Rong
2014-01-01
A key problem in spoken language identification (LID) is to design effective representations which are specific to language information. For example, in recent years, representations based on both phonotactic and acoustic features have proven their effectiveness for LID. Although advances in machine learning have led to significant improvements, LID performance is still lacking, especially for short duration speech utterances. With the hypothesis that language information is weak and represented only latently in speech, and is largely dependent on the statistical properties of the speech content, existing representations may be insufficient. Furthermore they may be susceptible to the variations caused by different speakers, specific content of the speech segments, and background noise. To address this, we propose using Deep Bottleneck Features (DBF) for spoken LID, motivated by the success of Deep Neural Networks (DNN) in speech recognition. We show that DBFs can form a low-dimensional compact representation of the original inputs with a powerful descriptive and discriminative capability. To evaluate the effectiveness of this, we design two acoustic models, termed DBF-TV and parallel DBF-TV (PDBF-TV), using a DBF based i-vector representation for each speech utterance. Results on NIST language recognition evaluation 2009 (LRE09) show significant improvements over state-of-the-art systems. By fusing the output of phonotactic and acoustic approaches, we achieve an EER of 1.08%, 1.89% and 7.01% for 30 s, 10 s and 3 s test utterances respectively. Furthermore, various DBF configurations have been extensively evaluated, and an optimal system proposed.
Kharlamov, Viktor; Campbell, Kenneth; Kazanina, Nina
2011-11-01
Speech sounds are not always perceived in accordance with their acoustic-phonetic content. For example, an early and automatic process of perceptual repair, which ensures conformity of speech inputs to the listener's native language phonology, applies to individual input segments that do not exist in the native inventory or to sound sequences that are illicit according to the native phonotactic restrictions on sound co-occurrences. The present study with Russian and Canadian English speakers shows that listeners may perceive phonetically distinct and licit sound sequences as equivalent when the native language system provides robust evidence for mapping multiple phonetic forms onto a single phonological representation. In Russian, due to an optional but productive t-deletion process that affects /stn/ clusters, the surface forms [sn] and [stn] may be phonologically equivalent and map to a single phonological form /stn/. In contrast, [sn] and [stn] clusters are usually phonologically distinct in (Canadian) English. Behavioral data from identification and discrimination tasks indicated that [sn] and [stn] clusters were more confusable for Russian than for English speakers. The EEG experiment employed an oddball paradigm with nonwords [asna] and [astna] used as the standard and deviant stimuli. A reliable mismatch negativity response was elicited approximately 100 msec postchange in the English group but not in the Russian group. These findings point to a perceptual repair mechanism that is engaged automatically at a prelexical level to ensure immediate encoding of speech inputs in phonological terms, which in turn enables efficient access to the meaning of a spoken utterance.
ERIC Educational Resources Information Center
Subtirelu, Nicholas Close; Lindemann, Stephanie
2016-01-01
While most research in applied linguistics has focused on second language (L2) speakers and their language capabilities, the success of interaction between such speakers and first language (L1) speakers also relies on the positive attitudes and communication skills of the L1 speakers. However, some research has suggested that many L1 speakers lack…
Consistency between verbal and non-verbal affective cues: a clue to speaker credibility.
Gillis, Randall L; Nilsen, Elizabeth S
2017-06-01
Listeners are exposed to inconsistencies in communication; for example, when speakers' words (i.e. verbal) are discrepant with their demonstrated emotions (i.e. non-verbal). Such inconsistencies introduce ambiguity, which may render a speaker to be a less credible source of information. Two experiments examined whether children make credibility discriminations based on the consistency of speakers' affect cues. In Experiment 1, school-age children (7- to 8-year-olds) preferred to solicit information from consistent speakers (e.g. those who provided a negative statement with negative affect), over novel speakers, to a greater extent than they preferred to solicit information from inconsistent speakers (e.g. those who provided a negative statement with positive affect) over novel speakers. Preschoolers (4- to 5-year-olds) did not demonstrate this preference. Experiment 2 showed that school-age children's ratings of speakers were influenced by speakers' affect consistency when the attribute being judged was related to information acquisition (speakers' believability, "weird" speech), but not general characteristics (speakers' friendliness, likeability). Together, findings suggest that school-age children are sensitive to, and use, the congruency of affect cues to determine whether individuals are credible sources of information.
Inferring speaker attributes in adductor spasmodic dysphonia: ratings from unfamiliar listeners.
Isetti, Derek; Xuereb, Linnea; Eadie, Tanya L
2014-05-01
To determine whether unfamiliar listeners' perceptions of speakers with adductor spasmodic dysphonia (ADSD) differ from control speakers on the parameters of relative age, confidence, tearfulness, and vocal effort and are related to speaker-rated vocal effort or voice-specific quality of life. Twenty speakers with ADSD (including 6 speakers with ADSD plus tremor) and 20 age- and sex-matched controls provided speech recordings, completed a voice-specific quality-of-life instrument (Voice Handicap Index; Jacobson et al., 1997), and rated their own vocal effort. Twenty listeners evaluated speech samples for relative age, confidence, tearfulness, and vocal effort using rating scales. Listeners judged speakers with ADSD as sounding significantly older, less confident, more tearful, and more effortful than control speakers (p < .01). Increased vocal effort was strongly associated with decreased speaker confidence (rs = .88-.89) and sounding more tearful (rs = .83-.85). Self-rated speaker effort was moderately related (rs = .45-.52) to listener impressions. Listeners' perceptions of confidence and tearfulness were also moderately associated with higher Voice Handicap Index scores (rs = .65-.70). Unfamiliar listeners judge speakers with ADSD more negatively than control speakers, with judgments extending beyond typical clinical measures. The results have implications for counseling and understanding the psychosocial effects of ADSD.
NASA Astrophysics Data System (ADS)
Dimakis, Nikolaos; Soldatos, John; Polymenakos, Lazaros; Sturm, Janienke; Neumann, Joachim; Casas, Josep R.
The CHIL Memory Jog service focuses on facilitating the collaboration of participants in meetings, lectures, presentations, and other human interactive events, occurring in indoor CHIL spaces. It exploits the whole set of the perceptual components that have been developed by the CHIL Consortium partners (e.g., person tracking, face identification, audio source localization, etc) along with a wide range of actuating devices such as projectors, displays, targeted audio devices, speakers, etc. The underlying set of perceptual components provides a constant flow of elementary contextual information, such as “person at location x0,y0”, “speech at location x0,y0”, information that alone is not of significant use. However, the CHIL Memory Jog service is accompanied by powerful situation identification techniques that fuse all the incoming information and creates complex states that drive the actuating logic.
Waaramaa, Teija; Leisiö, Timo
2013-01-01
The present study focused on voice quality and the perception of the basic emotions from speech samples in cross-cultural conditions. It was examined whether voice quality, cultural, or language background, age, or gender were related to the identification of the emotions. Professional actors (n2) and actresses (n2) produced non-sense sentences (n32) and protracted vowels (n8) expressing the six basic emotions, interest, and a neutral emotional state. The impact of musical interests on the ability to distinguish between emotions or valence (on an axis positivity – neutrality – negativity) from voice samples was studied. Listening tests were conducted on location in five countries: Estonia, Finland, Russia, Sweden, and the USA with 50 randomly chosen participants (25 males and 25 females) in each country. The participants (total N = 250) completed a questionnaire eliciting their background information and musical interests. The responses in the listening test and the questionnaires were statistically analyzed. Voice quality parameters and the share of the emotions and valence identified correlated significantly with each other for both genders. The percentage of emotions and valence identified was clearly above the chance level in each of the five countries studied, however, the countries differed significantly from each other for the identified emotions and the gender of the speaker. The samples produced by females were identified significantly better than those produced by males. Listener's age was a significant variable. Only minor gender differences were found for the identification. Perceptual confusion in the listening test between emotions seemed to be dependent on their similar voice production types. Musical interests tended to have a positive effect on the identification of the emotions. The results also suggest that identifying emotions from speech samples may be easier for those listeners who share a similar language or cultural background with the speaker. PMID:23801972
2012-03-01
with each SVM discriminating between a pair of the N total speakers in the data set. The (( + 1))/2 classifiers then vote on the final...classification of a test sample. The Random Forest classifier is an ensemble classifier that votes amongst decision trees generated with each node using...Forest vote , and the effects of overtraining will be mitigated by the fact that each decision tree is overtrained differently (due to the random
CNN: a speaker recognition system using a cascaded neural network.
Zaki, M; Ghalwash, A; Elkouny, A A
1996-05-01
The main emphasis of this paper is to present an approach for combining supervised and unsupervised neural network models to the issue of speaker recognition. To enhance the overall operation and performance of recognition, the proposed strategy integrates the two techniques, forming one global model called the cascaded model. We first present a simple conventional technique based on the distance measured between a test vector and a reference vector for different speakers in the population. This particular distance metric has the property of weighting down the components in those directions along which the intraspeaker variance is large. The reason for presenting this method is to clarify the discrepancy in performance between the conventional and neural network approach. We then introduce the idea of using unsupervised learning technique, presented by the winner-take-all model, as a means of recognition. Due to several tests that have been conducted and in order to enhance the performance of this model, dealing with noisy patterns, we have preceded it with a supervised learning model--the pattern association model--which acts as a filtration stage. This work includes both the design and implementation of both conventional and neural network approaches to recognize the speakers templates--which are introduced to the system via a voice master card and preprocessed before extracting the features used in the recognition. The conclusion indicates that the system performance in case of neural network is better than that of the conventional one, achieving a smooth degradation in respect of noisy patterns, and higher performance in respect of noise-free patterns.
Speaker Linking and Applications using Non-Parametric Hashing Methods
2016-09-08
clustering method based on hashing—canopy- clustering . We apply this method to a large corpus of speaker recordings, demonstrate performance tradeoffs...and compare to other hash- ing methods. Index Terms: speaker recognition, clustering , hashing, locality sensitive hashing. 1. Introduction We assume...speaker in our corpus. Second, given a QBE method, how can we perform speaker clustering —each clustering should be a single speaker, and a cluster should
On the Time Course of Vocal Emotion Recognition
Pell, Marc D.; Kotz, Sonja A.
2011-01-01
How quickly do listeners recognize emotions from a speaker's voice, and does the time course for recognition vary by emotion type? To address these questions, we adapted the auditory gating paradigm to estimate how much vocal information is needed for listeners to categorize five basic emotions (anger, disgust, fear, sadness, happiness) and neutral utterances produced by male and female speakers of English. Semantically-anomalous pseudo-utterances (e.g., The rivix jolled the silling) conveying each emotion were divided into seven gate intervals according to the number of syllables that listeners heard from sentence onset. Participants (n = 48) judged the emotional meaning of stimuli presented at each gate duration interval, in a successive, blocked presentation format. Analyses looked at how recognition of each emotion evolves as an utterance unfolds and estimated the “identification point” for each emotion. Results showed that anger, sadness, fear, and neutral expressions are recognized more accurately at short gate intervals than happiness, and particularly disgust; however, as speech unfolds, recognition of happiness improves significantly towards the end of the utterance (and fear is recognized more accurately than other emotions). When the gate associated with the emotion identification point of each stimulus was calculated, data indicated that fear (M = 517 ms), sadness (M = 576 ms), and neutral (M = 510 ms) expressions were identified from shorter acoustic events than the other emotions. These data reveal differences in the underlying time course for conscious recognition of basic emotions from vocal expressions, which should be accounted for in studies of emotional speech processing. PMID:22087275
The effect of tonal changes on voice onset time in Mandarin esophageal speech.
Liu, Hanjun; Ng, Manwa L; Wan, Mingxi; Wang, Supin; Zhang, Yi
2008-03-01
The present study investigated the effect of tonal changes on voice onset time (VOT) between normal laryngeal (NL) and superior esophageal (SE) speakers of Mandarin Chinese. VOT values were measured from the syllables /pha/, /tha/, and /kha/ produced at four tone levels by eight NL and seven SE speakers who were native speakers of Mandarin. Results indicated that Mandarin tones were associated with significantly different VOT values for NL speakers, in which high-falling tone was associated with significantly shorter VOT values than mid-rising tone and falling-rising tone. Regarding speaker group, SE speakers showed significantly shorter VOT values than NL speakers across all tone levels. This may be related to their use of pharyngoesophageal (PE) segment as another sound source. SE speakers appear to take a shorter time to start PE segment vibration compared to NL speakers using the vocal folds for vibration.
Ng, Manwa L; Chen, Yang
2011-12-01
The present study examined English sentence stress produced by native Cantonese speakers who were speaking English as a second language (ESL). Cantonese ESL speakers' proficiency in English stress production as perceived by English-speaking listeners was also studied. Acoustical parameters associated with sentence stress including fundamental frequency (F0), vowel duration, and intensity were measured from the English sentences produced by 40 Cantonese ESL speakers. Data were compared with those obtained from 40 native speakers of American English. The speech samples were also judged by eight native listeners who were native speakers of American English for placement, degree, and naturalness of stress. Results showed that Cantonese ESL speakers were able to use F0, vowel duration, and intensity to differentiate sentence stress patterns. Yet, both female and male Cantonese ESL speakers exhibited consistently higher F0 in stressed words than English speakers. Overall, Cantonese ESL speakers were found to be proficient in using duration and intensity to signal sentence stress, in a way comparable with English speakers. In addition, F0 and intensity were found to correlate closely with perceptual judgement and the degree of stress with the naturalness of stress.
Foucart, Alice; Garcia, Xavier; Ayguasanosa, Meritxell; Thierry, Guillaume; Martin, Clara; Costa, Albert
2015-08-01
The present study investigated how pragmatic information is integrated during L2 sentence comprehension. We put forward that the differences often observed between L1 and L2 sentence processing may reflect differences on how various types of information are used to process a sentence, and not necessarily differences between native and non-native linguistic systems. Based on the idea that when a cue is missing or distorted, one relies more on other cues available, we hypothesised that late bilinguals favour the cues that they master during sentence processing. To verify this hypothesis we investigated whether late bilinguals take the speaker's identity (inferred by the voice) into account when incrementally processing speech and whether this affects their online interpretation of the sentence. To do so, we adapted Van Berkum, J.J.A., Van den Brink, D., Tesink, C.M.J.Y., Kos, M., Hagoort, P., 2008. J. Cogn. Neurosci. 20(4), 580-591, study in which sentences with either semantic violations or pragmatic inconsistencies were presented. While both the native and the non-native groups showed a similar response to semantic violations (N400), their response to speakers' inconsistencies slightly diverged; late bilinguals showed a positivity much earlier than native speakers (LPP). These results suggest that, like native speakers, late bilinguals process semantic and pragmatic information incrementally; however, what seems to differ between L1 and L2 processing is the time-course of the different processes. We propose that this difference may originate from late bilinguals' sensitivity to pragmatic information and/or their ability to efficiently make use of the information provided by the sentence context to generate expectations in relation to pragmatic information during L2 sentence comprehension. In other words, late bilinguals may rely more on speaker identity than native speakers when they face semantic integration difficulties. Copyright © 2015 Elsevier Ltd. All rights reserved.
The Speaker Gender Gap at Critical Care Conferences.
Mehta, Sangeeta; Rose, Louise; Cook, Deborah; Herridge, Margaret; Owais, Sawayra; Metaxa, Victoria
2018-06-01
To review women's participation as faculty at five critical care conferences over 7 years. Retrospective analysis of five scientific programs to identify the proportion of females and each speaker's profession based on conference conveners, program documents, or internet research. Three international (European Society of Intensive Care Medicine, International Symposium on Intensive Care and Emergency Medicine, Society of Critical Care Medicine) and two national (Critical Care Canada Forum, U.K. Intensive Care Society State of the Art Meeting) annual critical care conferences held between 2010 and 2016. Female faculty speakers. None. Male speakers outnumbered female speakers at all five conferences, in all 7 years. Overall, women represented 5-31% of speakers, and female physicians represented 5-26% of speakers. Nursing and allied health professional faculty represented 0-25% of speakers; in general, more than 50% of allied health professionals were women. Over the 7 years, Society of Critical Care Medicine had the highest representation of female (27% overall) and nursing/allied health professional (16-25%) speakers; notably, male physicians substantially outnumbered female physicians in all years (62-70% vs 10-19%, respectively). Women's representation on conference program committees ranged from 0% to 40%, with Society of Critical Care Medicine having the highest representation of women (26-40%). The female proportions of speakers, physician speakers, and program committee members increased significantly over time at the Society of Critical Care Medicine and U.K. Intensive Care Society State of the Art Meeting conferences (p < 0.05), but there was no temporal change at the other three conferences. There is a speaker gender gap at critical care conferences, with male faculty outnumbering female faculty. This gap is more marked among physician speakers than those speakers representing nursing and allied health professionals. Several organizational strategies can address this gender gap.
Reflecting on Native Speaker Privilege
ERIC Educational Resources Information Center
Berger, Kathleen
2014-01-01
The issues surrounding native speakers (NSs) and nonnative speakers (NNSs) as teachers (NESTs and NNESTs, respectively) in the field of teaching English to speakers of other languages (TESOL) are a current topic of interest. In many contexts, the native speaker of English is viewed as the model teacher, thus putting the NEST into a position of…
ERIC Educational Resources Information Center
Kersten, Alan W.; Meissner, Christian A.; Lechuga, Julia; Schwartz, Bennett L.; Albrechtsen, Justin S.; Iglesias, Adam
2010-01-01
Three experiments provide evidence that the conceptualization of moving objects and events is influenced by one's native language, consistent with linguistic relativity theory. Monolingual English speakers and bilingual Spanish/English speakers tested in an English-speaking context performed better than monolingual Spanish speakers and bilingual…
Hybrid Speaker Recognition Using Universal Acoustic Model
NASA Astrophysics Data System (ADS)
Nishimura, Jun; Kuroda, Tadahiro
We propose a novel speaker recognition approach using a speaker-independent universal acoustic model (UAM) for sensornet applications. In sensornet applications such as “Business Microscope”, interactions among knowledge workers in an organization can be visualized by sensing face-to-face communication using wearable sensor nodes. In conventional studies, speakers are detected by comparing energy of input speech signals among the nodes. However, there are often synchronization errors among the nodes which degrade the speaker recognition performance. By focusing on property of the speaker's acoustic channel, UAM can provide robustness against the synchronization error. The overall speaker recognition accuracy is improved by combining UAM with the energy-based approach. For 0.1s speech inputs and 4 subjects, speaker recognition accuracy of 94% is achieved at the synchronization error less than 100ms.
Speech watermarking: an approach for the forensic analysis of digital telephonic recordings.
Faundez-Zanuy, Marcos; Lucena-Molina, Jose J; Hagmüller, Martin
2010-07-01
In this article, the authors discuss the problem of forensic authentication of digital audio recordings. Although forensic audio has been addressed in several articles, the existing approaches are focused on analog magnetic recordings, which are less prevalent because of the large amount of digital recorders available on the market (optical, solid state, hard disks, etc.). An approach based on digital signal processing that consists of spread spectrum techniques for speech watermarking is presented. This approach presents the advantage that the authentication is based on the signal itself rather than the recording format. Thus, it is valid for usual recording devices in police-controlled telephone intercepts. In addition, our proposal allows for the introduction of relevant information such as the recording date and time and all the relevant data (this is not always possible with classical systems). Our experimental results reveal that the speech watermarking procedure does not interfere in a significant way with the posterior forensic speaker identification.
Investigating Auditory Processing of Syntactic Gaps with L2 Speakers Using Pupillometry
ERIC Educational Resources Information Center
Fernandez, Leigh; Höhle, Barbara; Brock, Jon; Nickels, Lyndsey
2018-01-01
According to the Shallow Structure Hypothesis (SSH), second language (L2) speakers, unlike native speakers, build shallow syntactic representations during sentence processing. In order to test the SSH, this study investigated the processing of a syntactic movement in both native speakers of English and proficient late L2 speakers of English using…
A Model of Mandarin Tone Categories--A Study of Perception and Production
ERIC Educational Resources Information Center
Yang, Bei
2010-01-01
The current study lays the groundwork for a model of Mandarin tones based on both native speakers' and non-native speakers' perception and production. It demonstrates that there is variability in non-native speakers' tone productions and that there are differences in the perceptual boundaries in native speakers and non-native speakers. There…
Literacy Skill Differences between Adult Native English and Native Spanish Speakers
ERIC Educational Resources Information Center
Herman, Julia; Cote, Nicole Gilbert; Reilly, Lenore; Binder, Katherine S.
2013-01-01
The goal of this study was to compare the literacy skills of adult native English and native Spanish ABE speakers. Participants were 169 native English speakers and 124 native Spanish speakers recruited from five prior research projects. The results showed that the native Spanish speakers were less skilled on morphology and passage comprehension…
ERIC Educational Resources Information Center
Lee, Jiyeon; Yoshida, Masaya; Thompson, Cynthia K.
2015-01-01
Purpose: Grammatical encoding (GE) is impaired in agrammatic aphasia; however, the nature of such deficits remains unclear. We examined grammatical planning units during real-time sentence production in speakers with agrammatic aphasia and control speakers, testing two competing models of GE. We queried whether speakers with agrammatic aphasia…
And then I saw her race: Race-based expectations affect infants' word processing.
Weatherhead, Drew; White, Katherine S
2018-08-01
How do our expectations about speakers shape speech perception? Adults' speech perception is influenced by social properties of the speaker (e.g., race). When in development do these influences begin? In the current study, 16-month-olds heard familiar words produced in their native accent (e.g., "dog") and in an unfamiliar accent involving a vowel shift (e.g., "dag"), in the context of an image of either a same-race speaker or an other-race speaker. Infants' interpretation of the words depended on the speaker's race. For the same-race speaker, infants only recognized words produced in the familiar accent; for the other-race speaker, infants recognized both versions of the words. Two additional experiments showed that infants only recognized an other-race speaker's atypical pronunciations when they differed systematically from the native accent. These results provide the first evidence that expectations driven by unspoken properties of speakers, such as race, influence infants' speech processing. Copyright © 2018 Elsevier B.V. All rights reserved.
Word Durations in Non-Native English
Baker, Rachel E.; Baese-Berk, Melissa; Bonnasse-Gahot, Laurent; Kim, Midam; Van Engen, Kristin J.; Bradlow, Ann R.
2010-01-01
In this study, we compare the effects of English lexical features on word duration for native and non-native English speakers and for non-native speakers with different L1s and a range of L2 experience. We also examine whether non-native word durations lead to judgments of a stronger foreign accent. We measured word durations in English paragraphs read by 12 American English (AE), 20 Korean, and 20 Chinese speakers. We also had AE listeners rate the `accentedness' of these non-native speakers. AE speech had shorter durations, greater within-speaker word duration variance, greater reduction of function words, and less between-speaker variance than non-native speech. However, both AE and non-native speakers showed sensitivity to lexical predictability by reducing second mentions and high frequency words. Non-native speakers with more native-like word durations, greater within-speaker word duration variance, and greater function word reduction were perceived as less accented. Overall, these findings identify word duration as an important and complex feature of foreign-accented English. PMID:21516172
Steensberg, Alvilda T; Eriksen, Mette M; Andersen, Lars B; Hendriksen, Ole M; Larsen, Heinrich D; Laier, Gunnar H; Thougaard, Thomas
2017-06-01
The European Resuscitation Council Guidelines 2015 recommend bystanders to activate their mobile phone speaker function, if possible, in case of suspected cardiac arrest. This is to facilitate continuous dialogue with the dispatcher including (if required) cardiopulmonary resuscitation instructions. The aim of this study was to measure the bystander capability to activate speaker function in case of suspected cardiac arrest. In 87days, a systematic prospective registration of bystander capability to activate the speaker function, when cardiac arrest was suspected, was performed. For those asked, "can you activate your mobile phone's speaker function", audio recordings were examined and categorized into groups according to the bystanders capability to activate speaker function on their own initiative, without instructions, or with instructions from the emergency medical dispatcher. Time delay was measured, in seconds, for the bystanders without pre-activated speaker function. 42.0% (58) was able to activate the speaker function without instructions, 2.9% (4) with instructions, 18.1% (25) on own initiative and 37.0% (51) were unable to activate the speaker function. The median time to activate speaker function was 19s and 8s, with and without instructions, respectively. Dispatcher assisted cardiopulmonary resuscitation with activated speaker function, in cases of suspected cardiac arrest, allows for continuous dialogue between the emergency medical dispatcher and the bystander. In this study, we found a 63.0% success rate of activating the speaker function in such situations. Copyright © 2017 Elsevier B.V. All rights reserved.
ERIC Educational Resources Information Center
Ellis, Elizabeth M.
2016-01-01
Teacher linguistic identity has so far mainly been researched in terms of whether a teacher identifies (or is identified by others) as a native speaker (NEST) or nonnative speaker (NNEST) (Moussu & Llurda, 2008; Reis, 2011). Native speakers are presumed to be monolingual, and nonnative speakers, although by definition bilingual, tend to be…
42: An Open-Source Simulation Tool for Study and Design of Spacecraft Attitude Control Systems
NASA Technical Reports Server (NTRS)
Stoneking, Eric
2018-01-01
Simulation is an important tool in the analysis and design of spacecraft attitude control systems. The speaker will discuss the simulation tool, called simply 42, that he has developed over the years to support his own work as an engineer in the Attitude Control Systems Engineering Branch at NASA Goddard Space Flight Center. 42 was intended from the outset to be high-fidelity and powerful, but also fast and easy to use. 42 is publicly available as open source since 2014. The speaker will describe some of 42's models and features, and discuss its applicability to studies ranging from early concept studies through the design cycle, integration, and operations. He will outline 42's architecture and share some thoughts on simulation development as a long-term project.
How Cognitive Load Influences Speakers' Choice of Referring Expressions.
Vogels, Jorrig; Krahmer, Emiel; Maes, Alfons
2015-08-01
We report on two experiments investigating the effect of an increased cognitive load for speakers on the choice of referring expressions. Speakers produced story continuations to addressees, in which they referred to characters that were either salient or non-salient in the discourse. In Experiment 1, referents that were salient for the speaker were non-salient for the addressee, and vice versa. In Experiment 2, all discourse information was shared between speaker and addressee. Cognitive load was manipulated by the presence or absence of a secondary task for the speaker. The results show that speakers under load are more likely to produce pronouns, at least when referring to less salient referents. We take this finding as evidence that speakers under load have more difficulties taking discourse salience into account, resulting in the use of expressions that are more economical for themselves. © 2014 Cognitive Science Society, Inc.
Yum, Yen Na; Holcomb, Phillip J.; Grainger, Jonathan
2011-01-01
Comparisons of word and picture processing using Event-Related Potentials (ERPs) are contaminated by gross physical differences between the two types of stimuli. In the present study, we tackle this problem by comparing picture processing with word processing in an alphabetic and a logographic script, that are also characterized by gross physical differences. Native Mandarin Chinese speakers viewed pictures (line drawings) and Chinese characters (Experiment 1), native English speakers viewed pictures and English words (Experiment 2), and naïve Chinese readers (native English speakers) viewed pictures and Chinese characters (Experiment 3) in a semantic categorization task. The varying pattern of differences in the ERPs elicited by pictures and words across the three experiments provided evidence for i) script-specific processing arising between 150–200 ms post-stimulus onset, ii) domain-specific but script-independent processing arising between 200–300 ms post-stimulus onset, and iii) processing that depended on stimulus meaningfulness in the N400 time window. The results are interpreted in terms of differences in the way visual features are mapped onto higher-level representations for pictures and words in alphabetic and logographic writing systems. PMID:21439991
Sound localization and auditory response capabilities in round goby (Neogobius melanostomus)
NASA Astrophysics Data System (ADS)
Rollo, Audrey K.; Higgs, Dennis M.
2005-04-01
A fundamental role in vertebrate auditory systems is determining the direction of a sound source. While fish show directional responses to sound, sound localization remains in dispute. The species used in the current study, Neogobius melanostomus (round goby) uses sound in reproductive contexts, with both male and female gobies showing directed movement towards a calling male. The two-choice laboratory experiment was used (active versus quiet speaker) to analyze behavior of gobies in response to sound stimuli. When conspecific male spawning sounds were played, gobies moved in a direct path to the active speaker, suggesting true localization to sound. Of the animals that responded to conspecific sounds, 85% of the females and 66% of the males moved directly to the sound source. Auditory playback of natural and synthetic sounds showed differential behavioral specificity. Of gobies that responded, 89% were attracted to the speaker playing Padogobius martensii sounds, 87% to 100 Hz tone, 62% to white noise, and 56% to Gobius niger sounds. Swimming speed, as well as mean path angle to the speaker, will be presented during the presentation. Results suggest a strong localization of the round goby to a sound source, with some differential sound specificity.
Lexical representation of novel L2 contrasts
NASA Astrophysics Data System (ADS)
Hayes-Harb, Rachel; Masuda, Kyoko
2005-04-01
There is much interest among psychologists and linguists in the influence of the native language sound system on the acquisition of second languages (Best, 1995; Flege, 1995). Most studies of second language (L2) speech focus on how learners perceive and produce L2 sounds, but we know of only two that have considered how novel sound contrasts are encoded in learners' lexical representations of L2 words (Pallier et al., 2001; Ota et al., 2002). In this study we investigated how native speakers of English encode Japanese consonant quantity contrasts in their developing Japanese lexicons at different stages of acquisition (Japanese contrasts singleton versus geminate consonants but English does not). Monolingual English speakers, native English speakers learning Japanese for one year, and native speakers of Japanese were taught a set of Japanese nonwords containing singleton and geminate consonants. Subjects then performed memory tasks eliciting perception and production data to determine whether they encoded the Japanese consonant quantity contrast lexically. Overall accuracy in these tasks was a function of Japanese language experience, and acoustic analysis of the production data revealed non-native-like patterns of differentiation of singleton and geminate consonants among the L2 learners of Japanese. Implications for theories of L2 speech are discussed.
The impact of rate reduction and increased vocal intensity on coarticulation in dysarthria
NASA Astrophysics Data System (ADS)
Tjaden, Kris
2003-04-01
The dysarthrias are a group of speech disorders resulting from impairment to nervous system structures important for the motor execution of speech. Although numerous studies have examined how dysarthria impacts articulatory movements or changes in vocal tract shape, few studies of dysarthria consider that articulatory events and their acoustic consequences overlap or are coarticulated in connected speech. The impact of rate, loudness, and clarity on coarticulatory patterns in dysarthria also are poorly understood, although these prosodic manipulations frequently are employed as therapy strategies to improve intelligibility in dysarthria and also are known to affect coarticulatory patterns for at least some neurologically healthy speakers. The current study examined the effects of slowed rate and increased vocal intensity on anticipatory coarticulation for speakers with dysarthria secondary to Multiple Sclerosis (MS), as inferred from the acoustic signal. Healthy speakers were studied for comparison purposes. Three repetitions of twelve target words embedded in the carrier phrase ``It's a -- again'' were produced in habitual, loud, and slow speaking conditions. F2 frequencies and first moment coefficients were used to infer coarticulation. Both group and individual speaker trends will be examined in the data analyses.
Long-Term Experience with Chinese Language Shapes the Fusiform Asymmetry of English Reading
Mei, Leilei; Xue, Gui; Lu, Zhong-Lin; Chen, Chuansheng; Wei, Miao; He, Qinghua; Dong, Qi
2015-01-01
Previous studies have suggested differential engagement of the bilateral fusiform gyrus in the processing of Chinese and English. The present study tested the possibility that long-term experience with Chinese language affects the fusiform laterality of English reading by comparing three samples: Chinese speakers, English speakers with Chinese experience, and English speakers without Chinese experience. We found that, when reading words in their respective native language, Chinese and English speakers without Chinese experience differed in functional laterality of the posterior fusiform region (right laterality for Chinese speakers, but left laterality for English speakers). More importantly, compared with English speakers without Chinese experience, English speakers with Chinese experience showed more recruitment of the right posterior fusiform cortex for English words and pseudowords, which is similar to how Chinese speakers processed Chinese. These results suggest that long-term experience with Chinese shapes the fusiform laterality of English reading and have important implications for our understanding of the cross-language influences in terms of neural organization and of the functions of different fusiform subregions in reading. PMID:25598049
Statistical Evaluation of Biometric Evidence in Forensic Automatic Speaker Recognition
NASA Astrophysics Data System (ADS)
Drygajlo, Andrzej
Forensic speaker recognition is the process of determining if a specific individual (suspected speaker) is the source of a questioned voice recording (trace). This paper aims at presenting forensic automatic speaker recognition (FASR) methods that provide a coherent way of quantifying and presenting recorded voice as biometric evidence. In such methods, the biometric evidence consists of the quantified degree of similarity between speaker-dependent features extracted from the trace and speaker-dependent features extracted from recorded speech of a suspect. The interpretation of recorded voice as evidence in the forensic context presents particular challenges, including within-speaker (within-source) variability and between-speakers (between-sources) variability. Consequently, FASR methods must provide a statistical evaluation which gives the court an indication of the strength of the evidence given the estimated within-source and between-sources variabilities. This paper reports on the first ENFSI evaluation campaign through a fake case, organized by the Netherlands Forensic Institute (NFI), as an example, where an automatic method using the Gaussian mixture models (GMMs) and the Bayesian interpretation (BI) framework were implemented for the forensic speaker recognition task.
NASA Astrophysics Data System (ADS)
Asai, Kikuo; Kondo, Kimio; Kobayashi, Hideaki; Saito, Fumihiko
We developed a prototype system to support telecommunication by using keywords selected by the speaker in a videoconference. In the traditional presentation style, a speaker talks and uses audiovisual materials, and the audience at remote sites looks at these materials. Unfortunately, the audience often loses concentration and attention during the talk. To overcome this problem, we investigate a keyword presentation style, in which the speaker holds keyword cards that enable the audience to see additional information. Although keyword captions were originally intended for use in video materials for learning foreign languages, they can also be used to improve the quality of distance lectures in videoconferences. Our prototype system recognizes printed keywords in a video image at a server, and transfers the data to clients as multimedia functions such as language translation, three-dimensional (3D) model visualization, and audio reproduction. The additional information is collocated to the keyword cards in the display window, thus forming a spatial relationship between them. We conducted an experiment to investigate the properties of the keyword presentation style for an audience. The results suggest the potential of the keyword presentation style for improving the audience's concentration and attention in distance lectures by providing an environment that facilitates eye contact during videoconferencing.
Effect of explicit dimension instruction on speech category learning
Chandrasekaran, Bharath; Yi, Han-Gyol; Smayda, Kirsten E.; Maddox, W. Todd
2015-01-01
Learning non-native speech categories is often considered a challenging task in adulthood. This difficulty is driven by cross-language differences in weighting critical auditory dimensions that differentiate speech categories. For example, previous studies have shown that differentiating Mandarin tonal categories requires attending to dimensions related to pitch height and direction. Relative to native speakers of Mandarin, the pitch direction dimension is under-weighted by native English speakers. In the current study, we examined the effect of explicit instructions (dimension instruction) on native English speakers' Mandarin tone category learning within the framework of a dual-learning systems (DLS) model. This model predicts that successful speech category learning is initially mediated by an explicit, reflective learning system that frequently utilizes unidimensional rules, with an eventual switch to a more implicit, reflexive learning system that utilizes multidimensional rules. Participants were explicitly instructed to focus and/or ignore the pitch height dimension, the pitch direction dimension, or were given no explicit prime. Our results show that instruction instructing participants to focus on pitch direction, and instruction diverting attention away from pitch height resulted in enhanced tone categorization. Computational modeling of participant responses suggested that instruction related to pitch direction led to faster and more frequent use of multidimensional reflexive strategies, and enhanced perceptual selectivity along the previously underweighted pitch direction dimension. PMID:26542400
PREFACE: European Workshop on Advanced Control and Diagnosis
NASA Astrophysics Data System (ADS)
Schulte, Horst; Georg, Sören
2014-12-01
The European Workshop on Advanced Control and Diagnosis is an annual event that has been organised since 2003 by Control Engineering departments of several European universities in Germany, France, the UK, Poland, Italy, Hungary and Denmark. The overall planning of the workshops is conducted by the Intelligent Control and Diagnosis (ICD) steering committee. This year's ACD workshop took place at HTW Berlin (University of Applied Sciences) and was organised by the Control Engineering group of School of Engineering I of HTW Berlin. 38 papers were presented at ACD 2014, with contributions spanning a variety of fields in modern control science: Discrete control, nonlinear control, model predictive control, system identification, fault diagnosis and fault-tolerant control, control applications, applications of fuzzy logic, as well as modelling and simulation, the latter two forming a basis for all tasks in modern control. Three interesting and high-quality plenary lectures were delivered. The first plenary speaker was Wolfgang Weber from Pepperl+Fuchs, a German manufacturer of state-of-the-art industrial sensors and process interfaces. The second and third plenary speakers were two internationally high-ranked researchers in their respective fields, Prof. Didier Theilliol from Université de Lorraine and Prof. Carsten Scherer from Universität Stuttgart. Taken together, the three plenary lectures sought to contribute to closing the gap between theory and applications. On behalf of the whole ACD 2014 organising committee, we would like to thank all those who submitted papers and participated in the workshop. We hope it was a fruitful and memorable event for all. Together we are looking forward to the next ACD workshop in 2015 in Pilsen, Czech Republic. Horst Schulte (General Chair), Sören Georg (Programme Chair)
Peeters, David; Snijders, Tineke M; Hagoort, Peter; Özyürek, Aslı
2017-01-27
In everyday communication speakers often refer in speech and/or gesture to objects in their immediate environment, thereby shifting their addressee's attention to an intended referent. The neurobiological infrastructure involved in the comprehension of such basic multimodal communicative acts remains unclear. In an event-related fMRI study, we presented participants with pictures of a speaker and two objects while they concurrently listened to her speech. In each picture, one of the objects was singled out, either through the speaker's index-finger pointing gesture or through a visual cue that made the object perceptually more salient in the absence of gesture. A mismatch (compared to a match) between speech and the object singled out by the speaker's pointing gesture led to enhanced activation in left IFG and bilateral pMTG, showing the importance of these areas in conceptual matching between speech and referent. Moreover, a match (compared to a mismatch) between speech and the object made salient through a visual cue led to enhanced activation in the mentalizing system, arguably reflecting an attempt to converge on a jointly attended referent in the absence of pointing. These findings shed new light on the neurobiological underpinnings of the core communicative process of comprehending a speaker's multimodal referential act and stress the power of pointing as an important natural device to link speech to objects. Copyright © 2016 Elsevier Ltd. All rights reserved.
a Study of Multiplexing Schemes for Voice and Data.
NASA Astrophysics Data System (ADS)
Sriram, Kotikalapudi
Voice traffic variations are characterized by on/off transitions of voice calls, and talkspurt/silence transitions of speakers in conversations. A speaker is known to be in silence for more than half the time during a telephone conversation. In this dissertation, we study some schemes which exploit speaker silences for an efficient utilization of the transmission capacity in integrated voice/data multiplexing and in digital speech interpolation. We study two voice/data multiplexing schemes. In each scheme, any time slots momentarily unutilized by the voice traffic are made available to data. In the first scheme, the multiplexer does not use speech activity detectors (SAD), and hence the voice traffic variations are due to call on/off only. In the second scheme, the multiplexer detects speaker silences using SAD and transmits voice only during talkspurts. The multiplexer with SAD performs digital speech interpolation (DSI) as well as dynamic channel allocation to voice and data. The performance of the two schemes is evaluated using discrete-time modeling and analysis. The data delay performance for the case of English speech is compared with that for the case of Japanese speech. A closed form expression for the mean data message delay is derived for the single-channel single-talker case. In a DSI system, occasional speech losses occur whenever the number of speakers in simultaneous talkspurt exceeds the number of TDM voice channels. In a buffered DSI system, speech loss is further reduced at the cost of delay. We propose a novel fixed-delay buffered DSI scheme. In this scheme, speech fill-in/hangover is not required because there are no variable delays. Hence, all silences that naturally occur in speech are fully utilized. Consequently, a substantial improvement in the DSI performance is made possible. The scheme is modeled and analyzed in discrete -time. Its performance is evaluated in terms of the probability of speech clipping, packet rejection ratio, DSI advantage, and the delay.
Lee, Jonathan S; Pérez-Stable, Eliseo J; Gregorich, Steven E; Crawford, Michael H; Green, Adrienne; Livaudais-Toman, Jennifer; Karliner, Leah S
2017-08-01
Language barriers disrupt communication and impede informed consent for patients with limited English proficiency (LEP) undergoing healthcare procedures. Effective interventions for this disparity remain unclear. Assess the impact of a bedside interpreter phone system intervention on informed consent for patients with LEP and compare outcomes to those of English speakers. Prospective, pre-post intervention implementation study using propensity analysis. Hospitalized patients undergoing invasive procedures on the cardiovascular, general surgery or orthopedic surgery floors. Installation of dual-handset interpreter phones at every bedside enabling 24-h immediate access to professional interpreters. Primary predictor: pre- vs. post-implementation group; secondary predictor: post-implementation patients with LEP vs. English speakers. Primary outcomes: three central informed consent elements, patient-reported understanding of the (1) reasons for and (2) risks of the procedure and (3) having had all questions answered. We considered consent adequately informed when all three elements were met. We enrolled 152 Chinese- and Spanish-speaking patients with LEP (84 pre- and 68 post-implementation) and 86 English speakers. Post-implementation (vs. pre-implementation) patients with LEP were more likely to meet criteria for adequately informed consent (54% vs. 29%, p = 0.001) and, after propensity score adjustment, had significantly higher odds of adequately informed consent (AOR 2.56; 95% CI, 1.15-5.72) as well as of each consent element individually. However, compared to post-implementation English speakers, post-implementation patients with LEP had significantly lower adjusted odds of adequately informed consent (AOR, 0.38; 95% CI, 0.16-0.91). A bedside interpreter phone system intervention to increase rapid access to professional interpreters was associated with improvements in patient-reported informed consent and should be considered by hospitals seeking to improve care for patients with LEP; however, these improvements did not eliminate the language-based disparity. Additional clinician educational interventions and more language-concordant care may be necessary for informed consent to equal that for English speakers.
The Effects of Self-Disclosure on Male and Female Perceptions of Individuals Who Stutter.
Byrd, Courtney T; McGill, Megann; Gkalitsiou, Zoi; Cappellini, Colleen
2017-02-01
The purpose of this study was to examine the influence of self-disclosure on observers' perceptions of persons who stutter. Participants (N = 173) were randomly assigned to view 2 of 4 possible videos (i.e., male self-disclosure, male no self-disclosure, female self-disclosure, and female no self-disclosure). After viewing both videos, participants completed a survey assessing their perceptions of the speakers. Controlling for observer and speaker gender, listeners were more likely to select speakers who self-disclosed their stuttering as more friendly, outgoing, and confident compared with speakers who did not self-disclose. Observers were more likely to select speakers who did not self-disclose as unfriendly and shy compared with speakers who used a self-disclosure statement. Controlling for self-disclosure and observer gender, observers were less likely to choose the female speaker as friendlier, outgoing, and confident compared with the male speaker. Observers also were more likely to select the female speaker as unfriendly, shy, unintelligent, and insecure compared with the male speaker and were more likely to report that they were more distracted when viewing the videos. Results lend support to the effectiveness of self-disclosure as a technique that persons who stutter can use to positively influence the perceptions of listeners.
Law, Sam-Po; Chak, Gigi Wan-Chi
2017-01-01
Purpose Coverbal gesture use, which is affected by the presence and degree of aphasia, can be culturally specific. The purpose of this study was to compare gesture use among Cantonese-speaking individuals: 23 neurologically healthy speakers, 23 speakers with fluent aphasia, and 21 speakers with nonfluent aphasia. Method Multimedia data of discourse samples from these speakers were extracted from the Cantonese AphasiaBank. Gestures were independently annotated on their forms and functions to determine how gesturing rate and distribution of gestures differed across speaker groups. A multiple regression was conducted to determine the most predictive variable(s) for gesture-to-word ratio. Results Although speakers with nonfluent aphasia gestured most frequently, the rate of gesture use in counterparts with fluent aphasia did not differ significantly from controls. Different patterns of gesture functions in the 3 speaker groups revealed that gesture plays a minor role in lexical retrieval whereas its role in enhancing communication dominates among the speakers with aphasia. The percentages of complete sentences and dysfluency strongly predicted the gesturing rate in aphasia. Conclusions The current results supported the sketch model of language–gesture association. The relationship between gesture production and linguistic abilities and clinical implications for gesture-based language intervention for speakers with aphasia are also discussed. PMID:28609510
Kaland, Constantijn; Swerts, Marc; Krahmer, Emiel
2013-09-01
The present research investigates what drives the prosodic marking of contrastive information. For example, a typically developing speaker of a Germanic language like Dutch generally refers to a pink car as a "PINK car" (accented words in capitals) when a previously mentioned car was red. The main question addressed in this paper is whether contrastive intonation is produced with respect to the speaker's or (also) the listener's perspective on the preceding discourse. Furthermore, this research investigates the production of contrastive intonation by typically developing speakers and speakers with autism. The latter group is investigated because people with autism are argued to have difficulties accounting for another person's mental state and exhibit difficulties in the production and perception of accentuation and pitch range. To this end, utterances with contrastive intonation are elicited from both groups and analyzed in terms of function and form of prosody using production and perception measures. Contrary to expectations, typically developing speakers and speakers with autism produce functionally similar contrastive intonation as both groups account for both their own and their listener's perspective. However, typically developing speakers use a larger pitch range and are perceived as speaking more dynamically than speakers with autism, suggesting differences in their use of prosodic form.
Kong, Anthony Pak-Hin; Law, Sam-Po; Chak, Gigi Wan-Chi
2017-07-12
Coverbal gesture use, which is affected by the presence and degree of aphasia, can be culturally specific. The purpose of this study was to compare gesture use among Cantonese-speaking individuals: 23 neurologically healthy speakers, 23 speakers with fluent aphasia, and 21 speakers with nonfluent aphasia. Multimedia data of discourse samples from these speakers were extracted from the Cantonese AphasiaBank. Gestures were independently annotated on their forms and functions to determine how gesturing rate and distribution of gestures differed across speaker groups. A multiple regression was conducted to determine the most predictive variable(s) for gesture-to-word ratio. Although speakers with nonfluent aphasia gestured most frequently, the rate of gesture use in counterparts with fluent aphasia did not differ significantly from controls. Different patterns of gesture functions in the 3 speaker groups revealed that gesture plays a minor role in lexical retrieval whereas its role in enhancing communication dominates among the speakers with aphasia. The percentages of complete sentences and dysfluency strongly predicted the gesturing rate in aphasia. The current results supported the sketch model of language-gesture association. The relationship between gesture production and linguistic abilities and clinical implications for gesture-based language intervention for speakers with aphasia are also discussed.
2002-06-07
Continue to Develop and Refine Emerging Technology • Some of the emerging biometric devices, such as iris scans and facial recognition systems...such as iris scans and facial recognition systems, facial recognition systems, and speaker verification systems. (976301)
Razza, Sergio; Zaccone, Monica; Meli, Aannalisa; Cristofari, Eliana
2017-12-01
Children affected by hearing loss can experience difficulties in challenging and noisy environments even when deafness is corrected by Cochlear implant (CI) devices. These patients have a selective attention deficit in multiple listening conditions. At present, the most effective ways to improve the performance of speech recognition in noise consists of providing CI processors with noise reduction algorithms and of providing patients with bilateral CIs. The aim of this study was to compare speech performances in noise, across increasing noise levels, in CI recipients using two kinds of wireless remote-microphone radio systems that use digital radio frequency transmission: the Roger Inspiro accessory and the Cochlear Wireless Mini Microphone accessory. Eleven Nucleus Cochlear CP910 CI young user subjects were studied. The signal/noise ratio, at a speech reception threshold (SRT) value of 50%, was measured in different conditions for each patient: with CI only, with the Roger or with the MiniMic accessory. The effect of the application of the SNR-noise reduction algorithm in each of these conditions was also assessed. The tests were performed with the subject positioned in front of the main speaker, at a distance of 2.5 m. Another two speakers were positioned at 3.50 m. The main speaker at 65 dB issued disyllabic words. Babble noise signal was delivered through the other speakers, with variable intensity. The use of both wireless remote microphones improved the SRT results. Both systems improved gain of speech performances. The gain was higher with the Mini Mic system (SRT = -4.76) than the Roger system (SRT = -3.01). The addition of the NR algorithm did not statistically further improve the results. There is significant improvement in speech recognition results with both wireless digital remote microphone accessories, in particular with the Mini Mic system when used with the CP910 processor. The use of a remote microphone accessory surpasses the benefit of application of NR algorithm. Copyright © 2017. Published by Elsevier B.V.
Alpermann, Anke; Huber, Walter; Natke, Ulrich; Willmes, Klaus
2010-09-01
Improved fluency after stuttering therapy is usually measured by the percentage of stuttered syllables. However, outcome studies rarely evaluate the use of trained speech patterns that speakers use to manage stuttering. This study investigated whether the modified time interval analysis can distinguish between trained speech patterns, fluent speech, and stuttered speech. Seventeen German experts on stuttering judged a speech sample on two occasions. Speakers of the sample were stuttering adults, who were not undergoing therapy, as well as participants in a fluency shaping and a stuttering modification therapy. Results showed satisfactory inter-judge and intra-judge agreement above 80%. Intervals with trained speech patterns were identified as consistently as stuttered and fluent intervals. We discuss limitations of the study, as well as implications of our findings for the development of training for identification of trained speech patterns and future outcome studies. The reader will be able to (a) explain different methods to measure the use of trained speech patterns, (b) evaluate whether German experts are able to discriminate intervals with trained speech patterns reliably from fluent and stuttered intervals and (c) describe how the measurement of trained speech patterns can contribute to outcome studies.
The human genome: Some assembly required. Final report
DOE Office of Scientific and Technical Information (OSTI.GOV)
NONE
1994-12-31
The Human Genome Project promises to be one of the most rewarding endeavors in modern biology. The cost and the ethical and social implications, however, have made this project the source of considerable debate both in the scientific community and in the public at large. The 1994 Graduate Student Symposium addresses the scientific merits of the project, the technical issues involved in accomplishing the task, as well as the medical and social issues which stem from the wealth of knowledge which the Human Genome Project will help create. To this end, speakers were brought together who represent the diverse areasmore » of expertise characteristic of this multidisciplinary project. The keynote speaker addresses the project`s motivations and goals in the larger context of biological and medical sciences. The first two sessions address relevant technical issues, data collection with a focus on high-throughput sequencing methods and data analysis with an emphasis on identification of coding sequences. The third session explores recent advances in the understanding of genetic diseases and possible routes to treatment. Finally, the last session addresses some of the ethical, social and legal issues which will undoubtedly arise from having a detailed knowledge of the human genome.« less
Sentence durations and accentedness judgments
NASA Astrophysics Data System (ADS)
Bond, Z. S.; Stockmal, Verna; Markus, Dace
2003-04-01
Talkers in a second language can frequently be identified as speaking with a foreign accent. It is not clear to what degree a foreign accent represents specific deviations from a target language versus more general characteristics. We examined the identifications of native and non-native talkers by listeners with various amount of knowledge of the target language. Native and non-native speakers of Latvian provided materials. All the non-native talkers spoke Russian as their first language and were long-term residents of Latvia. A listening test, containing sentences excerpted from a short recorded passage, was presented to three groups of listeners: native speakers of Latvian, Russians for whom Latvian was a second language, and Americans with no knowledge of either of the two languages. The listeners were asked to judge whether each utterance was produced by a native or non-native talker. The Latvians identified the non-native talkers very accurately, 88%. The Russians were somewhat less accurate, 83%. The American listeners were least accurate, but still identified the non-native talkers at above chance levels, 62%. Sentence durations correlated with the judgments provided by the American listeners but not with the judgments provided by native or L2 listeners.
Comparison of different speech tasks among adults who stutter and adults who do not stutter
Ritto, Ana Paula; Costa, Julia Biancalana; Juste, Fabiola Staróbole; de Andrade, Claudia Regina Furquim
2016-01-01
OBJECTIVES: In this study, we compared the performance of both fluent speakers and people who stutter in three different speaking situations: monologue speech, oral reading and choral reading. This study follows the assumption that the neuromotor control of speech can be influenced by external auditory stimuli in both speakers who stutter and speakers who do not stutter. METHOD: Seventeen adults who stutter and seventeen adults who do not stutter were assessed in three speaking tasks: monologue, oral reading (solo reading aloud) and choral reading (reading in unison with the evaluator). Speech fluency and rate were measured for each task. RESULTS: The participants who stuttered had a lower frequency of stuttering during choral reading than during monologue and oral reading. CONCLUSIONS: According to the dual premotor system model, choral speech enhanced fluency by providing external cues for the timing of each syllable compensating for deficient internal cues. PMID:27074176
Early Detection of Severe Apnoea through Voice Analysis and Automatic Speaker Recognition Techniques
NASA Astrophysics Data System (ADS)
Fernández, Ruben; Blanco, Jose Luis; Díaz, David; Hernández, Luis A.; López, Eduardo; Alcázar, José
This study is part of an on-going collaborative effort between the medical and the signal processing communities to promote research on applying voice analysis and Automatic Speaker Recognition techniques (ASR) for the automatic diagnosis of patients with severe obstructive sleep apnoea (OSA). Early detection of severe apnoea cases is important so that patients can receive early treatment. Effective ASR-based diagnosis could dramatically cut medical testing time. Working with a carefully designed speech database of healthy and apnoea subjects, we present and discuss the possibilities of using generative Gaussian Mixture Models (GMMs), generally used in ASR systems, to model distinctive apnoea voice characteristics (i.e. abnormal nasalization). Finally, we present experimental findings regarding the discriminative power of speaker recognition techniques applied to severe apnoea detection. We have achieved an 81.25 % correct classification rate, which is very promising and underpins the interest in this line of inquiry.
Syntactic mixing across generations in an environment of community-wide bilingualism
Stoll, Sabine; Zakharko, Taras; Moran, Steven; Schikowski, Robert; Bickel, Balthasar
2015-01-01
A quantitative analysis of a trans-generational, conversational corpus of Chintang (Tibeto-Burman) speakers with community-wide bilingualism in Nepali (Indo-European) reveals that children show more code-switching into Nepali than older speakers. This confirms earlier proposals in the literature that code-switching in bilingual children decreases when they gain proficiency in their dominant language, especially in vocabulary. Contradicting expectations from other studies, our corpus data also reveal that for adults, multi-word insertions of Nepali into Chintang are just as likely to undergo full syntactic integration as single-word insertions. Speakers of younger generations show less syntactic integration. We propose that this reflects a change between generations, from strongly asymmetrical, Chintang-dominated bilingualism in older generations to more balanced bilingualism where Chintang and Nepali operate as clearly separate systems in younger generations. This change is likely to have been triggered by the increase of Nepali presence over the past few decades. PMID:25741296
How Do Speakers Avoid Ambiguous Linguistic Expressions?
ERIC Educational Resources Information Center
Ferreira, V.S.; Slevc, L.R.; Rogers, E.S.
2005-01-01
Three experiments assessed how speakers avoid linguistically and nonlinguistically ambiguous expressions. Speakers described target objects (a flying mammal, bat) in contexts including foil objects that caused linguistic (a baseball bat) and nonlinguistic (a larger flying mammal) ambiguity. Speakers sometimes avoided linguistic-ambiguity, and they…
Zhang, Juan; Meng, Yaxuan; McBride, Catherine; Fan, Xitao; Yuan, Zhen
2018-01-01
The present study investigated the impact of Chinese dialects on McGurk effect using behavioral and event-related potential (ERP) methodologies. Specifically, intra-language comparison of McGurk effect was conducted between Mandarin and Cantonese speakers. The behavioral results showed that Cantonese speakers exhibited a stronger McGurk effect in audiovisual speech perception compared to Mandarin speakers, although both groups performed equally in the auditory and visual conditions. ERP results revealed that Cantonese speakers were more sensitive to visual cues than Mandarin speakers, though this was not the case for the auditory cues. Taken together, the current findings suggest that the McGurk effect generated by Chinese speakers is mainly influenced by segmental phonology during audiovisual speech integration.
Zhang, Juan; Meng, Yaxuan; McBride, Catherine; Fan, Xitao; Yuan, Zhen
2018-01-01
The present study investigated the impact of Chinese dialects on McGurk effect using behavioral and event-related potential (ERP) methodologies. Specifically, intra-language comparison of McGurk effect was conducted between Mandarin and Cantonese speakers. The behavioral results showed that Cantonese speakers exhibited a stronger McGurk effect in audiovisual speech perception compared to Mandarin speakers, although both groups performed equally in the auditory and visual conditions. ERP results revealed that Cantonese speakers were more sensitive to visual cues than Mandarin speakers, though this was not the case for the auditory cues. Taken together, the current findings suggest that the McGurk effect generated by Chinese speakers is mainly influenced by segmental phonology during audiovisual speech integration. PMID:29780312
DOE Office of Scientific and Technical Information (OSTI.GOV)
McDonald, K; Curran, B
I. Information Security Background (Speaker = Kevin McDonald) Evolution of Medical Devices Living and Working in a Hostile Environment Attack Motivations Attack Vectors Simple Safety Strategies Medical Device Security in the News Medical Devices and Vendors Summary II. Keeping Radiation Oncology IT Systems Secure (Speaker = Bruce Curran) Hardware Security Double-lock Requirements “Foreign” computer systems Portable Device Encryption Patient Data Storage System Requirements Network Configuration Isolating Critical Devices Isolating Clinical Networks Remote Access Considerations Software Applications / Configuration Passwords / Screen Savers Restricted Services / access Software Configuration Restriction Use of DNS to restrict accesse. Patches / Upgrades Awareness Intrusionmore » Prevention Intrusion Detection Threat Risk Analysis Conclusion Learning Objectives: Understanding how Hospital IT Requirements affect Radiation Oncology IT Systems. Illustrating sample practices for hardware, network, and software security. Discussing implementation of good IT security practices in radiation oncology. Understand overall risk and threats scenario in a networked environment.« less
Opposing and following responses in sensorimotor speech control: Why responses go both ways.
Franken, Matthias K; Acheson, Daniel J; McQueen, James M; Hagoort, Peter; Eisner, Frank
2018-06-04
When talking, speakers continuously monitor and use the auditory feedback of their own voice to control and inform speech production processes. When speakers are provided with auditory feedback that is perturbed in real time, most of them compensate for this by opposing the feedback perturbation. But some responses follow the perturbation. In the present study, we investigated whether the state of the speech production system at perturbation onset may determine what type of response (opposing or following) is made. The results suggest that whether a perturbation-related response is opposing or following depends on ongoing fluctuations of the production system: The system initially responds by doing the opposite of what it was doing. This effect and the nontrivial proportion of following responses suggest that current production models are inadequate: They need to account for why responses to unexpected sensory feedback depend on the production system's state at the time of perturbation.
Can you hear my age? Influences of speech rate and speech spontaneity on estimation of speaker age
Skoog Waller, Sara; Eriksson, Mårten; Sörqvist, Patrik
2015-01-01
Cognitive hearing science is mainly about the study of how cognitive factors contribute to speech comprehension, but cognitive factors also partake in speech processing to infer non-linguistic information from speech signals, such as the intentions of the talker and the speaker’s age. Here, we report two experiments on age estimation by “naïve” listeners. The aim was to study how speech rate influences estimation of speaker age by comparing the speakers’ natural speech rate with increased or decreased speech rate. In Experiment 1, listeners were presented with audio samples of read speech from three different speaker age groups (young, middle aged, and old adults). They estimated the speakers as younger when speech rate was faster than normal and as older when speech rate was slower than normal. This speech rate effect was slightly greater in magnitude for older (60–65 years) speakers in comparison with younger (20–25 years) speakers, suggesting that speech rate may gain greater importance as a perceptual age cue with increased speaker age. This pattern was more pronounced in Experiment 2, in which listeners estimated age from spontaneous speech. Faster speech rate was associated with lower age estimates, but only for older and middle aged (40–45 years) speakers. Taken together, speakers of all age groups were estimated as older when speech rate decreased, except for the youngest speakers in Experiment 2. The absence of a linear speech rate effect in estimates of younger speakers, for spontaneous speech, implies that listeners use different age estimation strategies or cues (possibly vocabulary) depending on the age of the speaker and the spontaneity of the speech. Potential implications for forensic investigations and other applied domains are discussed. PMID:26236259
Evitts, Paul; Gallop, Robert
2011-01-01
There is a large body of research demonstrating the impact of visual information on speaker intelligibility in both normal and disordered speaker populations. However, there is minimal information on which specific visual features listeners find salient during conversational discourse. To investigate listeners' eye-gaze behaviour during face-to-face conversation with normal, laryngeal and proficient alaryngeal speakers. Sixty participants individually participated in a 10-min conversation with one of four speakers (typical laryngeal, tracheoesophageal, oesophageal, electrolaryngeal; 15 participants randomly assigned to one mode of speech). All speakers were > 85% intelligible and were judged to be 'proficient' by two certified speech-language pathologists. Participants were fitted with a head-mounted eye-gaze tracking device (Mobile Eye, ASL) that calculated the region of interest and mean duration of eye-gaze. Self-reported gaze behaviour was also obtained following the conversation using a 10 cm visual analogue scale. While listening, participants viewed the lower facial region of the oesophageal speaker more than the normal or tracheoesophageal speaker. Results of non-hierarchical cluster analyses showed that while listening, the pattern of eye-gaze was predominantly directed at the lower face of the oesophageal and electrolaryngeal speaker and more evenly dispersed among the background, lower face, and eyes of the normal and tracheoesophageal speakers. Finally, results show a low correlation between self-reported eye-gaze behaviour and objective regions of interest data. Overall, results suggest similar eye-gaze behaviour when healthy controls converse with normal and tracheoesophageal speakers and that participants had significantly different eye-gaze patterns when conversing with an oesophageal speaker. Results are discussed in terms of existing eye-gaze data and its potential implications on auditory-visual speech perception. © 2011 Royal College of Speech & Language Therapists.
Reilly, Kevin J.; Spencer, Kristie A.
2013-01-01
The current study investigated the processes responsible for selection of sounds and syllables during production of speech sequences in 10 adults with hypokinetic dysarthria from Parkinson’s disease, five adults with ataxic dysarthria, and 14 healthy control speakers. Speech production data from a choice reaction time task were analyzed to evaluate the effects of sequence length and practice on speech sound sequencing. Speakers produced sequences that were between one and five syllables in length over five experimental runs of 60 trials each. In contrast to the healthy speakers, speakers with hypokinetic dysarthria demonstrated exaggerated sequence length effects for both inter-syllable intervals (ISIs) and speech error rates. Conversely, speakers with ataxic dysarthria failed to demonstrate a sequence length effect on ISIs and were also the only group that did not exhibit practice-related changes in ISIs and speech error rates over the five experimental runs. The exaggerated sequence length effects in the hypokinetic speakers with Parkinson’s disease are consistent with an impairment of action selection during speech sequence production. The absent length effects observed in the speakers with ataxic dysarthria is consistent with previous findings that indicate a limited capacity to buffer speech sequences in advance of their execution. In addition, the lack of practice effects in these speakers suggests that learning-related improvements in the production rate and accuracy of speech sequences involves processing by structures of the cerebellum. Together, the current findings inform models of serial control for speech in healthy speakers and support the notion that sequencing deficits contribute to speech symptoms in speakers with hypokinetic or ataxic dysarthria. In addition, these findings indicate that speech sequencing is differentially impaired in hypokinetic and ataxic dysarthria. PMID:24137121
Discourse comprehension in L2: Making sense of what is not explicitly said.
Foucart, Alice; Romero-Rivas, Carlos; Gort, Bernharda Lottie; Costa, Albert
2016-12-01
Using ERPs, we tested whether L2 speakers can integrate multiple sources of information (e.g., semantic, pragmatic information) during discourse comprehension. We presented native speakers and L2 speakers with three-sentence scenarios in which the final sentence was highly causally related, intermediately related, or causally unrelated to its context; its interpretation therefore required simple or complex inferences. Native speakers revealed a gradual N400-like effect, larger in the causally unrelated condition than in the highly related condition, and falling in-between in the intermediately related condition, replicating previous results. In the crucial intermediately related condition, L2 speakers behaved like native speakers, however, showing extra processing in a later time-window. Overall, the results show that, when reading, L2 speakers are able to process information from the local context and prior information (e.g., world knowledge) to build global coherence, suggesting that they process different sources of information to make inferences online during discourse comprehension, like native speakers. Copyright © 2016 Elsevier Inc. All rights reserved.
Gender differences in identifying emotions from auditory and visual stimuli.
Waaramaa, Teija
2017-12-01
The present study focused on gender differences in emotion identification from auditory and visual stimuli produced by two male and two female actors. Differences in emotion identification from nonsense samples, language samples and prolonged vowels were investigated. It was also studied whether auditory stimuli can convey the emotional content of speech without visual stimuli, and whether visual stimuli can convey the emotional content of speech without auditory stimuli. The aim was to get a better knowledge of vocal attributes and a more holistic understanding of the nonverbal communication of emotion. Females tended to be more accurate in emotion identification than males. Voice quality parameters played a role in emotion identification in both genders. The emotional content of the samples was best conveyed by nonsense sentences, better than by prolonged vowels or shared native language of the speakers and participants. Thus, vocal non-verbal communication tends to affect the interpretation of emotion even in the absence of language. The emotional stimuli were better recognized from visual stimuli than auditory stimuli by both genders. Visual information about speech may not be connected to the language; instead, it may be based on the human ability to understand the kinetic movements in speech production more readily than the characteristics of the acoustic cues.
Mühler, Roland; Ziese, Michael; Rostalski, Dorothea
2009-01-01
The purpose of the study was to develop a speaker discrimination test for cochlear implant (CI) users. The speech material was drawn from the Oldenburg Logatome (OLLO) corpus, which contains 150 different logatomes read by 40 German and 10 French native speakers. The prototype test battery included 120 logatome pairs spoken by 5 male and 5 female speakers with balanced representations of the conditions 'same speaker' and 'different speaker'. Ten adult normal-hearing listeners and 12 adult postlingually deafened CI users were included in a study to evaluate the suitability of the test. The mean speaker discrimination score for the CI users was 67.3% correct and for the normal-hearing listeners 92.2% correct. A significant influence of voice gender and fundamental frequency difference on the speaker discrimination score was found in CI users as well as in normal-hearing listeners. Since the test results of the CI users were significantly above chance level and no ceiling effect was observed, we conclude that subsets of the OLLO corpus are very well suited to speaker discrimination experiments in CI users. Copyright 2008 S. Karger AG, Basel.
Multicultural issues in test interpretation.
Langdon, Henriette W; Wiig, Elisabeth H
2009-11-01
Designing the ideal test or series of tests to assess individuals who speak languages other than English is difficult. This article first describes some of the roadblocks-one of which is the lack of identification criteria for language and learning disabilities in monolingual and bilingual populations in most countries of the non-English-speaking world. This lag exists, in part, because access to general education is often limited. The second section describes tests that have been developed in the United States, primarily for Spanish-speaking individuals because they now represent the largest first-language majority in the United States (80% of English-language learners [ELLs] speak Spanish at home). We discuss tests developed for monolingual and bilingual English-Spanish speakers in the United States and divide this coverage into two parts: The first addresses assessment of students' first language (L1) and second language (L2), usually English, with different versions of the same test; the second describes assessment of L1 and L2 using the same version of the test, administered in the two languages. Examples of tests that fit a priori-determined criteria are briefly discussed throughout the article. Suggestions how to develop tests for speakers of languages other than English are also provided. In conclusion, we maintain that there will never be a perfect test or set of tests to adequately assess the communication skills of a bilingual individual. This is not surprising because we have yet to develop an ideal test or set of tests that fits monolingual Anglo speakers perfectly. Tests are tools, and the speech-language pathologist needs to know how to use those tools most effectively and equitably. The goal of this article is to provide such guidance. Thieme Medical Publishers.
Do Listeners Store in Memory a Speaker's Habitual Utterance-Final Phonation Type?
Bőhm, Tamás; Shattuck-Hufnagel, Stefanie
2009-01-01
Earlier studies report systematic differences across speakers in the occurrence of utterance-final irregular phonation; the work reported here investigated whether human listeners remember this speaker-specific information and can access it when necessary (a prerequisite for using this cue in speaker recognition). Listeners personally familiar with the voices of the speakers were presented with pairs of speech samples: one with the original and the other with transformed final phonation type. Asked to select the member of the pair that was closer to the talker's voice, most listeners tended to choose the unmanipulated token (even though they judged them to sound essentially equally natural). This suggests that utterance-final pitch period irregularity is part of the mental representation of individual speaker voices, although this may depend on the individual speaker and listener to some extent. PMID:19776665
Learning Words from Speakers with False Beliefs
ERIC Educational Resources Information Center
Papafragou, Anna; Fairchild, Sarah; Cohen, Matthew L.; Friedberg, Carlyn
2017-01-01
During communication, hearers try to infer the speaker's intentions to be able to understand what the speaker means. Nevertheless, whether (and how early) preschoolers track their interlocutors' mental states is still a matter of debate. Furthermore, there is disagreement about how children's ability to consult a speaker's belief in communicative…
International Student Speaker Programs: "Someone from Another World."
ERIC Educational Resources Information Center
Wilson, Angene
This study surveyed members of the Association of International Educators and community volunteers to find out how international student speaker programs actually work. An international student speaker program provides speakers (from the university foreign student population) for community organizations and schools. The results of the survey (49…
Linguistic "Mudes" and the De-Ethnicization of Language Choice in Catalonia
ERIC Educational Resources Information Center
Pujolar, Joan; Gonzalez, Isaac
2013-01-01
Catalan speakers have traditionally constructed the Catalan language as the main emblem of their identity even as migration filled the country with substantial numbers of speakers of Castilian. Although Catalan speakers have been bilingual in Catalan and Castilian for generations, sociolinguistic research has shown how speakers' bilingual…
Embodied Communication: Speakers' Gestures Affect Listeners' Actions
ERIC Educational Resources Information Center
Cook, Susan Wagner; Tanenhaus, Michael K.
2009-01-01
We explored how speakers and listeners use hand gestures as a source of perceptual-motor information during naturalistic communication. After solving the Tower of Hanoi task either with real objects or on a computer, speakers explained the task to listeners. Speakers' hand gestures, but not their speech, reflected properties of the particular…
Speech Breathing in Speakers Who Use an Electrolarynx
ERIC Educational Resources Information Center
Bohnenkamp, Todd A.; Stowell, Talena; Hesse, Joy; Wright, Simon
2010-01-01
Speakers who use an electrolarynx following a total laryngectomy no longer require pulmonary support for speech. Subsequently, chest wall movements may be affected; however, chest wall movements in these speakers are not well defined. The purpose of this investigation was to evaluate speech breathing in speakers who use an electrolarynx during…
NASA Astrophysics Data System (ADS)
Chang, Wen-Chi; Chen, Yu-Chi; Chien, Chih-Jen; Wang, An-Bang; Lee, Chih-Kung
2011-04-01
A testing system contains an advanced vibrometer/interferometer device (AVID) and a high-speed electronic speckle pattern interferometer (ESPI) was developed. AVID is a laser Doppler vibrometer that can be used to detect single-point linear and angular velocity with DC to 20 MHz bandwidth and with nanometer resolution. In swept frequency mode, frequency response from mHz to MHz of the structure of interest can be measured. The ESPI experimental setup can be used to measure full-field out-of-plane displacement. A 5-1 phase shifting method and a correlation algorithm were used to analyze the phase difference between the reference signal and the speckle signal scattered from the sample surface. In order to show the efficiency and effectiveness of AVID and ESPI, we designed a micro-speaker composed of a plate with fixed boundaries and two piezo-actuators attached to the sides of the plate. The AVID was used to measure the vibration of one of the piezo-actuators and the ESPI was adopted to measure the two-dimensional out-of-plane displacement of the plate. A microphone was used to measure the acoustic response created by the micro-speaker. Driving signal includes random signal, sinusoidal signal, amplitude modulated high-frequency carrier signal, etc. Angular response induced by amplitude modulated high-frequency carrier signal was found to be significantly narrower than the frequency responses created by other types of driving signals. The validity of our newly developed NDE system are detailed by comparing the relationship between the vibration signal of the micro-speaker and the acoustic field generated.
Cross-language differences in the brain network subserving intelligible speech.
Ge, Jianqiao; Peng, Gang; Lyu, Bingjiang; Wang, Yi; Zhuo, Yan; Niu, Zhendong; Tan, Li Hai; Leff, Alexander P; Gao, Jia-Hong
2015-03-10
How is language processed in the brain by native speakers of different languages? Is there one brain system for all languages or are different languages subserved by different brain systems? The first view emphasizes commonality, whereas the second emphasizes specificity. We investigated the cortical dynamics involved in processing two very diverse languages: a tonal language (Chinese) and a nontonal language (English). We used functional MRI and dynamic causal modeling analysis to compute and compare brain network models exhaustively with all possible connections among nodes of language regions in temporal and frontal cortex and found that the information flow from the posterior to anterior portions of the temporal cortex was commonly shared by Chinese and English speakers during speech comprehension, whereas the inferior frontal gyrus received neural signals from the left posterior portion of the temporal cortex in English speakers and from the bilateral anterior portion of the temporal cortex in Chinese speakers. Our results revealed that, although speech processing is largely carried out in the common left hemisphere classical language areas (Broca's and Wernicke's areas) and anterior temporal cortex, speech comprehension across different language groups depends on how these brain regions interact with each other. Moreover, the right anterior temporal cortex, which is crucial for tone processing, is equally important as its left homolog, the left anterior temporal cortex, in modulating the cortical dynamics in tone language comprehension. The current study pinpoints the importance of the bilateral anterior temporal cortex in language comprehension that is downplayed or even ignored by popular contemporary models of speech comprehension.
Cross-language differences in the brain network subserving intelligible speech
Ge, Jianqiao; Peng, Gang; Lyu, Bingjiang; Wang, Yi; Zhuo, Yan; Niu, Zhendong; Tan, Li Hai; Leff, Alexander P.; Gao, Jia-Hong
2015-01-01
How is language processed in the brain by native speakers of different languages? Is there one brain system for all languages or are different languages subserved by different brain systems? The first view emphasizes commonality, whereas the second emphasizes specificity. We investigated the cortical dynamics involved in processing two very diverse languages: a tonal language (Chinese) and a nontonal language (English). We used functional MRI and dynamic causal modeling analysis to compute and compare brain network models exhaustively with all possible connections among nodes of language regions in temporal and frontal cortex and found that the information flow from the posterior to anterior portions of the temporal cortex was commonly shared by Chinese and English speakers during speech comprehension, whereas the inferior frontal gyrus received neural signals from the left posterior portion of the temporal cortex in English speakers and from the bilateral anterior portion of the temporal cortex in Chinese speakers. Our results revealed that, although speech processing is largely carried out in the common left hemisphere classical language areas (Broca’s and Wernicke’s areas) and anterior temporal cortex, speech comprehension across different language groups depends on how these brain regions interact with each other. Moreover, the right anterior temporal cortex, which is crucial for tone processing, is equally important as its left homolog, the left anterior temporal cortex, in modulating the cortical dynamics in tone language comprehension. The current study pinpoints the importance of the bilateral anterior temporal cortex in language comprehension that is downplayed or even ignored by popular contemporary models of speech comprehension. PMID:25713366
Kong, Anthony Pak-Hin
2009-01-01
Discourse produced by speakers with aphasia contains rich and valuable information for researchers to understand the manifestation of aphasia as well as for clinicians to plan specific treatment components for their clients. Various approaches to investigate aphasic discourse have been proposed in the English literature. However, this is not the case in Chinese. As a result, clinical evaluations of aphasic discourse have not been a common practice. This problem is further compounded by the lack of validated stimuli that are culturally appropriate for language elicitation. The purpose of this study was twofold: (a) to develop and validate four sequential pictorial stimuli for elicitation of language samples in Cantonese speakers with aphasia, and (b) to investigate the use of a main concept measurement, a clinically oriented quantitative system, to analyze the elicited language samples. Twenty speakers with aphasia and ten normal speakers were invited to participate in this study. The aphasic group produced significantly less key information than the normal group. More importantly, a strong relationship was also found between aphasia severity and production of main concepts. While the results of the inter-rater and intra-rater reliability suggested the scoring system to be reliable, the test-retest results yielded strong and significant correlations across two testing sessions one to three weeks apart. Readers will demonstrate better understanding of (1) the development and validation of newly devised sequential pictorial stimuli to elicit oral language production, and (2) the use of a main concept measurement to quantify aphasic connected speech in Cantonese Chinese.
Communications system for zero-g simulation tests in water
NASA Technical Reports Server (NTRS)
Smith, H. E.
1971-01-01
System connects seven observers, diver, and spare station, and utilizes public address system with underwater speakers to provide two-way communications between test subject and personnel in control of life support, so that test personnel are warned immediately of malfunction in pressure suit or equipment.
Simultaneous Talk--From the Perspective of Floor Management of English and Japanese Speakers.
ERIC Educational Resources Information Center
Hayashi, Reiko
1988-01-01
Investigates simultaneous talk in face-to-face conversation using the analytic framework of "floor" proposed by Edelsky (1981). Analysis of taped conversation among speakers of Japanese and among speakers of English shows that, while both groups use simultaneous talk, it is used more frequently by Japanese speakers. A reference list…
The Effects of Source Unreliability on Prior and Future Word Learning
ERIC Educational Resources Information Center
Faught, Gayle G.; Leslie, Alicia D.; Scofield, Jason
2015-01-01
Young children regularly learn words from interactions with other speakers, though not all speakers are reliable informants. Interestingly, children will reverse to trusting a reliable speaker when a previously endorsed speaker proves unreliable. When later asked to identify the referent of a novel word, children who reverse trust are less willing…
Speaker Reliability Guides Children's Inductive Inferences about Novel Properties
ERIC Educational Resources Information Center
Kim, Sunae; Kalish, Charles W.; Harris, Paul L.
2012-01-01
Prior work shows that children can make inductive inferences about objects based on their labels rather than their appearance (Gelman, 2003). A separate line of research shows that children's trust in a speaker's label is selective. Children accept labels from a reliable speaker over an unreliable speaker (e.g., Koenig & Harris, 2005). In the…
Native-Speakerism and the Complexity of Personal Experience: A Duoethnographic Study
ERIC Educational Resources Information Center
Lowe, Robert J.; Kiczkowiak, Marek
2016-01-01
This paper presents a duoethnographic study into the effects of native-speakerism on the professional lives of two English language teachers, one "native", and one "non-native speaker" of English. The goal of the study was to build on and extend existing research on the topic of native-speakerism by investigating, through…
Research Timeline: Second Language Communication Strategies
ERIC Educational Resources Information Center
Kennedy, Sara; Trofimovich, Pavel
2016-01-01
Speakers of a second language (L2), regardless of profciency level, communicate for specifc purposes. For example, an L2 speaker of English may wish to build rapport with a co-worker by chatting about the weather. The speaker will draw on various resources to accomplish her communicative purposes. For instance, the speaker may say "falling…
Word Stress and Pronunciation Teaching in English as a Lingua Franca Contexts
ERIC Educational Resources Information Center
Lewis, Christine; Deterding, David
2018-01-01
Traditionally, pronunciation was taught by reference to native-speaker models. However, as speakers around the world increasingly interact in English as a lingua franca (ELF) contexts, there is less focus on native-speaker targets, and there is wide acceptance that achieving intelligibility is crucial while mimicking native-speaker pronunciation…
Defining "Native Speaker" in Multilingual Settings: English as a Native Language in Asia
ERIC Educational Resources Information Center
Hansen Edwards, Jette G.
2017-01-01
The current study examines how and why speakers of English from multilingual contexts in Asia are identifying as native speakers of English. Eighteen participants from different contexts in Asia, including Singapore, Malaysia, India, Taiwan, and The Philippines, who self-identified as native speakers of English participated in hour-long interviews…
Speaker Identity Supports Phonetic Category Learning
ERIC Educational Resources Information Center
Mani, Nivedita; Schneider, Signe
2013-01-01
Visual cues from the speaker's face, such as the discriminable mouth movements used to produce speech sounds, improve discrimination of these sounds by adults. The speaker's face, however, provides more information than just the mouth movements used to produce speech--it also provides a visual indexical cue of the identity of the speaker. The…
The Interpretability Hypothesis: Evidence from Wh-Interrogatives in Second Language Acquisition
ERIC Educational Resources Information Center
Tsimpli, Ianthi Maria; Dimitrakopoulou, Maria
2007-01-01
The second language acquisition (SLA) literature reports numerous studies of proficient second language (L2) speakers who diverge significantly from native speakers despite the evidence offered by the L2 input. Recent SLA theories have attempted to account for native speaker/non-native speaker (NS/NNS) divergence by arguing for the dissociation…
NASA Astrophysics Data System (ADS)
Smith, David R. R.; Patterson, Roy D.
2005-11-01
Glottal-pulse rate (GPR) and vocal-tract length (VTL) are related to the size, sex, and age of the speaker but it is not clear how the two factors combine to influence our perception of speaker size, sex, and age. This paper describes experiments designed to measure the effect of the interaction of GPR and VTL upon judgements of speaker size, sex, and age. Vowels were scaled to represent people with a wide range of GPRs and VTLs, including many well beyond the normal range of the population, and listeners were asked to judge the size and sex/age of the speaker. The judgements of speaker size show that VTL has a strong influence upon perceived speaker size. The results for the sex and age categorization (man, woman, boy, or girl) show that, for vowels with GPR and VTL values in the normal range, judgements of speaker sex and age are influenced about equally by GPR and VTL. For vowels with abnormal combinations of low GPRs and short VTLs, the VTL information appears to decide the sex/age judgement.
Oliveira Barrichelo, V M; Heuer, R J; Dean, C M; Sataloff, R T
2001-09-01
Many studies have described and analyzed the singer's formant. A similar phenomenon produced by trained speakers led some authors to examine the speaker's ring. If we consider these phenomena as resonance effects associated with vocal tract adjustments and training, can we hypothesize that trained singers can carry over their singing formant ability into speech, also obtaining a speaker's ring? Can we find similar differences for energy distribution in continuous speech? Forty classically trained singers and forty untrained normal speakers performed an all-voiced reading task and produced a sample of a sustained spoken vowel /a/. The singers were also requested to perform a sustained sung vowel /a/ at a comfortable pitch. The reading was analyzed by the long-term average spectrum (LTAS) method. The sustained vowels were analyzed through power spectrum analysis. The data suggest that singers show more energy concentration in the singer's formant/speaker's ring region in both sung and spoken vowels. The singers' spoken vowel energy in the speaker's ring area was found to be significantly larger than that of the untrained speakers. The LTAS showed similar findings suggesting that those differences also occur in continuous speech. This finding supports the value of further research on the effect of singing training on the resonance of the speaking voice.
Talker and accent variability effects on spoken word recognition
NASA Astrophysics Data System (ADS)
Nyang, Edna E.; Rogers, Catherine L.; Nishi, Kanae
2003-04-01
A number of studies have shown that words in a list are recognized less accurately in noise and with longer response latencies when they are spoken by multiple talkers, rather than a single talker. These results have been interpreted as support for an exemplar-based model of speech perception, in which it is assumed that detailed information regarding the speaker's voice is preserved in memory and used in recognition, rather than being eliminated via normalization. In the present study, the effects of varying both accent and talker are investigated using lists of words spoken by (a) a single native English speaker, (b) six native English speakers, (c) three native English speakers and three Japanese-accented English speakers. Twelve /hVd/ words were mixed with multi-speaker babble at three signal-to-noise ratios (+10, +5, and 0 dB) to create the word lists. Native English-speaking listeners' percent-correct recognition for words produced by native English speakers across the three talker conditions (single talker native, multi-talker native, and multi-talker mixed native and non-native) and three signal-to-noise ratios will be compared to determine whether sources of speaker variability other than voice alone add to the processing demands imposed by simple (i.e., single accent) speaker variability in spoken word recognition.
Sulpizio, Simone; Fasoli, Fabio; Maass, Anne; Paladino, Maria Paola; Vespignani, Francesco; Eyssel, Friederike; Bentler, Dominik
2015-01-01
Empirical research had initially shown that English listeners are able to identify the speakers' sexual orientation based on voice cues alone. However, the accuracy of this voice-based categorization, as well as its generalizability to other languages (language-dependency) and to non-native speakers (language-specificity), has been questioned recently. Consequently, we address these open issues in 5 experiments: First, we tested whether Italian and German listeners are able to correctly identify sexual orientation of same-language male speakers. Then, participants of both nationalities listened to voice samples and rated the sexual orientation of both Italian and German male speakers. We found that listeners were unable to identify the speakers' sexual orientation correctly. However, speakers were consistently categorized as either heterosexual or gay on the basis of how they sounded. Moreover, a similar pattern of results emerged when listeners judged the sexual orientation of speakers of their own and of the foreign language. Overall, this research suggests that voice-based categorization of sexual orientation reflects the listeners' expectations of how gay voices sound rather than being an accurate detector of the speakers' actual sexual identity. Results are discussed with regard to accuracy, acoustic features of voices, language dependency and language specificity.
Speaker and Observer Perceptions of Physical Tension during Stuttering.
Tichenor, Seth; Leslie, Paula; Shaiman, Susan; Yaruss, J Scott
2017-01-01
Speech-language pathologists routinely assess physical tension during evaluation of those who stutter. If speakers experience tension that is not visible to clinicians, then judgments of severity may be inaccurate. This study addressed this potential discrepancy by comparing judgments of tension by people who stutter and expert clinicians to determine if clinicians could accurately identify the speakers' experience of physical tension. Ten adults who stutter were audio-video recorded in two speaking samples. Two board-certified specialists in fluency evaluated the samples using the Stuttering Severity Instrument-4 and a checklist adapted for this study. Speakers rated their tension using the same forms, and then discussed their experiences in a qualitative interview so that themes related to physical tension could be identified. The degree of tension reported by speakers was higher than that observed by specialists. Tension in parts of the body that were less visible to the observer (chest, abdomen, throat) was reported more by speakers than by specialists. The thematic analysis revealed that speakers' experience of tension changes over time and that these changes may be related to speakers' acceptance of stuttering. The lack of agreement between speaker and specialist perceptions of tension suggests that using self-reports is a necessary component for supporting the accurate diagnosis of tension in stuttering. © 2018 S. Karger AG, Basel.
Speech Prosody Across Stimulus Types for Individuals with Parkinson's Disease.
K-Y Ma, Joan; Schneider, Christine B; Hoffmann, Rüdiger; Storch, Alexander
2015-01-01
Up to 89% of the individuals with Parkinson's disease (PD) experience speech problem over the course of the disease. Speech prosody and intelligibility are two of the most affected areas in hypokinetic dysarthria. However, assessment of these areas could potentially be problematic as speech prosody and intelligibility could be affected by the type of speech materials employed. To comparatively explore the effects of different types of speech stimulus on speech prosody and intelligibility in PD speakers. Speech prosody and intelligibility of two groups of individuals with varying degree of dysarthria resulting from PD was compared to that of a group of control speakers using sentence reading, passage reading and monologue. Acoustic analysis including measures on fundamental frequency (F0), intensity and speech rate was used to form a prosodic profile for each individual. Speech intelligibility was measured for the speakers with dysarthria using direct magnitude estimation. Difference in F0 variability between the speakers with dysarthria and control speakers was only observed in sentence reading task. Difference in the average intensity level was observed for speakers with mild dysarthria to that of the control speakers. Additionally, there were stimulus effect on both intelligibility and prosodic profile. The prosodic profile of PD speakers was different from that of the control speakers in the more structured task, and lower intelligibility was found in less structured task. This highlighted the value of both structured and natural stimulus to evaluate speech production in PD speakers.
Social dominance orientation, nonnative accents, and hiring recommendations.
Hansen, Karolina; Dovidio, John F
2016-10-01
Discrimination against nonnative speakers is widespread and largely socially acceptable. Nonnative speakers are evaluated negatively because accent is a sign that they belong to an outgroup and because understanding their speech requires unusual effort from listeners. The present research investigated intergroup bias, based on stronger support for hierarchical relations between groups (social dominance orientation [SDO]), as a predictor of hiring recommendations of nonnative speakers. In an online experiment using an adaptation of the thin-slices methodology, 65 U.S. adults (54% women; 80% White; Mage = 35.91, range = 18-67) heard a recording of a job applicant speaking with an Asian (Mandarin Chinese) or a Latino (Spanish) accent. Participants indicated how likely they would be to recommend hiring the speaker, answered questions about the text, and indicated how difficult it was to understand the applicant. Independent of objective comprehension, participants high in SDO reported that it was more difficult to understand a Latino speaker than an Asian speaker. SDO predicted hiring recommendations of the speakers, but this relationship was mediated by the perception that nonnative speakers were difficult to understand. This effect was stronger for speakers from lower status groups (Latinos relative to Asians) and was not related to objective comprehension. These findings suggest a cycle of prejudice toward nonnative speakers: Not only do perceptions of difficulty in understanding cause prejudice toward them, but also prejudice toward low-status groups can lead to perceived difficulty in understanding members of these groups. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Goller, Florian; Lee, Donghoon; Ansorge, Ulrich; Choi, Soonja
2017-01-01
Languages differ in how they categorize spatial relations: While German differentiates between containment (in) and support (auf) with distinct spatial words—(a) den Kuli IN die Kappe stecken (”put pen in cap”); (b) die Kappe AUF den Kuli stecken (”put cap on pen”)—Korean uses a single spatial word (kkita) collapsing (a) and (b) into one semantic category, particularly when the spatial enclosure is tight-fit. Korean uses a different word (i.e., netha) for loose-fits (e.g., apple in bowl). We tested whether these differences influence the attention of the speaker. In a crosslinguistic study, we compared native German speakers with native Korean speakers. Participants rated the similarity of two successive video clips of several scenes where two objects were joined or nested (either in a tight or loose manner). The rating data show that Korean speakers base their rating of similarity more on tight- versus loose-fit, whereas German speakers base their rating more on containment versus support (in vs. auf). Throughout the experiment, we also measured the participants’ eye movements. Korean speakers looked equally long at the moving Figure object and at the stationary Ground object, whereas German speakers were more biased to look at the Ground object. Additionally, Korean speakers also looked more at the region where the two objects touched than did German speakers. We discuss our data in the light of crosslinguistic semantics and the extent of their influence on spatial cognition and perception. PMID:29362644
An emergence of coordinated communication in populations of agents.
Kvasnicka, V; Pospichal, J
1999-01-01
The purpose of this article is to demonstrate that coordinated communication spontaneously emerges in a population composed of agents that are capable of specific cognitive activities. Internal states of agents are characterized by meaning vectors. Simple neural networks composed of one layer of hidden neurons perform cognitive activities of agents. An elementary communication act consists of the following: (a) two agents are selected, where one of them is declared the speaker and the other the listener; (b) the speaker codes a selected meaning vector onto a sequence of symbols and sends it to the listener as a message; and finally, (c) the listener decodes this message into a meaning vector and adapts his or her neural network such that the differences between speaker and listener meaning vectors are decreased. A Darwinian evolution enlarged by ideas from the Baldwin effect and Dawkins' memes is simulated by a simple version of an evolutionary algorithm without crossover. The agent fitness is determined by success of the mutual pairwise communications. It is demonstrated that agents in the course of evolution gradually do a better job of decoding received messages (they are closer to meaning vectors of speakers) and all agents gradually start to use the same vocabulary for the common communication. Moreover, if agent meaning vectors contain regularities, then these regularities are manifested also in messages created by agent speakers, that is, similar parts of meaning vectors are coded by similar symbol substrings. This observation is considered a manifestation of the emergence of a grammar system in the common coordinated communication.
Speaker and Accent Variation Are Handled Differently: Evidence in Native and Non-Native Listeners
Kriengwatana, Buddhamas; Terry, Josephine; Chládková, Kateřina; Escudero, Paola
2016-01-01
Listeners are able to cope with between-speaker variability in speech that stems from anatomical sources (i.e. individual and sex differences in vocal tract size) and sociolinguistic sources (i.e. accents). We hypothesized that listeners adapt to these two types of variation differently because prior work indicates that adapting to speaker/sex variability may occur pre-lexically while adapting to accent variability may require learning from attention to explicit cues (i.e. feedback). In Experiment 1, we tested our hypothesis by training native Dutch listeners and Australian-English (AusE) listeners without any experience with Dutch or Flemish to discriminate between the Dutch vowels /I/ and /ε/ from a single speaker. We then tested their ability to classify /I/ and /ε/ vowels of a novel Dutch speaker (i.e. speaker or sex change only), or vowels of a novel Flemish speaker (i.e. speaker or sex change plus accent change). We found that both Dutch and AusE listeners could successfully categorize vowels if the change involved a speaker/sex change, but not if the change involved an accent change. When AusE listeners were given feedback on their categorization responses to the novel speaker in Experiment 2, they were able to successfully categorize vowels involving an accent change. These results suggest that adapting to accents may be a two-step process, whereby the first step involves adapting to speaker differences at a pre-lexical level, and the second step involves adapting to accent differences at a contextual level, where listeners have access to word meaning or are given feedback that allows them to appropriately adjust their perceptual category boundaries. PMID:27309889
San Segundo, Eugenia; Tsanas, Athanasios; Gómez-Vilda, Pedro
2017-01-01
There is a growing consensus that hybrid approaches are necessary for successful speaker characterization in Forensic Speaker Comparison (FSC); hence this study explores the forensic potential of voice features combining source and filter characteristics. The former relate to the action of the vocal folds while the latter reflect the geometry of the speaker's vocal tract. This set of features have been extracted from pause fillers, which are long enough for robust feature estimation while spontaneous enough to be extracted from voice samples in real forensic casework. Speaker similarity was measured using standardized Euclidean Distances (ED) between pairs of speakers: 54 different-speaker (DS) comparisons, 54 same-speaker (SS) comparisons and 12 comparisons between monozygotic twins (MZ). Results revealed that the differences between DS and SS comparisons were significant in both high quality and telephone-filtered recordings, with no false rejections and limited false acceptances; this finding suggests that this set of voice features is highly speaker-dependent and therefore forensically useful. Mean ED for MZ pairs lies between the average ED for SS comparisons and DS comparisons, as expected according to the literature on twin voices. Specific cases of MZ speakers with very high ED (i.e. strong dissimilarity) are discussed in the context of sociophonetic and twin studies. A preliminary simplification of the Vocal Profile Analysis (VPA) Scheme is proposed, which enables the quantification of voice quality features in the perceptual assessment of speaker similarity, and allows for the calculation of perceptual-acoustic correlations. The adequacy of z-score normalization for this study is also discussed, as well as the relevance of heat maps for detecting the so-called phantoms in recent approaches to the biometric menagerie. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Speech to Text Translation for Malay Language
NASA Astrophysics Data System (ADS)
Al-khulaidi, Rami Ali; Akmeliawati, Rini
2017-11-01
The speech recognition system is a front end and a back-end process that receives an audio signal uttered by a speaker and converts it into a text transcription. The speech system can be used in several fields including: therapeutic technology, education, social robotics and computer entertainments. In most cases in control tasks, which is the purpose of proposing our system, wherein the speed of performance and response concern as the system should integrate with other controlling platforms such as in voiced controlled robots. Therefore, the need for flexible platforms, that can be easily edited to jibe with functionality of the surroundings, came to the scene; unlike other software programs that require recording audios and multiple training for every entry such as MATLAB and Phoenix. In this paper, a speech recognition system for Malay language is implemented using Microsoft Visual Studio C#. 90 (ninety) Malay phrases were tested by 10 (ten) speakers from both genders in different contexts. The result shows that the overall accuracy (calculated from Confusion Matrix) is satisfactory as it is 92.69%.
Schelinski, Stefanie; Riedel, Philipp; von Kriegstein, Katharina
2014-12-01
In auditory-only conditions, for example when we listen to someone on the phone, it is essential to fast and accurately recognize what is said (speech recognition). Previous studies have shown that speech recognition performance in auditory-only conditions is better if the speaker is known not only by voice, but also by face. Here, we tested the hypothesis that such an improvement in auditory-only speech recognition depends on the ability to lip-read. To test this we recruited a group of adults with autism spectrum disorder (ASD), a condition associated with difficulties in lip-reading, and typically developed controls. All participants were trained to identify six speakers by name and voice. Three speakers were learned by a video showing their face and three others were learned in a matched control condition without face. After training, participants performed an auditory-only speech recognition test that consisted of sentences spoken by the trained speakers. As a control condition, the test also included speaker identity recognition on the same auditory material. The results showed that, in the control group, performance in speech recognition was improved for speakers known by face in comparison to speakers learned in the matched control condition without face. The ASD group lacked such a performance benefit. For the ASD group auditory-only speech recognition was even worse for speakers known by face compared to speakers not known by face. In speaker identity recognition, the ASD group performed worse than the control group independent of whether the speakers were learned with or without face. Two additional visual experiments showed that the ASD group performed worse in lip-reading whereas face identity recognition was within the normal range. The findings support the view that auditory-only communication involves specific visual mechanisms. Further, they indicate that in ASD, speaker-specific dynamic visual information is not available to optimize auditory-only speech recognition. Copyright © 2014 Elsevier Ltd. All rights reserved.
Speaker Recognition Through NLP and CWT Modeling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown-VanHoozer, S.A.; Kercel, S.W.; Tucker, R.W.
The objective of this research is to develop a system capable of identifying speakers on wiretaps from a large database (>500 speakers) with a short search time duration (<30 seconds), and with better than 90% accuracy. Much previous research in speaker recognition has led to algorithms that produced encouraging preliminary results, but were overwhelmed when applied to populations of more than a dozen or so different speakers. The authors are investigating a solution to the "large population" problem by seeking two completely different kinds of characterizing features. These features are he techniques of Neuro-Linguistic Programming (NLP) and the continuous waveletmore » transform (CWT). NLP extracts precise neurological, verbal and non-verbal information, and assimilates the information into useful patterns. These patterns are based on specific cues demonstrated by each individual, and provide ways of determining congruency between verbal and non-verbal cues. The primary NLP modalities are characterized through word spotting (or verbal predicates cues, e.g., see, sound, feel, etc.) while the secondary modalities would be characterized through the speech transcription used by the individual. This has the practical effect of reducing the size of the search space, and greatly speeding up the process of identifying an unknown speaker. The wavelet-based line of investigation concentrates on using vowel phonemes and non-verbal cues, such as tempo. The rationale for concentrating on vowels is there are a limited number of vowels phonemes, and at least one of them usually appears in even the shortest of speech segments. Using the fast, CWT algorithm, the details of both the formant frequency and the glottal excitation characteristics can be easily extracted from voice waveforms. The differences in the glottal excitation waveforms as well as the formant frequency are evident in the CWT output. More significantly, the CWT reveals significant detail of the glottal excitation waveform.« less
Speaker recognition through NLP and CWT modeling.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown-VanHoozer, A.; Kercel, S. W.; Tucker, R. W.
The objective of this research is to develop a system capable of identifying speakers on wiretaps from a large database (>500 speakers) with a short search time duration (<30 seconds), and with better than 90% accuracy. Much previous research in speaker recognition has led to algorithms that produced encouraging preliminary results, but were overwhelmed when applied to populations of more than a dozen or so different speakers. The authors are investigating a solution to the ''huge population'' problem by seeking two completely different kinds of characterizing features. These features are extracted using the techniques of Neuro-Linguistic Programming (NLP) and themore » continuous wavelet transform (CWT). NLP extracts precise neurological, verbal and non-verbal information, and assimilates the information into useful patterns. These patterns are based on specific cues demonstrated by each individual, and provide ways of determining congruency between verbal and non-verbal cues. The primary NLP modalities are characterized through word spotting (or verbal predicates cues, e.g., see, sound, feel, etc.) while the secondary modalities would be characterized through the speech transcription used by the individual. This has the practical effect of reducing the size of the search space, and greatly speeding up the process of identifying an unknown speaker. The wavelet-based line of investigation concentrates on using vowel phonemes and non-verbal cues, such as tempo. The rationale for concentrating on vowels is there are a limited number of vowels phonemes, and at least one of them usually appears in even the shortest of speech segments. Using the fast, CWT algorithm, the details of both the formant frequency and the glottal excitation characteristics can be easily extracted from voice waveforms. The differences in the glottal excitation waveforms as well as the formant frequency are evident in the CWT output. More significantly, the CWT reveals significant detail of the glottal excitation waveform.« less
Phonetic Encoding of Coda Voicing Contrast under Different Focus Conditions in L1 vs. L2 English
Choi, Jiyoun; Kim, Sahayng; Cho, Taehong
2016-01-01
This study investigated how coda voicing contrast in English would be phonetically encoded in the temporal vs. spectral dimension of the preceding vowel (in vowel duration vs. F1/F2) by Korean L2 speakers of English, and how their L2 phonetic encoding pattern would be compared to that of native English speakers. Crucially, these questions were explored by taking into account the phonetics-prosody interface, testing effects of prominence by comparing target segments in three focus conditions (phonological focus, lexical focus, and no focus). Results showed that Korean speakers utilized the temporal dimension (vowel duration) to encode coda voicing contrast, but failed to use the spectral dimension (F1/F2), reflecting their native language experience—i.e., with a more sparsely populated vowel space in Korean, they are less sensitive to small changes in the spectral dimension, and hence fine-grained spectral cues in English are not readily accessible. Results also showed that along the temporal dimension, both the L1 and L2 speakers hyperarticulated coda voicing contrast under prominence (when phonologically or lexically focused), but hypoarticulated it in the non-prominent condition. This indicates that low-level phonetic realization and high-order information structure interact in a communicatively efficient way, regardless of the speakers’ native language background. The Korean speakers, however, used the temporal phonetic space differently from the way the native speakers did, especially showing less reduction in the no focus condition. This was also attributable to their native language experience—i.e., the Korean speakers’ use of temporal dimension is constrained in a way that is not detrimental to the preservation of coda voicing contrast, given that they failed to add additional cues along the spectral dimension. The results imply that the L2 phonetic system can be more fully illuminated through an investigation of the phonetics-prosody interface in connection with the L2 speakers’ native language experience. PMID:27242571
Invariant principles of speech motor control that are not language-specific.
Chakraborty, Rahul
2012-12-01
Bilingual speakers must learn to modify their speech motor control mechanism based on the linguistic parameters and rules specified by the target language. This study examines if there are aspects of speech motor control which remain invariant regardless of the first (L1) and second (L2) language targets. Based on the age of academic exposure and proficiency in L2, 21 Bengali-English bilingual participants were classified into high (n = 11) and low (n = 10) L2 (English) proficiency groups. Using the Optotrak 3020 motion sensitive camera system, the lips and jaw movements were recorded while participants produced Bengali (L1) and English (L2) sentences. Based on kinematic analyses of the lip and jaw movements, two different variability measures (i.e., lip aperture and lower lip/jaw complex) were computed for English and Bengali sentences. Analyses demonstrated that the two groups of bilingual speakers produced lip aperture complexes (a higher order synergy) that were more consistent in co-ordination than were the lower lip/jaw complexes (a lower order synergy). Similar findings were reported earlier in monolingual English speakers by Smith and Zelaznik. Thus, this hierarchical organization may be viewed as a fundamental principle of speech motor control, since it is maintained even in bilingual speakers.
The influence of ambient speech on adult speech productions through unintentional imitation.
Delvaux, Véronique; Soquet, Alain
2007-01-01
This paper deals with the influence of ambient speech on individual speech productions. A methodological framework is defined to gather the experimental data necessary to feed computer models simulating self-organisation in phonological systems. Two experiments were carried out. Experiment 1 was run on French native speakers from two regiolects of Belgium: two from Liège and two from Brussels. When exposed to the way of speaking of the other regiolect via loudspeakers, the speakers of one regiolect produced vowels that were significantly different from their typical realisations, and significantly closer to the way of speaking specific of the other regiolect. Experiment 2 achieved a replication of the results for 8 Mons speakers hearing a Liège speaker. A significant part of the imitative effect remained up to 10 min after the end of the exposure to the other regiolect productions. As a whole, the results suggest that: (i) imitation occurs automatically and unintentionally, (ii) the modified realisations leave a memory trace, in which case the mechanism may be better defined as 'mimesis' than as 'imitation'. The potential effects of multiple imitative speech interactions on sound change are discussed in this paper, as well as the implications for a general theory of phonetic implementation and phonetic representation.
On the Development of Speech Resources for the Mixtec Language
2013-01-01
The Mixtec language is one of the main native languages in Mexico. In general, due to urbanization, discrimination, and limited attempts to promote the culture, the native languages are disappearing. Most of the information available about the Mixtec language is in written form as in dictionaries which, although including examples about how to pronounce the Mixtec words, are not as reliable as listening to the correct pronunciation from a native speaker. Formal acoustic resources, as speech corpora, are almost non-existent for the Mixtec, and no speech technologies are known to have been developed for it. This paper presents the development of the following resources for the Mixtec language: (1) a speech database of traditional narratives of the Mixtec culture spoken by a native speaker (labelled at the phonetic and orthographic levels by means of spectral analysis) and (2) a native speaker-adaptive automatic speech recognition (ASR) system (trained with the speech database) integrated with a Mixtec-to-Spanish/Spanish-to-Mixtec text translator. The speech database, although small and limited to a single variant, was reliable enough to build the multiuser speech application which presented a mean recognition/translation performance up to 94.36% in experiments with non-native speakers (the target users). PMID:23710134
Second Language Writing Classification System Based on Word-Alignment Distribution
ERIC Educational Resources Information Center
Kotani, Katsunori; Yoshimi, Takehiko
2010-01-01
The present paper introduces an automatic classification system for assisting second language (L2) writing evaluation. This system, which classifies sentences written by L2 learners as either native speaker-like or learner-like sentences, is constructed by machine learning algorithms using word-alignment distributions as classification features…
... Education Achieving Zero Preventable Deaths Conference Publications and Posters National Trauma System Injury Prevention and Control Quality ... Abstracts Hotel Accommodations Travel Information Moderator/Speaker ... Poster Presenter Information 2017 Presentations Education Education Overview Division ...
Human Language Technology: Opportunities and Challenges
2005-01-01
because of the connections to and reliance on signal processing. Audio diarization critically includes indexing of speakers [12], since speaker ...to reduce inter- speaker variability in training. Standard techniques include vocal-tract length normalization, adaptation of acoustic models using...maximum likelihood linear regression (MLLR), and speaker -adaptive training based on MLLR. The acoustic models are mixtures of Gaussians, typically with
ERIC Educational Resources Information Center
Tsurutani, Chiharu
2012-01-01
Foreign-accented speakers are generally regarded as less educated, less reliable and less interesting than native speakers and tend to be associated with cultural stereotypes of their country of origin. This discrimination against foreign accents has, however, been discussed mainly using accented English in English-speaking countries. This study…
The Employability of Non-Native-Speaker Teachers of EFL: A UK Survey
ERIC Educational Resources Information Center
Clark, Elizabeth; Paran, Amos
2007-01-01
The native speaker still has a privileged position in English language teaching, representing both the model speaker and the ideal teacher. Non-native-speaker teachers of English are often perceived as having a lower status than their native-speaking counterparts, and have been shown to face discriminatory attitudes when applying for teaching…
Generic Language and Speaker Confidence Guide Preschoolers' Inferences about Novel Animate Kinds
ERIC Educational Resources Information Center
Stock, Hayli R.; Graham, Susan A.; Chambers, Craig G.
2009-01-01
We investigated the influence of speaker certainty on 156 four-year-old children's sensitivity to generic and nongeneric statements. An inductive inference task was implemented, in which a speaker described a nonobvious property of a novel creature using either a generic or a nongeneric statement. The speaker appeared to be confident, neutral, or…
Modern Greek Language: Acquisition of Morphology and Syntax by Non-Native Speakers
ERIC Educational Resources Information Center
Andreou, Georgia; Karapetsas, Anargyros; Galantomos, Ioannis
2008-01-01
This study investigated the performance of native and non native speakers of Modern Greek language on morphology and syntax tasks. Non-native speakers of Greek whose native language was English, which is a language with strict word order and simple morphology, made more errors and answered more slowly than native speakers on morphology but not…
ERIC Educational Resources Information Center
Kong, Anthony Pak-Hin; Law, Sam-Po; Chak, Gigi Wan-Chi
2017-01-01
Purpose: Coverbal gesture use, which is affected by the presence and degree of aphasia, can be culturally specific. The purpose of this study was to compare gesture use among Cantonese-speaking individuals: 23 neurologically healthy speakers, 23 speakers with fluent aphasia, and 21 speakers with nonfluent aphasia. Method: Multimedia data of…
ERIC Educational Resources Information Center
Gorman, Kristen S.; Gegg-Harrison, Whitney; Marsh, Chelsea R.; Tanenhaus, Michael K.
2013-01-01
When referring to named objects, speakers can choose either a name ("mbira") or a description ("that gourd-like instrument with metal strips"); whether the name provides useful information depends on whether the speaker's knowledge of the name is shared with the addressee. But, how do speakers determine what is shared? In 2…
Accent Attribution in Speakers with Foreign Accent Syndrome
ERIC Educational Resources Information Center
Verhoeven, Jo; De Pauw, Guy; Pettinato, Michele; Hirson, Allen; Van Borsel, John; Marien, Peter
2013-01-01
Purpose: The main aim of this experiment was to investigate the perception of Foreign Accent Syndrome in comparison to speakers with an authentic foreign accent. Method: Three groups of listeners attributed accents to conversational speech samples of 5 FAS speakers which were embedded amongst those of 5 speakers with a real foreign accent and 5…
Race in Conflict with Heritage: "Black" Heritage Language Speaker of Japanese
ERIC Educational Resources Information Center
Doerr, Neriko Musha; Kumagai, Yuri
2014-01-01
"Heritage language speaker" is a relatively new term to denote minority language speakers who grew up in a household where the language was used or those who have a family, ancestral, or racial connection to the minority language. In research on heritage language speakers, overlap between these 2 definitions is often assumed--that is,…
ERIC Educational Resources Information Center
Montrul, Silvina; Davidson, Justin; De La Fuente, Israel; Foote, Rebecca
2014-01-01
We examined how age of acquisition in Spanish heritage speakers and L2 learners interacts with implicitness vs. explicitness of tasks in gender processing of canonical and non-canonical ending nouns. Twenty-three Spanish native speakers, 29 heritage speakers, and 33 proficiency-matched L2 learners completed three on-line spoken word recognition…
The Role of Interaction in Native Speaker Comprehension of Nonnative Speaker Speech.
ERIC Educational Resources Information Center
Polio, Charlene; Gass, Susan M.
1998-01-01
Because interaction gives language learners an opportunity to modify their speech upon a signal of noncomprehension, it should also have a positive effect on native speakers' (NS) comprehension of nonnative speakers (NNS). This study shows that interaction does help NSs comprehend NNSs, contrasting the claims of an earlier study that found no…
The persuasiveness of synthetic speech versus human speech.
Stern, S E; Mullennix, J W; Dyson, C; Wilson, S J
1999-12-01
Is computer-synthesized speech as persuasive as the human voice when presenting an argument? After completing an attitude pretest, 193 participants were randomly assigned to listen to a persuasive appeal under three conditions: a high-quality synthesized speech system (DECtalk Express), a low-quality synthesized speech system (Monologue), and a tape recording of a human voice. Following the appeal, participants completed a posttest attitude survey and a series of questionnaires designed to assess perceptions of speech qualities, perceptions of the speaker, and perceptions of the message. The human voice was generally perceived more favorably than the computer-synthesized voice, and the speaker was perceived more favorably when the voice was a human voice than when it was computer synthesized. There was, however, no evidence that computerized speech, as compared with the human voice, affected persuasion or perceptions of the message. Actual or potential applications of this research include issues that should be considered when designing synthetic speech systems.
Propensity and stickiness in the naming game: Tipping fractions of minorities
NASA Astrophysics Data System (ADS)
Thompson, Andrew M.; Szymanski, Boleslaw K.; Lim, Chjan C.
2014-10-01
Agent-based models of the binary naming game are generalized here to represent a family of models parameterized by the introduction of two continuous parameters. These parameters define varying listener-speaker interactions on the individual level with one parameter controlling the speaker and the other controlling the listener of each interaction. The major finding presented here is that the generalized naming game preserves the existence of critical thresholds for the size of committed minorities. Above such threshold, a committed minority causes a fast (in time logarithmic in size of the network) convergence to consensus, even when there are other parameters influencing the system. Below such threshold, reaching consensus requires time exponential in the size of the network. Moreover, the two introduced parameters cause bifurcations in the stabilities of the system's fixed points and may lead to changes in the system's consensus.
The perception of syllable affiliation of singleton stops in repetitive speech.
de Jong, Kenneth J; Lim, Byung-Jin; Nagao, Kyoko
2004-01-01
Stetson (1951) noted that repeating singleton coda consonants at fast speech rates makes them be perceived as onset consonants affiliated with a following vowel. The current study documents the perception of rate-induced resyllabification, as well as what temporal properties give rise to the perception of syllable affiliation. Stimuli were extracted from a previous study of repeated stop + vowel and vowel + stop syllables (de Jong, 2001a, 2001b). Forced-choice identification tasks show that slow repetitions are clearly distinguished. As speakers increase rate, they reach a point after which listeners disagree as to the affiliation of the stop. This pattern is found for voiced and voiceless consonants using different stimulus extraction techniques. Acoustic models of the identifications indicate that the sudden shift in syllabification occurs with the loss of an acoustic hiatus between successive syllables. Acoustic models of the fast rate identifications indicate various other qualities, such as consonant voicing, affect the probability that the consonants will be perceived as onsets. These results indicate a model of syllabic affiliation where specific juncture-marking aspects of the signal dominate parsing, and in their absence other differences provide additional, weaker cues to syllabic affiliation.
Person authentication using brainwaves (EEG) and maximum a posteriori model adaptation.
Marcel, Sébastien; Millán, José Del R
2007-04-01
In this paper, we investigate the use of brain activity for person authentication. It has been shown in previous studies that the brain-wave pattern of every individual is unique and that the electroencephalogram (EEG) can be used for biometric identification. EEG-based biometry is an emerging research topic and we believe that it may open new research directions and applications in the future. However, very little work has been done in this area and was focusing mainly on person identification but not on person authentication. Person authentication aims to accept or to reject a person claiming an identity, i.e., comparing a biometric data to one template, while the goal of person identification is to match the biometric data against all the records in a database. We propose the use of a statistical framework based on Gaussian Mixture Models and Maximum A Posteriori model adaptation, successfully applied to speaker and face authentication, which can deal with only one training session. We perform intensive experimental simulations using several strict train/test protocols to show the potential of our method. We also show that there are some mental tasks that are more appropriate for person authentication than others.
The cognitive neuroscience of person identification.
Biederman, Irving; Shilowich, Bryan E; Herald, Sarah B; Margalit, Eshed; Maarek, Rafael; Meschke, Emily X; Hacker, Catrina M
2018-02-14
We compare and contrast five differences between person identification by voice and face. 1. There is little or no cost when a familiar face is to be recognized from an unrestricted set of possible faces, even at Rapid Serial Visual Presentation (RSVP) rates, but the accuracy of familiar voice recognition declines precipitously when the set of possible speakers is increased from one to a mere handful. 2. Whereas deficits in face recognition are typically perceptual in origin, those with normal perception of voices can manifest severe deficits in their identification. 3. Congenital prosopagnosics (CPros) and congenital phonagnosics (CPhon) are generally unable to imagine familiar faces and voices, respectively. Only in CPros, however, is this deficit a manifestation of a general inability to form visual images of any kind. CPhons report no deficit in imaging non-voice sounds. 4. The prevalence of CPhons of 3.2% is somewhat higher than the reported prevalence of approximately 2.0% for CPros in the population. There is evidence that CPhon represents a distinct condition statistically and not just normal variation. 5. Face and voice recognition proficiency are uncorrelated rather than reflecting limitations of a general capacity for person individuation. Copyright © 2018 Elsevier Ltd. All rights reserved.
ERIC Educational Resources Information Center
Pestel, Ann
1989-01-01
The author discusses working with speakers from business and industry to present career information at the secondary level. Advice for speakers is presented, as well as tips for program coordinators. (CH)
Stasenko, Alena; Bonn, Cory; Teghipco, Alex; Garcea, Frank E.; Sweet, Catherine; Dombovy, Mary; McDonough, Joyce; Mahon, Bradford Z.
2015-01-01
In the last decade, the debate about the causal role of the motor system in speech perception has been reignited by demonstrations that motor processes are engaged during the processing of speech sounds. However, the exact role of the motor system in auditory speech processing remains elusive. Here we evaluate which aspects of auditory speech processing are affected, and which are not, in a stroke patient with dysfunction of the speech motor system. The patient’s spontaneous speech was marked by frequent phonological/articulatory errors, and those errors were caused, at least in part, by motor-level impairments with speech production. We found that the patient showed a normal phonemic categorical boundary when discriminating two nonwords that differ by a minimal pair (e.g., ADA-AGA). However, using the same stimuli, the patient was unable to identify or label the nonword stimuli (using a button-press response). A control task showed that he could identify speech sounds by speaker gender, ruling out a general labeling impairment. These data suggest that the identification (i.e. labeling) of nonword speech sounds may involve the speech motor system, but that the perception of speech sounds (i.e., discrimination) does not require the motor system. This means that motor processes are not causally involved in perception of the speech signal, and suggest that the motor system may be used when other cues (e.g., meaning, context) are not available. PMID:25951749
Lee, Jiyeon; Yoshida, Masaya; Thompson, Cynthia K
2015-08-01
Grammatical encoding (GE) is impaired in agrammatic aphasia; however, the nature of such deficits remains unclear. We examined grammatical planning units during real-time sentence production in speakers with agrammatic aphasia and control speakers, testing two competing models of GE. We queried whether speakers with agrammatic aphasia produce sentences word by word without advanced planning or whether hierarchical syntactic structure (i.e., verb argument structure; VAS) is encoded as part of the advanced planning unit. Experiment 1 examined production of sentences with a predefined structure (i.e., "The A and the B are above the C") using eye tracking. Experiment 2 tested production of transitive and unaccusative sentences without a predefined sentence structure in a verb-priming study. In Experiment 1, both speakers with agrammatic aphasia and young and age-matched control speakers used word-by-word strategies, selecting the first lemma (noun A) only prior to speech onset. However, in Experiment 2, unlike controls, speakers with agrammatic aphasia preplanned transitive and unaccusative sentences, encoding VAS before speech onset. Speakers with agrammatic aphasia show incremental, word-by-word production for structurally simple sentences, requiring retrieval of multiple noun lemmas. However, when sentences involve functional (thematic to grammatical) structure building, advanced planning strategies (i.e., VAS encoding) are used. This early use of hierarchical syntactic information may provide a scaffold for impaired GE in agrammatism.
Grammatical Encoding and Learning in Agrammatic Aphasia: Evidence from Structural Priming
Cho-Reyes, Soojin; Mack, Jennifer E.; Thompson, Cynthia K.
2017-01-01
The present study addressed open questions about the nature of sentence production deficits in agrammatic aphasia. In two structural priming experiments, 13 aphasic and 13 age-matched control speakers repeated visually- and auditorily-presented prime sentences, and then used visually-presented word arrays to produce dative sentences. Experiment 1 examined whether agrammatic speakers form structural and thematic representations during sentence production, whereas Experiment 2 tested the lasting effects of structural priming in lags of two and four sentences. Results of Experiment 1 showed that, like unimpaired speakers, the aphasic speakers evinced intact structural priming effects, suggesting that they are able to generate such representations. Unimpaired speakers also evinced reliable thematic priming effects, whereas agrammatic speakers did so in some experimental conditions, suggesting that access to thematic representations may be intact. Results of Experiment 2 showed structural priming effects of comparable magnitude for aphasic and unimpaired speakers. In addition, both groups showed lasting structural priming effects in both lag conditions, consistent with implicit learning accounts. In both experiments, aphasic speakers with more severe language impairments exhibited larger priming effects, consistent with the “inverse preference” prediction of implicit learning accounts. The findings indicate that agrammatic speakers are sensitive to structural priming across levels of representation and that such effects are lasting, suggesting that structural priming may be beneficial for the treatment of sentence production deficits in agrammatism. PMID:28924328
ERIC Educational Resources Information Center
Paul, Rhea; Shriberg, Lawrence D.; McSweeny, Jane; Cicchetti, Domenic; Klin, Ami; Volkmar, Fred
2005-01-01
Shriberg "et al." [Shriberg, L. "et al." (2001). "Journal of Speech, Language and Hearing Research, 44," 1097-1115] described prosody-voice features of 30 high functioning speakers with autistic spectrum disorder (ASD) compared to age-matched control speakers. The present study reports additional information on the speakers with ASD, including…
Investigating Holistic Measures of Speech Prosody
ERIC Educational Resources Information Center
Cunningham, Dana Aliel
2012-01-01
Speech prosody is a multi-faceted dimension of speech which can be measured and analyzed in a variety of ways. In this study, the speech prosody of Mandarin L1 speakers, English L2 speakers, and English L1 speakers was assessed by trained raters who listened to sound clips of the speakers responding to a graph prompt and reading a short passage.…
Young Children's Sensitivity to Speaker Gender When Learning from Others
ERIC Educational Resources Information Center
Ma, Lili; Woolley, Jacqueline D.
2013-01-01
This research explores whether young children are sensitive to speaker gender when learning novel information from others. Four- and 6-year-olds ("N" = 144) chose between conflicting statements from a male versus a female speaker (Studies 1 and 3) or decided which speaker (male or female) they would ask (Study 2) when learning about the functions…
ERIC Educational Resources Information Center
McNaughton, Stephanie; McDonough, Kim
2015-01-01
This exploratory study investigated second language (L2) French speakers' service encounters in the multilingual setting of Montreal, specifically whether switches to English during French service encounters were related to L2 speakers' willingness to communicate or motivation. Over a two-week period, 17 French L2 speakers in Montreal submitted…
ERIC Educational Resources Information Center
Gilbert, Harvey R.; Ferrand, Carole T.
1987-01-01
Respirometric quotients (RQ), the ratio of oral air volume expended to total volume expended, were obtained from the productions of oral and nasal airflow of 10 speakers with cleft palate, with and without their prosthetic appliances, and 10 normal speakers. Cleft palate speakers without their appliances exhibited the lowest RQ values. (Author/DB)
ERIC Educational Resources Information Center
Polio, Charlene; Gass, Susan; Chapin, Laura
2006-01-01
Implicit negative feedback has been shown to facilitate SLA, and the extent to which such feedback is given is related to a variety of task and interlocutor variables. The background of a native speaker (NS), in terms of amount of experience in interactions with nonnative speakers (NNSs), has been shown to affect the quantity of implicit negative…
ERIC Educational Resources Information Center
Tatsumi, Naofumi
2012-01-01
Previous research shows that American learners of Japanese (AJs) tend to differ from native Japanese speakers in their compliment responses (CRs). Yokota (1986) and Shimizu (2009) have reported that AJs tend to respond more negatively than native Japanese speakers. It has also been reported that AJs' CRs tend to lack the use of avoidance or…
Intelligibility of clear speech: effect of instruction.
Lam, Jennifer; Tjaden, Kris
2013-10-01
The authors investigated how clear speech instructions influence sentence intelligibility. Twelve speakers produced sentences in habitual, clear, hearing impaired, and overenunciate conditions. Stimuli were amplitude normalized and mixed with multitalker babble for orthographic transcription by 40 listeners. The main analysis investigated percentage-correct intelligibility scores as a function of the 4 conditions and speaker sex. Additional analyses included listener response variability, individual speaker trends, and an alternate intelligibility measure: proportion of content words correct. Relative to the habitual condition, the overenunciate condition was associated with the greatest intelligibility benefit, followed by the hearing impaired and clear conditions. Ten speakers followed this trend. The results indicated different patterns of clear speech benefit for male and female speakers. Greater listener variability was observed for speakers with inherently low habitual intelligibility compared to speakers with inherently high habitual intelligibility. Stable proportions of content words were observed across conditions. Clear speech instructions affected the magnitude of the intelligibility benefit. The instruction to overenunciate may be most effective in clear speech training programs. The findings may help explain the range of clear speech intelligibility benefit previously reported. Listener variability analyses suggested the importance of obtaining multiple listener judgments of intelligibility, especially for speakers with inherently low habitual intelligibility.
Smith, David R R; Walters, Thomas C; Patterson, Roy D
2007-12-01
A recent study [Smith and Patterson, J. Acoust. Soc. Am. 118, 3177-3186 (2005)] demonstrated that both the glottal-pulse rate (GPR) and the vocal-tract length (VTL) of vowel sounds have a large effect on the perceived sex and age (or size) of a speaker. The vowels for all of the "different" speakers in that study were synthesized from recordings of the sustained vowels of one, adult male speaker. This paper presents a follow-up study in which a range of vowels were synthesized from recordings of four different speakers--an adult man, an adult woman, a young boy, and a young girl--to determine whether the sex and age of the original speaker would have an effect upon listeners' judgments of whether a vowel was spoken by a man, woman, boy, or girl, after they were equated for GPR and VTL. The sustained vowels of the four speakers were scaled to produce the same combinations of GPR and VTL, which covered the entire range normally encountered in every day life. The results show that listeners readily distinguish children from adults based on their sustained vowels but that they struggle to distinguish the sex of the speaker.
Dikker, Suzanne; Silbert, Lauren J; Hasson, Uri; Zevin, Jason D
2014-04-30
Recent research has shown that the degree to which speakers and listeners exhibit similar brain activity patterns during human linguistic interaction is correlated with communicative success. Here, we used an intersubject correlation approach in fMRI to test the hypothesis that a listener's ability to predict a speaker's utterance increases such neural coupling between speakers and listeners. Nine subjects listened to recordings of a speaker describing visual scenes that varied in the degree to which they permitted specific linguistic predictions. In line with our hypothesis, the temporal profile of listeners' brain activity was significantly more synchronous with the speaker's brain activity for highly predictive contexts in left posterior superior temporal gyrus (pSTG), an area previously associated with predictive auditory language processing. In this region, predictability differentially affected the temporal profiles of brain responses in the speaker and listeners respectively, in turn affecting correlated activity between the two: whereas pSTG activation increased with predictability in the speaker, listeners' pSTG activity instead decreased for more predictable sentences. Listeners additionally showed stronger BOLD responses for predictive images before sentence onset, suggesting that highly predictable contexts lead comprehenders to preactivate predicted words.
When speaker identity is unavoidable: Neural processing of speaker identity cues in natural speech.
Tuninetti, Alba; Chládková, Kateřina; Peter, Varghese; Schiller, Niels O; Escudero, Paola
2017-11-01
Speech sound acoustic properties vary largely across speakers and accents. When perceiving speech, adult listeners normally disregard non-linguistic variation caused by speaker or accent differences, in order to comprehend the linguistic message, e.g. to correctly identify a speech sound or a word. Here we tested whether the process of normalizing speaker and accent differences, facilitating the recognition of linguistic information, is found at the level of neural processing, and whether it is modulated by the listeners' native language. In a multi-deviant oddball paradigm, native and nonnative speakers of Dutch were exposed to naturally-produced Dutch vowels varying in speaker, sex, accent, and phoneme identity. Unexpectedly, the analysis of mismatch negativity (MMN) amplitudes elicited by each type of change shows a large degree of early perceptual sensitivity to non-linguistic cues. This finding on perception of naturally-produced stimuli contrasts with previous studies examining the perception of synthetic stimuli wherein adult listeners automatically disregard acoustic cues to speaker identity. The present finding bears relevance to speech normalization theories, suggesting that at an unattended level of processing, listeners are indeed sensitive to changes in fundamental frequency in natural speech tokens. Copyright © 2017 Elsevier Inc. All rights reserved.
Patterns of lung volume use during an extemporaneous speech task in persons with Parkinson disease.
Bunton, Kate
2005-01-01
This study examined patterns of lung volume use in speakers with Parkinson disease (PD) during an extemporaneous speaking task. The performance of a control group was also examined. Behaviors described are based on acoustic, kinematic and linguistic measures. Group differences were found in breath group duration, lung volume initiation, and lung volume termination measures. Speakers in the control group alternated between a longer and shorter breath groups. With starting lung volumes being higher for the longer breath groups and lower for shorter breath groups. Speech production was terminated before reaching tidal end expiratory level. This pattern was also seen in 4 of 7 speakers with PD. The remaining 3 PD speakers initiated speech at low starting lung volumes and continued speaking below EEL. This subgroup of PD speakers ended breath groups at agrammatical boundaries, whereas control speakers ended at appropriate grammatical boundaries. As a result of participating in this exercise, the reader will (1) be able to describe the patterns of lung volume use in speakers with Parkinson disease and compare them with those employed by control speakers; and (2) obtain information about the influence of speaking task on speech breathing.
Hanulíková, Adriana; van Alphen, Petra M; van Goch, Merel M; Weber, Andrea
2012-04-01
How do native listeners process grammatical errors that are frequent in non-native speech? We investigated whether the neural correlates of syntactic processing are modulated by speaker identity. ERPs to gender agreement errors in sentences spoken by a native speaker were compared with the same errors spoken by a non-native speaker. In line with previous research, gender violations in native speech resulted in a P600 effect (larger P600 for violations in comparison with correct sentences), but when the same violations were produced by the non-native speaker with a foreign accent, no P600 effect was observed. Control sentences with semantic violations elicited comparable N400 effects for both the native and the non-native speaker, confirming no general integration problem in foreign-accented speech. The results demonstrate that the P600 is modulated by speaker identity, extending our knowledge about the role of speaker's characteristics on neural correlates of speech processing.
Factors affecting the perception of Korean-accented American English
NASA Astrophysics Data System (ADS)
Cho, Kwansun; Harris, John G.; Shrivastav, Rahul
2005-09-01
This experiment examines the relative contribution of two factors, intonation and articulation errors, on the perception of foreign accent in Korean-accented American English. Ten native speakers of Korean and ten native speakers of American English were asked to read ten English sentences. These sentences were then modified using high-quality speech resynthesis techniques [STRAIGHT Kawahara et al., Speech Commun. 27, 187-207 (1999)] to generate four sets of stimuli. In the first two sets of stimuli, the intonation patterns of the Korean speakers and American speakers were switched with one another. The articulatory errors for each speaker were not modified. In the final two sets, the sentences from the Korean and American speakers were resynthesized without any modifications. Fifteen listeners were asked to rate all the stimuli for the degree of foreign accent. Preliminary results show that, for native speakers of American English, articulation errors may play a greater role in the perception of foreign accent than errors in intonation patterns. [Work supported by KAIM.
Eiesland, Eli Anne; Lind, Marianne
2012-03-01
Compounds are words that are made up of at least two other words (lexemes), featuring lexical and syntactic characteristics and thus particularly interesting for the study of language processing. Most studies of compounds and language processing have been based on data from experimental single word production and comprehension tasks. To enhance the ecological validity of morphological processing research, data from other contexts, such as discourse production, need to be considered. This study investigates the production of nominal compounds in semi-spontaneous spoken texts by a group of speakers with fluent types of aphasia compared to a group of neurologically healthy speakers. The speakers with aphasia produce significantly fewer nominal compound types in their texts than the non-aphasic speakers, and the compounds they produce exhibit fewer different types of semantic relations than the compounds produced by the non-aphasic speakers. The results are discussed in relation to theories of language processing.
Nursing practice and work environment issues in the 21st century: a leadership challenge.
Manojlovich, Milisa; Barnsteiner, Jane; Bolton, Linda Burnes; Disch, Joanne; Saint, Sanjay
2008-01-01
A leadership conference titled "Have Patient Safety and the Workforce Shortage Created the Perfect Storm?" was held in honor of Dr. Ada Sue Hinshaw, who was ending her tenure as dean of the University of Michigan School of Nursing. A morning panel on the preferred future for practice featured plenary speaker Dr. Linda Burnes Bolton and participating panelists Dr. Sanjay Saint, Dr. Jane Barnsteiner, and Dr. Joanne Disch. Each speaker presented a unique yet complementary perspective, with several common themes permeating the morning's presentations. For example, all of the speakers mentioned how important interprofessional collaboration is to promoting patient safety. The themes can be categorized broadly as nursing practice and work environment issues, with subthemes of interprofessional communication and collaboration, systems solutions to patient safety problems, and future directions in nursing education. A synopsis of comments made during the morning practice panel and empirical support for the themes and subthemes identified by panelists are provided in this article.
High-Fidelity Piezoelectric Audio Device
NASA Technical Reports Server (NTRS)
Woodward, Stanley E.; Fox, Robert L.; Bryant, Robert G.
2003-01-01
ModalMax is a very innovative means of harnessing the vibration of a piezoelectric actuator to produce an energy efficient low-profile device with high-bandwidth high-fidelity audio response. The piezoelectric audio device outperforms many commercially available speakers made using speaker cones. The piezoelectric device weighs substantially less (4 g) than the speaker cones which use magnets (10 g). ModalMax devices have extreme fabrication simplicity. The entire audio device is fabricated by lamination. The simplicity of the design lends itself to lower cost. The piezoelectric audio device can be used without its acoustic chambers and thereby resulting in a very low thickness of 0.023 in. (0.58 mm). The piezoelectric audio device can be completely encapsulated, which makes it very attractive for use in wet environments. Encapsulation does not significantly alter the audio response. Its small size (see Figure 1) is applicable to many consumer electronic products, such as pagers, portable radios, headphones, laptop computers, computer monitors, toys, and electronic games. The audio device can also be used in automobile or aircraft sound systems.
Clancy, Cornelius J; Pappas, Peter; Vazquez, Jose; Judson, Marc A; Tobin, Ellis; Kontoyiannis, Dimitrios P; Thompson, George R; Reboli, Annette; Garey, Kevin W; Greenberg, Richard N; Ostrosky-Zeichner, Luis; Wu, Alan; Lyon, G Marshall; Apewokin, Senu; Nguyen, M Hong; Caliendo, Angela
2017-01-01
Abstract Background Blood cultures (BC) are the diagnostic gold standard for candidemia, but sensitivity is <50%. T2 Candida (T2) is a novel, FDA-approved nanodiagnostic panel, which utilizes T2 magnetic resonance and a dedicated instrument to detect Candida within whole blood samples. Methods Candidemic adults were identified at 14 centers by diagnostic BC (dBC). Follow-up blood samples were collected from all patients (pts) for testing by T2 and companion BC (cBC). T2 was run-in batch at a central lab; results are reported qualitatively for three groups of spp. (Candida albicans/C. tropicalis (CA/CT), C. glabrata/C. krusei (CG/CK), or C. parapsilosis (CP)). T2 and cBC were defined as positive (+) if they detected a sp. identified in dBC. Results 152 patients were enrolled (median age: 54 yrs (18–93); 54% (82) men). Candidemia risk factors included indwelling catheters (82%, 125), abdominal surgery (24%, 36), transplant (22%, 33), cancer (22%, 33), hemodialysis (17%, 26), neutropenia (10%, 15). Mean times to Candida detection/spp. identification by dBC were 47/133 hours (2/5.5 d). dBC revealed CA (30%, 46), CG (29%, 45), CP (28%, 43), CT (11%, 17) and CK (3%, 4). Mean time to collection of T2/cBC was 62 hours (2.6 d). 74% (112) of patients received antifungal (AF) therapy prior to T2/cBC (mean: 55 hours (2.3 d)). Overall, T2 results were more likely than cBC to be + (P < 0.0001; Table), a result driven by performance in AF-treated patients (P < 0.0001). T2 was more likely to be + among patients originally infected with CA (61% (28) vs. 20% (9); P = 0.001); there were trends toward higher positivity in patients infected with CT (59% (17) vs. 23% (4; P = 0.08) and CP (42% (18) vs. 28% (12); P = 0.26). T2 was + in 89% (32/36) of patients with + cBC. Conclusion T2 was sensitive for diagnosing candidemia at the time of + cBC, and it was significantly more like to be + than cBC among AF-treated patients. T2 is an important advance in the diagnosis of candidemia, which is likely to be particularly useful in patients receiving prophylactic, pre-emptive or empiric AF therapy. Test results, n (%) Pt group (n) T2+ T2- cBC+ cBC- T2+/cBC+ T2+/cBC- T2-/cBC+ T2-/cBC- All (152) 69 (45%) 83 (55%) 36 (24%) 116 (76%) 32 (21%) 37 (24%) 4 (3%) 79 (52%) Prior AF (112) 55 (49%) 57 (51%) 23 (20%) 89 (80%) 20 (18%) 35 (31%) 3 (3%) 54 (48%) No AF (40) 14 (35%) 26 (65%) 13 (32%) 27 (68%) 12 (30%) 2 (5%) 1 (2%) 25 (62%) Disclosure D. P. Kontoyiannis, Pfizer: Research Contractor, Research support and Speaker honorarium; Astellas: Research Contractor, Research support and Speaker honorarium; Merck: Honorarium, Speaker honorarium; Cidara: Honorarium, Speaker honorarium; Amplyx: Honorarium, Speaker honorarium; F2G: Honorarium, Speaker honorarium; L. Ostrosky-Zeichner, Astellas: Consultant and Grant Investigator, Consulting fee and Research grant; Merck: Scientific Advisor and Speaker’s Bureau, Consulting fee and Speaker honorarium; Pfizer: Grant Investigator and Speaker’s Bureau, Grant recipient and Speaker honorarium; Scynexis: Grant Investigator and Scientific Advisor, Consulting fee and Grant recipient; Cidara: Grant Investigator and Scientific Advisor, Consulting fee and Research grant; S. Apewokin, T2 biosystems: Investigator, Research support; Astellas: Scientific Advisor, Consulting fee
Sewell, Justin L.; Kushel, Margot B.; Inadomi, John M.; Yee, Hal F.
2009-01-01
Goals We sought to identify factors associated with gastroenterology clinic attendance in an urban safety net healthcare system. Background Missed clinic appointments reduce the efficiency and availability of healthcare, but subspecialty clinic attendance among patients with established healthcare access has not been studied. Study We performed an observational study using secondary data from administrative sources to study patients referred to, and scheduled for an appointment in, the adult gastroenterology clinic serving the safety net healthcare system of San Francisco, California. Our dependent variable was whether subjects attended or missed a scheduled appointment. Analysis included multivariable logistic regression and classification tree analysis. 1,833 patients were referred and scheduled for an appointment between 05/2005 and 08/2006. Prisoners were excluded. All patients had a primary care provider. Results 683 patients (37.3%) missed their appointment; 1,150 (62.7%) attended. Language was highly associated with attendance in the logistic regression; non-English speakers were less likely than English speakers to miss an appointment (adjusted odds ratio 0.42 [0.28,0.63] for Spanish, 0.56 [0.38,0.82] for Asian language, p < 0.001). Other factors were also associated with attendance, but classification tree analysis identified language to be the most highly associated variable. Conclusions In an urban safety net healthcare population, among patients with established healthcare access and a scheduled gastroenterology clinic appointment, not speaking English was most strongly associated with higher attendance rates. Patient related factors associated with not speaking English likely influence subspecialty clinic attendance rates, and these factors may differ from those affecting general healthcare access. PMID:19169147
NPSNET: Aural cues for virtual world immersion
NASA Astrophysics Data System (ADS)
Dahl, Leif A.
1992-09-01
NPSNET is a low-cost visual and aural simulation system designed and implemented at the Naval Postgraduate School. NPSNET is an example of a virtual world simulation environment that incorporates real-time aural cues through software-hardware interaction. In the current implementation of NPSNET, a graphics workstation functions in the sound server role which involves sending and receiving networked sound message packets across a Local Area Network, composed of multiple graphics workstations. The network messages contain sound file identification information that is transmitted from the sound server across an RS-422 protocol communication line to a serial to Musical Instrument Digital Interface (MIDI) converter. The MIDI converter, in turn relays the sound byte to a sampler, an electronic recording and playback device. The sampler correlates the hexadecimal input to a specific note or stored sound and sends it as an audio signal to speakers via an amplifier. The realism of a simulation is improved by involving multiple participant senses and removing external distractions. This thesis describes the incorporation of sound as aural cues, and the enhancement they provide in the virtual simulation environment of NPSNET.
The Research Triangle Park Speakers Bureau page is a free resource that schools, universities, and community groups in the Raleigh-Durham-Chapel Hill, N.C. area can use to request speakers and find educational resources.
ERIC Educational Resources Information Center
Mitchell, Peter; Robinson, Elizabeth J.; Thompson, Doreen E.
1999-01-01
Three experiments examined 3- to 6-year olds' ability to use a speaker's utterance based on false belief to identify which of several referents was intended. Found that many 4- to 5-year olds performed correctly only when it was unnecessary to consider the speaker's belief. When the speaker gave an ambiguous utterance, many 3- to 6-year olds…
Speaker Introductions at Internal Medicine Grand Rounds: Forms of Address Reveal Gender Bias.
Files, Julia A; Mayer, Anita P; Ko, Marcia G; Friedrich, Patricia; Jenkins, Marjorie; Bryan, Michael J; Vegunta, Suneela; Wittich, Christopher M; Lyle, Melissa A; Melikian, Ryan; Duston, Trevor; Chang, Yu-Hui H; Hayes, Sharonne N
2017-05-01
Gender bias has been identified as one of the drivers of gender disparity in academic medicine. Bias may be reinforced by gender subordinating language or differential use of formality in forms of address. Professional titles may influence the perceived expertise and authority of the referenced individual. The objective of this study is to examine how professional titles were used in the same and mixed-gender speaker introductions at Internal Medicine Grand Rounds (IMGR). A retrospective observational study of video-archived speaker introductions at consecutive IMGR was conducted at two different locations (Arizona, Minnesota) of an academic medical center. Introducers and speakers at IMGR were physician and scientist peers holding MD, PhD, or MD/PhD degrees. The primary outcome was whether or not a speaker's professional title was used during the first form of address during speaker introductions at IMGR. As secondary outcomes, we evaluated whether or not the speakers professional title was used in any form of address during the introduction. Three hundred twenty-one forms of address were analyzed. Female introducers were more likely to use professional titles when introducing any speaker during the first form of address compared with male introducers (96.2% [102/106] vs. 65.6% [141/215]; p < 0.001). Female dyads utilized formal titles during the first form of address 97.8% (45/46) compared with male dyads who utilized a formal title 72.4% (110/152) of the time (p = 0.007). In mixed-gender dyads, where the introducer was female and speaker male, formal titles were used 95.0% (57/60) of the time. Male introducers of female speakers utilized professional titles 49.2% (31/63) of the time (p < 0.001). In this study, women introduced by men at IMGR were less likely to be addressed by professional title than were men introduced by men. Differential formality in speaker introductions may amplify isolation, marginalization, and professional discomfiture expressed by women faculty in academic medicine.
NASA Astrophysics Data System (ADS)
Wong, Nicole W.
The palatal lateral is a rare sound in the world's languages; a review of the literature reveals just 23 languages that currently possess the palatal lateral. Similarly, only 15 (or 3.33%) of the languages in the UCLA Phonological Segment Inventory Database (UPSID) (Maddieson and Precoda, 1991) can claim to currently possess the palatal lateral. While UPSID reports that an additional five languages (Basque, Guarani, Iate, Spanish, Turkish) possess the palatal lateral, these languages have either lost the palatal lateral or were included erroneously. Understanding the production and perception of rare speech sounds is important for understanding the distribution of speech sounds cross-linguistically, especially with regards to the establishment of a single phonetic alphabet (i.e. the International Phonetic Alphabet (IPA)) that can be used to describe and transcribe the languages of the world (Ladefoged and Everett, 1996). An investigation of rare speech sounds can also reveal important findings regarding the physical limitations of the vocal tract and human auditory system. Given that the palatal lateral is a rare speech sound, a complete description of the articulation, acoustics, and perception of this sound does not currently exist. Accounts of the palatal lateral vary with regards to terminology; the palatal lateral has also been referred to as a so-called "phonemically" palatalized lateral (Zilyns'kyj, 1979), a laminal post-alveolar lateral (Ladefoged and Maddieson, 1996), and an alveolopalatal lateral (Recasens, 2013). Furthermore, current literature also does not distinguish between the palatal lateral and a palatalized lateral. The lack of agreement in literature regarding terminology can present problems when attempting to assess whether a palatal lateral in one language is similar to a palatal lateral in another language. This dissertation provides a comprehensive description of the palatal lateral, as a means of initiating cross-linguistic comparisons of the palatal lateral as well as understanding the difference between a palatal and palatalized lateral. A two-part study of the articulation and acoustics of the palatal lateral in Brazilian Portuguese (BP) was undertaken in this dissertation. Articulatory data was collected using electromagenetic articulography (EMA) from 10 female native speakers of BP from Sao Paulo state in Brazil, which permitted the simultaneous collection of acoustic information. Study 1 investigated the articulation of the palatal lateral through a battery of measures and compares the palatal lateral against the palatalized lateral approximant, alveolar lateral approximant, palatal approximant, palatal nasal, palatalized nasal, and alveolar nasal. Study 2 analyzes the acoustics of the palatal lateral in comparison to the palatalized lateral approximant, alveolar lateral approximant, and palatal approximant. A third study was included in the appendix. This study incorporates a phone identification task to understand the role of acoustic saliency in the rareness of the palatal lateral, i.e. compared to other palatal sounds, is the palatal lateral more likely to be misidentified and if so, as which sounds? This task also investigates whether there is a perceived difference between the palatal and palatalized lateral that may not be captured by Study 1 and 2, in addition to whether native speakers of BP are better at distinguishing the two sounds than non-native speakers (here, native speakers of American English). The palatal lateral was compared to the palatalized lateral, palatal approximant, alveolar lateral approximant, palatal nasal, palatalized nasal, alveolar nasal, voiced alveolar stop, and voiced palatalized alveolar stop. 25 (11 male, 14 female) natives speakers of BP and 20 (11 male, 9 female) native speakers of American English with no extensive exposure to BP participated in this study. Results from Study 1 show that the palatal lateral is articulated laminally with a high front tongue body and concave anterior tongue shape that gradually becomes straighter as the phone progresses. Acoustic results in Study 2 indicate a median F1, F2, and F3 of 367 Hz, 1954 Hz, and 3035 Hz respectively for female speakers of BP. Statistical analysis reveals little or no evidence of significant difference between the palatal lateral and palatalized lateral with regards to the shape of the tongue body, duration of the phone, or formant frequencies. The perception study included in the appendix finds that while both native and non-native speakers of BP distinguish between the palatal lateral and palatalized lateral at chance level, native speakers of BP perform better than the non-native speakers at correctly identifying the palatal and palatalized nasal. This study also finds that of all the sounds included in this task, the palatal and palatalized lateral are the most likely to be misidentified as the palatal approximant for both participant groups, with the addition of -3 dB of speech-shaped noise greatly increasing the rate of confusion. However, the palatalized lateral is inaccurately identified as a palatal approximant at a confusion rate nearly double or more than the palatal lateral. This dissertation reveals that the palatal and palatalized lateral are essentially the same sound in BP. Furthermore, there is no evidence that indicates that the palatal or palatalized lateral are composed of two separate phones, i.e. an alveolar lateral approximant followed by a palatal approximant. Findings from the perception study support the proposal that yeismo (i.e. the merger of the palatal lateral in favor of the palatal approximant (Colantoni, 2001; Hualde et al., 2005)) occurs because lateral sounds are less robust against added noise than nasal sounds. I argue here that this contributes directly to the rareness of the palatal lateral.
20 CFR 631.41 - Allowable State activities.
Code of Federal Regulations, 2010 CFR
2010-04-01
... unemployment compensation system, in accordance with § 631.37(b) of this part; (5) Discretionary allocation for... than basic and remedial education, literacy and English for non-English speakers training, retraining...
Perceptual and Acoustic Reliability Estimates for the Speech Disorders Classification System (SDCS)
ERIC Educational Resources Information Center
Shriberg, Lawrence D.; Fourakis, Marios; Hall, Sheryl D.; Karlsson, Heather B.; Lohmeier, Heather L.; McSweeny, Jane L.; Potter, Nancy L.; Scheer-Cohen, Alison R.; Strand, Edythe A.; Tilkens, Christie M.; Wilson, David L.
2010-01-01
A companion paper describes three extensions to a classification system for paediatric speech sound disorders termed the Speech Disorders Classification System (SDCS). The SDCS uses perceptual and acoustic data reduction methods to obtain information on a speaker's speech, prosody, and voice. The present paper provides reliability estimates for…
Koenig, Melissa A; Echols, Catharine H
2003-04-01
The four studies reported here examine whether 16-month-old infants' responses to true and false utterances interact with their knowledge of human agents. In Study 1, infants heard repeated instances either of true or false labeling of common objects; labels came from an active human speaker seated next to the infant. In Study 2, infants experienced the same stimuli and procedure; however, we replaced the human speaker of Study 1 with an audio speaker in the same location. In Study 3, labels came from a hidden audio speaker. In Study 4, a human speaker labeled the objects while facing away from them. In Study 1, infants looked significantly longer to the human agent when she falsely labeled than when she truthfully labeled the objects. Infants did not show a similar pattern of attention for the audio speaker of Study 2, the silent human of Study 3 or the facing-backward speaker of Study 4. In fact, infants who experienced truthful labeling looked significantly longer to the facing-backward labeler of Study 4 than to true labelers of the other three contexts. Additionally, infants were more likely to correct false labels when produced by the human labeler of Study 1 than in any of the other contexts. These findings suggest, first, that infants are developing a critical conception of other human speakers as truthful communicators, and second, that infants understand that human speakers may provide uniquely useful information when a word fails to match its referent. These findings are consistent with the view that infants can recognize differences in knowledge and that such differences can be based on differences in the availability of perceptual experience.
Wong, Raymond
2013-01-01
Voice biometrics is one kind of physiological characteristics whose voice is different for each individual person. Due to this uniqueness, voice classification has found useful applications in classifying speakers' gender, mother tongue or ethnicity (accent), emotion states, identity verification, verbal command control, and so forth. In this paper, we adopt a new preprocessing method named Statistical Feature Extraction (SFX) for extracting important features in training a classification model, based on piecewise transformation treating an audio waveform as a time-series. Using SFX we can faithfully remodel statistical characteristics of the time-series; together with spectral analysis, a substantial amount of features are extracted in combination. An ensemble is utilized in selecting only the influential features to be used in classification model induction. We focus on the comparison of effects of various popular data mining algorithms on multiple datasets. Our experiment consists of classification tests over four typical categories of human voice data, namely, Female and Male, Emotional Speech, Speaker Identification, and Language Recognition. The experiments yield encouraging results supporting the fact that heuristically choosing significant features from both time and frequency domains indeed produces better performance in voice classification than traditional signal processing techniques alone, like wavelets and LPC-to-CC. PMID:24288684
An acoustic comparison of two women's infant- and adult-directed speech
NASA Astrophysics Data System (ADS)
Andruski, Jean; Katz-Gershon, Shiri
2003-04-01
In addition to having prosodic characteristics that are attractive to infant listeners, infant-directed (ID) speech shares certain characteristics of adult-directed (AD) clear speech, such as increased acoustic distance between vowels, that might be expected to make ID speech easier for adults to perceive in noise than AD conversational speech. However, perceptual tests of two women's ID productions by Andruski and Bessega [J. Acoust. Soc. Am. 112, 2355] showed that is not always the case. In a word identification task that compared ID speech with AD clear and conversational speech, one speaker's ID productions were less well-identified than AD clear speech, but better identified than AD conversational speech. For the second woman, ID speech was the least accurately identified of the three speech registers. For both speakers, hard words (infrequent words with many lexical neighbors) were also at an increased disadvantage relative to easy words (frequent words with few lexical neighbors) in speech registers that were less accurately perceived. This study will compare several acoustic properties of these women's productions, including pitch and formant-frequency characteristics. Results of the acoustic analyses will be examined with the original perceptual results to suggest reasons for differences in listener's accuracy in identifying these two women's ID speech in noise.
Discourse intonation and second language acquisition: Three genre-based studies
NASA Astrophysics Data System (ADS)
Wennerstrom, Ann Kristin
1997-12-01
This dissertation investigates intonation in the discourse of nonnative speakers of English. It is proposed that intonation functions as a grammar of cohesion, contributing to the coherence of the text. Based on a componential model of intonation adapted from Pierrehumbert and Hirshberg (1990), three empirical studies were conducted in different genres of spoken discourse: academic lectures, conversations, and oral narratives. Using computerized speech technology, excerpts of taped discourse were measured to determine how intonation associated with various constituents of text. All speakers were tested for overall English level on tests adapted from the SPEAK Test (ETS, 1985). Comparisons using native speaker data were also conducted. The first study investigated intonation in lectures given by Chinese teaching assistants. Multivariate analyses showed that intonation was a significant factor contributing to better scores on an exam of overall comprehensibility in English. The second study investigated the role of intonation in the turn-taking system in conversations between native and nonnative speakers of English. The final study considered emotional aspects of intonation in narratives, using the framework of Labov and Waletsky (1967). In sum, adult nonnative speakers can acquire intonation as part of their overall language development, although there is evidence against any specific order of acquisition. Intonation contributes to coherence by indicating the relationship between the current utterance and what is assumed to already be in participants' mental representations of the discourse. It also performs a segmentation function, denoting hierarchical relationships among utterances and/or turns. It is suggested that while pitch can be a resource in cross-cultural communication to show emotion and attitude, the grammatical aspects of intonation must be acquired gradually.
NASA Astrophysics Data System (ADS)
2014-03-01
The second International Conference on Mathematical Modeling in Physical Sciences (IC-MSQUARE) took place at Prague, Czech Republic, from Sunday 1 September to Thursday 5 September 2013. The Conference was attended by more than 280 participants and hosted about 400 oral, poster, and virtual presentations while counted more than 600 pre-registered authors. The second IC-MSQUARE consisted of different and diverging workshops and thus covered various research fields where Mathematical Modeling is used, such as Theoretical/Mathematical Physics, Neutrino Physics, Non-Integrable Systems, Dynamical Systems, Computational Nanoscience, Biological Physics, Computational Biomechanics, Complex Networks, Stochastic Modeling, Fractional Statistics, DNA Dynamics, Macroeconomics. The scientific program was rather heavy since after the Keynote and Invited Talks in the morning, three parallel sessions were running every day. However, according to all attendees, the program was excellent with high level of talks and the scientific environment was fruitful, thus all attendees had a creative time. We would like to thank the Keynote Speaker and the Invited Speakers for their significant contribution to IC-MSQUARE. We also would like to thank the Members of the International Advisory and Scientific Committees as well as the Members of the Organizing Committee. Further information on the editors, speakers and committees is available in the attached pdf.
Evaluating language environment analysis system performance for Chinese: a pilot study in Shanghai.
Gilkerson, Jill; Zhang, Yiwen; Xu, Dongxin; Richards, Jeffrey A; Xu, Xiaojuan; Jiang, Fan; Harnsberger, James; Topping, Keith
2015-04-01
The purpose of this study was to evaluate performance of the Language Environment Analysis (LENA) automated language-analysis system for the Chinese Shanghai dialect and Mandarin (SDM) languages. Volunteer parents of 22 children aged 3-23 months were recruited in Shanghai. Families provided daylong in-home audio recordings using LENA. A native speaker listened to 15 min of randomly selected audio samples per family to label speaker regions and provide Chinese character and SDM word counts for adult speakers. LENA segment labeling and counts were compared with rater-based values. LENA demonstrated good sensitivity in identifying adult and child; this sensitivity was comparable to that of American English validation samples. Precision was strong for adults but less so for children. LENA adult word count correlated strongly with both Chinese characters and SDM word counts. LENA conversational turn counts correlated similarly with rater-based counts after the exclusion of three unusual samples. Performance related to some degree to child age. LENA adult word count and conversational turn provided reasonably accurate estimates for SDM over the age range tested. Theoretical and practical considerations regarding LENA performance in non-English languages are discussed. Despite the pilot nature and other limitations of the study, results are promising for broader cross-linguistic applications.
[EC5-Space Suit Assembly Team- Internship
NASA Technical Reports Server (NTRS)
Maicke, Andrew
2016-01-01
There were three main projects in this internship. The first pertained to the Bearing Dust Cycle Test, in particular automating the test to allow for easier administration. The second concerned modifying the communication system setup in the Z2 suit, where speakers and mics were adjusted to allow for more space in the helmet. And finally, the last project concerned the tensile strength testing of fabrics deemed as candidates for space suit materials and desired to be sent off for radiation testing. The major duties here are split up between the major projects detailed above. For the Bearing Dust Cycle Test, the first objective was to find a way to automate administration of the test, as the previous version was long and tedious to perform. In order to do this, it was necessary to introduce additional electronics and perform programming to control the automation. Once this was done, it would be necessary to update documents concerning the test setup, procedure, and potential hazards. Finally, I was tasked with running tests using the new system to confirm system performance. For the Z2 communication system modifications, it was necessary to investigate alternative speakers and microphones which may have better performance than those currently used in the suit. Further, new speaker and microphone positions needed to be identified to keep them out of the way of the suit user. Once this was done, appropriate hardware (such as speaker or microphone cases and holders) could be prototyped and fabricated. For the suit material strength testing, the first task was to gather and document various test fabrics to identify the best suit material candidates. Then, it was needed to prepare samples for testing to establish baseline measurements and specify a testing procedure. Once the data was fully collected, additional test samples would be prepared and sent off-site to undergo irradiation before being tested again to observe changes in strength performance. For the Bearing Dust Cycle Test, automation was achieved through use of a servo motor and code written in LabVIEW. With this a small electrical servo controller was constructed and added to the system. For the Z2 communication modifications speaker cases were developed and printed, and new speakers and mics were selected. This allowed us to move the speakers and mics to locations to remain out of the suit users way. For the suit material strength testing, five material candidates were identified and test samples were created. These samples underwent testing, and baseline test results were gathered, though these results are currently being investigated for accuracy. The main process efficiency developed during the course of this internship comes from automation of the Bearing Dust Cycle Test. In particular, many hours of human involvement and precise operation are replaced with a simple motor setup. Thus it is no longer required to man the test, saving valuable employee time. This internship has confirmed a few things for me, namely that I both want to work as an engineer for an aerospace firm and that in particular I want to work for the Johnson Space Center. I am also confirmed in my desire to work with electronics, though I was surprised to enjoy prototyping 3D CAD design as much as I did. Therefore, I will make an effort to build my skills in this area so that I can continue to design mechanical models. In fact, I found the process of hands-on prototyping to be perhaps the most fun aspect of my time working here. This internship has also furthered my excitement for continual education, and I will hopefully be pursuing a masters in my field in the near future.
. Northern Command Speakers Program The U.S. Northern Command Speaker's Program works to increase face-to -face contact with our public to help build and sustain public understanding of our command missions and
Speakers of Different Languages Process the Visual World Differently
Chabal, Sarah; Marian, Viorica
2015-01-01
Language and vision are highly interactive. Here we show that people activate language when they perceive the visual world, and that this language information impacts how speakers of different languages focus their attention. For example, when searching for an item (e.g., clock) in the same visual display, English and Spanish speakers look at different objects. Whereas English speakers searching for the clock also look at a cloud, Spanish speakers searching for the clock also look at a gift, because the Spanish names for gift (regalo) and clock (reloj) overlap phonologically. These different looking patterns emerge despite an absence of direct linguistic input, showing that language is automatically activated by visual scene processing. We conclude that the varying linguistic information available to speakers of different languages affects visual perception, leading to differences in how the visual world is processed. PMID:26030171
Learning foreign labels from a foreign speaker: the role of (limited) exposure to a second language.
Akhtar, Nameera; Menjivar, Jennifer; Hoicka, Elena; Sabbagh, Mark A
2012-11-01
Three- and four-year-olds (N = 144) were introduced to novel labels by an English speaker and a foreign speaker (of Nordish, a made-up language), and were asked to endorse one of the speaker's labels. Monolingual English-speaking children were compared to bilingual children and English-speaking children who were regularly exposed to a language other than English. All children tended to endorse the English speaker's labels when asked 'What do you call this?', but when asked 'What do you call this in Nordish?', children with exposure to a second language were more likely to endorse the foreign label than monolingual and bilingual children. The findings suggest that, at this age, exposure to, but not necessarily immersion in, more than one language may promote the ability to learn foreign words from a foreign speaker.
Content-specific coordination of listeners' to speakers' EEG during communication.
Kuhlen, Anna K; Allefeld, Carsten; Haynes, John-Dylan
2012-01-01
Cognitive neuroscience has recently begun to extend its focus from the isolated individual mind to two or more individuals coordinating with each other. In this study we uncover a coordination of neural activity between the ongoing electroencephalogram (EEG) of two people-a person speaking and a person listening. The EEG of one set of twelve participants ("speakers") was recorded while they were narrating short stories. The EEG of another set of twelve participants ("listeners") was recorded while watching audiovisual recordings of these stories. Specifically, listeners watched the superimposed videos of two speakers simultaneously and were instructed to attend either to one or the other speaker. This allowed us to isolate neural coordination due to processing the communicated content from the effects of sensory input. We find several neural signatures of communication: First, the EEG is more similar among listeners attending to the same speaker than among listeners attending to different speakers, indicating that listeners' EEG reflects content-specific information. Secondly, listeners' EEG activity correlates with the attended speakers' EEG, peaking at a time delay of about 12.5 s. This correlation takes place not only between homologous, but also between non-homologous brain areas in speakers and listeners. A semantic analysis of the stories suggests that listeners coordinate with speakers at the level of complex semantic representations, so-called "situation models". With this study we link a coordination of neural activity between individuals directly to verbally communicated information.
Choi, Yaelin
2017-01-01
Purpose The present study aimed to compare acoustic models of speech intelligibility in individuals with the same disease (Parkinson's disease [PD]) and presumably similar underlying neuropathologies but with different native languages (American English [AE] and Korean). Method A total of 48 speakers from the 4 speaker groups (AE speakers with PD, Korean speakers with PD, healthy English speakers, and healthy Korean speakers) were asked to read a paragraph in their native languages. Four acoustic variables were analyzed: acoustic vowel space, voice onset time contrast scores, normalized pairwise variability index, and articulation rate. Speech intelligibility scores were obtained from scaled estimates of sentences extracted from the paragraph. Results The findings indicated that the multiple regression models of speech intelligibility were different in Korean and AE, even with the same set of predictor variables and with speakers matched on speech intelligibility across languages. Analysis of the descriptive data for the acoustic variables showed the expected compression of the vowel space in speakers with PD in both languages, lower normalized pairwise variability index scores in Korean compared with AE, and no differences within or across language in articulation rate. Conclusions The results indicate that the basis of an intelligibility deficit in dysarthria is likely to depend on the native language of the speaker and listener. Additional research is required to explore other potential predictor variables, as well as additional language comparisons to pursue cross-linguistic considerations in classification and diagnosis of dysarthria types. PMID:28821018
Speaker Segmentation and Clustering Using Gender Information
2006-02-01
used in the first stages of segmentation forder information in the clustering of the opposite-gender speaker diarization of news broadcasts. files, the...AFRL-HE-WP-TP-2006-0026 AIR FORCE RESEARCH LABORATORY Speaker Segmentation and Clustering Using Gender Information Brian M. Ore General Dynamics...COVERED (From - To) February 2006 ProceedinLgs 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER Speaker Segmentation and Clustering Using Gender Information 5b
The 2016 NIST Speaker Recognition Evaluation
2017-08-20
The 2016 NIST Speaker Recognition Evaluation Seyed Omid Sadjadi1,∗, Timothée Kheyrkhah1,†, Audrey Tong1, Craig Greenberg1, Douglas Reynolds2, Elliot...recent in an ongoing series of speaker recognition evaluations (SRE) to foster research in ro- bust text-independent speaker recognition, as well as...online evaluation platform, a fixed training data condition, more variability in test segment duration (uni- formly distributed between 10s and 60s
Magnetic Fluids Deliver Better Speaker Sound Quality
NASA Technical Reports Server (NTRS)
2015-01-01
In the 1960s, Glenn Research Center developed a magnetized fluid to draw rocket fuel into spacecraft engines while in space. Sony has incorporated the technology into its line of slim speakers by using the fluid as a liquid stand-in for the speaker's dampers, which prevent the speaker from blowing out while adding stability. The fluid helps to deliver more volume and hi-fidelity sound while reducing distortion.
Special Observance Planning Guide
2015-11-01
Finding the right speaker for an event can be a challenge. Many speakers are recommended based on word-of-mouth or through a group connected to...An unprepared, rambling speaker or one who intentionally or unintentionally attacks a group or its members can be extremely damaging to a program...Don’t assume that an organizational senior leader is an adequate speaker based on position, rank, and/or affiliation with a reference group
ERIC Educational Resources Information Center
Bressmann, Tim; Flowers, Heather; Wong, Willy; Irish, Jonathan C.
2010-01-01
The goal of this study was to quantitatively describe aspects of coronal tongue movement in different anatomical regions of the tongue. Four normal speakers and a speaker with partial glossectomy read four repetitions of a metronome-paced poem. Their tongue movement was recorded in four coronal planes using two-dimensional B-mode ultrasound…
ERIC Educational Resources Information Center
McKain, Danielle R.
2012-01-01
The term real world is often used in mathematics education, yet the definition of real-world problems and how to incorporate them in the classroom remains ambiguous. One way real-world connections can be made is through guest speakers. Guest speakers can offer different perspectives and share knowledge about various subject areas, yet the impact…
When pitch Accents Encode Speaker Commitment: Evidence from French Intonation.
Michelas, Amandine; Portes, Cristel; Champagne-Lavau, Maud
2016-06-01
Recent studies on a variety of languages have shown that a speaker's commitment to the propositional content of his or her utterance can be encoded, among other strategies, by pitch accent types. Since prior research mainly relied on lexical-stress languages, our understanding of how speakers of a non-lexical-stress language encode speaker commitment is limited. This paper explores the contribution of the last pitch accent of an intonation phrase to convey speaker commitment in French, a language that has stress at the phrasal level as well as a restricted set of pitch accents. In a production experiment, participants had to produce sentences in two pragmatic contexts: unbiased questions (the speaker had no particular belief with respect to the expected answer) and negatively biased questions (the speaker believed the proposition to be false). Results revealed that negatively biased questions consistently exhibited an additional unaccented F0 peak in the preaccentual syllable (an H+!H* pitch accent) while unbiased questions were often realized with a rising pattern across the accented syllable (an H* pitch accent). These results provide evidence that pitch accent types in French can signal the speaker's belief about the certainty of the proposition expressed in French. It also has implications for the phonological model of French intonation.
Sociological effects on vocal aging: Age related F0 effects in two languages
NASA Astrophysics Data System (ADS)
Nagao, Kyoko
2005-04-01
Listeners can estimate the age of a speaker fairly accurately from their speech (Ptacek and Sander, 1966). It is generally considered that this perception is based on physiologically determined aspects of the speech. However, the degree to which it is due to conventional sociolinguistic aspects of speech is unknown. The current study examines the degree to which fundamental frequency (F0) changes due to advanced aging across two language groups of speakers. It also examines the degree to which the speakers associate these changes with aging in a voice disguising task. Thirty native speakers each of English and Japanese, taken from three age groups, read a target phrase embedded in a carrier sentence in their native language. Each speaker also read the sentence pretending to be 20-years younger or 20-years older than their own age. Preliminary analysis of eighteen Japanese speakers indicates that the mean and maximum F0 values increase when the speakers pretended to be younger than when they pretended to be older. Some previous studies on age perception, however, suggested that F0 has minor effects on listeners' age estimation. The acoustic results will also be discussed in conjunction with the results of the listeners' age estimation of the speakers.
Brener, Loren; Wilson, Hannah; Rose, Grenville; Mackenzie, Althea; de Wit, John
2013-01-01
Positive Speakers programs consist of people who are trained to speak publicly about their illness. The focus of these programs, especially with stigmatised illnesses such as hepatitis C (HCV), is to inform others of the speakers' experiences, thereby humanising the illness and reducing ignorance associated with the disease. This qualitative research aimed to understand the perceived impact of Positive Speakers programs on changing audience members' attitudes towards people with HCV. Interviews were conducted with nine Positive Speakers and 16 of their audience members to assess the way in which these sessions were perceived by both speakers and the audience to challenge stereotypes and stigma associated with HCV and promote positive attitude change amongst the audience. Data were analysed using Intergroup Contact Theory to frame the analysis with a focus on whether the program met the optimal conditions to promote attitude change. Findings suggest that there are a number of vital components to this Positive Speakers program which ensures that the program meets the requirements for successful and equitable intergroup contact. This Positive Speakers program thereby helps to deconstruct stereotypes about people with HCV, while simultaneously increasing positive attitudes among audience members with the ultimate aim of improving quality of health care and treatment for people with HCV.
Maass, Anne; Paladino, Maria Paola; Vespignani, Francesco; Eyssel, Friederike; Bentler, Dominik
2015-01-01
Empirical research had initially shown that English listeners are able to identify the speakers' sexual orientation based on voice cues alone. However, the accuracy of this voice-based categorization, as well as its generalizability to other languages (language-dependency) and to non-native speakers (language-specificity), has been questioned recently. Consequently, we address these open issues in 5 experiments: First, we tested whether Italian and German listeners are able to correctly identify sexual orientation of same-language male speakers. Then, participants of both nationalities listened to voice samples and rated the sexual orientation of both Italian and German male speakers. We found that listeners were unable to identify the speakers' sexual orientation correctly. However, speakers were consistently categorized as either heterosexual or gay on the basis of how they sounded. Moreover, a similar pattern of results emerged when listeners judged the sexual orientation of speakers of their own and of the foreign language. Overall, this research suggests that voice-based categorization of sexual orientation reflects the listeners' expectations of how gay voices sound rather than being an accurate detector of the speakers' actual sexual identity. Results are discussed with regard to accuracy, acoustic features of voices, language dependency and language specificity. PMID:26132820
NASA Technical Reports Server (NTRS)
Costanza, Bryan T.; Horne, William C.; Schery, S. D.; Babb, Alex T.
2011-01-01
The Aero-Physics Branch at NASA Ames Research Center utilizes a 32- by 48-inch subsonic wind tunnel for aerodynamics research. The feasibility of acquiring acoustic measurements with a phased microphone array was recently explored. Acoustic characterization of the wind tunnel was carried out with a floor-mounted 24-element array and two ceiling-mounted speakers. The minimum speaker level for accurate level measurement was evaluated for various tunnel speeds up to a Mach number of 0.15 and streamwise speaker locations. A variety of post-processing procedures, including conventional beamforming and deconvolutional processing such as TIDY, were used. The speaker measurements, with and without flow, were used to compare actual versus simulated in-flow speaker calibrations. Data for wind-off speaker sound and wind-on tunnel background noise were found valuable for predicting sound levels for which the speakers were detectable when the wind was on. Speaker sources were detectable 2 - 10 dB below the peak background noise level with conventional data processing. The effectiveness of background noise cross-spectral matrix subtraction was assessed and found to improve the detectability of test sound sources by approximately 10 dB over a wide frequency range.
Engaging spaces: Intimate electro-acoustic display in alternative performance venues
NASA Astrophysics Data System (ADS)
Bahn, Curtis; Moore, Stephan
2004-05-01
In past presentations to the ASA, we have described the design and construction of four generations of unique spherical speakers (multichannel, outward-radiating geodesic speaker arrays) and Sensor-Speaker-Arrays, (SenSAs: combinations of various sensor devices with outward-radiating multichannel speaker arrays). This presentation will detail the ways in which arrays of these speakers have been employed in alternative performance venues-providing presence and intimacy in the performance of electro-acoustic chamber music and sound installation, while engaging natural and unique acoustical qualities of various locations. We will present documentation of the use of multichannel sonic diffusion arrays in small clubs, ``black-box'' theaters, planetariums, and art galleries.
A Limited-Vocabulary, Multi-Speaker Automatic Isolated Word Recognition System.
ERIC Educational Resources Information Center
Paul, James E., Jr.
Techniques for automatic recognition of isolated words are investigated, and a computer simulation of a word recognition system is effected. Considered in detail are data acquisition and digitizing, word detection, amplitude and time normalization, short-time spectral estimation including spectral windowing, spectral envelope approximation,…
Educational Systems (ES) in Mathematics and other related topics
NASA Astrophysics Data System (ADS)
Artemiadis, Nicolaos K.
2009-05-01
This work is specifically concerned with various Educational Systems that are used for the improvement of Education in Mathematics. It is based on the speaker's as well as of others' viewpoint which has arisen from discussions about this issue at various levels and on different occasions.
Detection thresholds for gaps, overlaps, and no-gap-no-overlaps.
Heldner, Mattias
2011-07-01
Detection thresholds for gaps and overlaps, that is acoustic and perceived silences and stretches of overlapping speech in speaker changes, were determined. Subliminal gaps and overlaps were categorized as no-gap-no-overlaps. The established gap and overlap detection thresholds both corresponded to the duration of a long vowel, or about 120 ms. These detection thresholds are valuable for mapping the perceptual speaker change categories gaps, overlaps, and no-gap-no-overlaps into the acoustic domain. Furthermore, the detection thresholds allow generation and understanding of gaps, overlaps, and no-gap-no-overlaps in human-like spoken dialogue systems. © 2011 Acoustical Society of America
Speaker emotion recognition: from classical classifiers to deep neural networks
NASA Astrophysics Data System (ADS)
Mezghani, Eya; Charfeddine, Maha; Nicolas, Henri; Ben Amar, Chokri
2018-04-01
Speaker emotion recognition is considered among the most challenging tasks in recent years. In fact, automatic systems for security, medicine or education can be improved when considering the speech affective state. In this paper, a twofold approach for speech emotion classification is proposed. At the first side, a relevant set of features is adopted, and then at the second one, numerous supervised training techniques, involving classic methods as well as deep learning, are experimented. Experimental results indicate that deep architecture can improve classification performance on two affective databases, the Berlin Dataset of Emotional Speech and the SAVEE Dataset Surrey Audio-Visual Expressed Emotion.
Intonation and gender perception: applications for transgender speakers.
Hancock, Adrienne; Colton, Lindsey; Douglas, Fiacre
2014-03-01
Intonation is commonly addressed in voice and communication feminization therapy, yet empirical evidence of gender differences for intonation is scarce and rarely do studies examine how it relates to gender perception of transgender speakers. This study examined intonation of 12 males, 12 females, six female-to-male, and 14 male-to-female transgender speakers describing a Norman Rockwell image. Several intonation measures were compared between biological gender groups, between perceived gender groups, and between male-to-female (MTF) speakers who were perceived as male, female, or ambiguous gender. Speakers with a larger percentage of utterances with upward intonation and a larger utterance semitone range were perceived as female by listeners, despite no significant differences between the actual intonation of the four gender groups. MTF speakers who do not pass as female appear to use less upward and more downward intonations than female and passing MTF speakers. Intonation has potential for use in transgender communication therapy because it can influence perception to some degree. Copyright © 2014 The Voice Foundation. Published by Mosby, Inc. All rights reserved.
What a speaker's choice of frame reveals: reference points, frame selection, and framing effects.
McKenzie, Craig R M; Nelson, Jonathan D
2003-09-01
Framing effects are well established: Listeners' preferences depend on how outcomes are described to them, or framed. Less well understood is what determines how speakers choose frames. Two experiments revealed that reference points systematically influenced speakers' choices between logically equivalent frames. For example, speakers tended to describe a 4-ounce cup filled to the 2-ounce line as half full if it was previously empty but described it as half empty if it was previously full. Similar results were found when speakers could describe the outcome of a medical treatment in terms of either mortality or survival (e.g., 25% die vs. 75% survive). Two additional experiments showed that listeners made accurate inferences about speakers' reference points on the basis of the selected frame (e.g., if a speaker described a cup as half empty, listeners inferred that the cup used to be full). Taken together, the data suggest that frames reliably convey implicit information in addition to their explicit content, which helps explain why framing effects are so robust.
Kamimura, Akiko; Ashby, Jeanie; Tabler, Jennifer; Nourian, Maziar M; Trinh, Ha Ngoc; Chen, Jason; Reel, Justine J
2017-01-01
The abuse of substances is a significant public health issue. Perceived stress and depression have been found to be related to the abuse of substances. The purpose of this study is to examine the prevalence of substance use (i.e., alcohol problems, smoking, and drug use) and the association between substance use, perceived stress, and depression among free clinic patients. Patients completed a self-administered survey in 2015 (N = 504). The overall prevalence of substance use among free clinic patients was not high compared to the U.S. general population. U.S.-born English speakers reported a higher prevalence rate of tobacco smoking and drug use than did non-U.S.-born English speakers and Spanish speakers. Alcohol problems and smoking were significantly related to higher levels of perceived stress and depression. Substance use prevention and education should be included in general health education programs. U.S.-born English speakers would need additional attention. Mental health intervention would be essential to prevention and intervention.
Modeling methods of MEMS micro-speaker with electrostatic working principle
NASA Astrophysics Data System (ADS)
Tumpold, D.; Kaltenbacher, M.; Glacer, C.; Nawaz, M.; Dehé, A.
2013-05-01
The market for mobile devices like tablets, laptops or mobile phones is increasing rapidly. Device housings get thinner and energy efficiency is more and more important. Micro-Electro-Mechanical-System (MEMS) loudspeakers, fabricated in complementary metal oxide semiconductor (CMOS) compatible technology merge energy efficient driving technology with cost economical fabrication processes. In most cases, the fabrication of such devices within the design process is a lengthy and costly task. Therefore, the need for computer modeling tools capable of precisely simulating the multi-field interactions is increasing. The accurate modeling of such MEMS devices results in a system of coupled partial differential equations (PDEs) describing the interaction between the electric, mechanical and acoustic field. For the efficient and accurate solution we apply the Finite Element (FE) method. Thereby, we fully take the nonlinear effects into account: electrostatic force, charged moving body (loaded membrane) in an electric field, geometric nonlinearities and mechanical contact during the snap-in case between loaded membrane and stator. To efficiently handle the coupling between the mechanical and acoustic fields, we apply Mortar FE techniques, which allow different grid sizes along the coupling interface. Furthermore, we present a recently developed PML (Perfectly Matched Layer) technique, which allows limiting the acoustic computational domain even in the near field without getting spurious reflections. For computations towards the acoustic far field we us a Kirchhoff Helmholtz integral (e.g, to compute the directivity pattern). We will present simulations of a MEMS speaker system based on a single sided driving mechanism as well as an outlook on MEMS speakers using double stator systems (pull-pull-system), and discuss their efficiency (SPL) and quality (THD) towards the generated acoustic sound.
Beyond the language given: the neural correlates of inferring speaker meaning.
Bašnáková, Jana; Weber, Kirsten; Petersson, Karl Magnus; van Berkum, Jos; Hagoort, Peter
2014-10-01
Even though language allows us to say exactly what we mean, we often use language to say things indirectly, in a way that depends on the specific communicative context. For example, we can use an apparently straightforward sentence like "It is hard to give a good presentation" to convey deeper meanings, like "Your talk was a mess!" One of the big puzzles in language science is how listeners work out what speakers really mean, which is a skill absolutely central to communication. However, most neuroimaging studies of language comprehension have focused on the arguably much simpler, context-independent process of understanding direct utterances. To examine the neural systems involved in getting at contextually constrained indirect meaning, we used functional magnetic resonance imaging as people listened to indirect replies in spoken dialog. Relative to direct control utterances, indirect replies engaged dorsomedial prefrontal cortex, right temporo-parietal junction and insula, as well as bilateral inferior frontal gyrus and right medial temporal gyrus. This suggests that listeners take the speaker's perspective on both cognitive (theory of mind) and affective (empathy-like) levels. In line with classic pragmatic theories, our results also indicate that currently popular "simulationist" accounts of language comprehension fail to explain how listeners understand the speaker's intended message. © The Author 2013. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Noiray, Aude; Cathiard, Marie-Agnès; Ménard, Lucie; Abry, Christian
2011-01-01
The modeling of anticipatory coarticulation has been the subject of longstanding debates for more than 40 yr. Empirical investigations in the articulatory domain have converged toward two extreme modeling approaches: a maximal anticipation behavior (Look-ahead model) or a fixed pattern (Time-locked model). However, empirical support for any of these models has been hardly conclusive, both within and across languages. The present study tested the temporal organization of vocalic anticipatory coarticulation of the rounding feature from [i] to [u] transitions for adult speakers of American English and Canadian French. Articulatory data were synchronously recorded using an Optotrak for lip protrusion and a dedicated Lip-Shape-Tracking-System for lip constriction. Results show that (i) protrusion is an inconsistent parameter for tracking anticipatory rounding gestures across individuals, more specifically in English; (ii) labial constriction (between-lip area) is a more reliable correlate, allowing for the description of vocalic rounding in both languages; (iii) when tested on the constriction component, speakers show a lawful anticipatory behavior expanding linearly as the intervocalic consonant interval increases from 0 to 5 consonants. The Movement Expansion Model from Abry and Lallouache [(1995a) Bul. de la Comm. Parlée 3, 85–99; (1995b) Proceedings of ICPHS4, 152–155.] predicted such a regular behavior, i.e., a lawful variabilitywith a speaker-specific expansion rate, which is not language-specific. PMID:21303015
Noiray, Aude; Cathiard, Marie-Agnès; Ménard, Lucie; Abry, Christian
2011-01-01
The modeling of anticipatory coarticulation has been the subject of longstanding debates for more than 40 yr. Empirical investigations in the articulatory domain have converged toward two extreme modeling approaches: a maximal anticipation behavior (Look-ahead model) or a fixed pattern (Time-locked model). However, empirical support for any of these models has been hardly conclusive, both within and across languages. The present study tested the temporal organization of vocalic anticipatory coarticulation of the rounding feature from [i] to [u] transitions for adult speakers of American English and Canadian French. Articulatory data were synchronously recorded using an Optotrak for lip protrusion and a dedicated Lip-Shape-Tracking-System for lip constriction. Results show that (i) protrusion is an inconsistent parameter for tracking anticipatory rounding gestures across individuals, more specifically in English; (ii) labial constriction (between-lip area) is a more reliable correlate, allowing for the description of vocalic rounding in both languages; (iii) when tested on the constriction component, speakers show a lawful anticipatory behavior expanding linearly as the intervocalic consonant interval increases from 0 to 5 consonants. The Movement Expansion Model from Abry and Lallouache [(1995a) Bul. de la Comm. Parlée 3, 85-99; (1995b) Proceedings of ICPHS 4, 152-155.] predicted such a regular behavior, i.e., a lawful variability with a speaker-specific expansion rate, which is not language-specific.
Visual input enhances selective speech envelope tracking in auditory cortex at a "cocktail party".
Zion Golumbic, Elana; Cogan, Gregory B; Schroeder, Charles E; Poeppel, David
2013-01-23
Our ability to selectively attend to one auditory signal amid competing input streams, epitomized by the "Cocktail Party" problem, continues to stimulate research from various approaches. How this demanding perceptual feat is achieved from a neural systems perspective remains unclear and controversial. It is well established that neural responses to attended stimuli are enhanced compared with responses to ignored ones, but responses to ignored stimuli are nonetheless highly significant, leading to interference in performance. We investigated whether congruent visual input of an attended speaker enhances cortical selectivity in auditory cortex, leading to diminished representation of ignored stimuli. We recorded magnetoencephalographic signals from human participants as they attended to segments of natural continuous speech. Using two complementary methods of quantifying the neural response to speech, we found that viewing a speaker's face enhances the capacity of auditory cortex to track the temporal speech envelope of that speaker. This mechanism was most effective in a Cocktail Party setting, promoting preferential tracking of the attended speaker, whereas without visual input no significant attentional modulation was observed. These neurophysiological results underscore the importance of visual input in resolving perceptual ambiguity in a noisy environment. Since visual cues in speech precede the associated auditory signals, they likely serve a predictive role in facilitating auditory processing of speech, perhaps by directing attentional resources to appropriate points in time when to-be-attended acoustic input is expected to arrive.
The Future of Transportation: Safety, Opportunity, and Innovation
DOT National Transportation Integrated Search
2016-12-30
This report summarizes key findings from the Future of Transportation: Safety, Opportunity, and Innovation thought leadership speaker series held at Volpe, The National Transportation Systems Center, during the summer and fall of 2016.
NASA Technical Reports Server (NTRS)
Meeson, Blanche W.; Gabrys, Robert; Ireton, M. Frank; Einaudi, Franco (Technical Monitor)
2001-01-01
Education projects supported by federal agencies and carried out by a wide range of organizations foster learning about Earth and Space systems science in a wide array of venues. Across these agencies a range of strategies are employed to ensure that effective materials are created for these diverse venues. And that these materials are deployed broadly so that a large spectrum of the American Public, both adults and children alike, can learn and become excited by the Earth and space system science. This session will highlight some of those strategies and will cover representative examples to illustrate the effectiveness of the strategies. Invited speakers from selected formal and informal educational efforts will anchor this session. Speakers with representative examples are encouraged to submit abstracts for the session to showcase the strategies which they use.
ERIC Educational Resources Information Center
Köroglu, Zehra; Tüm, Gülden
2017-01-01
This study has been conducted to evaluate the TM usage in the MA theses written by the native speakers (NSs) of English and the Turkish speakers (TSs) of English. The purpose is to compare the TM usage in the introduction, results and discussion, and conclusion sections by both groups' randomly selected MA theses in the field of ELT between the…
Improving the Effectiveness of Speaker Verification Domain Adaptation With Inadequate In-Domain Data
2017-08-20
Improving the Effectiveness of Speaker Verification Domain Adaptation With Inadequate In-Domain Data Bengt J. Borgström1, Elliot Singer1, Douglas...ll.mit.edu.edu, dar@ll.mit.edu, es@ll.mit.edu, omid.sadjadi@nist.gov Abstract This paper addresses speaker verification domain adaptation with...contain speakers with low channel diversity. Existing domain adaptation methods are reviewed, and their shortcomings are discussed. We derive an
Mortality inequality in two native population groups.
Saarela, Jan; Finnäs, Fjalar
2005-11-01
A sample of people aged 40-67 years, taken from a longitudinal register compiled by Statistics Finland, is used to analyse mortality differences between Swedish speakers and Finnish speakers in Finland. Finnish speakers are known to have higher death rates than Swedish speakers. The purpose is to explore whether labour-market experience and partnership status, treated as proxies for measures of variation in health-related characteristics, are related to the mortality differential. Persons who are single, disability pensioners, and those having experienced unemployment are found to have substantially higher death rates than those with a partner and employed persons. Swedish speakers have a more favourable distribution on both variables, which thus notably helps to reduce the Finnish-Swedish mortality gradient. A conclusion from this study is that future analyses on the topic should focus on mechanisms that bring a greater proportion of Finnish speakers into the groups with poor health or supposed unhealthy behaviour.
How Psychological Stress Affects Emotional Prosody.
Paulmann, Silke; Furnes, Desire; Bøkenes, Anne Ming; Cozzolino, Philip J
2016-01-01
We explored how experimentally induced psychological stress affects the production and recognition of vocal emotions. In Study 1a, we demonstrate that sentences spoken by stressed speakers are judged by naïve listeners as sounding more stressed than sentences uttered by non-stressed speakers. In Study 1b, negative emotions produced by stressed speakers are generally less well recognized than the same emotions produced by non-stressed speakers. Multiple mediation analyses suggest this poorer recognition of negative stimuli was due to a mismatch between the variation of volume voiced by speakers and the range of volume expected by listeners. Together, this suggests that the stress level of the speaker affects judgments made by the receiver. In Study 2, we demonstrate that participants who were induced with a feeling of stress before carrying out an emotional prosody recognition task performed worse than non-stressed participants. Overall, findings suggest detrimental effects of induced stress on interpersonal sensitivity.
In the eye of the beholder: eye contact increases resistance to persuasion.
Chen, Frances S; Minson, Julia A; Schöne, Maren; Heinrichs, Markus
2013-11-01
Popular belief holds that eye contact increases the success of persuasive communication, and prior research suggests that speakers who direct their gaze more toward their listeners are perceived as more persuasive. In contrast, we demonstrate that more eye contact between the listener and speaker during persuasive communication predicts less attitude change in the direction advocated. In Study 1, participants freely watched videos of speakers expressing various views on controversial sociopolitical issues. Greater direct gaze at the speaker's eyes was associated with less attitude change in the direction advocated by the speaker. In Study 2, we instructed participants to look at either the eyes or the mouths of speakers presenting arguments counter to participants' own attitudes. Intentionally maintaining direct eye contact led to less persuasion than did gazing at the mouth. These findings suggest that efforts at increasing eye contact may be counterproductive across a variety of persuasion contexts.
How Psychological Stress Affects Emotional Prosody
Paulmann, Silke; Furnes, Desire; Bøkenes, Anne Ming; Cozzolino, Philip J.
2016-01-01
We explored how experimentally induced psychological stress affects the production and recognition of vocal emotions. In Study 1a, we demonstrate that sentences spoken by stressed speakers are judged by naïve listeners as sounding more stressed than sentences uttered by non-stressed speakers. In Study 1b, negative emotions produced by stressed speakers are generally less well recognized than the same emotions produced by non-stressed speakers. Multiple mediation analyses suggest this poorer recognition of negative stimuli was due to a mismatch between the variation of volume voiced by speakers and the range of volume expected by listeners. Together, this suggests that the stress level of the speaker affects judgments made by the receiver. In Study 2, we demonstrate that participants who were induced with a feeling of stress before carrying out an emotional prosody recognition task performed worse than non-stressed participants. Overall, findings suggest detrimental effects of induced stress on interpersonal sensitivity. PMID:27802287
Don't Underestimate the Benefits of Being Misunderstood.
Gibson, Edward; Tan, Caitlin; Futrell, Richard; Mahowald, Kyle; Konieczny, Lars; Hemforth, Barbara; Fedorenko, Evelina
2017-06-01
Being a nonnative speaker of a language poses challenges. Individuals often feel embarrassed by the errors they make when talking in their second language. However, here we report an advantage of being a nonnative speaker: Native speakers give foreign-accented speakers the benefit of the doubt when interpreting their utterances; as a result, apparently implausible utterances are more likely to be interpreted in a plausible way when delivered in a foreign than in a native accent. Across three replicated experiments, we demonstrated that native English speakers are more likely to interpret implausible utterances, such as "the mother gave the candle the daughter," as similar plausible utterances ("the mother gave the candle to the daughter") when the speaker has a foreign accent. This result follows from the general model of language interpretation in a noisy channel, under the hypothesis that listeners assume a higher error rate in foreign-accented than in nonaccented speech.
Rhythmic patterning in Malaysian and Singapore English.
Tan, Rachel Siew Kuang; Low, Ee-Ling
2014-06-01
Previous work on the rhythm of Malaysian English has been based on impressionistic observations. This paper utilizes acoustic analysis to measure the rhythmic patterns of Malaysian English. Recordings of the read speech and spontaneous speech of 10 Malaysian English speakers were analyzed and compared with recordings of an equivalent sample of Singaporean English speakers. Analysis was done using two rhythmic indexes, the PVI and VarcoV. It was found that although the rhythm of read speech of the Singaporean speakers was syllable-based as described by previous studies, the rhythm of the Malaysian speakers was even more syllable-based. Analysis of the syllables in specific utterances showed that Malaysian speakers did not reduce vowels as much as Singaporean speakers in cases of syllables in utterances. Results of the spontaneous speech confirmed the findings for the read speech; that is, the same rhythmic patterning was found which normally triggers vowel reductions.
Speakers of different languages process the visual world differently.
Chabal, Sarah; Marian, Viorica
2015-06-01
Language and vision are highly interactive. Here we show that people activate language when they perceive the visual world, and that this language information impacts how speakers of different languages focus their attention. For example, when searching for an item (e.g., clock) in the same visual display, English and Spanish speakers look at different objects. Whereas English speakers searching for the clock also look at a cloud, Spanish speakers searching for the clock also look at a gift, because the Spanish names for gift (regalo) and clock (reloj) overlap phonologically. These different looking patterns emerge despite an absence of direct language input, showing that linguistic information is automatically activated by visual scene processing. We conclude that the varying linguistic information available to speakers of different languages affects visual perception, leading to differences in how the visual world is processed. (c) 2015 APA, all rights reserved).
The Origins and Evolution of Molecules in Icy Solids
NASA Technical Reports Server (NTRS)
Hudson, Reggie L.; Moore, Marla H.
2010-01-01
Astronomical observations of the past few decades have revealed the existence of a variety of molecules in extraterrestrial ices. These molecules include H2O, CO, and CO2, and organics such as CH4, CH30H, and C2H6. Some ices are dominated by polar molecules, while non-polar species appear to dominate others. Observations, mainly in the radio and IR regions, have allowed the inference of other solid-phase molecules whose formation remains difficult to explain by gas-phase chemistry alone. Several laboratory research groups have reported on extensive experiments on the solid-phase reaction chemistry of icy materials, generally as initiated by either ionizing radiation or vacuum-UV photons. These experiments not only permit molecular identifications to be made from astronomical observations, but also allow predictions of yet unidentified molecules. This laboratory approach has evolved over more than 30 years with much of the earliest work focusing on complex mixtures thought to represent either cometary or interstellar ices. Although those early experiments documented a rich solid-state photo- and radiation chemistry, they revealed few details of reactions for particular molecules, partly due to the multi-component nature of the samples. Since then, model systems have been examined that allow the chemistry of individual species and specific reactions to be probed. Reactions involving most of the smaller astronomical molecules have now been studied and specific processes identified. Current laboratory work suggests that a variety of reactions occur in extraterrestrial ices, including acid-base processes, radical dimerizations, proton transfers, oxidations, reductions, and isomerizations. This workshop presentation will focus on chemical reactions relevant to solar system and interstellar ices. While most of the work will be drawn from that to which the speaker has contributed, results from other laboratories also will be included. Suggestions for future studies will be made, with an emphasis on some present deficiencies. The speaker's work has been generously supported by these NASA research programs: Cassini Data Analysis, Exobiology, Mars Fundamental Research, Outer Planets Research, Planetary Atmospheres, Planetary Geology and Geophysics, and the NASA Astrobiology Institute.
Processing ser and estar to locate objects and events: An ERP study with L2 speakers of Spanish.
Dussias, Paola E; Contemori, Carla; Román, Patricia
2014-01-01
In Spanish locative constructions, a different form of the copula is selected in relation to the semantic properties of the grammatical subject: sentences that locate objects require estar while those that locate events require ser (both translated in English as 'to be'). In an ERP study, we examined whether second language (L2) speakers of Spanish are sensitive to the selectional restrictions that the different types of subjects impose on the choice of the two copulas. Twenty-four native speakers of Spanish and two groups of L2 Spanish speakers (24 beginners and 18 advanced speakers) were recruited to investigate the processing of 'object/event + estar/ser ' permutations. Participants provided grammaticality judgments on correct (object + estar ; event + ser ) and incorrect (object + ser ; event + estar ) sentences while their brain activity was recorded. In line with previous studies (Leone-Fernández, Molinaro, Carreiras, & Barber, 2012; Sera, Gathje, & Pintado, 1999), the results of the grammaticality judgment for the native speakers showed that participants correctly accepted object + estar and event + ser constructions. In addition, while 'object + ser ' constructions were considered grossly ungrammatical, 'event + estar ' combinations were perceived as unacceptable to a lesser degree. For these same participants, ERP recording time-locked to the onset of the critical word ' en ' showed a larger P600 for the ser predicates when the subject was an object than when it was an event (*La silla es en la cocina vs. La fiesta es en la cocina). This P600 effect is consistent with syntactic repair of the defining predicate when it does not fit with the adequate semantic properties of the subject. For estar predicates (La silla está en la cocina vs. *La fiesta está en la cocina), the findings showed a central-frontal negativity between 500-700 ms. Grammaticality judgment data for the L2 speakers of Spanish showed that beginners were significantly less accurate than native speakers in all conditions, while the advanced speakers only differed from the natives in the event+ ser and event+ estar conditions. For the ERPs, the beginning learners did not show any effects in the time-windows under analysis. The advanced speakers showed a pattern similar to that of native speakers: (1) a P600 response to 'object + ser ' violation more central and frontally distributed, and (2) a central-frontal negativity between 500-700 ms for 'event + estar ' violation. Findings for the advanced speakers suggest that behavioral methods commonly used to assess grammatical knowledge in the L2 may be underestimating what L2 speakers have actually learned.
STS-41 Voice Command System Flight Experiment Report
NASA Technical Reports Server (NTRS)
Salazar, George A.
1981-01-01
This report presents the results of the Voice Command System (VCS) flight experiment on the five-day STS-41 mission. Two mission specialists,Bill Shepherd and Bruce Melnick, used the speaker-dependent system to evaluate the operational effectiveness of using voice to control a spacecraft system. In addition, data was gathered to analyze the effects of microgravity on speech recognition performance.
Reasoning about knowledge: Children's evaluations of generality and verifiability.
Koenig, Melissa A; Cole, Caitlin A; Meyer, Meredith; Ridge, Katherine E; Kushnir, Tamar; Gelman, Susan A
2015-12-01
In a series of experiments, we examined 3- to 8-year-old children's (N=223) and adults' (N=32) use of two properties of testimony to estimate a speaker's knowledge: generality and verifiability. Participants were presented with a "Generic speaker" who made a series of 4 general claims about "pangolins" (a novel animal kind), and a "Specific speaker" who made a series of 4 specific claims about "this pangolin" as an individual. To investigate the role of verifiability, we systematically varied whether the claim referred to a perceptually-obvious feature visible in a picture (e.g., "has a pointy nose") or a non-evident feature that was not visible (e.g., "sleeps in a hollow tree"). Three main findings emerged: (1) young children showed a pronounced reliance on verifiability that decreased with age. Three-year-old children were especially prone to credit knowledge to speakers who made verifiable claims, whereas 7- to 8-year-olds and adults credited knowledge to generic speakers regardless of whether the claims were verifiable; (2) children's attributions of knowledge to generic speakers was not detectable until age 5, and only when those claims were also verifiable; (3) children often generalized speakers' knowledge outside of the pangolin domain, indicating a belief that a person's knowledge about pangolins likely extends to new facts. Findings indicate that young children may be inclined to doubt speakers who make claims they cannot verify themselves, as well as a developmentally increasing appreciation for speakers who make general claims. Copyright © 2015 Elsevier Inc. All rights reserved.
Why We Serve - U.S. Department of Defense Official Website
described by a soldier, sailor, airman or Marine who lives it. Story HOW TO HOST A SPEAKER Organizations other organizations. Speakers Photos MEET THE SPEAKERS January 2008 Army Major Lisa L. Carter Navy
Formant transitions in the fluent speech of Farsi-speaking people who stutter.
Dehqan, Ali; Yadegari, Fariba; Blomgren, Michael; Scherer, Ronald C
2016-06-01
Second formant (F2) transitions can be used to infer attributes of articulatory transitions. This study compared formant transitions during fluent speech segments of Farsi (Persian) speaking people who stutter and normally fluent Farsi speakers. Ten Iranian males who stutter and 10 normally fluent Iranian males participated. Sixteen different "CVt" tokens were embedded within the phrase "Begu CVt an". Measures included overall F2 transition frequency extents, durations, and derived overall slopes, initial F2 transition slopes at 30ms and 60ms, and speaking rate. (1) Mean overall formant frequency extent was significantly greater in 14 of the 16 CVt tokens for the group of stuttering speakers. (2) Stuttering speakers exhibited significantly longer overall F2 transitions for all 16 tokens compared to the nonstuttering speakers. (3) The overall F2 slopes were similar between the two groups. (4) The stuttering speakers exhibited significantly greater initial F2 transition slopes (positive or negative) for five of the 16 tokens at 30ms and six of the 16 tokens at 60ms. (5) The stuttering group produced a slower syllable rate than the non-stuttering group. During perceptually fluent utterances, the stuttering speakers had greater F2 frequency extents during transitions, took longer to reach vowel steady state, exhibited some evidence of steeper slopes at the beginning of transitions, had overall similar F2 formant slopes, and had slower speaking rates compared to nonstuttering speakers. Findings support the notion of different speech motor timing strategies in stuttering speakers. Findings are likely to be independent of the language spoken. Educational objectives This study compares aspects of F2 formant transitions between 10 stuttering and 10 nonstuttering speakers. Readers will be able to describe: (a) characteristics of formant frequency as a specific acoustic feature used to infer speech movements in stuttering and nonstuttering speakers, (b) two methods of measuring second formant (F2) transitions: the visual criteria method and fixed time criteria method, (c) characteristics of F2 transitions in the fluent speech of stuttering speakers and how those characteristics appear to differ from normally fluent speakers, and (d) possible cross-linguistic effects on acoustic analyses of stuttering. Copyright © 2016 Elsevier Inc. All rights reserved.
The International English Language Testing System (IELTS): The Speaking Test.
ERIC Educational Resources Information Center
Ingram, D. E.
1991-01-01
The International English Language Testing System (IELTS) assesses proficiency in English both generally and for special purposes of non-native English speakers studying, training, or learning English in English-speaking countries. The Speaking subtest of the IELTS measures a candidate's general proficiency in speaking in everyday situations via a…
Second Language Learning of Complex Inflectional Systems
ERIC Educational Resources Information Center
Kempe, Vera; Brooks, Patricia J.
2008-01-01
This study explored learning and generalization of parts of the Russian case-marking paradigm, an inflecting-fusional system in which affixes simultaneously mark several grammatical features (case, gender, number, animacy). In Experiment 1, adult English speakers (N = 43) were exposed to nouns with transparent gender marking in the nominative case…
The Role of the Limbic System in Human Communication.
ERIC Educational Resources Information Center
Lamendella, John T.
Linguistics has chosen as its niche the language component of human communication and, naturally enough, the neurolinguist has concentrated on lateralized language systems of the cerebral hemispheres. However, decoding a speaker's total message requires attention to gestures, facial expressions, and prosodic features, as well as other somatic and…
CONASTA Brings Teachers a Kaleidoscope of Science
ERIC Educational Resources Information Center
Teaching Science, 2015
2015-01-01
From star systems to social systems, CONASTA 64 connects teachers to researchers and scientists working on the cutting edge of modern science. We asked two CONASTA 64 Keynote speakers, Steven Tingay and Ian Walker to share their passion for their work and their dedication for giving back to the science community.
Developing a Weighted Measure of Speech Sound Accuracy
ERIC Educational Resources Information Center
Preston, Jonathan L.; Ramsdell, Heather L.; Oller, D. Kimbrough; Edwards, Mary Louise; Tobin, Stephen J.
2011-01-01
Purpose: To develop a system for numerically quantifying a speaker's phonetic accuracy through transcription-based measures. With a focus on normal and disordered speech in children, the authors describe a system for differentially weighting speech sound errors on the basis of various levels of phonetic accuracy using a Weighted Speech Sound…
/l/ Production in English-Arabic Bilingual Speakers.
ERIC Educational Resources Information Center
Khattab, Ghada
2002-01-01
Reports an analysis of /l/ production by English-Arabic bilingual children. Addresses the question of whether the bilingual develops one phonological system or two by calling for a refinement of the notion of system using insights from recent phonetic and sociolinguistic work on variability in speech. English-Arabic bilinguals were studied.…
Referential first mention in narratives by mildly mentally retarded adults.
Kernan, K T; Sabsay, S
1987-01-01
Referential first mentions in narrative reports of a short film by 40 mildly mentally retarded adults and 20 nonretarded adults were compared. The mentally retarded sample included equal numbers of male and female, and black and white speakers. The mentally retarded speakers made significantly fewer first mentions and significantly more errors in the form of the first mentions than did nonretarded speakers. A pattern of better performance by black males than by other mentally retarded speakers was found. It is suggested that task difficulty and incomplete mastery of the use of definite and indefinite forms for encoding old and new information, rather than some global type of egocentrism, accounted for the poorer performance by mentally retarded speakers.
Transfer Effects in Learning a Second Language Grammatical Gender System
ERIC Educational Resources Information Center
Sabourin, Laura; Stowe, Laurie A.; de Haan, Ger J.
2006-01-01
In this article second language (L2) knowledge of Dutch grammatical gender is investigated. Adult speakers of German, English and a Romance language (French, Italian or Spanish) were investigated to explore the role of transfer in learning the Dutch grammatical gender system. In the first language (L1) systems, German is the most similar to Dutch…
Somatotype and Body Composition of Normal and Dysphonic Adult Speakers.
Franco, Débora; Fragoso, Isabel; Andrea, Mário; Teles, Júlia; Martins, Fernando
2017-01-01
Voice quality provides information about the anatomical characteristics of the speaker. The patterns of somatotype and body composition can provide essential knowledge to characterize the individuality of voice quality. The aim of this study was to verify if there were significant differences in somatotype and body composition between normal and dysphonic speakers. Cross-sectional study. Anthropometric measurements were taken of a sample of 72 adult participants (40 normal speakers and 32 dysphonic speakers) according to International Society for the Advancement of Kinanthropometry standards, which allowed the calculation of endomorphism, mesomorphism, ectomorphism components, body density, body mass index, fat mass, percentage fat, and fat-free mass. Perception and acoustic evaluations as well as nasoendoscopy were used to assign speakers into normal or dysphonic groups. There were no significant differences between normal and dysphonic speakers in the mean somatotype attitudinal distance and somatotype dispersion distance (in spite of marginally significant differences [P < 0.10] in somatotype attitudinal distance and somatotype dispersion distance between groups) and in the mean vector of the somatotype components. Furthermore, no significant differences were found between groups concerning the mean of percentage fat, fat mass, fat-free mass, body density, and body mass index after controlling by sex. The findings suggested no significant differences in the somatotype and body composition variables, between normal and dysphonic speakers. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Huang, Laura; Frideger, Marcia; Pearce, Jone L
2013-11-01
We propose and test a new theory explaining glass-ceiling bias against nonnative speakers as driven by perceptions that nonnative speakers have weak political skill. Although nonnative accent is a complex signal, its effects on assessments of the speakers' political skill are something that speakers can actively mitigate; this makes it an important bias to understand. In Study 1, White and Asian nonnative speakers using the same scripted responses as native speakers were found to be significantly less likely to be recommended for a middle-management position, and this bias was fully mediated by assessments of their political skill. The alternative explanations of race, communication skill, and collaborative skill were nonsignificant. In Study 2, entrepreneurial start-up pitches from national high-technology, new-venture funding competitions were shown to experienced executive MBA students. Nonnative speakers were found to have a significantly lower likelihood of receiving new-venture funding, and this was fully mediated by the coders' assessments of their political skill. The entrepreneurs' race, communication skill, and collaborative skill had no effect. We discuss the value of empirically testing various posited reasons for glass-ceiling biases, how the importance and ambiguity of political skill for executive success serve as an ostensibly meritocratic cover for nonnative speaker bias, and other theoretical and practical implications of this work. (c) 2013 APA, all rights reserved.
Improved Audio Reproduction System
NASA Technical Reports Server (NTRS)
Chang, C. S.
1972-01-01
Circuitry utilizing electrical feedback of instantaneous speaker coil velocity compensates for loudspeaker resonance, transient peaks and frequency drop-off so that sounds of widely varying frequencies and amplitudes can be reproduced accurately from high fidelity recordings of any variety.
ERIC Educational Resources Information Center
Samuels, Alan R.; And Others
1987-01-01
These five papers by speakers at the Small Computers in Libraries 1987 conference include: "Acquiring and Using Shareware in Building Small Scale Automated Information systems" (Samuels); "A Software Lending Collection" (Talab); "Providing Subject Access to Microcomputer Software" (Mitchell); "Interfacing Vendor…
Perception of speaker size and sex of vowel sounds
NASA Astrophysics Data System (ADS)
Smith, David R. R.; Patterson, Roy D.
2005-04-01
Glottal-pulse rate (GPR) and vocal-tract length (VTL) are both related to speaker size and sex-however, it is unclear how they interact to determine our perception of speaker size and sex. Experiments were designed to measure the relative contribution of GPR and VTL to judgements of speaker size and sex. Vowels were scaled to represent people with different GPRs and VTLs, including many well beyond the normal population values. In a single interval, two response rating paradigm, listeners judged the size (using a 7-point scale) and sex/age of the speaker (man, woman, boy, or girl) of these scaled vowels. Results from the size-rating experiments show that VTL has a much greater influence upon judgements of speaker size than GPR. Results from the sex-categorization experiments show that judgements of speaker sex are influenced about equally by GPR and VTL for vowels with normal GPR and VTL values. For abnormal combinations of GPR and VTL, where low GPRs are combined with short VTLs, VTL has more influence than GPR in sex judgements. [Work supported by the UK MRC (G9901257) and the German Volkswagen Foundation (VWF 1/79 783).
Voice Handicap Index in Persian Speakers with Various Severities of Hearing Loss.
Aghadoost, Ozra; Moradi, Negin; Dabirmoghaddam, Payman; Aghadoost, Alireza; Naderifar, Ehsan; Dehbokri, Siavash Mohammadi
2016-01-01
The purpose of this study was to assess and compare the total score and subscale scores of the Voice Handicap Index (VHI) in speakers with and without hearing loss. A further aim was to determine if a correlation exists between severities of hearing loss with total scores and VHI subscale scores. In this cross-sectional, descriptive analytical study, 100 participants, divided in 2 groups of participants with and without hearing loss, were studied. Background information was gathered by interview, and VHI questionnaires were filled in by all participants. For all variables, including mean total score and VHI subscale scores, there was a considerable difference in speakers with and without hearing loss (p < 0.05). The correlation between severity of hearing loss with total score and VHI subscale scores was significant. Speakers with hearing loss were found to have higher mean VHI scores than speakers with normal hearing. This indicates a high voice handicap related to voice in speakers with hearing loss. In addition, increased severity of hearing loss leads to more severe voice handicap. This finding emphasizes the need for a multilateral assessment and treatment of voice disorders in speakers with hearing loss. © 2017 S. Karger AG, Basel.
Understanding speaker attitudes from prosody by adults with Parkinson's disease.
Monetta, Laura; Cheang, Henry S; Pell, Marc D
2008-09-01
The ability to interpret vocal (prosodic) cues during social interactions can be disrupted by Parkinson's disease, with notable effects on how emotions are understood from speech. This study investigated whether PD patients who have emotional prosody deficits exhibit further difficulties decoding the attitude of a speaker from prosody. Vocally inflected but semantically nonsensical 'pseudo-utterances' were presented to listener groups with and without PD in two separate rating tasks. Task I required participants to rate how confident a speaker sounded from their voice and Task 2 required listeners to rate how polite the speaker sounded for a comparable set of pseudo-utterances. The results showed that PD patients were significantly less able than HC participants to use prosodic cues to differentiate intended levels of speaker confidence in speech, although the patients could accurately detect the politelimpolite attitude of the speaker from prosody in most cases. Our data suggest that many PD patients fail to use vocal cues to effectively infer a speaker's emotions as well as certain attitudes in speech such as confidence, consistent with the idea that the basal ganglia play a role in the meaningful processing of prosodic sequences in spoken language (Pell & Leonard, 2003).
von Lochow, Heike; Lyberg-Åhlander, Viveka; Sahlén, Birgitta; Kastberg, Tobias; Brännström, K Jonas
2018-04-01
This study explores the effect of voice quality and competing speaker/-s on children's performance in a passage comprehension task. Furthermore, it explores the interaction between passage comprehension and cognitive functioning. Forty-nine children (27 girls and 22 boys) with normal hearing (aged 7-12 years) participated. Passage comprehension was tested in six different listening conditions; a typical voice (non-dysphonic voice) in quiet, a typical voice with one competing speaker, a typical voice with four competing speakers, a dysphonic voice in quiet, a dysphonic voice with one competing speaker, and a dysphonic voice with four competing speakers. The children's working memory capacity and executive functioning were also assessed. The findings indicate no direct effect of voice quality on the children's performance, but a significant effect of background listening condition. Interaction effects were seen between voice quality, background listening condition, and executive functioning. The children's susceptibility to the effect of the dysphonic voice and the background listening conditions are related to the individual's executive functions. The findings have several implications for design of interventions in language learning environments such as classrooms.
San Juan, Valerie; Chambers, Craig G; Berman, Jared; Humphry, Chelsea; Graham, Susan A
2017-10-01
Two experiments examined whether 5-year-olds draw inferences about desire outcomes that constrain their online interpretation of an utterance. Children were informed of a speaker's positive (Experiment 1) or negative (Experiment 2) desire to receive a specific toy as a gift before hearing a referentially ambiguous statement ("That's my present") spoken with either a happy or sad voice. After hearing the speaker express a positive desire, children (N=24) showed an implicit (i.e., eye gaze) and explicit ability to predict reference to the desired object when the speaker sounded happy, but they showed only implicit consideration of the alternate object when the speaker sounded sad. After hearing the speaker express a negative desire, children (N=24) used only happy prosodic cues to predict the intended referent of the statement. Taken together, the findings indicate that the efficiency with which 5-year-olds integrate desire reasoning with language processing depends on the emotional valence of the speaker's voice but not on the type of desire representations (i.e., positive vs. negative) that children must reason about online. Copyright © 2017 Elsevier Inc. All rights reserved.
Four S's to Turn Your "Sex Talk" into a Super Program.
ERIC Educational Resources Information Center
Friedman, Jay
1995-01-01
Selection of campus speakers on sexuality is discussed, including assessment of speaker qualifications, the importance of teaching style and tone, choice of subject, program design for a meaningful event, and the sensitivity of both the speaker and the institution. (MSE)
NREL: International Activities - Fourth Renewable Energy Industries Forum
Speakers and Presentations International Activities Printable Version Fourth Renewable Energy Industries Forum Speakers and Presentations The Fourth Renewable Energy Industries Forum (REIF) speakers and practices, opportunities and challenges of utility and distributed projects, renewable energy integration
Electrophysiology of subject-verb agreement mediated by speakers' gender.
Hanulíková, Adriana; Carreiras, Manuel
2015-01-01
An important property of speech is that it explicitly conveys features of a speaker's identity such as age or gender. This event-related potential (ERP) study examined the effects of social information provided by a speaker's gender, i.e., the conceptual representation of gender, on subject-verb agreement. Despite numerous studies on agreement, little is known about syntactic computations generated by speaker characteristics extracted from the acoustic signal. Slovak is well suited to investigate this issue because it is a morphologically rich language in which agreement involves features for number, case, and gender. Grammaticality of a sentence can be evaluated by checking a speaker's gender as conveyed by his/her voice. We examined how conceptual information about speaker gender, which is not syntactic but rather social and pragmatic in nature, is interpreted for the computation of agreement patterns. ERP responses to verbs disagreeing with the speaker's gender (e.g., a sentence including a masculine verbal inflection spoken by a female person 'the neighbors were upset because I (∗)stoleMASC plums') elicited a larger early posterior negativity compared to correct sentences. When the agreement was purely syntactic and did not depend on the speaker's gender, a disagreement between a formally marked subject and the verb inflection (e.g., the womanFEM (∗)stoleMASC plums) resulted in a larger P600 preceded by a larger anterior negativity compared to the control sentences. This result is in line with proposals according to which the recruitment of non-syntactic information such as the gender of the speaker results in N400-like effects, while formally marked syntactic features lead to structural integration as reflected in a LAN/P600 complex.
Speaker normalization and adaptation using second-order connectionist networks.
Watrous, R L
1993-01-01
A method for speaker normalization and adaption using connectionist networks is developed. A speaker-specific linear transformation of observations of the speech signal is computed using second-order network units. Classification is accomplished by a multilayer feedforward network that operates on the normalized speech data. The network is adapted for a new talker by modifying the transformation parameters while leaving the classifier fixed. This is accomplished by backpropagating classification error through the classifier to the second-order transformation units. This method was evaluated for the classification of ten vowels for 76 speakers using the first two formant values of the Peterson-Barney data. The results suggest that rapid speaker adaptation resulting in high classification accuracy can be accomplished by this method.
Arithmetic processing in the brain shaped by cultures
Tang, Yiyuan; Zhang, Wutian; Chen, Kewei; Feng, Shigang; Ji, Ye; Shen, Junxian; Reiman, Eric M.; Liu, Yijun
2006-01-01
The universal use of Arabic numbers in mathematics raises a question whether these digits are processed the same way in people speaking various languages, such as Chinese and English, which reflect differences in Eastern and Western cultures. Using functional MRI, we demonstrated a differential cortical representation of numbers between native Chinese and English speakers. Contrasting to native English speakers, who largely employ a language process that relies on the left perisylvian cortices for mental calculation such as a simple addition task, native Chinese speakers, instead, engage a visuo-premotor association network for the same task. Whereas in both groups the inferior parietal cortex was activated by a task for numerical quantity comparison, functional MRI connectivity analyses revealed a functional distinction between Chinese and English groups among the brain networks involved in the task. Our results further indicate that the different biological encoding of numbers may be shaped by visual reading experience during language acquisition and other cultural factors such as mathematics learning strategies and education systems, which cannot be explained completely by the differences in languages per se. PMID:16815966
Weber, Kirsten; Luther, Lisa; Indefrey, Peter; Hagoort, Peter
2016-05-01
When we learn a second language later in life, do we integrate it with the established neural networks in place for the first language or is at least a partially new network recruited? While there is evidence that simple grammatical structures in a second language share a system with the native language, the story becomes more multifaceted for complex sentence structures. In this study, we investigated the underlying brain networks in native speakers compared with proficient second language users while processing complex sentences. As hypothesized, complex structures were processed by the same large-scale inferior frontal and middle temporal language networks of the brain in the second language, as seen in native speakers. These effects were seen both in activations and task-related connectivity patterns. Furthermore, the second language users showed increased task-related connectivity from inferior frontal to inferior parietal regions of the brain, regions related to attention and cognitive control, suggesting less automatic processing for these structures in a second language.
TU-D-213AB-01: How You Can Be the Speaker and Communicator Everyone Wants You to Be.
Collins, J; Aydogan, B
2012-06-01
Effectiveness of an oral presentation depends on the ability of the speaker to communicate with the audience. An important part of this communication is focusing on two to five key points and emphasizing those points during the presentation. Every aspect of the presentation should be purposeful and directed at facilitating learners' achievement of the objectives. This necessitates that the speaker has carefully developed the objectives and built the presentation around attainment of the objectives. A presentation should be designed to include as much audience participation as possible, no matter the size of the audience. Techniques to encourage audience participation include questioning, brainstorming, small-group activities, role-playing, case-based examples, directed listening, and use of an audience response system. It is first necessary to motivate and gain attention of the learner for learning to take place. This can be accomplished through appropriate use of humor, anecdotes, and quotations. This course will review adult learning principles and effective presentation skills, Learning Objectives: 1. Apply adult learning principles. 2. Demonstrate effective presentations skills. © 2012 American Association of Physicists in Medicine.