Science.gov

Sample records for acoustic speech features

  1. Adding articulatory features to acoustic features for automatic speech recognition

    SciTech Connect

    Zlokarnik, I.

    1995-05-01

    A hidden-Markov-model (HMM) based speech recognition system was evaluated that makes use of simultaneously recorded acoustic and articulatory data. The articulatory measurements were gathered by means of electromagnetic articulography and describe the movement of small coils fixed to the speakers` tongue and jaw during the production of German V{sub 1}CV{sub 2} sequences [P. Hoole and S. Gfoerer, J. Acoust. Soc. Am. Suppl. 1 {bold 87}, S123 (1990)]. Using the coordinates of the coil positions as an articulatory representation, acoustic and articulatory features were combined to make up an acoustic--articulatory feature vector. The discriminant power of this combined representation was evaluated for two subjects on a speaker-dependent isolated word recognition task. When the articulatory measurements were used both for training and testing the HMMs, the articulatory representation was capable of reducing the error rate of comparable acoustic-based HMMs by a relative percentage of more than 60%. In a separate experiment, the articulatory movements during the testing phase were estimated using a multilayer perceptron that performed an acoustic-to-articulatory mapping. Under these more realistic conditions, when articulatory measurements are only available during the training, the error rate could be reduced by a relative percentage of 18% to 25%.

  2. Classification of stop place in consonant-vowel contexts using feature extrapolation of acoustic-phonetic features in telephone speech.

    PubMed

    Lee, Jung-Won; Choi, Jeung-Yoon; Kang, Hong-Goo

    2012-02-01

    Knowledge-based speech recognition systems extract acoustic cues from the signal to identify speech characteristics. For channel-deteriorated telephone speech, acoustic cues, especially those for stop consonant place, are expected to be degraded or absent. To investigate the use of knowledge-based methods in degraded environments, feature extrapolation of acoustic-phonetic features based on Gaussian mixture models is examined. This process is applied to a stop place detection module that uses burst release and vowel onset cues for consonant-vowel tokens of English. Results show that classification performance is enhanced in telephone channel-degraded speech, with extrapolated acoustic-phonetic features reaching or exceeding performance using estimated Mel-frequency cepstral coefficients (MFCCs). Results also show acoustic-phonetic features may be combined with MFCCs for best performance, suggesting these features provide information complementary to MFCCs. PMID:22352523

  3. Primary Progressive Apraxia of Speech: Clinical Features and Acoustic and Neurologic Correlates

    PubMed Central

    Strand, Edythe A.; Clark, Heather; Machulda, Mary; Whitwell, Jennifer L.; Josephs, Keith A.

    2015-01-01

    Purpose This study summarizes 2 illustrative cases of a neurodegenerative speech disorder, primary progressive apraxia of speech (AOS), as a vehicle for providing an overview of the disorder and an approach to describing and quantifying its perceptual features and some of its temporal acoustic attributes. Method Two individuals with primary progressive AOS underwent speech-language and neurologic evaluations on 2 occasions, ranging from 2.0 to 7.5 years postonset. Performance on several tests, tasks, and rating scales, as well as several acoustic measures, were compared over time within and between cases. Acoustic measures were compared with performance of control speakers. Results Both patients initially presented with AOS as the only or predominant sign of disease and without aphasia or dysarthria. The presenting features and temporal progression were captured in an AOS Rating Scale, an Articulation Error Score, and temporal acoustic measures of utterance duration, syllable rates per second, rates of speechlike alternating motion and sequential motion, and a pairwise variability index measure. Conclusions AOS can be the predominant manifestation of neurodegenerative disease. Clinical ratings of its attributes and acoustic measures of some of its temporal characteristics can support its diagnosis and help quantify its salient characteristics and progression over time. PMID:25654422

  4. Estimation of glottal source features from the spectral envelope of the acoustic speech signal

    NASA Astrophysics Data System (ADS)

    Torres, Juan Felix

    Speech communication encompasses diverse types of information, including phonetics, affective state, voice quality, and speaker identity. From a speech production standpoint, the acoustic speech signal can be mainly divided into glottal source and vocal tract components, which play distinct roles in rendering the various types of information it contains. Most deployed speech analysis systems, however, do not explicitly represent these two components as distinct entities, as their joint estimation from the acoustic speech signal becomes an ill-defined blind deconvolution problem. Nevertheless, because of the desire to understand glottal behavior and how it relates to perceived voice quality, there has been continued interest in explicitly estimating the glottal component of the speech signal. To this end, several inverse filtering (IF) algorithms have been proposed, but they are unreliable in practice because of the blind formulation of the separation problem. In an effort to develop a method that can bypass the challenging IF process, this thesis proposes a new glottal source information extraction method that relies on supervised machine learning to transform smoothed spectral representations of speech, which are already used in some of the most widely deployed and successful speech analysis applications, into a set of glottal source features. A transformation method based on Gaussian mixture regression (GMR) is presented and compared to current IF methods in terms of feature similarity, reliability, and speaker discrimination capability on a large speech corpus, and potential representations of the spectral envelope of speech are investigated for their ability represent glottal source variation in a predictable manner. The proposed system was found to produce glottal source features that reasonably matched their IF counterparts in many cases, while being less susceptible to spurious errors. The development of the proposed method entailed a study into the aspects

  5. [Influence of human personal features on acoustic correlates of speech emotional intonation characteristics].

    PubMed

    Dmitrieva, E S; Gel'man, V Ia; Zaĭtseva, K A; Orlov, A M

    2009-01-01

    Comparative study of acoustic correlates of emotional intonation was conducted on two types of speech material: sensible speech utterances and short meaningless words. The corpus of speech signals of different emotional intonations (happy, angry, frightened, sad and neutral) was created using the actor's method of simulation of emotions. Native Russian 20-70-year-old speakers (both professional actors and non-actors) participated in the study. In the corpus, the following characteristics were analyzed: mean values and standard deviations of the power, fundamental frequency, frequencies of the first and second formants, and utterance duration. Comparison of each emotional intonation with "neutral" utterances showed the greatest deviations of the fundamental frequency and frequencies of the first formant. The direction of these deviations was independent of the semantic content of speech utterance and its duration, age, gender, and being actor or non-actor, though the personal features of the speakers affected the absolute values of these frequencies. PMID:19947529

  6. Analysis of Acoustic Features in Speakers with Cognitive Disorders and Speech Impairments

    NASA Astrophysics Data System (ADS)

    Saz, Oscar; Simón, Javier; Rodríguez, W. Ricardo; Lleida, Eduardo; Vaquero, Carlos

    2009-12-01

    This work presents the results in the analysis of the acoustic features (formants and the three suprasegmental features: tone, intensity and duration) of the vowel production in a group of 14 young speakers suffering different kinds of speech impairments due to physical and cognitive disorders. A corpus with unimpaired children's speech is used to determine the reference values for these features in speakers without any kind of speech impairment within the same domain of the impaired speakers; this is 57 isolated words. The signal processing to extract the formant and pitch values is based on a Linear Prediction Coefficients (LPCs) analysis of the segments considered as vowels in a Hidden Markov Model (HMM) based Viterbi forced alignment. Intensity and duration are also based in the outcome of the automated segmentation. As main conclusion of the work, it is shown that intelligibility of the vowel production is lowered in impaired speakers even when the vowel is perceived as correct by human labelers. The decrease in intelligibility is due to a 30% of increase in confusability in the formants map, a reduction of 50% in the discriminative power in energy between stressed and unstressed vowels and to a 50% increase of the standard deviation in the length of the vowels. On the other hand, impaired speakers keep good control of tone in the production of stressed and unstressed vowels.

  7. Speech Recognition and Acoustic Features in Combined Electric and Acoustic Stimulation

    ERIC Educational Resources Information Center

    Yoon, Yang-soo; Li, Yongxin; Fu, Qian-Jie

    2012-01-01

    Purpose: In this study, the authors aimed to identify speech information processed by a hearing aid (HA) that is additive to information processed by a cochlear implant (CI) as a function of signal-to-noise ratio (SNR). Method: Speech recognition was measured with CI alone, HA alone, and CI + HA. Ten participants were separated into 2 groups; good…

  8. Acoustic and Articulatory Features of Diphthong Production: A Speech Clarity Study

    ERIC Educational Resources Information Center

    Tasko, Stephen M.; Greilick, Kristin

    2010-01-01

    Purpose: The purpose of this study was to evaluate how speaking clearly influences selected acoustic and orofacial kinematic measures associated with diphthong production. Method: Forty-nine speakers, drawn from the University of Wisconsin X-Ray Microbeam Speech Production Database (J. R. Westbury, 1994), served as participants. Samples of clear…

  9. Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

    DOEpatents

    Holzrichter, J.F.; Ng, L.C.

    1998-03-17

    The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching. 35 figs.

  10. Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

    DOEpatents

    Holzrichter, John F.; Ng, Lawrence C.

    1998-01-01

    The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching.

  11. Acoustic Feature Optimization Based on F-Ratio for Robust Speech Recognition

    NASA Astrophysics Data System (ADS)

    Sun, Yanqing; Zhou, Yu; Zhao, Qingwei; Yan, Yonghong

    This paper focuses on the problem of performance degradation in mismatched speech recognition. The F-Ratio analysis method is utilized to analyze the significance of different frequency bands for speech unit classification, and we find that frequencies around 1kHz and 3kHz, which are the upper bounds of the first and the second formants for most of the vowels, should be emphasized in comparison to the Mel-frequency cepstral coefficients (MFCC). The analysis result is further observed to be stable in several typical mismatched situations. Similar to the Mel-Frequency scale, another frequency scale called the F-Ratio-scale is thus proposed to optimize the filter bank design for the MFCC features, and make each subband contains equal significance for speech unit classification. Under comparable conditions, with the modified features we get a relative 43.20% decrease compared with the MFCC in sentence error rate for the emotion affected speech recognition, 35.54%, 23.03% for the noisy speech recognition at 15dB and 0dB SNR (signal to noise ratio) respectively, and 64.50% for the three years' 863 test data. The application of the F-Ratio analysis on the clean training set of the Aurora2 database demonstrates its robustness over languages, texts and sampling rates.

  12. Exploration of Acoustic Features for Automatic Vowel Discrimination in Spontaneous Speech

    ERIC Educational Resources Information Center

    Tyson, Na'im R.

    2012-01-01

    In an attempt to understand what acoustic/auditory feature sets motivated transcribers towards certain labeling decisions, I built machine learning models that were capable of discriminating between canonical and non-canonical vowels excised from the Buckeye Corpus. Specifically, I wanted to model when the dictionary form and the transcribed-form…

  13. An acoustic feature-based similarity scoring system for speech rehabilitation assistance.

    PubMed

    Syauqy, Dahnial; Wu, Chao-Min; Setyawati, Onny

    2016-08-01

    The purpose of this study is to develop a tool to assist speech therapy and rehabilitation, which focused on automatic scoring based on the comparison of the patient's speech with another normal speech on several aspects including pitch, vowel, voiced-unvoiced segments, strident fricative and sound intensity. The pitch estimation employed the use of cepstrum-based algorithm for its robustness; the vowel classification used multilayer perceptron (MLP) to classify vowel from pitch and formants; and the strident fricative detection was based on the major peak spectral intensity, location and the pitch existence in the segment. In order to evaluate the performance of the system, this study analyzed eight patient's speech recordings (four males, four females; 4-58-years-old), which had been recorded in previous study in cooperation with Taipei Veterans General Hospital and Taoyuan General Hospital. The experiment result on pitch algorithm showed that the cepstrum method had 5.3% of gross pitch error from a total of 2086 frames. On the vowel classification algorithm, MLP method provided 93% accuracy (men), 87% (women) and 84% (children). In total, the overall results showed that 156 tool's grading results (81%) were consistent compared to 192 audio and visual observations done by four experienced respondents. Implication for Rehabilitation Difficulties in communication may limit the ability of a person to transfer and exchange information. The fact that speech is one of the primary means of communication has encouraged the needs of speech diagnosis and rehabilitation. The advances of technology in computer-assisted speech therapy (CAST) improve the quality, time efficiency of the diagnosis and treatment of the disorders. The present study attempted to develop tool to assist speech therapy and rehabilitation, which provided simple interface to let the assessment be done even by the patient himself without the need of particular knowledge of speech processing while at the

  14. Time-Frequency Feature Representation Using Multi-Resolution Texture Analysis and Acoustic Activity Detector for Real-Life Speech Emotion Recognition

    PubMed Central

    Wang, Kun-Ching

    2015-01-01

    The classification of emotional speech is mostly considered in speech-related research on human-computer interaction (HCI). In this paper, the purpose is to present a novel feature extraction based on multi-resolutions texture image information (MRTII). The MRTII feature set is derived from multi-resolution texture analysis for characterization and classification of different emotions in a speech signal. The motivation is that we have to consider emotions have different intensity values in different frequency bands. In terms of human visual perceptual, the texture property on multi-resolution of emotional speech spectrogram should be a good feature set for emotion classification in speech. Furthermore, the multi-resolution analysis on texture can give a clearer discrimination between each emotion than uniform-resolution analysis on texture. In order to provide high accuracy of emotional discrimination especially in real-life, an acoustic activity detection (AAD) algorithm must be applied into the MRTII-based feature extraction. Considering the presence of many blended emotions in real life, in this paper make use of two corpora of naturally-occurring dialogs recorded in real-life call centers. Compared with the traditional Mel-scale Frequency Cepstral Coefficients (MFCC) and the state-of-the-art features, the MRTII features also can improve the correct classification rates of proposed systems among different language databases. Experimental results show that the proposed MRTII-based feature information inspired by human visual perception of the spectrogram image can provide significant classification for real-life emotional recognition in speech. PMID:25594590

  15. Speech recognition: Acoustic, phonetic and lexical

    NASA Astrophysics Data System (ADS)

    Zue, V. W.

    1985-10-01

    Our long-term research goal is the development and implementation of speaker-independent continuous speech recognition systems. It is our conviction that proper utilization of speech-specific knowledge is essential for advanced speech recognition systems. With this in mind, we have continued to make progress on the acquisition of acoustic-phonetic and lexical knowledge. We have completed the development of a continuous digit recognition system. The system was constructed to investigate the utilization of acoustic phonetic knowledge in a speech recognition system. Some of the significant development of this study includes a soft-failure procedure for lexical access, and the discovery of a set of acoustic-phonetic features for verification. We have completed a study of the constraints provided by lexical stress on word recognition. We found that lexical stress information alone can, on the average, reduce the number of word candidates from a large dictionary by more than 80%. In conjunction with this study, we successfully developed a system that automatically determines the stress pattern of a word from the acoustic signal.

  16. Speech recognition: Acoustic, phonetic and lexical knowledge

    NASA Astrophysics Data System (ADS)

    Zue, V. W.

    1985-08-01

    During this reporting period we continued to make progress on the acquisition of acoustic-phonetic and lexical knowledge. We completed development of a continuous digit recognition system. The system was constructed to investigate the use of acoustic-phonetic knowledge in a speech recognition system. The significant achievements of this study include the development of a soft-failure procedure for lexical access and the discovery of a set of acoustic-phonetic features for verification. We completed a study of the constraints that lexical stress imposes on word recognition. We found that lexical stress information alone can, on the average, reduce the number of word candidates from a large dictionary by more than 80 percent. In conjunction with this study, we successfully developed a system that automatically determines the stress pattern of a word from the acoustic signal. We performed an acoustic study on the characteristics of nasal consonants and nasalized vowels. We have also developed recognition algorithms for nasal murmurs and nasalized vowels in continuous speech. We finished the preliminary development of a system that aligns a speech waveform with the corresponding phonetic transcription.

  17. Methods and apparatus for non-acoustic speech characterization and recognition

    DOEpatents

    Holzrichter, John F.

    1999-01-01

    By simultaneously recording EM wave reflections and acoustic speech information, the positions and velocities of the speech organs as speech is articulated can be defined for each acoustic speech unit. Well defined time frames and feature vectors describing the speech, to the degree required, can be formed. Such feature vectors can uniquely characterize the speech unit being articulated each time frame. The onset of speech, rejection of external noise, vocalized pitch periods, articulator conditions, accurate timing, the identification of the speaker, acoustic speech unit recognition, and organ mechanical parameters can be determined.

  18. Methods and apparatus for non-acoustic speech characterization and recognition

    SciTech Connect

    Holzrichter, J.F.

    1999-12-21

    By simultaneously recording EM wave reflections and acoustic speech information, the positions and velocities of the speech organs as speech is articulated can be defined for each acoustic speech unit. Well defined time frames and feature vectors describing the speech, to the degree required, can be formed. Such feature vectors can uniquely characterize the speech unit being articulated each time frame. The onset of speech, rejection of external noise, vocalized pitch periods, articulator conditions, accurate timing, the identification of the speaker, acoustic speech unit recognition, and organ mechanical parameters can be determined.

  19. Acoustics of Clear Speech: Effect of Instruction

    ERIC Educational Resources Information Center

    Lam, Jennifer; Tjaden, Kris; Wilding, Greg

    2012-01-01

    Purpose: This study investigated how different instructions for eliciting clear speech affected selected acoustic measures of speech. Method: Twelve speakers were audio-recorded reading 18 different sentences from the Assessment of Intelligibility of Dysarthric Speech (Yorkston & Beukelman, 1984). Sentences were produced in habitual, clear,…

  20. Acoustic Characteristics of Ataxic Speech in Japanese Patients with Spinocerebellar Degeneration (SCD)

    ERIC Educational Resources Information Center

    Ikui, Yukiko; Tsukuda, Mamoru; Kuroiwa, Yoshiyuki; Koyano, Shigeru; Hirose, Hajime; Taguchi, Takahide

    2012-01-01

    Background: In English- and German-speaking countries, ataxic speech is often described as showing scanning based on acoustic impressions. Although the term "scanning" is generally considered to represent abnormal speech features including prosodic excess or insufficiency, any precise acoustic analysis of ataxic speech has not been performed in…

  1. Near-Term Fetuses Process Temporal Features of Speech

    ERIC Educational Resources Information Center

    Granier-Deferre, Carolyn; Ribeiro, Aurelie; Jacquet, Anne-Yvonne; Bassereau, Sophie

    2011-01-01

    The perception of speech and music requires processing of variations in spectra and amplitude over different time intervals. Near-term fetuses can discriminate acoustic features, such as frequencies and spectra, but whether they can process complex auditory streams, such as speech sequences and more specifically their temporal variations, fast or…

  2. Acoustic assessment of speech privacy curtains in two nursing units.

    PubMed

    Pope, Diana S; Miller-Klein, Erik T

    2016-01-01

    Hospitals have complex soundscapes that create challenges to patient care. Extraneous noise and high reverberation rates impair speech intelligibility, which leads to raised voices. In an unintended spiral, the increasing noise may result in diminished speech privacy, as people speak loudly to be heard over the din. The products available to improve hospital soundscapes include construction materials that absorb sound (acoustic ceiling tiles, carpet, wall insulation) and reduce reverberation rates. Enhanced privacy curtains are now available and offer potential for a relatively simple way to improve speech privacy and speech intelligibility by absorbing sound at the hospital patient's bedside. Acoustic assessments were performed over 2 days on two nursing units with a similar design in the same hospital. One unit was built with the 1970s' standard hospital construction and the other was newly refurbished (2013) with sound-absorbing features. In addition, we determined the effect of an enhanced privacy curtain versus standard privacy curtains using acoustic measures of speech privacy and speech intelligibility indexes. Privacy curtains provided auditory protection for the patients. In general, that protection was increased by the use of enhanced privacy curtains. On an average, the enhanced curtain improved sound absorption from 20% to 30%; however, there was considerable variability, depending on the configuration of the rooms tested. Enhanced privacy curtains provide measureable improvement to the acoustics of patient rooms but cannot overcome larger acoustic design issues. To shorten reverberation time, additional absorption, and compact and more fragmented nursing unit floor plate shapes should be considered. PMID:26780959

  3. Acoustic assessment of speech privacy curtains in two nursing units

    PubMed Central

    Pope, Diana S.; Miller-Klein, Erik T.

    2016-01-01

    Hospitals have complex soundscapes that create challenges to patient care. Extraneous noise and high reverberation rates impair speech intelligibility, which leads to raised voices. In an unintended spiral, the increasing noise may result in diminished speech privacy, as people speak loudly to be heard over the din. The products available to improve hospital soundscapes include construction materials that absorb sound (acoustic ceiling tiles, carpet, wall insulation) and reduce reverberation rates. Enhanced privacy curtains are now available and offer potential for a relatively simple way to improve speech privacy and speech intelligibility by absorbing sound at the hospital patient's bedside. Acoustic assessments were performed over 2 days on two nursing units with a similar design in the same hospital. One unit was built with the 1970s’ standard hospital construction and the other was newly refurbished (2013) with sound-absorbing features. In addition, we determined the effect of an enhanced privacy curtain versus standard privacy curtains using acoustic measures of speech privacy and speech intelligibility indexes. Privacy curtains provided auditory protection for the patients. In general, that protection was increased by the use of enhanced privacy curtains. On an average, the enhanced curtain improved sound absorption from 20% to 30%; however, there was considerable variability, depending on the configuration of the rooms tested. Enhanced privacy curtains provide measureable improvement to the acoustics of patient rooms but cannot overcome larger acoustic design issues. To shorten reverberation time, additional absorption, and compact and more fragmented nursing unit floor plate shapes should be considered. PMID:26780959

  4. Acoustic modeling of the speech organ

    NASA Astrophysics Data System (ADS)

    Kacprowski, J.

    The state of research on acoustic modeling of phonational and articulatory speech producing elements is reviewed. Consistent with the physical interpretation of the speech production process, the acoustic theory of speech production is expressed as the product of three factors: laryngeal involvement, sound transmission, and emanations from the mouth and/or nose. Each of these factors is presented in the form of a simplified mathematical description which provides the theoretical basis for the formation of physical models of the appropriate functional members of this complex bicybernetic system. Vocal tract wall impedance, vocal tract synthesizers, laryngeal dysfunction, vowel nasalization, resonance circuits, and sound wave propagation are discussed.

  5. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2004-03-23

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  6. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2006-02-14

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  7. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2006-08-08

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  8. Acoustic characteristics of listener-constrained speech

    NASA Astrophysics Data System (ADS)

    Ashby, Simone; Cummins, Fred

    2003-04-01

    Relatively little is known about the acoustical modifications speakers employ to meet the various constraints-auditory, linguistic and otherwise-of their listeners. Similarly, the manner by which perceived listener constraints interact with speakers' adoption of specialized speech registers is poorly Hypo (H&H) theory offers a framework for examining the relationship between speech production and output-oriented goals for communication, suggesting that under certain circumstances speakers may attempt to minimize phonetic ambiguity by employing a ``hyperarticulated'' speaking style (Lindblom, 1990). It remains unclear, however, what the acoustic correlates of hyperarticulated speech are, and how, if at all, we might expect phonetic properties to change respective to different listener-constrained conditions. This paper is part of a preliminary investigation concerned with comparing the prosodic characteristics of speech produced across a range of listener constraints. Analyses are drawn from a corpus of read hyperarticulated speech data comprising eight adult, female speakers of English. Specialized registers include speech to foreigners, infant-directed speech, speech produced under noisy conditions, and human-machine interaction. The authors gratefully acknowledge financial support of the Irish Higher Education Authority, allocated to Fred Cummins for collaborative work with Media Lab Europe.

  9. An Acoustic Measure for Word Prominence in Spontaneous Speech

    PubMed Central

    Wang, Dagen; Narayanan, Shrikanth

    2010-01-01

    An algorithm for automatic speech prominence detection is reported in this paper. We describe a comparative analysis on various acoustic features for word prominence detection and report results using a spoken dialog corpus with manually assigned prominence labels. The focus is on features such as spectral intensity and speech rate that are directly extracted from speech based on a correlation-based approach without requiring explicit linguistic or phonetic knowledge. Additionally, various pitch-based measures are studied with respect to their discriminating ability for prominence detection. A parametric scheme for modeling pitch plateau is proposed and this feature alone is found to outperform the traditional local pitch statistics. Two sets of experiments are used to explore the usefulness of the acoustic score generated using these features. The first set focuses on a more traditional way of word prominence detection based on a manually-tagged corpus. A 76.8% classification accuracy was achieved on a corpus of role-playing spoken dialogs. Due to difficulties in manually tagging speech prominence into discrete levels (categories), the second set of experiments focuses on evaluating the score indirectly. Specifically, through experiments on the Switchboard corpus, it is shown that the proposed acoustic score can discriminate between content word and function words in a statistically significant way. The relation between speech prominence and content/function words is also explored. Since prominent words tend to be predominantly content words, and since content words can be automatically marked from text-derived part of speech (POS) information, it is shown that the proposed acoustic score can be indirectly cross-validated through POS information. PMID:20454538

  10. Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation

    NASA Astrophysics Data System (ADS)

    Xiao, Xiong; Zhao, Shengkui; Ha Nguyen, Duc Hoang; Zhong, Xionghu; Jones, Douglas L.; Chng, Eng Siong; Li, Haizhou

    2016-01-01

    This paper investigates deep neural networks (DNN) based on nonlinear feature mapping and statistical linear feature adaptation approaches for reducing reverberation in speech signals. In the nonlinear feature mapping approach, DNN is trained from parallel clean/distorted speech corpus to map reverberant and noisy speech coefficients (such as log magnitude spectrum) to the underlying clean speech coefficients. The constraint imposed by dynamic features (i.e., the time derivatives of the speech coefficients) are used to enhance the smoothness of predicted coefficient trajectories in two ways. One is to obtain the enhanced speech coefficients with a least square estimation from the coefficients and dynamic features predicted by DNN. The other is to incorporate the constraint of dynamic features directly into the DNN training process using a sequential cost function. In the linear feature adaptation approach, a sparse linear transform, called cross transform, is used to transform multiple frames of speech coefficients to a new feature space. The transform is estimated to maximize the likelihood of the transformed coefficients given a model of clean speech coefficients. Unlike the DNN approach, no parallel corpus is used and no assumption on distortion types is made. The two approaches are evaluated on the REVERB Challenge 2014 tasks. Both speech enhancement and automatic speech recognition (ASR) results show that the DNN-based mappings significantly reduce the reverberation in speech and improve both speech quality and ASR performance. For the speech enhancement task, the proposed dynamic feature constraint help to improve cepstral distance, frequency-weighted segmental signal-to-noise ratio (SNR), and log likelihood ratio metrics while moderately degrades the speech-to-reverberation modulation energy ratio. In addition, the cross transform feature adaptation improves the ASR performance significantly for clean-condition trained acoustic models.

  11. Acoustic Evidence for Phonologically Mismatched Speech Errors

    ERIC Educational Resources Information Center

    Gormley, Andrea

    2015-01-01

    Speech errors are generally said to accommodate to their new phonological context. This accommodation has been validated by several transcription studies. The transcription methodology is not the best choice for detecting errors at this level, however, as this type of error can be difficult to perceive. This paper presents an acoustic analysis of…

  12. Electrocorticographic representations of segmental features in continuous speech.

    PubMed

    Lotte, Fabien; Brumberg, Jonathan S; Brunner, Peter; Gunduz, Aysegul; Ritaccio, Anthony L; Guan, Cuntai; Schalk, Gerwin

    2015-01-01

    Acoustic speech output results from coordinated articulation of dozens of muscles, bones and cartilages of the vocal mechanism. While we commonly take the fluency and speed of our speech productions for granted, the neural mechanisms facilitating the requisite muscular control are not completely understood. Previous neuroimaging and electrophysiology studies of speech sensorimotor control has typically concentrated on speech sounds (i.e., phonemes, syllables and words) in isolation; sentence-length investigations have largely been used to inform coincident linguistic processing. In this study, we examined the neural representations of segmental features (place and manner of articulation, and voicing status) in the context of fluent, continuous speech production. We used recordings from the cortical surface [electrocorticography (ECoG)] to simultaneously evaluate the spatial topography and temporal dynamics of the neural correlates of speech articulation that may mediate the generation of hypothesized gestural or articulatory scores. We found that the representation of place of articulation involved broad networks of brain regions during all phases of speech production: preparation, execution and monitoring. In contrast, manner of articulation and voicing status were dominated by auditory cortical responses after speech had been initiated. These results provide a new insight into the articulatory and auditory processes underlying speech production in terms of their motor requirements and acoustic correlates. PMID:25759647

  13. Electrocorticographic representations of segmental features in continuous speech

    PubMed Central

    Lotte, Fabien; Brumberg, Jonathan S.; Brunner, Peter; Gunduz, Aysegul; Ritaccio, Anthony L.; Guan, Cuntai; Schalk, Gerwin

    2015-01-01

    Acoustic speech output results from coordinated articulation of dozens of muscles, bones and cartilages of the vocal mechanism. While we commonly take the fluency and speed of our speech productions for granted, the neural mechanisms facilitating the requisite muscular control are not completely understood. Previous neuroimaging and electrophysiology studies of speech sensorimotor control has typically concentrated on speech sounds (i.e., phonemes, syllables and words) in isolation; sentence-length investigations have largely been used to inform coincident linguistic processing. In this study, we examined the neural representations of segmental features (place and manner of articulation, and voicing status) in the context of fluent, continuous speech production. We used recordings from the cortical surface [electrocorticography (ECoG)] to simultaneously evaluate the spatial topography and temporal dynamics of the neural correlates of speech articulation that may mediate the generation of hypothesized gestural or articulatory scores. We found that the representation of place of articulation involved broad networks of brain regions during all phases of speech production: preparation, execution and monitoring. In contrast, manner of articulation and voicing status were dominated by auditory cortical responses after speech had been initiated. These results provide a new insight into the articulatory and auditory processes underlying speech production in terms of their motor requirements and acoustic correlates. PMID:25759647

  14. Perceptual centres in speech - an acoustic analysis

    NASA Astrophysics Data System (ADS)

    Scott, Sophie Kerttu

    Perceptual centres, or P-centres, represent the perceptual moments of occurrence of acoustic signals - the 'beat' of a sound. P-centres underlie the perception and production of rhythm in perceptually regular speech sequences. P-centres have been modelled both in speech and non speech (music) domains. The three aims of this thesis were toatest out current P-centre models to determine which best accounted for the experimental data bto identify a candidate parameter to map P-centres onto (a local approach) as opposed to the previous global models which rely upon the whole signal to determine the P-centre the final aim was to develop a model of P-centre location which could be applied to speech and non speech signals. The first aim was investigated by a series of experiments in which a) speech from different speakers was investigated to determine whether different models could account for variation between speakers b) whether rendering the amplitude time plot of a speech signal affects the P-centre of the signal c) whether increasing the amplitude at the offset of a speech signal alters P-centres in the production and perception of speech. The second aim was carried out by a) manipulating the rise time of different speech signals to determine whether the P-centre was affected, and whether the type of speech sound ramped affected the P-centre shift b) manipulating the rise time and decay time of a synthetic vowel to determine whether the onset alteration was had more affect on P-centre than the offset manipulation c) and whether the duration of a vowel affected the P-centre, if other attributes (amplitude, spectral contents) were held constant. The third aim - modelling P-centres - was based on these results. The Frequency dependent Amplitude Increase Model of P-centre location (FAIM) was developed using a modelling protocol, the APU GammaTone Filterbank and the speech from different speakers. The P-centres of the stimuli corpus were highly predicted by attributes of

  15. Clinical investigation of speech signal features among patients with schizophrenia

    PubMed Central

    ZHANG, Jing; PAN, Zhongde; GUI, Chao; CUI, Donghong

    2016-01-01

    Background A new area of interest in the search for biomarkers for schizophrenia is the study of the acoustic parameters of speech called 'speech signal features'. Several of these features have been shown to be related to emotional responsiveness, a characteristic that is notably restricted in patients with schizophrenia, particularly those with prominent negative symptoms. Aim Assess the relationship of selected acoustic parameters of speech to the severity of clinical symptoms in patients with chronic schizophrenia and compare these characteristics between patients and matched healthy controls. Methods Ten speech signal features-six prosody features, formant bandwidth and amplitude, and two spectral features-were assessed using 15-minute speech samples obtained by smartphone from 26 inpatients with chronic schizophrenia (at enrollment and 1 week later) and from 30 healthy controls (at enrollment only). Clinical symptoms of the patients were also assessed at baseline and 1 week later using the Positive and Negative Syndrome Scale, the Scale for the Assessment of Negative Symptoms, and the Clinical Global Impression-Schizophrenia scale. Results In the patient group the symptoms were stable over the 1-week interval and the 1-week test-retest reliability of the 10 speech features was good (intraclass correlation coefficients [ICC] ranging from 0.55 to 0.88). Comparison of the speech features between patients and controls found no significant differences in the six prosody features or in the formant bandwidth and amplitude features, but the two spectral features were different: the Mel-frequency cepstral coefficient (MFCC) scores were significantly lower in the patient group than in the control group, and the linear prediction coding (LPC) scores were significantly higher in the patient group than in the control group. Within the patient group, 10 of the 170 associations between the 10 speech features considered and the 17 clinical parameters considered were

  16. Emotion Identification Using Extremely Low Frequency Components of Speech Feature Contours

    PubMed Central

    Lin, Chang-Hong; Liao, Wei-Kai; Hsieh, Wen-Chi; Liao, Wei-Jiun

    2014-01-01

    The investigations of emotional speech identification can be divided into two main parts, features and classifiers. In this paper, how to extract an effective speech feature set for the emotional speech identification is addressed. In our speech feature set, we use not only statistical analysis of frame-based acoustical features, but also the approximated speech feature contours, which are obtained by extracting extremely low frequency components to speech feature contours. Furthermore, principal component analysis (PCA) is applied to the approximated speech feature contours so that an efficient representation of approximated contours can be derived. The proposed speech feature set is fed into support vector machines (SVMs) to perform multiclass emotion identification. The experimental results demonstrate the performance of the proposed system with 82.26% identification rate. PMID:24982991

  17. Acoustic characterization of developmental speech disorders

    NASA Astrophysics Data System (ADS)

    Bunnell, H. Timothy; Polikoff, James; McNicholas, Jane; Walter, Rhonda; Winn, Matthew

    2001-05-01

    A novel approach to classifying children with developmental speech delays (DSD) involving /r/ was developed. The approach first derives an acoustic classification of /r/ tokens based on their forced Viterbi alignment to a five-state hidden Markov model (HMM) of normally articulated /r/. Children with DSD are then classified in terms of the proportion of their /r/ productions that fall into each broad acoustic class. This approach was evaluated using 953 examples of /r/ as produced by 18 DSD children and an approximately equal number of /r/ tokens produced by a much larger number of normally articulating children. The acoustic classification identified three broad categories of /r/ that differed substantially in how they aligned to the normal speech /r/ HMM. Additionally, these categories tended to partition tokens uttered by DSD children from those uttered by normally articulating children. Similarities among the DSD children and average normal child measured in terms of the proportion of their /r/ productions that fell into each of the three broad acoustic categories were used to perform a hierarchical clustering. This clustering revealed groupings of DSD children who tended to approach /r/ production in one of several acoustically distinct manners.

  18. Acoustic analysis of speech under stress.

    PubMed

    Sondhi, Savita; Khan, Munna; Vijay, Ritu; Salhan, Ashok K; Chouhan, Satish

    2015-01-01

    When a person is emotionally charged, stress could be discerned in his voice. This paper presents a simplified and a non-invasive approach to detect psycho-physiological stress by monitoring the acoustic modifications during a stressful conversation. Voice database consists of audio clips from eight different popular FM broadcasts wherein the host of the show vexes the subjects who are otherwise unaware of the charade. The audio clips are obtained from real-life stressful conversations (no simulated emotions). Analysis is done using PRAAT software to evaluate mean fundamental frequency (F0) and formant frequencies (F1, F2, F3, F4) both in neutral and stressed state. Results suggest that F0 increases with stress; however, formant frequency decreases with stress. Comparison of Fourier and chirp spectra of short vowel segment shows that for relaxed speech, the two spectra are similar; however, for stressed speech, they differ in the high frequency range due to increased pitch modulation. PMID:26558301

  19. System And Method For Characterizing Voiced Excitations Of Speech And Acoustic Signals, Removing Acoustic Noise From Speech, And Synthesizi

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2006-04-25

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  20. Acoustic assessment of erygmophonic speech of Moroccan laryngectomized patients

    PubMed Central

    Ouattassi, Naouar; Benmansour, Najib; Ridal, Mohammed; Zaki, Zouheir; Bendahhou, Karima; Nejjari, Chakib; Cherkaoui, Abdeljabbar; El Alami, Mohammed Nouredine El Amine

    2015-01-01

    Introduction Acoustic evaluation of alaryngeal voices is among the most prominent issues in speech analysis field. In fact, many methods have been developed to date to substitute the classic perceptual evaluation. The Aim of this study is to present our experience in erygmophonic speech objective assessment and to discuss the most widely used methods of acoustic speech appraisal. through a prospective case-control study we have measured acoustic parameters of speech quality during one year of erygmophonic rehabilitation therapy of Moroccan laryngectomized patients. Methods We have assessed acoustic parameters of erygmophonic speech samples of eleven laryngectomized patients through the speech rehabilitation therapy. Acoustic parameters were obtained by perturbation analysis method and linear predictive coding algorithms also through the broadband spectrogram. Results Using perturbation analysis methods, we have found erygmophonic voice to be significantly poorer than normal speech and it exhibits higher formant frequency values. However, erygmophonic voice shows also higher and extremely variable Error values that were greater than the acceptable level. And thus, live a doubt on the reliability of those analytic methods results. Conclusion Acoustic parameters for objective evaluation of alaryngeal voices should allow a reliable representation of the perceptual evaluation of the quality of speech. This requirement has not been fulfilled by the common methods used so far. Therefore, acoustical assessment of erygmophonic speech needs more investigations. PMID:26587121

  1. Acoustic Study of Acted Emotions in Speech

    NASA Astrophysics Data System (ADS)

    Wang, Rong

    An extensive set of carefully recorded utterances provided a speech database for investigating acoustic correlates among eight emotional states. Four professional actors and four professional actresses simulated the emotional states of joy, conversation, nervousness, anger, sadness, hate, fear, and depression. The values of 14 acoustic parameters were extracted from analyses of the simulated portrayals. Normalization of the parameters was made to reduce the talker-dependence. Correlates of emotion were investigated by means of principal components analysis. Sadness and depression were found to be "ambiguous" with respect to each other, but "unique" with respect to joy and anger in the principal components space. Joy, conversation, nervousness, anger, hate, and fear did not separate well in the space and so exhibited ambiguity with respect to one another. The different talkers expressed joy, anger, sadness, and depression more consistently than the other four emotions. The analysis results were compared with the results of a subjective study using the same speech database and considerable consistency between the two was found.

  2. Acoustic differences among casual, conversational, and read speech

    NASA Astrophysics Data System (ADS)

    Pinnow, DeAnna

    Speech is a complex behavior that allows speakers to use many variations to satisfy the demands connected with multiple speaking environments. Speech research typically obtains speech samples in a controlled laboratory setting using read material, yet anecdotal observations of such speech, particularly from talkers with a speech and language impairment, have identified a "performance" effect in the produced speech which masks the characteristics of impaired speech outside of the lab (Goberman, Recker, & Parveen, 2010). The aim of the current study was to investigate acoustic differences among laboratory read, laboratory conversational, and casual speech through well-defined speech tasks in the laboratory and in talkers' natural environments. Eleven healthy research participants performed lab recording tasks (19 read sentences and a dialogue about their life) and collected natural-environment recordings of themselves over 3-day periods using portable recorders. Segments were analyzed for articulatory, voice, and prosodic acoustic characteristics using computer software and hand counting. The current study results indicate that lab-read speech was significantly different from casual speech: greater articulation range, improved voice quality measures, lower speech rate, and lower mean pitch. One implication of the results is that different laboratory techniques may be beneficial in obtaining speech samples that are more like casual speech, thus making it easier to correctly analyze abnormal speech characteristics with fewer errors.

  3. Acoustics in Halls for Speech and Music

    NASA Astrophysics Data System (ADS)

    Gade, Anders C.

    This chapter deals specifically with concepts, tools, and architectural variables of importance when designing auditoria for speech and music. The focus will be on cultivating the useful components of the sound in the room rather than on avoiding noise from outside or from installations, which is dealt with in Chap. 11. The chapter starts by presenting the subjective aspects of the room acoustic experience according to consensus at the time of writing. Then follows a description of their objective counterparts, the objective room acoustic parameters, among which the classical reverberation time measure is only one of many, but still of fundamental value. After explanations on how these parameters can be measured and predicted during the design phase, the remainder of the chapter deals with how the acoustic properties can be controlled by the architectural design of auditoria. This is done by presenting the influence of individual design elements as well as brief descriptions of halls designed for specific purposes, such as drama, opera, and symphonic concerts. Finally, some important aspects of loudspeaker installations in auditoria are briefly touched upon.

  4. Improving Robustness of Deep Neural Network Acoustic Models via Speech Separation and Joint Adaptive Training

    PubMed Central

    Narayanan, Arun; Wang, DeLiang

    2015-01-01

    Although deep neural network (DNN) acoustic models are known to be inherently noise robust, especially with matched training and testing data, the use of speech separation as a frontend and for deriving alternative feature representations has been shown to improve performance in challenging environments. We first present a supervised speech separation system that significantly improves automatic speech recognition (ASR) performance in realistic noise conditions. The system performs separation via ratio time-frequency masking; the ideal ratio mask (IRM) is estimated using DNNs. We then propose a framework that unifies separation and acoustic modeling via joint adaptive training. Since the modules for acoustic modeling and speech separation are implemented using DNNs, unification is done by introducing additional hidden layers with fixed weights and appropriate network architecture. On the CHiME-2 medium-large vocabulary ASR task, and with log mel spectral features as input to the acoustic model, an independently trained ratio masking frontend improves word error rates by 10.9% (relative) compared to the noisy baseline. In comparison, the jointly trained system improves performance by 14.4%. We also experiment with alternative feature representations to augment the standard log mel features, like the noise and speech estimates obtained from the separation module, and the standard feature set used for IRM estimation. Our best system obtains a word error rate of 15.4% (absolute), an improvement of 4.6 percentage points over the next best result on this corpus. PMID:26973851

  5. Age-Related Changes in Acoustic Characteristics of Adult Speech

    ERIC Educational Resources Information Center

    Torre, Peter, III; Barlow, Jessica A.

    2009-01-01

    This paper addresses effects of age and sex on certain acoustic properties of speech, given conflicting findings on such effects reported in prior research. The speech of 27 younger adults (15 women, 12 men; mean age 25.5 years) and 59 older adults (32 women, 27 men; mean age 75.2 years) was evaluated for identification of differences for sex and…

  6. Acoustic Markers of Prosodic Boundaries in Spanish Spontaneous Alaryngeal Speech

    ERIC Educational Resources Information Center

    Cuenca, M. H.; Barrio, M. M.

    2010-01-01

    Prosodic information aids segmentation of the continuous speech signal and thereby facilitates auditory speech processing. Durational and pitch variations are prosodic cues especially necessary to convey prosodic boundaries, but alaryngeal speakers have inconsistent control over acoustic parameters such as F0 and duration, being as a result noisy…

  7. Prediction of acoustic feature parameters using myoelectric signals.

    PubMed

    Lee, Ki-Seung

    2010-07-01

    It is well-known that a clear relationship exists between human voices and myoelectric signals (MESs) from the area of the speaker's mouth. In this study, we utilized this information to implement a speech synthesis scheme in which MES alone was used to predict the parameters characterizing the vocal-tract transfer function of specific speech signals. Several feature parameters derived from MES were investigated to find the optimal feature for maximization of the mutual information between the acoustic and the MES features. After the optimal feature was determined, an estimation rule for the acoustic parameters was proposed, based on a minimum mean square error (MMSE) criterion. In a preliminary study, 60 isolated words were used for both objective and subjective evaluations. The results showed that the average Euclidean distance between the original and predicted acoustic parameters was reduced by about 30% compared with the average Euclidean distance of the original parameters. The intelligibility of the synthesized speech signals using the predicted features was also evaluated. A word-level identification ratio of 65.5% and a syllable-level identification ratio of 73% were obtained through a listening test. PMID:20172775

  8. Study of acoustic correlates associate with emotional speech

    NASA Astrophysics Data System (ADS)

    Yildirim, Serdar; Lee, Sungbok; Lee, Chul Min; Bulut, Murtaza; Busso, Carlos; Kazemzadeh, Ebrahim; Narayanan, Shrikanth

    2004-10-01

    This study investigates the acoustic characteristics of four different emotions expressed in speech. The aim is to obtain detailed acoustic knowledge on how a speech signal is modulated by changes from neutral to a certain emotional state. Such knowledge is necessary for automatic emotion recognition and classification and emotional speech synthesis. Speech data obtained from two semi-professional actresses are analyzed and compared. Each subject produces 211 sentences with four different emotions; neutral, sad, angry, happy. We analyze changes in temporal and acoustic parameters such as magnitude and variability of segmental duration, fundamental frequency and the first three formant frequencies as a function of emotion. Acoustic differences among the emotions are also explored with mutual information computation, multidimensional scaling and acoustic likelihood comparison with normal speech. Results indicate that speech associated with anger and happiness is characterized by longer duration, shorter interword silence, higher pitch and rms energy with wider ranges. Sadness is distinguished from other emotions by lower rms energy and longer interword silence. Interestingly, the difference in formant pattern between [happiness/anger] and [neutral/sadness] are better reflected in back vowels such as /a/(/father/) than in front vowels. Detailed results on intra- and interspeaker variability will be reported.

  9. Speech recognition: Acoustic phonetic and lexical knowledge representation

    NASA Astrophysics Data System (ADS)

    Zue, V. W.

    1983-02-01

    The purpose of this program is to develop a speech data base facility under which the acoustic characteristics of speech sounds in various contexts can be studied conveniently; investigate the phonological properties of a large lexicon of, say 10,000 words, and determine to what extent the phontactic constraints can be utilized in speech recognition; study the acoustic cues that are used to mark work boundaries; develop a test bed in the form of a large-vocabulary, IWR system to study the interactions of acoustic, phonetic and lexical knowledge; and develop a limited continuous speech recognition system with the goal of recognizing any English word from its spelling in order to assess the interactions of higher-level knowledge sources.

  10. Speech recognition: Acoustic phonetic and lexical knowledge representation

    NASA Astrophysics Data System (ADS)

    Zue, V. W.

    1984-02-01

    The purpose of this program is to develop a speech data base facility under which the acoustic characteristics of speech sounds in various contexts can be studied conveniently; investigate the phonological properties of a large lexicon of, say 10,000 words and determine to what extent the phonotactic constraints can be utilized in speech recognition; study the acoustic cues that are used to mark work boundaries; develop a test bed in the form of a large-vocabulary, IWR system to study the interactions of acoustic, phonetic and lexical knowledge; and develop a limited continuous speech recognition system with the goal of recognizing any English word from its spelling in order to assess the interactions of higher-level knowledge sources.

  11. Researches of the Electrotechnical Laboratory. No. 955: Speech recognition by description of acoustic characteristic variations

    NASA Astrophysics Data System (ADS)

    Hayamizu, Satoru

    1993-09-01

    A new speech recognition technique is proposed. This technique systematically describes acoustic characteristic variations using a large scale speech database, thereby, obtaining high recognition accuracy. Rules are extracted to represent knowledge concerning acoustic characteristic variations by observing the actual speech database. A general framework based on maps of the sets of variation factors to the acoustic feature spaces is proposed. A single recognition model is not used for each element of descriptive units regardless of the states of the variation factors. Large-scaled and systematic different recognition models are used for different states. A technique to structurize the representation of acoustic characteristic variations by clustering recognition models depending on variation factors is proposed. To investigate acoustic characteristic variations for phonetic contexts efficiently, word sets for reading texts of speech database are selected so that the maximum number of three phoneme sequences are covered in small number of words as possible. A selection algorithm, in which the first criterion is to maximize the number of different three phoneme sequences in the word set and the second criterion is to maximize the entropy of the three phonemes, is proposed. Read speed data of the word sets are collected and labelled as acoustic-phonetic segments. Experiments of speaker-independent word recognition using this speech database were conducted to show the description effectiveness of the acoustic characteristic variations using networks of acoustic-phonetic segments. The experiment shows the recognition errors are reduced. Basic framework for estimating the acoustic characteristics in unknown phonetic contexts using decision trees is proposed.

  12. Optimizing acoustical conditions for speech intelligibility in classrooms

    NASA Astrophysics Data System (ADS)

    Yang, Wonyoung

    High speech intelligibility is imperative in classrooms where verbal communication is critical. However, the optimal acoustical conditions to achieve a high degree of speech intelligibility have previously been investigated with inconsistent results, and practical room-acoustical solutions to optimize the acoustical conditions for speech intelligibility have not been developed. This experimental study validated auralization for speech-intelligibility testing, investigated the optimal reverberation for speech intelligibility for both normal and hearing-impaired listeners using more realistic room-acoustical models, and proposed an optimal sound-control design for speech intelligibility based on the findings. The auralization technique was used to perform subjective speech-intelligibility tests. The validation study, comparing auralization results with those of real classroom speech-intelligibility tests, found that if the room to be auralized is not very absorptive or noisy, speech-intelligibility tests using auralization are valid. The speech-intelligibility tests were done in two different auralized sound fields---approximately diffuse and non-diffuse---using the Modified Rhyme Test and both normal and hearing-impaired listeners. A hybrid room-acoustical prediction program was used throughout the work, and it and a 1/8 scale-model classroom were used to evaluate the effects of ceiling barriers and reflectors. For both subject groups, in approximately diffuse sound fields, when the speech source was closer to the listener than the noise source, the optimal reverberation time was zero. When the noise source was closer to the listener than the speech source, the optimal reverberation time was 0.4 s (with another peak at 0.0 s) with relative output power levels of the speech and noise sources SNS = 5 dB, and 0.8 s with SNS = 0 dB. In non-diffuse sound fields, when the noise source was between the speaker and the listener, the optimal reverberation time was 0.6 s with

  13. Preserved Acoustic Hearing in Cochlear Implantation Improves Speech Perception

    PubMed Central

    Sheffield, Sterling W.; Jahn, Kelly; Gifford, René H.

    2015-01-01

    Background With improved surgical techniques and electrode design, an increasing number of cochlear implant (CI) recipients have preserved acoustic hearing in the implanted ear, thereby resulting in bilateral acoustic hearing. There are currently no guidelines, however, for clinicians with respect to audio-metric criteria and the recommendation of amplification in the implanted ear. The acoustic bandwidth necessary to obtain speech perception benefit from acoustic hearing in the implanted ear is unknown. Additionally, it is important to determine if, and in which listening environments, acoustic hearing in both ears provides more benefit than hearing in just one ear, even with limited residual hearing. Purpose The purposes of this study were to (1) determine whether acoustic hearing in an ear with a CI provides as much speech perception benefit as an equivalent bandwidth of acoustic hearing in the non-implanted ear, and (2) determine whether acoustic hearing in both ears provides more benefit than hearing in just one ear. Research Design A repeated-measures, within-participant design was used to compare performance across listening conditions. Study Sample Seven adults with CIs and bilateral residual acoustic hearing (hearing preservation) were recruited for the study. Data Collection and Analysis Consonant-nucleus-consonant word recognition was tested in four conditions: CI alone, CI + acoustic hearing in the nonimplanted ear, CI + acoustic hearing in the implanted ear, and CI + bilateral acoustic hearing. A series of low-pass filters were used to examine the effects of acoustic bandwidth through an insert earphone with amplification. Benefit was defined as the difference among conditions. The benefit of bilateral acoustic hearing was tested in both diffuse and single-source background noise. Results were analyzed using repeated-measures analysis of variance. Results Similar benefit was obtained for equivalent acoustic frequency bandwidth in either ear. Acoustic

  14. Evaluating a topographical mapping from speech acoustics to tongue positions

    SciTech Connect

    Hogden, J.; Heard, M.

    1995-05-01

    The {ital continuity} {ital mapping} algorithm---a procedure for learning to recover the relative positions of the articulators from speech signals---is evaluated using human speech data. The advantage of continuity mapping is that it is an unsupervised algorithm; that is, it can potentially be trained to make a mapping from speech acoustics to speech articulation without articulator measurements. The procedure starts by vector quantizing short windows of a speech signal so that each window is represented (encoded) by a single number. Next, multidimensional scaling is used to map quantization codes that were temporally close in the encoded speech to nearby points in a {ital continuity} {ital map}. Since speech sounds produced sufficiently close together in time must have been produced by similar articulator configurations, and speech sounds produced close together in time are close to each other in the continuity map, sounds produced by similar articulator positions should be mapped to similar positions in the continuity map. The data set used for evaluating the continuity mapping algorithm is comprised of simultaneously collected articulator and acoustic measurements made using an electromagnetic midsagittal articulometer on a human subject. Comparisons between measured articulator positions and those recovered using continuity mapping will be presented.

  15. Acoustic Analysis of Speech of Cochlear Implantees and Its Implications

    PubMed Central

    Patadia, Rajesh; Govale, Prajakta; Rangasayee, R.; Kirtane, Milind

    2012-01-01

    Objectives Cochlear implantees have improved speech production skills compared with those using hearing aids, as reflected in their acoustic measures. When compared to normal hearing controls, implanted children had fronted vowel space and their /s/ and /∫/ noise frequencies overlapped. Acoustic analysis of speech provides an objective index of perceived differences in speech production which can be precursory in planning therapy. The objective of this study was to compare acoustic characteristics of speech in cochlear implantees with those of normal hearing age matched peers to understand implications. Methods Group 1 consisted of 15 children with prelingual bilateral severe-profound hearing loss (age, 5-11 years; implanted between 4-10 years). Prior to an implant behind the ear, hearing aids were used; prior & post implantation subjects received at least 1 year of aural intervention. Group 2 consisted of 15 normal hearing age matched peers. Sustained productions of vowels and words with selected consonants were recorded. Using Praat software for acoustic analysis, digitized speech tokens were measured for F1, F2, and F3 of vowels; centre frequency (Hz) and energy concentration (dB) in burst; voice onset time (VOT in ms) for stops; centre frequency (Hz) of noise in /s/; rise time (ms) for affricates. A t-test was used to find significant differences between groups. Results Significant differences were found in VOT for /b/, F1 and F2 of /e/, and F3 of /u/. No significant differences were found for centre frequency of burst, energy concentration for stops, centre frequency of noise in /s/, or rise time for affricates. These findings suggest that auditory feedback provided by cochlear implants enable subjects to monitor production of speech sounds. Conclusion Acoustic analysis of speech is an essential method for discerning characteristics which have or have not been improved by cochlear implantation and thus for planning intervention. PMID:22701768

  16. Speech recognition: Acoustic-phonetic knowledge acquisition and representation

    NASA Astrophysics Data System (ADS)

    Zue, Victor W.

    1987-09-01

    A long-term research goal is the development and implementation of speaker-independent continuous speech recognition systems. It is believed that the proper utilization of speech-specific knowledge is essential for such advanced systems. Research is thus directed toward the acquisition of acoustic-phonetic and lexical knowledge, and the application of this knowledge to speech recognition algorithms. Investigation into the contextual variations of speech sounds has continued, emphasizing the role of the syllable in these variations. Analysis revealed that the acoustic realization of a stop depends greatly on its position within a syllable. In order to represent and utilize this information in speech recognition, a hierarchical syllable description has been adopted that enables us to specify the constraints in terms of an immediate constituent grammar. We will continue to quantify the effect of context on the acoustic realization of phonemes using larger constituent units such as syllables. In addition, a grammar will be developed to describe the relationship between phonemes and acoustic segments, and a parser that will make use of this grammar for phonetic recognition and lexical access.

  17. Investigation of the optimum acoustical conditions for speech using auralization

    NASA Astrophysics Data System (ADS)

    Yang, Wonyoung; Hodgson, Murray

    2001-05-01

    Speech intelligibility is mainly affected by reverberation and by signal-to-noise level difference, the difference between the speech-signal and background-noise levels at a receiver. An important question for the design of rooms for speech (e.g., classrooms) is, what are the optimal values of these factors? This question has been studied experimentally and theoretically. Experimental studies found zero optimal reverberation time, but theoretical predictions found nonzero reverberation times. These contradictory results are partly caused by the different ways of accounting for background noise. Background noise sources and their locations inside the room are the most detrimental factors in speech intelligibility. However, noise levels also interact with reverberation in rooms. In this project, two major room-acoustical factors for speech intelligibility were controlled using speech and noise sources of known relative output levels located in a virtual room with known reverberation. Speech intelligibility test signals were played in the virtual room and auralized for listeners. The Modified Rhyme Test (MRT) and babble noise were used to measure subjective speech intelligibility quality. Optimal reverberation times, and the optimal values of other speech intelligibility metrics, for normal-hearing people and for hard-of-hearing people, were identified and compared.

  18. Acoustic Speech Analysis Of Wayang Golek Puppeteer

    NASA Astrophysics Data System (ADS)

    Hakim, Faisal Abdul; Mandasari, Miranti Indar; Sarwono, Joko

    2010-12-01

    Active disguising speech is one problem to be taken into account in forensic speaker verification or identification processes. The verification processes are usually carried out by comparison between unknown samples and known samples. Active disguising can be occurred on both samples. To simulate the condition of speech disguising, voices of Wayang Golek Puppeteer were used. It is assumed that wayang golek puppeteer is a master of disguise. He can manipulate his voice into many different types of character's voices. This paper discusses the speech characteristics of 2 puppeteers. Comparison was made between the voices of puppeteer's habitual voice with his manipulated voice.

  19. An overlapping-feature-based phonological model incorporating linguistic constraints: applications to speech recognition.

    PubMed

    Sun, Jiping; Deng, Li

    2002-02-01

    Modeling phonological units of speech is a critical issue in speech recognition. In this paper, our recent development of an overlapping-feature-based phonological model that represents long-span contextual dependency in speech acoustics is reported. In this model, high-level linguistic constraints are incorporated in automatic construction of the patterns of feature-overlapping and of the hidden Markov model (HMM) states induced by such patterns. The main linguistic information explored includes word and phrase boundaries, morpheme, syllable, syllable constituent categories, and word stress. A consistent computational framework developed for the construction of the feature-based model and the major components of the model are described. Experimental results on the use of the overlapping-feature model in an HMM-based system for speech recognition show improvements over the conventional triphone-based phonological model. PMID:11863165

  20. An overlapping-feature-based phonological model incorporating linguistic constraints: Applications to speech recognition

    NASA Astrophysics Data System (ADS)

    Sun, Jiping; Deng, Li

    2002-02-01

    Modeling phonological units of speech is a critical issue in speech recognition. In this paper, our recent development of an overlapping-feature-based phonological model that represents long-span contextual dependency in speech acoustics is reported. In this model, high-level linguistic constraints are incorporated in automatic construction of the patterns of feature-overlapping and of the hidden Markov model (HMM) states induced by such patterns. The main linguistic information explored includes word and phrase boundaries, morpheme, syllable, syllable constituent categories, and word stress. A consistent computational framework developed for the construction of the feature-based model and the major components of the model are described. Experimental results on the use of the overlapping-feature model in an HMM-based system for speech recognition show improvements over the conventional triphone-based phonological model.

  1. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2002-01-01

    Low power EM waves are used to detect motions of vocal tract tissues of the human speech system before, during, and after voiced speech. A voiced excitation function is derived. The excitation function provides speech production information to enhance speech characterization and to enable noise removal from human speech.

  2. Distributed neural representations of phonological features during speech perception.

    PubMed

    Arsenault, Jessica S; Buchsbaum, Bradley R

    2015-01-14

    A fundamental goal of the human auditory system is to map complex acoustic signals onto stable internal representations of the basic sound patterns of speech. Phonemes and the distinctive features that they comprise constitute the basic building blocks from which higher-level linguistic representations, such as words and sentences, are formed. Although the neural structures underlying phonemic representations have been well studied, there is considerable debate regarding frontal-motor cortical contributions to speech as well as the extent of lateralization of phonological representations within auditory cortex. Here we used functional magnetic resonance imaging (fMRI) and multivoxel pattern analysis to investigate the distributed patterns of activation that are associated with the categorical and perceptual similarity structure of 16 consonant exemplars in the English language used in Miller and Nicely's (1955) classic study of acoustic confusability. Participants performed an incidental task while listening to phonemes in the MRI scanner. Neural activity in bilateral anterior superior temporal gyrus and supratemporal plane was correlated with the first two components derived from a multidimensional scaling analysis of a behaviorally derived confusability matrix. We further showed that neural representations corresponding to the categorical features of voicing, manner of articulation, and place of articulation were widely distributed throughout bilateral primary, secondary, and association areas of the superior temporal cortex, but not motor cortex. Although classification of phonological features was generally bilateral, we found that multivariate pattern information was moderately stronger in the left compared with the right hemisphere for place but not for voicing or manner of articulation. PMID:25589757

  3. Speech Intelligibility Advantages using an Acoustic Beamformer Display

    NASA Technical Reports Server (NTRS)

    Begault, Durand R.; Sunder, Kaushik; Godfroy, Martine; Otto, Peter

    2015-01-01

    A speech intelligibility test conforming to the Modified Rhyme Test of ANSI S3.2 "Method for Measuring the Intelligibility of Speech Over Communication Systems" was conducted using a prototype 12-channel acoustic beamformer system. The target speech material (signal) was identified against speech babble (noise), with calculated signal-noise ratios of 0, 5 and 10 dB. The signal was delivered at a fixed beam orientation of 135 deg (re 90 deg as the frontal direction of the array) and the noise at 135 deg (co-located) and 0 deg (separated). A significant improvement in intelligibility from 57% to 73% was found for spatial separation for the same signal-noise ratio (0 dB). Significant effects for improved intelligibility due to spatial separation were also found for higher signal-noise ratios (5 and 10 dB).

  4. Unvoiced Speech Recognition Using Tissue-Conductive Acoustic Sensor

    NASA Astrophysics Data System (ADS)

    Heracleous, Panikos; Kaino, Tomomi; Saruwatari, Hiroshi; Shikano, Kiyohiro

    2006-12-01

    We present the use of stethoscope and silicon NAM (nonaudible murmur) microphones in automatic speech recognition. NAM microphones are special acoustic sensors, which are attached behind the talker's ear and can capture not only normal (audible) speech, but also very quietly uttered speech (nonaudible murmur). As a result, NAM microphones can be applied in automatic speech recognition systems when privacy is desired in human-machine communication. Moreover, NAM microphones show robustness against noise and they might be used in special systems (speech recognition, speech transform, etc.) for sound-impaired people. Using adaptation techniques and a small amount of training data, we achieved for a 20 k dictation task a[InlineEquation not available: see fulltext.] word accuracy for nonaudible murmur recognition in a clean environment. In this paper, we also investigate nonaudible murmur recognition in noisy environments and the effect of the Lombard reflex on nonaudible murmur recognition. We also propose three methods to integrate audible speech and nonaudible murmur recognition using a stethoscope NAM microphone with very promising results.

  5. Static and Dynamic Features for Improved HMM based Visual Speech Recognition

    NASA Astrophysics Data System (ADS)

    Rajavel, R.; Sathidevi, P. S.

    Visual speech recognition refers to the identification of utterances through the movements of lips, tongue, teeth, and other facial muscles of the speaker without using the acoustic signal. This work shows the relative benefits of both static and dynamic visual speech features for improved visual speech recognition. Two approaches for visual feature extraction have been considered: (1) an image transform based static feature approach in which Discrete Cosine Transform (DCT) is applied to each video frame and 6×6 triangle region coefficients are considered as features. Principal Component Analysis (PCA) is applied over all 60 features corresponding to the video frame to reduce the redundancy; the resultant 21 coefficients are taken as the static visual features. (2) Motion segmentation based dynamic feature approach in which the facial movements are segmented from the video file using motion history images (MHI). DCT is applied to the MHI and triangle region coefficients are taken as the dynamic visual features. Two types of experiments were done one with concatenated features and another with dimension reduced feature by using PCA to identify the utterances. The left-right continuous HMMs are used as visual speech classifier to classify nine MPEG-4 standard viseme consonants. The experimental result shows that the concatenated as well as dimension reduced features improve te visual speech recognition with a high accuracy of 92.45% and 92.15% respectively.

  6. Speech recognition: Acoustic-phonetic knowledge acquisition and representation

    NASA Astrophysics Data System (ADS)

    Zue, Victor W.

    1988-09-01

    The long-term research goal is to develop and implement speaker-independent continuous speech recognition systems. It is believed that the proper utilization of speech-specific knowledge is essential for such advanced systems. This research is thus directed toward the acquisition, quantification, and representation, of acoustic-phonetic and lexical knowledge, and the application of this knowledge to speech recognition algorithms. In addition, we are exploring new speech recognition alternatives based on artificial intelligence and connectionist techniques. We developed a statistical model for predicting the acoustic realization of stop consonants in various positions in the syllable template. A unification-based grammatical formalism was developed for incorporating this model into the lexical access algorithm. We provided an information-theoretic justification for the hierarchical structure of the syllable template. We analyzed segmented duration for vowels and fricatives in continuous speech. Based on contextual information, we developed durational models for vowels and fricatives that account for over 70 percent of the variance, using data from multiple, unknown speakers. We rigorously evaluated the ability of human spectrogram readers to identify stop consonants spoken by many talkers and in a variety of phonetic contexts. Incorporating the declarative knowledge used by the readers, we developed a knowledge-based system for stop identification. We achieved comparable system performance to that to the readers.

  7. Predicting the intelligibility of deaf children's speech from acoustic measures

    NASA Astrophysics Data System (ADS)

    Uchanski, Rosalie M.; Geers, Ann E.; Brenner, Christine M.; Tobey, Emily A.

    2001-05-01

    A weighted combination of speech-acoustic measures may provide an objective assessment of speech intelligibility in deaf children that could be used to evaluate the benefits of sensory aids and rehabilitation programs. This investigation compared the accuracy of two different approaches, multiple linear regression and a simple neural net. These two methods were applied to identical sets of acoustic measures, including both segmental (e.g., voice-onset times of plosives, spectral moments of fricatives, second formant frequencies of vowels) and suprasegmental measures (e.g., sentence duration, number and frequency of intersentence pauses). These independent variables were obtained from digitized recordings of deaf children's imitations of 11 simple sentences. The dependent measure was the percentage of spoken words from the 36 McGarr Sentences understood by groups of naive listeners. The two predictive methods were trained on speech measures obtained from 123 out of 164 8- and 9-year-old deaf children who used cochlear implants. Then, predictions were obtained using speech measures from the remaining 41 children. Preliminary results indicate that multiple linear regression is a better predictor of intelligibility than the neural net, accounting for 79% as opposed to 65% of the variance in the data. [Work supported by NIH.

  8. Prolonged Speech and Modification of Stuttering: Perceptual, Acoustic, and Electroglottographic Data.

    ERIC Educational Resources Information Center

    Packman, Ann; And Others

    1994-01-01

    This study investigated changes in the speech patterns of young adult male subjects when stuttering was modified by deliberately prolonging speech. Three subjects showed clinically significant stuttering reductions when using prolonged speech to reduce their stuttering. Resulting speech was perceptually stutter free. Acoustic and…

  9. Tongue-Palate Contact Pressure, Oral Air Pressure, and Acoustics of Clear Speech

    ERIC Educational Resources Information Center

    Searl, Jeff; Evitts, Paul M.

    2013-01-01

    Purpose: The authors compared articulatory contact pressure (ACP), oral air pressure (Po), and speech acoustics for conversational versus clear speech. They also assessed the relationship of these measures to listener perception. Method: Twelve adults with normal speech produced monosyllables in a phrase using conversational and clear speech.…

  10. Compensating Acoustic Mismatch Using Class-Based Histogram Equalization for Robust Speech Recognition

    NASA Astrophysics Data System (ADS)

    Suh, Youngjoo; Kim, Sungtak; Kim, Hoirin

    2007-12-01

    A new class-based histogram equalization method is proposed for robust speech recognition. The proposed method aims at not only compensating for an acoustic mismatch between training and test environments but also reducing the two fundamental limitations of the conventional histogram equalization method, the discrepancy between the phonetic distributions of training and test speech data, and the nonmonotonic transformation caused by the acoustic mismatch. The algorithm employs multiple class-specific reference and test cumulative distribution functions, classifies noisy test features into their corresponding classes, and equalizes the features by using their corresponding class reference and test distributions. The minimum mean-square error log-spectral amplitude (MMSE-LSA)-based speech enhancement is added just prior to the baseline feature extraction to reduce the corruption by additive noise. The experiments on the Aurora2 database proved the effectiveness of the proposed method by reducing relative errors by[InlineEquation not available: see fulltext.] over the mel-cepstral-based features and by[InlineEquation not available: see fulltext.] over the conventional histogram equalization method, respectively.

  11. Acoustic and Perceptual Consequences of Clear and Loud Speech

    PubMed Central

    Tjaden, Kris; Richards, Emily; Kuo, Christina; Wilding, Greg; Sussman, Joan

    2014-01-01

    Objective Several issues concerning F2 slope in dysarthria were addressed by obtaining speech acoustic measures and judgments of intelligibility for sentences produced in Habitual, Clear and Loud conditions by speakers with Parkinson's disease (PD) and healthy controls. Patients and Methods Acoustic measures of average and maximum F2 slope for diphthongs, duration and intensity were obtained. Listeners judged intelligibility using a visual analog scale. Differences in measures among groups and conditions as well as relationships among measures were examined. Results Average and maximum F2 slope metrics were strongly correlated, but only average F2 slope consistently differed among groups and conditions, with shallower slopes for the PD group and steeper slopes for Clear speech versus Habitual and Loud. Clear and Loud speech were also characterized by lengthened durations, increased intensity and improved intelligibility versus Habitual. F2 slope and intensity were unrelated, and F2 slope was a significant predictor of intelligibility. Conclusion Average diphthong F2 slope was more sensitive than maximum F2 slope to articulatory mechanism involvement in mild dysarthria in PD. F2 slope holds promise as an objective measure of treatment-related changes in the articulatory mechanism for therapeutic techniques that focus on articulation. PMID:24504015

  12. DWT features performance analysis for automatic speech recognition of Urdu.

    PubMed

    Ali, Hazrat; Ahmad, Nasir; Zhou, Xianwei; Iqbal, Khalid; Ali, Sahibzada Muhammad

    2014-01-01

    This paper presents the work on Automatic Speech Recognition of Urdu language, using a comparative analysis for Discrete Wavelets Transform (DWT) based features and Mel Frequency Cepstral Coefficients (MFCC). These features have been extracted for one hundred isolated words of Urdu, each word uttered by ten different speakers. The words have been selected from the most frequently used words of Urdu. A variety of age and dialect has been covered by using a balanced corpus approach. After extraction of features, the classification has been achieved by using Linear Discriminant Analysis. After the classification task, the confusion matrix obtained for the DWT features has been compared with the one obtained for Mel-Frequency Cepstral Coefficients based speech recognition. The framework has been trained and tested for speech data recorded under controlled environments. The experimental results are useful in determination of the optimum features for speech recognition task. PMID:25674450

  13. The acoustic features of human laughter

    NASA Astrophysics Data System (ADS)

    Bachorowski, Jo-Anne; Owren, Michael J.

    2002-05-01

    Remarkably little is known about the acoustic features of laughter, despite laughter's ubiquitous role in human vocal communication. Outcomes are described for 1024 naturally produced laugh bouts recorded from 97 young adults. Acoustic analysis focused on temporal characteristics, production modes, source- and filter-related effects, and indexical cues to laugher sex and individual identity. The results indicate that laughter is a remarkably complex vocal signal, with evident diversity in both production modes and fundamental frequency characteristics. Also of interest was finding a consistent lack of articulation effects in supralaryngeal filtering. Outcomes are compared to previously advanced hypotheses and conjectures about this species-typical vocal signal.

  14. Canonicalization of Feature Parameters for Robust Speech Recognition Based on Distinctive Phonetic Feature (DPF) Vectors

    NASA Astrophysics Data System (ADS)

    Huda, Mohammad Nurul; Ghulam, Muhammad; Fukuda, Takashi; Katsurada, Kouichi; Nitta, Tsuneo

    This paper describes a robust automatic speech recognition (ASR) system with less computation. Acoustic models of a hidden Markov model (HMM)-based classifier include various types of hidden factors such as speaker-specific characteristics, coarticulation, and an acoustic environment, etc. If there exists a canonicalization process that can recover the degraded margin of acoustic likelihoods between correct phonemes and other ones caused by hidden factors, the robustness of ASR systems can be improved. In this paper, we introduce a canonicalization method that is composed of multiple distinctive phonetic feature (DPF) extractors corresponding to each hidden factor canonicalization, and a DPF selector which selects an optimum DPF vector as an input of the HMM-based classifier. The proposed method resolves gender factors and speaker variability, and eliminates noise factors by applying the canonicalzation based on the DPF extractors and two-stage Wiener filtering. In the experiment on AURORA-2J, the proposed method provides higher word accuracy under clean training and significant improvement of word accuracy in low signal-to-noise ratio (SNR) under multi-condition training compared to a standard ASR system with mel frequency ceptral coeffient (MFCC) parameters. Moreover, the proposed method requires a reduced, two-fifth, Gaussian mixture components and less memory to achieve accurate ASR.

  15. Aging Affects Neural Synchronization to Speech-Related Acoustic Modulations

    PubMed Central

    Goossens, Tine; Vercammen, Charlotte; Wouters, Jan; van Wieringen, Astrid

    2016-01-01

    As people age, speech perception problems become highly prevalent, especially in noisy situations. In addition to peripheral hearing and cognition, temporal processing plays a key role in speech perception. Temporal processing of speech features is mediated by synchronized activity of neural oscillations in the central auditory system. Previous studies indicate that both the degree and hemispheric lateralization of synchronized neural activity relate to speech perception performance. Based on these results, we hypothesize that impaired speech perception in older persons may, in part, originate from deviances in neural synchronization. In this study, auditory steady-state responses that reflect synchronized activity of theta, beta, low and high gamma oscillations (i.e., 4, 20, 40, and 80 Hz ASSR, respectively) were recorded in young, middle-aged, and older persons. As all participants had normal audiometric thresholds and were screened for (mild) cognitive impairment, differences in synchronized neural activity across the three age groups were likely to be attributed to age. Our data yield novel findings regarding theta and high gamma oscillations in the aging auditory system. At an older age, synchronized activity of theta oscillations is increased, whereas high gamma synchronization is decreased. In contrast to young persons who exhibit a right hemispheric dominance for processing of high gamma range modulations, older adults show a symmetrical processing pattern. These age-related changes in neural synchronization may very well underlie the speech perception problems in aging persons. PMID:27378906

  16. Learning Speech Variability in Discriminative Acoustic Model Adaptation

    NASA Astrophysics Data System (ADS)

    Sato, Shoei; Oku, Takahiro; Homma, Shinichi; Kobayashi, Akio; Imai, Toru

    We present a new discriminative method of acoustic model adaptation that deals with a task-dependent speech variability. We have focused on differences of expressions or speaking styles between tasks and set the objective of this method as improving the recognition accuracy of indistinctly pronounced phrases dependent on a speaking style.The adaptation appends subword models for frequently observable variants of subwords in the task. To find the task-dependent variants, low-confidence words are statistically selected from words with higher frequency in the task's adaptation data by using their word lattices. HMM parameters of subword models dependent on the words are discriminatively trained by using linear transforms with a minimum phoneme error (MPE) criterion. For the MPE training, subword accuracy discriminating between the variants and the originals is also investigated. In speech recognition experiments, the proposed adaptation with the subword variants reduced the word error rate by 12.0% relative in a Japanese conversational broadcast task.

  17. A Bayesian view on acoustic model-based techniques for robust speech recognition

    NASA Astrophysics Data System (ADS)

    Maas, Roland; Huemmer, Christian; Sehr, Armin; Kellermann, Walter

    2015-12-01

    This article provides a unifying Bayesian view on various approaches for acoustic model adaptation, missing feature, and uncertainty decoding that are well-known in the literature of robust automatic speech recognition. The representatives of these classes can often be deduced from a Bayesian network that extends the conventional hidden Markov models used in speech recognition. These extensions, in turn, can in many cases be motivated from an underlying observation model that relates clean and distorted feature vectors. By identifying and converting the observation models into a Bayesian network representation, we formulate the corresponding compensation rules. We thus summarize the various approaches as approximations or modifications of the same Bayesian decoding rule leading to a unified view on known derivations as well as to new formulations for certain approaches.

  18. Suppressed alpha oscillations predict intelligibility of speech and its acoustic details.

    PubMed

    Obleser, Jonas; Weisz, Nathan

    2012-11-01

    Modulations of human alpha oscillations (8-13 Hz) accompany many cognitive processes, but their functional role in auditory perception has proven elusive: Do oscillatory dynamics of alpha reflect acoustic details of the speech signal and are they indicative of comprehension success? Acoustically presented words were degraded in acoustic envelope and spectrum in an orthogonal design, and electroencephalogram responses in the frequency domain were analyzed in 24 participants, who rated word comprehensibility after each trial. First, the alpha power suppression during and after a degraded word depended monotonically on spectral and, to a lesser extent, envelope detail. The magnitude of this alpha suppression exhibited an additional and independent influence on later comprehension ratings. Second, source localization of alpha suppression yielded superior parietal, prefrontal, as well as anterior temporal brain areas. Third, multivariate classification of the time-frequency pattern across participants showed that patterns of late posterior alpha power allowed best for above-chance classification of word intelligibility. Results suggest that both magnitude and topography of late alpha suppression in response to single words can indicate a listener's sensitivity to acoustic features and the ability to comprehend speech under adverse listening conditions. PMID:22100354

  19. Suppressed Alpha Oscillations Predict Intelligibility of Speech and its Acoustic Details

    PubMed Central

    Weisz, Nathan

    2012-01-01

    Modulations of human alpha oscillations (8–13 Hz) accompany many cognitive processes, but their functional role in auditory perception has proven elusive: Do oscillatory dynamics of alpha reflect acoustic details of the speech signal and are they indicative of comprehension success? Acoustically presented words were degraded in acoustic envelope and spectrum in an orthogonal design, and electroencephalogram responses in the frequency domain were analyzed in 24 participants, who rated word comprehensibility after each trial. First, the alpha power suppression during and after a degraded word depended monotonically on spectral and, to a lesser extent, envelope detail. The magnitude of this alpha suppression exhibited an additional and independent influence on later comprehension ratings. Second, source localization of alpha suppression yielded superior parietal, prefrontal, as well as anterior temporal brain areas. Third, multivariate classification of the time–frequency pattern across participants showed that patterns of late posterior alpha power allowed best for above-chance classification of word intelligibility. Results suggest that both magnitude and topography of late alpha suppression in response to single words can indicate a listener's sensitivity to acoustic features and the ability to comprehend speech under adverse listening conditions. PMID:22100354

  20. On the Acoustics of Emotion in Audio: What Speech, Music, and Sound have in Common

    PubMed Central

    Weninger, Felix; Eyben, Florian; Schuller, Björn W.; Mortillaro, Marcello; Scherer, Klaus R.

    2013-01-01

    Without doubt, there is emotional information in almost any kind of sound received by humans every day: be it the affective state of a person transmitted by means of speech; the emotion intended by a composer while writing a musical piece, or conveyed by a musician while performing it; or the affective state connected to an acoustic event occurring in the environment, in the soundtrack of a movie, or in a radio play. In the field of affective computing, there is currently some loosely connected research concerning either of these phenomena, but a holistic computational model of affect in sound is still lacking. In turn, for tomorrow’s pervasive technical systems, including affective companions and robots, it is expected to be highly beneficial to understand the affective dimensions of “the sound that something makes,” in order to evaluate the system’s auditory environment and its own audio output. This article aims at a first step toward a holistic computational model: starting from standard acoustic feature extraction schemes in the domains of speech, music, and sound analysis, we interpret the worth of individual features across these three domains, considering four audio databases with observer annotations in the arousal and valence dimensions. In the results, we find that by selection of appropriate descriptors, cross-domain arousal, and valence regression is feasible achieving significant correlations with the observer annotations of up to 0.78 for arousal (training on sound and testing on enacted speech) and 0.60 for valence (training on enacted speech and testing on music). The high degree of cross-domain consistency in encoding the two main dimensions of affect may be attributable to the co-evolution of speech and music from multimodal affect bursts, including the integration of nature sounds for expressive effects. PMID:23750144

  1. Denoising of human speech using combined acoustic and em sensor signal processing

    SciTech Connect

    Ng, L C; Burnett, G C; Holzrichter, J F; Gable, T J

    1999-11-29

    Low Power EM radar-like sensors have made it possible to measure properties of the human speech production system in real-time, without acoustic interference. This greatly enhances the quality and quantify of information for many speech related applications. See Holzrichter, Burnett, Ng, and Lea, J. Acoustic. Soc. Am. 103 (1) 622 (1998). By using combined Glottal-EM- Sensor- and Acoustic-signals, segments of voiced, unvoiced, and no-speech can be reliably defined. Real-time Denoising filters can be constructed to remove noise from the user's corresponding speech signal.

  2. Effect of acoustic fine structure cues on the recognition of auditory-only and audiovisual speech.

    PubMed

    Meister, Hartmut; Fuersen, Katrin; Schreitmueller, Stefan; Walger, Martin

    2016-06-01

    This study addressed the hypothesis that an improvement in speech recognition due to combined envelope and fine structure cues is greater in the audiovisual than the auditory modality. Normal hearing listeners were presented with envelope vocoded speech in combination with low-pass filtered speech. The benefit of adding acoustic low-frequency fine structure to acoustic envelope cues was significantly greater for audiovisual than for auditory-only speech. It is suggested that this is due to complementary information of the different acoustic and visual cues. The results have potential implications for the assessment of bimodal cochlear implant fittings or electroacoustic stimulation. PMID:27369134

  3. Acoustic Predictors of Intelligibility for Segmentally Interrupted Speech: Temporal Envelope, Voicing, and Duration

    ERIC Educational Resources Information Center

    Fogerty, Daniel

    2013-01-01

    Purpose: Temporal interruption limits the perception of speech to isolated temporal glimpses. An analysis was conducted to determine the acoustic parameter that best predicts speech recognition from temporal fragments that preserve different types of speech information--namely, consonants and vowels. Method: Young listeners with normal hearing…

  4. Perceptual and Acoustic Reliability Estimates for the Speech Disorders Classification System (SDCS)

    ERIC Educational Resources Information Center

    Shriberg, Lawrence D.; Fourakis, Marios; Hall, Sheryl D.; Karlsson, Heather B.; Lohmeier, Heather L.; McSweeny, Jane L.; Potter, Nancy L.; Scheer-Cohen, Alison R.; Strand, Edythe A.; Tilkens, Christie M.; Wilson, David L.

    2010-01-01

    A companion paper describes three extensions to a classification system for paediatric speech sound disorders termed the Speech Disorders Classification System (SDCS). The SDCS uses perceptual and acoustic data reduction methods to obtain information on a speaker's speech, prosody, and voice. The present paper provides reliability estimates for…

  5. Emotional speech acoustic model for Malay: iterative versus isolated unit training.

    PubMed

    Mustafa, Mumtaz Begum; Ainon, Raja Noor

    2013-10-01

    The ability of speech synthesis system to synthesize emotional speech enhances the user's experience when using this kind of system and its related applications. However, the development of an emotional speech synthesis system is a daunting task in view of the complexity of human emotional speech. The more recent state-of-the-art speech synthesis systems, such as the one based on hidden Markov models, can synthesize emotional speech with acceptable naturalness with the use of a good emotional speech acoustic model. However, building an emotional speech acoustic model requires adequate resources including segment-phonetic labels of emotional speech, which is a problem for many under-resourced languages, including Malay. This research shows how it is possible to build an emotional speech acoustic model for Malay with minimal resources. To achieve this objective, two forms of initialization methods were considered: iterative training using the deterministic annealing expectation maximization algorithm and the isolated unit training. The seed model for the automatic segmentation is a neutral speech acoustic model, which was transformed to target emotion using two transformation techniques: model adaptation and context-dependent boundary refinement. Two forms of evaluation have been performed: an objective evaluation measuring the prosody error and a listening evaluation to measure the naturalness of the synthesized emotional speech. PMID:24116440

  6. Acoustic Analysis of the Voiced-Voiceless Distinction in Dutch Tracheoesophageal Speech

    ERIC Educational Resources Information Center

    Jongmans, Petra; Wempe, Ton G.; van Tinteren, Harm; Hilgers, Frans J. M.; Pols, Louis C. W.; van As-Brooks, Corina J.

    2010-01-01

    Purpose: Confusions between voiced and voiceless plosives and voiced and voiceless fricatives are common in Dutch tracheoesophageal (TE) speech. This study investigates (a) which acoustic measures are found to convey a correct voicing contrast in TE speech and (b) whether different measures are found in TE speech than in normal laryngeal (NL)…

  7. Acoustic landmarks drive delta-theta oscillations to enable speech comprehension by facilitating perceptual parsing

    PubMed Central

    Doelling, Keith; Arnal, Luc; Ghitza, Oded; Poeppel, David

    2013-01-01

    A growing body of research suggests that intrinsic neuronal slow (< 10 Hz) oscillations in auditory cortex appear to track incoming speech and other spectro-temporally complex auditory signals. Within this framework, several recent studies have identified critical-band temporal envelopes as the specific acoustic feature being reflected by the phase of these oscillations. However, how this alignment between speech acoustics and neural oscillations might underpin intelligibility is unclear. Here we test the hypothesis that the ‘sharpness’ of temporal fluctuations in the critical band envelope acts as a temporal cue to speech syllabic rate, driving delta-theta rhythms to track the stimulus and facilitate intelligibility. We interpret our findings as evidence that sharp events in the stimulus cause cortical rhythms to re-align and parse the stimulus into syllable-sized chunks for further decoding. Using magnetoencephalographic recordings, we show that by removing temporal fluctuations that occur at the syllabic rate, envelope-tracking activity is reduced. By artificially reinstating these temporal fluctuations, envelope-tracking activity is regained. These changes in tracking correlate with intelligibility of the stimulus. Together, the results suggest that the sharpness of fluctuations in the stimulus, as reflected in the cochlear output, drive oscillatory activity to track and entrain to the stimulus, at its syllabic rate. This process likely facilitates parsing of the stimulus into meaningful chunks appropriate for subsequent decoding, enhancing perception and intelligibility. PMID:23791839

  8. Logopenic and nonfluent variants of primary progressive aphasia are differentiated by acoustic measures of speech production.

    PubMed

    Ballard, Kirrie J; Savage, Sharon; Leyton, Cristian E; Vogel, Adam P; Hornberger, Michael; Hodges, John R

    2014-01-01

    Differentiation of logopenic (lvPPA) and nonfluent/agrammatic (nfvPPA) variants of Primary Progressive Aphasia is important yet remains challenging since it hinges on expert based evaluation of speech and language production. In this study acoustic measures of speech in conjunction with voxel-based morphometry were used to determine the success of the measures as an adjunct to diagnosis and to explore the neural basis of apraxia of speech in nfvPPA. Forty-one patients (21 lvPPA, 20 nfvPPA) were recruited from a consecutive sample with suspected frontotemporal dementia. Patients were diagnosed using the current gold-standard of expert perceptual judgment, based on presence/absence of particular speech features during speaking tasks. Seventeen healthy age-matched adults served as controls. MRI scans were available for 11 control and 37 PPA cases; 23 of the PPA cases underwent amyloid ligand PET imaging. Measures, corresponding to perceptual features of apraxia of speech, were periods of silence during reading and relative vowel duration and intensity in polysyllable word repetition. Discriminant function analyses revealed that a measure of relative vowel duration differentiated nfvPPA cases from both control and lvPPA cases (r(2) = 0.47) with 88% agreement with expert judgment of presence of apraxia of speech in nfvPPA cases. VBM analysis showed that relative vowel duration covaried with grey matter intensity in areas critical for speech motor planning and programming: precentral gyrus, supplementary motor area and inferior frontal gyrus bilaterally, only affected in the nfvPPA group. This bilateral involvement of frontal speech networks in nfvPPA potentially affects access to compensatory mechanisms involving right hemisphere homologues. Measures of silences during reading also discriminated the PPA and control groups, but did not increase predictive accuracy. Findings suggest that a measure of relative vowel duration from of a polysyllable word repetition task

  9. Logopenic and Nonfluent Variants of Primary Progressive Aphasia Are Differentiated by Acoustic Measures of Speech Production

    PubMed Central

    Ballard, Kirrie J.; Savage, Sharon; Leyton, Cristian E.; Vogel, Adam P.; Hornberger, Michael; Hodges, John R.

    2014-01-01

    Differentiation of logopenic (lvPPA) and nonfluent/agrammatic (nfvPPA) variants of Primary Progressive Aphasia is important yet remains challenging since it hinges on expert based evaluation of speech and language production. In this study acoustic measures of speech in conjunction with voxel-based morphometry were used to determine the success of the measures as an adjunct to diagnosis and to explore the neural basis of apraxia of speech in nfvPPA. Forty-one patients (21 lvPPA, 20 nfvPPA) were recruited from a consecutive sample with suspected frontotemporal dementia. Patients were diagnosed using the current gold-standard of expert perceptual judgment, based on presence/absence of particular speech features during speaking tasks. Seventeen healthy age-matched adults served as controls. MRI scans were available for 11 control and 37 PPA cases; 23 of the PPA cases underwent amyloid ligand PET imaging. Measures, corresponding to perceptual features of apraxia of speech, were periods of silence during reading and relative vowel duration and intensity in polysyllable word repetition. Discriminant function analyses revealed that a measure of relative vowel duration differentiated nfvPPA cases from both control and lvPPA cases (r2 = 0.47) with 88% agreement with expert judgment of presence of apraxia of speech in nfvPPA cases. VBM analysis showed that relative vowel duration covaried with grey matter intensity in areas critical for speech motor planning and programming: precentral gyrus, supplementary motor area and inferior frontal gyrus bilaterally, only affected in the nfvPPA group. This bilateral involvement of frontal speech networks in nfvPPA potentially affects access to compensatory mechanisms involving right hemisphere homologues. Measures of silences during reading also discriminated the PPA and control groups, but did not increase predictive accuracy. Findings suggest that a measure of relative vowel duration from of a polysyllable word repetition task

  10. Moving to the Speed of Sound: Context Modulation of the Effect of Acoustic Properties of Speech

    ERIC Educational Resources Information Center

    Shintel, Hadas; Nusbaum, Howard C.

    2008-01-01

    Suprasegmental acoustic patterns in speech can convey meaningful information and affect listeners' interpretation in various ways, including through systematic analog mapping of message-relevant information onto prosody. We examined whether the effect of analog acoustic variation is governed by the acoustic properties themselves. For example, fast…

  11. Effective Acoustic Modeling for Pronunciation Quality Scoring of Strongly Accented Mandarin Speech

    NASA Astrophysics Data System (ADS)

    Ge, Fengpei; Liu, Changliang; Shao, Jian; Pan, Fuping; Dong, Bin; Yan, Yonghong

    In this paper we present our investigation into improving the performance of our computer-assisted language learning (CALL) system through exploiting the acoustic model and features within the speech recognition framework. First, to alleviate channel distortion, speaker-dependent cepstrum mean normalization (CMN) is adopted and the average correlation coefficient (average CC) between machine and expert scores is improved from 78.00% to 84.14%. Second, heteroscedastic linear discriminant analysis (HLDA) is adopted to enhance the discriminability of the acoustic model, which successfully increases the average CC from 84.14% to 84.62%. Additionally, HLDA causes the scoring accuracy to be more stable at various pronunciation proficiency levels, and thus leads to an increase in the speaker correct-rank rate from 85.59% to 90.99%. Finally, we use maximum a posteriori (MAP) estimation to tune the acoustic model to fit strongly accented test speech. As a result, the average CC is improved from 84.62% to 86.57%. These three novel techniques improve the accuracy of evaluating pronunciation quality.

  12. Fluid-acoustic interactions and their impact on pathological voiced speech

    NASA Astrophysics Data System (ADS)

    Erath, Byron D.; Zanartu, Matias; Peterson, Sean D.; Plesniak, Michael W.

    2011-11-01

    Voiced speech is produced by vibration of the vocal fold structures. Vocal fold dynamics arise from aerodynamic pressure loadings, tissue properties, and acoustic modulation of the driving pressures. Recent speech science advancements have produced a physiologically-realistic fluid flow solver (BLEAP) capable of prescribing asymmetric intraglottal flow attachment that can be easily assimilated into reduced order models of speech. The BLEAP flow solver is extended to incorporate acoustic loading and sound propagation in the vocal tract by implementing a wave reflection analog approach for sound propagation based on the governing BLEAP equations. This enhanced physiological description of the physics of voiced speech is implemented into a two-mass model of speech. The impact of fluid-acoustic interactions on vocal fold dynamics is elucidated for both normal and pathological speech through linear and nonlinear analysis techniques. Supported by NSF Grant CBET-1036280.

  13. Acoustical Characteristics of Mastication Sounds: Application of Speech Analysis Techniques

    NASA Astrophysics Data System (ADS)

    Brochetti, Denise

    Food scientists have used acoustical methods to study characteristics of mastication sounds in relation to food texture. However, a model for analysis of the sounds has not been identified, and reliability of the methods has not been reported. Therefore, speech analysis techniques were applied to mastication sounds, and variation in measures of the sounds was examined. To meet these objectives, two experiments were conducted. In the first experiment, a digital sound spectrograph generated waveforms and wideband spectrograms of sounds by 3 adult subjects (1 male, 2 females) for initial chews of food samples differing in hardness and fracturability. Acoustical characteristics were described and compared. For all sounds, formants appeared in the spectrograms, and energy occurred across a 0 to 8000-Hz range of frequencies. Bursts characterized waveforms for peanut, almond, raw carrot, ginger snap, and hard candy. Duration and amplitude of the sounds varied with the subjects. In the second experiment, the spectrograph was used to measure the duration, amplitude, and formants of sounds for the initial 2 chews of cylindrical food samples (raw carrot, teething toast) differing in diameter (1.27, 1.90, 2.54 cm). Six adult subjects (3 males, 3 females) having normal occlusions and temporomandibular joints chewed the samples between the molar teeth and with the mouth open. Ten repetitions per subject were examined for each food sample. Analysis of estimates of variation indicated an inconsistent intrasubject variation in the acoustical measures. Food type and sample diameter also affected the estimates, indicating the variable nature of mastication. Generally, intrasubject variation was greater than intersubject variation. Analysis of ranks of the data indicated that the effect of sample diameter on the acoustical measures was inconsistent and depended on the subject and type of food. If inferences are to be made concerning food texture from acoustical measures of mastication

  14. How stable are acoustic metrics of contrastive speech rhythm?

    PubMed

    Wiget, Lukas; White, Laurence; Schuppler, Barbara; Grenon, Izabelle; Rauch, Olesya; Mattys, Sven L

    2010-03-01

    Acoustic metrics of contrastive speech rhythm, based on vocalic and intervocalic interval durations, are intended to capture stable typological differences between languages. They should consequently be robust to variation between speakers, sentence materials, and measurers. This paper assesses the impact of these sources of variation on the metrics %V (proportion of utterance comprised of vocalic intervals), VarcoV (rate-normalized standard deviation of vocalic interval duration), and nPVI-V (a measure of the durational variability between successive pairs of vocalic intervals). Five measurers analyzed the same corpus of speech: five sentences read by six speakers of Standard Southern British English. Differences between sentences were responsible for the greatest variation in rhythm scores. Inter-speaker differences were also a source of significant variability. However, there was relatively little variation due to segmentation differences between measurers following an agreed protocol. An automated phone alignment process was also used: Rhythm scores thus derived showed good agreement with the human measurers. A number of recommendations for researchers wishing to exploit contrastive rhythm metrics are offered in conclusion. PMID:20329856

  15. Selecting Disorder-Specific Features for Speech Pathology Fingerprinting.

    PubMed

    Berisha, Visar; Sandoval, Steven; Utianski, Rene; Liss, Julie; Spanias, Andreas

    2013-01-01

    The general aim of this work is to learn a unique statistical signature for the state of a particular speech pathology. We pose this as a speaker identification problem for dysarthric individuals. To that end, we propose a novel algorithm for feature selection that aims to minimize the effects of speaker-specific features (e.g., fundamental frequency) and maximize the effects of pathology-specific features (e.g., vocal tract distortions and speech rhythm). We derive a cost function for optimizing feature selection that simultaneously trades off between these two competing criteria. Furthermore, we develop an efficient algorithm that optimizes this cost function and test the algorithm on a set of 34 dysarthric and 13 healthy speakers. Results show that the proposed method yields a set of features related to the speech disorder and not an individual's speaking style. When compared to other feature-selection algorithms, the proposed approach results in an improvement in a disorder fingerprinting task by selecting features that are specific to the disorder. PMID:25005047

  16. Selecting Disorder-Specific Features for Speech Pathology Fingerprinting

    PubMed Central

    Berisha, Visar; Sandoval, Steven; Utianski, Rene; Liss, Julie; Spanias, Andreas

    2014-01-01

    The general aim of this work is to learn a unique statistical signature for the state of a particular speech pathology. We pose this as a speaker identification problem for dysarthric individuals. To that end, we propose a novel algorithm for feature selection that aims to minimize the effects of speaker-specific features (e.g., fundamental frequency) and maximize the effects of pathology-specific features (e.g., vocal tract distortions and speech rhythm). We derive a cost function for optimizing feature selection that simultaneously trades off between these two competing criteria. Furthermore, we develop an efficient algorithm that optimizes this cost function and test the algorithm on a set of 34 dysarthric and 13 healthy speakers. Results show that the proposed method yields a set of features related to the speech disorder and not an individual's speaking style. When compared to other feature-selection algorithms, the proposed approach results in an improvement in a disorder fingerprinting task by selecting features that are specific to the disorder. PMID:25005047

  17. DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1

    NASA Astrophysics Data System (ADS)

    Garofolo, J. S.; Lamel, L. F.; Fisher, W. M.; Fiscus, J. G.; Pallett, D. S.

    1993-02-01

    The Texas Instruments/Massachusetts Institute of Technology (TIMIT) corpus of read speech has been designed to provide speech data for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of automatic speech recognition systems. TIMIT contains speech from 630 speakers representing 8 major dialect divisions of American English, each speaking 10 phonetically-rich sentences. The TIMIT corpus includes time-aligned orthographic, phonetic, and word transcriptions, as well as speech waveform data for each spoken sentence. The release of TIMIT contains several improvements over the Prototype CD-ROM released in December, 1988: (1) full 630-speaker corpus, (2) checked and corrected transcriptions, (3) word-alignment transcriptions, (4) NIST SPHERE-headered waveform files and header manipulation software, (5) phonemic dictionary, (6) new test and training subsets balanced for dialectal and phonetic coverage, and (7) more extensive documentation.

  18. Acoustic properties of naturally produced clear speech at normal speaking rates

    NASA Astrophysics Data System (ADS)

    Krause, Jean C.; Braida, Louis D.

    2004-01-01

    Sentences spoken ``clearly'' are significantly more intelligible than those spoken ``conversationally'' for hearing-impaired listeners in a variety of backgrounds [Picheny et al., J. Speech Hear. Res. 28, 96-103 (1985); Uchanski et al., ibid. 39, 494-509 (1996); Payton et al., J. Acoust. Soc. Am. 95, 1581-1592 (1994)]. While producing clear speech, however, talkers often reduce their speaking rate significantly [Picheny et al., J. Speech Hear. Res. 29, 434-446 (1986); Uchanski et al., ibid. 39, 494-509 (1996)]. Yet speaking slowly is not solely responsible for the intelligibility benefit of clear speech (over conversational speech), since a recent study [Krause and Braida, J. Acoust. Soc. Am. 112, 2165-2172 (2002)] showed that talkers can produce clear speech at normal rates with training. This finding suggests that clear speech has inherent acoustic properties, independent of rate, that contribute to improved intelligibility. Identifying these acoustic properties could lead to improved signal processing schemes for hearing aids. To gain insight into these acoustical properties, conversational and clear speech produced at normal speaking rates were analyzed at three levels of detail (global, phonological, and phonetic). Although results suggest that talkers may have employed different strategies to achieve clear speech at normal rates, two global-level properties were identified that appear likely to be linked to the improvements in intelligibility provided by clear/normal speech: increased energy in the 1000-3000-Hz range of long-term spectra and increased modulation depth of low frequency modulations of the intensity envelope. Other phonological and phonetic differences associated with clear/normal speech include changes in (1) frequency of stop burst releases, (2) VOT of word-initial voiceless stop consonants, and (3) short-term vowel spectra.

  19. Formant Centralization Ratio: A Proposal for a New Acoustic Measure of Dysarthric Speech

    ERIC Educational Resources Information Center

    Sapir, Shimon; Ramig, Lorraine O.; Spielman, Jennifer L.; Fox, Cynthia

    2010-01-01

    Purpose: The vowel space area (VSA) has been used as an acoustic metric of dysarthric speech, but with varying degrees of success. In this study, the authors aimed to test an alternative metric to the VSA--the "formant centralization ratio" (FCR), which is hypothesized to more effectively differentiate dysarthric from healthy speech and register…

  20. Speech feature discrimination in deaf children following cochlear implantation

    NASA Astrophysics Data System (ADS)

    Bergeson, Tonya R.; Pisoni, David B.; Kirk, Karen Iler

    2002-05-01

    Speech feature discrimination is a fundamental perceptual skill that is often assumed to underlie word recognition and sentence comprehension performance. To investigate the development of speech feature discrimination in deaf children with cochlear implants, we conducted a retrospective analysis of results from the Minimal Pairs Test (Robbins et al., 1988) selected from patients enrolled in a longitudinal study of speech perception and language development. The MP test uses a 2AFC procedure in which children hear a word and select one of two pictures (bat-pat). All 43 children were prelingually deafened, received a cochlear implant before 6 years of age or between ages 6 and 9, and used either oral or total communication. Children were tested once every 6 months to 1 year for 7 years; not all children were tested at each interval. By 2 years postimplant, the majority of these children achieved near-ceiling levels of discrimination performance for vowel height, vowel place, and consonant manner. Most of the children also achieved plateaus but did not reach ceiling performance for consonant place and voicing. The relationship between speech feature discrimination, spoken word recognition, and sentence comprehension will be discussed. [Work supported by NIH/NIDCD Research Grant No. R01DC00064 and NIH/NIDCD Training Grant No. T32DC00012.

  1. Acoustic diagnosis of pulmonary hypertension: automated speech- recognition-inspired classification algorithm outperforms physicians.

    PubMed

    Kaddoura, Tarek; Vadlamudi, Karunakar; Kumar, Shine; Bobhate, Prashant; Guo, Long; Jain, Shreepal; Elgendi, Mohamed; Coe, James Y; Kim, Daniel; Taylor, Dylan; Tymchak, Wayne; Schuurmans, Dale; Zemp, Roger J; Adatia, Ian

    2016-01-01

    We hypothesized that an automated speech- recognition-inspired classification algorithm could differentiate between the heart sounds in subjects with and without pulmonary hypertension (PH) and outperform physicians. Heart sounds, electrocardiograms, and mean pulmonary artery pressures (mPAp) were recorded simultaneously. Heart sound recordings were digitized to train and test speech-recognition-inspired classification algorithms. We used mel-frequency cepstral coefficients to extract features from the heart sounds. Gaussian-mixture models classified the features as PH (mPAp ≥ 25 mmHg) or normal (mPAp < 25 mmHg). Physicians blinded to patient data listened to the same heart sound recordings and attempted a diagnosis. We studied 164 subjects: 86 with mPAp ≥ 25 mmHg (mPAp 41 ± 12 mmHg) and 78 with mPAp < 25 mmHg (mPAp 17 ± 5 mmHg) (p  < 0.005). The correct diagnostic rate of the automated speech-recognition-inspired algorithm was 74% compared to 56% by physicians (p = 0.005). The false positive rate for the algorithm was 34% versus 50% (p = 0.04) for clinicians. The false negative rate for the algorithm was 23% and 68% (p = 0.0002) for physicians. We developed an automated speech-recognition-inspired classification algorithm for the acoustic diagnosis of PH that outperforms physicians that could be used to screen for PH and encourage earlier specialist referral. PMID:27609672

  2. Acoustic diagnosis of pulmonary hypertension: automated speech- recognition-inspired classification algorithm outperforms physicians

    PubMed Central

    Kaddoura, Tarek; Vadlamudi, Karunakar; Kumar, Shine; Bobhate, Prashant; Guo, Long; Jain, Shreepal; Elgendi, Mohamed; Coe, James Y; Kim, Daniel; Taylor, Dylan; Tymchak, Wayne; Schuurmans, Dale; Zemp, Roger J.; Adatia, Ian

    2016-01-01

    We hypothesized that an automated speech- recognition-inspired classification algorithm could differentiate between the heart sounds in subjects with and without pulmonary hypertension (PH) and outperform physicians. Heart sounds, electrocardiograms, and mean pulmonary artery pressures (mPAp) were recorded simultaneously. Heart sound recordings were digitized to train and test speech-recognition-inspired classification algorithms. We used mel-frequency cepstral coefficients to extract features from the heart sounds. Gaussian-mixture models classified the features as PH (mPAp ≥ 25 mmHg) or normal (mPAp < 25 mmHg). Physicians blinded to patient data listened to the same heart sound recordings and attempted a diagnosis. We studied 164 subjects: 86 with mPAp ≥ 25 mmHg (mPAp 41 ± 12 mmHg) and 78 with mPAp < 25 mmHg (mPAp 17 ± 5 mmHg) (p  < 0.005). The correct diagnostic rate of the automated speech-recognition-inspired algorithm was 74% compared to 56% by physicians (p = 0.005). The false positive rate for the algorithm was 34% versus 50% (p = 0.04) for clinicians. The false negative rate for the algorithm was 23% and 68% (p = 0.0002) for physicians. We developed an automated speech-recognition-inspired classification algorithm for the acoustic diagnosis of PH that outperforms physicians that could be used to screen for PH and encourage earlier specialist referral. PMID:27609672

  3. A Frame-Based Context-Dependent Acoustic Modeling for Speech Recognition

    NASA Astrophysics Data System (ADS)

    Terashima, Ryuta; Zen, Heiga; Nankaku, Yoshihiko; Tokuda, Keiichi

    We propose a novel acoustic model for speech recognition, named FCD (Frame-based Context Dependent) model. It can obtain a probability distribution by using a top-down clustering technique to simultaneously consider the local frame position in phoneme, phoneme duration, and phoneme context. The model topology is derived from connecting left-to-right HMM models without self-loop transition for each phoneme duration. Because the FCD model can change the probability distribution into a sequence corresponding with one phoneme duration, it can has the ability to generate a smooth trajectory of speech feature vector. We also performed an experiment to evaluate the performance of speech recognition for the model. In the experiment, 132 questions for frame position, 66 questions for phoneme duration and 134 questions for phoneme context were used to train the sub-phoneme FCD model. In order to compare the performance, left-to-right HMM and two types of HSMM models with almost same number of states were also trained. As a result, 18% of relative improvement of tri-phone accuracy was achieved by the FCD model.

  4. Feature extraction and models for speech: An overview

    NASA Astrophysics Data System (ADS)

    Schroeder, Manfred

    2002-11-01

    Modeling of speech has a long history, beginning with Count von Kempelens 1770 mechanical speaking machine. Even then human vowel production was seen as resulting from a source (the vocal chords) driving a physically separate resonator (the vocal tract). Homer Dudley's 1928 frequency-channel vocoder and many of its descendants are based on the same successful source-filter paradigm. For linguistic studies as well as practical applications in speech recognition, compression, and synthesis (see M. R. Schroeder, Computer Speech), the extant models require the (often difficult) extraction of numerous parameters such as the fundamental and formant frequencies and various linguistic distinctive features. Some of these difficulties were obviated by the introduction of linear predictive coding (LPC) in 1967 in which the filter part is an all-pole filter, reflecting the fact that for non-nasalized vowels the vocal tract is well approximated by an all-pole transfer function. In the now ubiquitous code-excited linear prediction (CELP), the source-part is replaced by a code book which (together with a perceptual error criterion) permits speech compression to very low bit rates at high speech quality for the Internet and cell phones.

  5. The Use of Artificial Neural Networks to Estimate Speech Intelligibility from Acoustic Variables: A Preliminary Analysis.

    ERIC Educational Resources Information Center

    Metz, Dale Evan; And Others

    1992-01-01

    A preliminary scheme for estimating the speech intelligibility of hearing-impaired speakers from acoustic parameters, using a computerized artificial neural network to process mathematically the acoustic input variables, is outlined. Tests with 60 hearing-impaired speakers found the scheme to be highly accurate in identifying speakers separated by…

  6. A maximum likelihood approach to estimating articulator positions from speech acoustics

    SciTech Connect

    Hogden, J.

    1996-09-23

    This proposal presents an algorithm called maximum likelihood continuity mapping (MALCOM) which recovers the positions of the tongue, jaw, lips, and other speech articulators from measurements of the sound-pressure waveform of speech. MALCOM differs from other techniques for recovering articulator positions from speech in three critical respects: it does not require training on measured or modeled articulator positions, it does not rely on any particular model of sound propagation through the vocal tract, and it recovers a mapping from acoustics to articulator positions that is linearly, not topographically, related to the actual mapping from acoustics to articulation. The approach categorizes short-time windows of speech into a finite number of sound types, and assumes the probability of using any articulator position to produce a given sound type can be described by a parameterized probability density function. MALCOM then uses maximum likelihood estimation techniques to: (1) find the most likely smooth articulator path given a speech sample and a set of distribution functions (one distribution function for each sound type), and (2) change the parameters of the distribution functions to better account for the data. Using this technique improves the accuracy of articulator position estimates compared to continuity mapping -- the only other technique that learns the relationship between acoustics and articulation solely from acoustics. The technique has potential application to computer speech recognition, speech synthesis and coding, teaching the hearing impaired to speak, improving foreign language instruction, and teaching dyslexics to read. 34 refs., 7 figs.

  7. Language Comprehension in Language-Learning Impaired Children Improved with Acoustically Modified Speech

    NASA Astrophysics Data System (ADS)

    Tallal, Paula; Miller, Steve L.; Bedi, Gail; Byma, Gary; Wang, Xiaoqin; Nagarajan, Srikantan S.; Schreiner, Christoph; Jenkins, William M.; Merzenich, Michael M.

    1996-01-01

    A speech processing algorithm was developed to create more salient versions of the rapidly changing elements in the acoustic waveform of speech that have been shown to be deficiently processed by language-learning impaired (LLI) children. LLI children received extensive daily training, over a 4-week period, with listening exercises in which all speech was translated into this synthetic form. They also received daily training with computer "games" designed to adaptively drive improvements in temporal processing thresholds. Significant improvements in speech discrimination and language comprehension abilities were demonstrated in two independent groups of LLI children.

  8. Discrimination of environmental background noise in presence of speech using sample-pairs statistics based features

    NASA Astrophysics Data System (ADS)

    Jhanwar, D.; Sharma, Kamlesh K.; Modani, S. G.

    2015-09-01

    A methodology to discriminate the different classes of background noise using new features based on samples of the signal is presented here. Two consecutive samples of different amplitude of the discretetime signals are termed as sample-pair and 14 types of sample-pairs are considered here as fundamental features. Results of simulation work proves that count of some of such type of sample-pairs as well as count of few combinations of two, three and four such sample-pairs are useful to detect and discriminate the different acoustic noise mixed with speech signals. On the basis of simulation results, the performance of proposed features have proved better than other spectral features like Mel Frequency Cepstral Coefficients (MFCC), Spectral Centroid, Spectral Flux and Spectral Roll-off regarding discrimination capabilities, simplicity of extraction process and lesser dependency over speech utterances mixed with noise. These sample-pairs based features having advantage of not requiring frame-decomposition and silence period removal. Their discrimination capabilities are shown by Fisher's F-ratio as performance index. The multiclass Support Vector Machine (SVM) is used as a classifier.

  9. A magnetic resonance imaging study on the articulatory and acoustic speech parameters of Malay vowels

    PubMed Central

    2014-01-01

    The phonetic properties of six Malay vowels are investigated using magnetic resonance imaging (MRI) to visualize the vocal tract in order to obtain dynamic articulatory parameters during speech production. To resolve image blurring due to the tongue movement during the scanning process, a method based on active contour extraction is used to track tongue contours. The proposed method efficiently tracks tongue contours despite the partial blurring of MRI images. Consequently, the articulatory parameters that are effectively measured as tongue movement is observed, and the specific shape of the tongue and its position for all six uttered Malay vowels are determined. Speech rehabilitation procedure demands some kind of visual perceivable prototype of speech articulation. To investigate the validity of the measured articulatory parameters based on acoustic theory of speech production, an acoustic analysis based on the uttered vowels by subjects has been performed. As the acoustic speech and articulatory parameters of uttered speech were examined, a correlation between formant frequencies and articulatory parameters was observed. The experiments reported a positive correlation between the constriction location of the tongue body and the first formant frequency, as well as a negative correlation between the constriction location of the tongue tip and the second formant frequency. The results demonstrate that the proposed method is an effective tool for the dynamic study of speech production. PMID:25060583

  10. The acoustics for speech of eight auditoriums in the city of Sao Paulo

    NASA Astrophysics Data System (ADS)

    Bistafa, Sylvio R.

    2002-11-01

    Eight auditoriums with a proscenium type of stage, which usually operate as dramatic theaters in the city of Sao Paulo, were acoustically surveyed in terms of their adequacy to unassisted speech. Reverberation times, early decay times, and speech levels were measured in different positions, together with objective measures of speech intelligibility. The measurements revealed reverberation time values rather uniform throughout the rooms, whereas significant variations were found in the values of the other acoustical measures with position. The early decay time was found to be better correlated with the objective measures of speech intelligibility than the reverberation time. The results from the objective measurements of speech intelligibility revealed that the speech transmission index STI, and its simplified version RaSTI, are strongly correlated with the early-to-late sound ratio C50 (1 kHz). However, it was found that the criterion value of acceptability of the latter is more easily met than the former. The results from these measurements enable to understand how the characteristics of the architectural design determine the acoustical quality for speech. Measurements of ST1-Gade were made as an attempt to validate it as an objective measure of ''support'' for the actor. The preliminary diagnosing results with ray tracing simulations will also be presented.

  11. Audio-video feature correlation: faces and speech

    NASA Astrophysics Data System (ADS)

    Durand, Gwenael; Montacie, Claude; Caraty, Marie-Jose; Faudemay, Pascal

    1999-08-01

    This paper presents a study of the correlation of features automatically extracted from the audio stream and the video stream of audiovisual documents. In particular, we were interested in finding out whether speech analysis tools could be combined with face detection methods, and to what extend they should be combined. A generic audio signal partitioning algorithm as first used to detect Silence/Noise/Music/Speech segments in a full length movie. A generic object detection method was applied to the keyframes extracted from the movie in order to detect the presence or absence of faces. The correlation between the presence of a face in the keyframes and of the corresponding voice in the audio stream was studied. A third stream, which is the script of the movie, is warped on the speech channel in order to automatically label faces appearing in the keyframes with the name of the corresponding character. We naturally found that extracted audio and video features were related in many cases, and that significant benefits can be obtained from the joint use of audio and video analysis methods.

  12. Speech and melody recognition in binaurally combined acoustic and electric hearing

    NASA Astrophysics Data System (ADS)

    Kong, Ying-Yee; Stickney, Ginger S.; Zeng, Fan-Gang

    2005-03-01

    Speech recognition in noise and music perception is especially challenging for current cochlear implant users. The present study utilizes the residual acoustic hearing in the nonimplanted ear in five cochlear implant users to elucidate the role of temporal fine structure at low frequencies in auditory perception and to test the hypothesis that combined acoustic and electric hearing produces better performance than either mode alone. The first experiment measured speech recognition in the presence of competing noise. It was found that, although the residual low-frequency (<1000 Hz) acoustic hearing produced essentially no recognition for speech recognition in noise, it significantly enhanced performance when combined with the electric hearing. The second experiment measured melody recognition in the same group of subjects and found that, contrary to the speech recognition result, the low-frequency acoustic hearing produced significantly better performance than the electric hearing. It is hypothesized that listeners with combined acoustic and electric hearing might use the correlation between the salient pitch in low-frequency acoustic hearing and the weak pitch in the envelope to enhance segregation between signal and noise. The present study suggests the importance and urgency of accurately encoding the fine-structure cue in cochlear implants. .

  13. Acoustic Features of Palilalia: A Case Study

    ERIC Educational Resources Information Center

    Borsel, John Van; Bontinck, Charlotte; Coryn, Marleen; Paemeleire, Frank; Vandemaele, Pieter

    2007-01-01

    While a number of authors have suggested that patients with palilalia typically show a tendency to repeat words or phrases with an increasing rate, others maintain that an accelerating speech rate is not essential. The present paper reports the results of an instrumental analysis of the reiterations in a 60-year-old man with palilalia. Results…

  14. Mathematical model of acoustic speech production with mobile walls of the vocal tract

    NASA Astrophysics Data System (ADS)

    Lyubimov, N. A.; Zakharov, E. V.

    2016-03-01

    A mathematical speech production model is considered that describes acoustic oscillation propagation in a vocal tract with mobile walls. The wave field function satisfies the Helmholtz equation with boundary conditions of the third kind (impedance type). The impedance mode corresponds to a threeparameter pendulum oscillation model. The experimental research demonstrates the nonlinear character of how the mobility of the vocal tract walls influence the spectral envelope of a speech signal.

  15. Precategorical Acoustic Storage and the Perception of Speech

    ERIC Educational Resources Information Center

    Frankish, Clive

    2008-01-01

    Theoretical accounts of both speech perception and of short term memory must consider the extent to which perceptual representations of speech sounds might survive in relatively unprocessed form. This paper describes a novel version of the serial recall task that can be used to explore this area of shared interest. In immediate recall of digit…

  16. Vowel Acoustics in Adults with Apraxia of Speech

    ERIC Educational Resources Information Center

    Jacks, Adam; Mathes, Katey A.; Marquardt, Thomas P.

    2010-01-01

    Purpose: To investigate the hypothesis that vowel production is more variable in adults with acquired apraxia of speech (AOS) relative to healthy individuals with unimpaired speech. Vowel formant frequency measures were selected as the specific target of focus. Method: Seven adults with AOS and aphasia produced 15 repetitions of 6 American English…

  17. Acoustic properties of vowels in clear and conversational speech by female non-native English speakers

    NASA Astrophysics Data System (ADS)

    Li, Chi-Nin; So, Connie K.

    2005-04-01

    Studies have shown that talkers can improve the intelligibility of their speech when instructed to speak as if talking to a hearing-impaired person. The improvement of speech intelligibility is associated with specific acoustic-phonetic changes: increases in vowel duration and fundamental frequency (F0), a wider pitch range, and a shift in formant frequencies for F1 and F2. Most previous studies of clear speech production have been conducted with native speakers; research with second language speakers is much less common. The present study examined the acoustic properties of non-native English vowels produced in a clear speaking style. Five female Cantonese speakers and a comparison group of English speakers were recorded producing four vowels (/i u ae a/) in /bVt/ context in conversational and clear speech. Vowel durations, F0, pitch range, and the first two formants for each of the four vowels were measured. Analyses revealed that for both groups of speakers, vowel durations, F0, pitch range, and F1 spoken clearly were greater than those produced conversationally. However, F2 was higher in conversational speech than in clear speech. The findings suggest that female non-native English speakers exhibit acoustic-phonetic patterns similar to those of native speakers when asked to produce English vowels clearly.

  18. Speech privacy and annoyance considerations in the acoustic environment of passenger cars of high-speed trains.

    PubMed

    Jeon, Jin Yong; Hong, Joo Young; Jang, Hyung Suk; Kim, Jae Hyeon

    2015-12-01

    It is necessary to consider not only annoyance of interior noises but also speech privacy to achieve acoustic comfort in a passenger car of a high-speed train because speech from other passengers can be annoying. This study aimed to explore an optimal acoustic environment to satisfy speech privacy and reduce annoyance in a passenger car. Two experiments were conducted using speech sources and compartment noise of a high speed train with varying speech-to-noise ratios (SNRA) and background noise levels (BNL). Speech intelligibility was tested in experiment I, and in experiment II, perceived speech privacy, annoyance, and acoustic comfort of combined sounds with speech and background noise were assessed. The results show that speech privacy and annoyance were significantly influenced by the SNRA. In particular, the acoustic comfort was evaluated as acceptable when the SNRA was less than -6 dB for both speech privacy and noise annoyance. In addition, annoyance increased significantly as the BNL exceeded 63 dBA, whereas the effect of the background-noise level on the speech privacy was not significant. These findings suggest that an optimal level of interior noise in a passenger car might exist between 59 and 63 dBA, taking normal speech levels into account. PMID:26723351

  19. Perceiving speech in context: Compensation for contextual variability during acoustic cue encoding and categorization

    NASA Astrophysics Data System (ADS)

    Toscano, Joseph Christopher

    Several fundamental questions about speech perception concern how listeners understand spoken language despite considerable variability in speech sounds across different contexts (the problem of lack of invariance in speech). This contextual variability is caused by several factors, including differences between individual talkers' voices, variation in speaking rate, and effects of coarticulatory context. A number of models have been proposed to describe how the speech system handles differences across contexts. Critically, these models make different predictions about (1) whether contextual variability is handled at the level of acoustic cue encoding or categorization, (2) whether it is driven by feedback from category-level processes or interactions between cues, and (3) whether listeners discard fine-grained acoustic information to compensate for contextual variability. Separating the effects of cue- and category-level processing has been difficult because behavioral measures tap processes that occur well after initial cue encoding and are influenced by task demands and linguistic information. Recently, we have used the event-related brain potential (ERP) technique to examine cue encoding and online categorization. Specifically, we have looked at differences in the auditory N1 as a measure of acoustic cue encoding and the P3 as a measure of categorization. This allows us to examine multiple levels of processing during speech perception and can provide a useful tool for studying effects of contextual variability. Here, I apply this approach to determine the point in processing at which context has an effect on speech perception and to examine whether acoustic cues are encoded continuously. Several types of contextual variability (talker gender, speaking rate, and coarticulation), as well as several acoustic cues (voice onset time, formant frequencies, and bandwidths), are examined in a series of experiments. The results suggest that (1) at early stages of speech

  20. Acoustical features of two Mayan monuments at Chichen Itza: Accident or design?

    NASA Astrophysics Data System (ADS)

    Lubman, David

    2002-11-01

    Chichen Itza dominated the early postclassic Maya world, ca. 900-1200 C.E. Two of its colossal monuments, the Great Ball Court and the temple of Kukulkan, reflect the sophisticated, hybrid culture of a Mexicanized Maya civilization. The architecture seems intended for ceremony and ritual drama. Deducing ritual practices will advance the understanding of a lost civilization, but what took place there is largely unknown. Perhaps acoustical science can add value. Unexpected and unusual acoustical features can be interpreted as intriguing clues or irrelevant accidents. Acoustical advocates believe that, when combined with an understanding of the Maya worldview, acoustical features can provide unique insights into how the Maya designed and used theater spaces. At Chichen Itza's monuments, sound reinforcement features improve rulers and priests ability to address large crowds, and Ball Court whispering galleries permit speech communication over unexpectedly large distances. Handclaps at Kukulkan stimulate chirps that mimic a revered bird (''Kukul''), thus reinforcing cultic beliefs. A ball striking playing field wall stimulates flutter echoes at the Great Ball Court; their strength and duration arguably had dramatic, mythic, and practical significance. Interpretations of the possible mythic, magic, and political significance of sound phenomena at these Maya monuments strongly suggests intentional design.

  1. Study of Harmonics-to-Noise Ratio and Critical-Band Energy Spectrum of Speech as Acoustic Indicators of Laryngeal and Voice Pathology

    NASA Astrophysics Data System (ADS)

    Shama, Kumara; krishna, Anantha; Cholayya, Niranjan U.

    2006-12-01

    Acoustic analysis of speech signals is a noninvasive technique that has been proved to be an effective tool for the objective support of vocal and voice disease screening. In the present study acoustic analysis of sustained vowels is considered. A simple[InlineEquation not available: see fulltext.]-means nearest neighbor classifier is designed to test the efficacy of a harmonics-to-noise ratio (HNR) measure and the critical-band energy spectrum of the voiced speech signal as tools for the detection of laryngeal pathologies. It groups the given voice signal sample into pathologic and normal. The voiced speech signal is decomposed into harmonic and noise components using an iterative signal extrapolation algorithm. The HNRs at four different frequency bands are estimated and used as features. Voiced speech is also filtered with 21 critical-bandpass filters that mimic the human auditory neurons. Normalized energies of these filter outputs are used as another set of features. The results obtained have shown that the HNR and the critical-band energy spectrum can be used to correlate laryngeal pathology and voice alteration, using previously classified voice samples. This method could be an additional acoustic indicator that supplements the clinical diagnostic features for voice evaluation.

  2. Speech in ALS: Longitudinal Changes in Lips and Jaw Movements and Vowel Acoustics

    PubMed Central

    Yunusova, Yana; Green, Jordan R.; Lindstrom, Mary J.; Pattee, Gary L.; Zinman, Lorne

    2015-01-01

    Purpose The goal of this exploratory study was to investigate longitudinally the changes in facial kinematics, vowel formant frequencies, and speech intelligibility in individuals diagnosed with bulbar amyotrophic lateral sclerosis (ALS). This study was motivated by the need to understand articulatory and acoustic changes with disease progression and their subsequent effect on deterioration of speech in ALS. Method Lip and jaw movements and vowel acoustics were obtained for four individuals with bulbar ALS during four consecutive recording sessions with an average interval of three months between recordings. Participants read target words embedded into sentences at a comfortable speaking rate. Maximum vertical and horizontal mouth opening and maximum jaw displacements were obtained during corner vowels. First and second formant frequencies were measured for each vowel. Speech intelligibility and speaking rate score were obtained for each session as well. Results Transient, non-vowel-specific changes in kinematics of the jaw and lips were observed. Kinematic changes often preceded changes in vowel acoustics and speech intelligibility. Conclusions Nonlinear changes in speech kinematics should be considered in evaluation of the disease effects on jaw and lip musculature. Kinematic measures might be most suitable for early detection of changes associated with bulbar ALS.

  3. Effects of contextual cues on speech recognition in simulated electric-acoustic stimulation.

    PubMed

    Kong, Ying-Yee; Donaldson, Gail; Somarowthu, Ala

    2015-05-01

    Low-frequency acoustic cues have shown to improve speech perception in cochlear-implant listeners. However, the mechanisms underlying this benefit are still not well understood. This study investigated the extent to which low-frequency cues can facilitate listeners' use of linguistic knowledge in simulated electric-acoustic stimulation (EAS). Experiment 1 examined differences in the magnitude of EAS benefit at the phoneme, word, and sentence levels. Speech materials were processed via noise-channel vocoding and lowpass (LP) filtering. The amount of spectral degradation in the vocoder speech was varied by applying different numbers of vocoder channels. Normal-hearing listeners were tested on vocoder-alone, LP-alone, and vocoder + LP conditions. Experiment 2 further examined factors that underlie the context effect on EAS benefit at the sentence level by limiting the low-frequency cues to temporal envelope and periodicity (AM + FM). Results showed that EAS benefit was greater for higher-context than for lower-context speech materials even when the LP ear received only low-frequency AM + FM cues. Possible explanations for the greater EAS benefit observed with higher-context materials may lie in the interplay between perceptual and expectation-driven processes for EAS speech recognition, and/or the band-importance functions for different types of speech materials. PMID:25994712

  4. Contributions of electric and acoustic hearing to bimodal speech and music perception.

    PubMed

    Crew, Joseph D; Galvin, John J; Landsberger, David M; Fu, Qian-Jie

    2015-01-01

    Cochlear implant (CI) users have difficulty understanding speech in noisy listening conditions and perceiving music. Aided residual acoustic hearing in the contralateral ear can mitigate these limitations. The present study examined contributions of electric and acoustic hearing to speech understanding in noise and melodic pitch perception. Data was collected with the CI only, the hearing aid (HA) only, and both devices together (CI+HA). Speech reception thresholds (SRTs) were adaptively measured for simple sentences in speech babble. Melodic contour identification (MCI) was measured with and without a masker instrument; the fundamental frequency of the masker was varied to be overlapping or non-overlapping with the target contour. Results showed that the CI contributes primarily to bimodal speech perception and that the HA contributes primarily to bimodal melodic pitch perception. In general, CI+HA performance was slightly improved relative to the better ear alone (CI-only) for SRTs but not for MCI, with some subjects experiencing a decrease in bimodal MCI performance relative to the better ear alone (HA-only). Individual performance was highly variable, and the contribution of either device to bimodal perception was both subject- and task-dependent. The results suggest that individualized mapping of CIs and HAs may further improve bimodal speech and music perception. PMID:25790349

  5. Contributions of Electric and Acoustic Hearing to Bimodal Speech and Music Perception

    PubMed Central

    Crew, Joseph D.; Galvin III, John J.; Landsberger, David M.; Fu, Qian-Jie

    2015-01-01

    Cochlear implant (CI) users have difficulty understanding speech in noisy listening conditions and perceiving music. Aided residual acoustic hearing in the contralateral ear can mitigate these limitations. The present study examined contributions of electric and acoustic hearing to speech understanding in noise and melodic pitch perception. Data was collected with the CI only, the hearing aid (HA) only, and both devices together (CI+HA). Speech reception thresholds (SRTs) were adaptively measured for simple sentences in speech babble. Melodic contour identification (MCI) was measured with and without a masker instrument; the fundamental frequency of the masker was varied to be overlapping or non-overlapping with the target contour. Results showed that the CI contributes primarily to bimodal speech perception and that the HA contributes primarily to bimodal melodic pitch perception. In general, CI+HA performance was slightly improved relative to the better ear alone (CI-only) for SRTs but not for MCI, with some subjects experiencing a decrease in bimodal MCI performance relative to the better ear alone (HA-only). Individual performance was highly variable, and the contribution of either device to bimodal perception was both subject- and task-dependent. The results suggest that individualized mapping of CIs and HAs may further improve bimodal speech and music perception. PMID:25790349

  6. Effects of contextual cues on speech recognition in simulated electric-acoustic stimulation

    PubMed Central

    Kong, Ying-Yee; Donaldson, Gail; Somarowthu, Ala

    2015-01-01

    Low-frequency acoustic cues have shown to improve speech perception in cochlear-implant listeners. However, the mechanisms underlying this benefit are still not well understood. This study investigated the extent to which low-frequency cues can facilitate listeners' use of linguistic knowledge in simulated electric-acoustic stimulation (EAS). Experiment 1 examined differences in the magnitude of EAS benefit at the phoneme, word, and sentence levels. Speech materials were processed via noise-channel vocoding and lowpass (LP) filtering. The amount of spectral degradation in the vocoder speech was varied by applying different numbers of vocoder channels. Normal-hearing listeners were tested on vocoder-alone, LP-alone, and vocoder + LP conditions. Experiment 2 further examined factors that underlie the context effect on EAS benefit at the sentence level by limiting the low-frequency cues to temporal envelope and periodicity (AM + FM). Results showed that EAS benefit was greater for higher-context than for lower-context speech materials even when the LP ear received only low-frequency AM + FM cues. Possible explanations for the greater EAS benefit observed with higher-context materials may lie in the interplay between perceptual and expectation-driven processes for EAS speech recognition, and/or the band-importance functions for different types of speech materials. PMID:25994712

  7. Acoustic and perceptual evaluation of category goodness of /t/ and /k/ in typical and misarticulated children's speech.

    PubMed

    Strömbergsson, Sofia; Salvi, Giampiero; House, David

    2015-06-01

    This investigation explores perceptual and acoustic characteristics of children's successful and unsuccessful productions of /t/ and /k/, with a specific aim of exploring perceptual sensitivity to phonetic detail, and the extent to which this sensitivity is reflected in the acoustic domain. Recordings were collected from 4- to 8-year-old children with a speech sound disorder (SSD) who misarticulated one of the target plosives, and compared to productions recorded from peers with typical speech development (TD). Perceptual responses were registered with regards to a visual-analog scale, ranging from "clear [t]" to "clear [k]." Statistical models of prototypical productions were built, based on spectral moments and discrete cosine transform features, and used in the scoring of SSD productions. In the perceptual evaluation, "clear substitutions" were rated as less prototypical than correct productions. Moreover, target-appropriate productions of /t/ and /k/ produced by children with SSD were rated as less prototypical than those produced by TD peers. The acoustical modeling could to a large extent discriminate between the gross categories /t/ and /k/, and scored the SSD utterances on a continuous scale that was largely consistent with the category of production. However, none of the methods exhibited the same sensitivity to phonetic detail as the human listeners. PMID:26093431

  8. Disordered speech disrupts conversational entrainment: a study of acoustic-prosodic entrainment and communicative success in populations with communication challenges.

    PubMed

    Borrie, Stephanie A; Lubold, Nichola; Pon-Barry, Heather

    2015-01-01

    Conversational entrainment, a pervasive communication phenomenon in which dialogue partners adapt their behaviors to align more closely with one another, is considered essential for successful spoken interaction. While well-established in other disciplines, this phenomenon has received limited attention in the field of speech pathology and the study of communication breakdowns in clinical populations. The current study examined acoustic-prosodic entrainment, as well as a measure of communicative success, in three distinctly different dialogue groups: (i) healthy native vs. healthy native speakers (Control), (ii) healthy native vs. foreign-accented speakers (Accented), and (iii) healthy native vs. dysarthric speakers (Disordered). Dialogue group comparisons revealed significant differences in how the groups entrain on particular acoustic-prosodic features, including pitch, intensity, and jitter. Most notably, the Disordered dialogues were characterized by significantly less acoustic-prosodic entrainment than the Control dialogues. Further, a positive relationship between entrainment indices and communicative success was identified. These results suggest that the study of conversational entrainment in speech pathology will have essential implications for both scientific theory and clinical application in this domain. PMID:26321996

  9. Classifying acoustic signals into phoneme categories: average and dyslexic readers make use of complex dynamical patterns and multifractal scaling properties of the speech signal

    PubMed Central

    2015-01-01

    Several competing aetiologies of developmental dyslexia suggest that the problems with acquiring literacy skills are causally entailed by low-level auditory and/or speech perception processes. The purpose of this study is to evaluate the diverging claims about the specific deficient peceptual processes under conditions of strong inference. Theoretically relevant acoustic features were extracted from a set of artificial speech stimuli that lie on a /bAk/-/dAk/ continuum. The features were tested on their ability to enable a simple classifier (Quadratic Discriminant Analysis) to reproduce the observed classification performance of average and dyslexic readers in a speech perception experiment. The ‘classical’ features examined were based on component process accounts of developmental dyslexia such as the supposed deficit in Envelope Rise Time detection and the deficit in the detection of rapid changes in the distribution of energy in the frequency spectrum (formant transitions). Studies examining these temporal processing deficit hypotheses do not employ measures that quantify the temporal dynamics of stimuli. It is shown that measures based on quantification of the dynamics of complex, interaction-dominant systems (Recurrence Quantification Analysis and the multifractal spectrum) enable QDA to classify the stimuli almost identically as observed in dyslexic and average reading participants. It seems unlikely that participants used any of the features that are traditionally associated with accounts of (impaired) speech perception. The nature of the variables quantifying the temporal dynamics of the speech stimuli imply that the classification of speech stimuli cannot be regarded as a linear aggregate of component processes that each parse the acoustic signal independent of one another, as is assumed by the ‘classical’ aetiologies of developmental dyslexia. It is suggested that the results imply that the differences in speech perception performance between

  10. Classifying acoustic signals into phoneme categories: average and dyslexic readers make use of complex dynamical patterns and multifractal scaling properties of the speech signal.

    PubMed

    Hasselman, Fred

    2015-01-01

    Several competing aetiologies of developmental dyslexia suggest that the problems with acquiring literacy skills are causally entailed by low-level auditory and/or speech perception processes. The purpose of this study is to evaluate the diverging claims about the specific deficient peceptual processes under conditions of strong inference. Theoretically relevant acoustic features were extracted from a set of artificial speech stimuli that lie on a /bAk/-/dAk/ continuum. The features were tested on their ability to enable a simple classifier (Quadratic Discriminant Analysis) to reproduce the observed classification performance of average and dyslexic readers in a speech perception experiment. The 'classical' features examined were based on component process accounts of developmental dyslexia such as the supposed deficit in Envelope Rise Time detection and the deficit in the detection of rapid changes in the distribution of energy in the frequency spectrum (formant transitions). Studies examining these temporal processing deficit hypotheses do not employ measures that quantify the temporal dynamics of stimuli. It is shown that measures based on quantification of the dynamics of complex, interaction-dominant systems (Recurrence Quantification Analysis and the multifractal spectrum) enable QDA to classify the stimuli almost identically as observed in dyslexic and average reading participants. It seems unlikely that participants used any of the features that are traditionally associated with accounts of (impaired) speech perception. The nature of the variables quantifying the temporal dynamics of the speech stimuli imply that the classification of speech stimuli cannot be regarded as a linear aggregate of component processes that each parse the acoustic signal independent of one another, as is assumed by the 'classical' aetiologies of developmental dyslexia. It is suggested that the results imply that the differences in speech perception performance between average and

  11. Noise Robust Feature Scheme for Automatic Speech Recognition Based on Auditory Perceptual Mechanisms

    NASA Astrophysics Data System (ADS)

    Cai, Shang; Xiao, Yeming; Pan, Jielin; Zhao, Qingwei; Yan, Yonghong

    Mel Frequency Cepstral Coefficients (MFCC) are the most popular acoustic features used in automatic speech recognition (ASR), mainly because the coefficients capture the most useful information of the speech and fit well with the assumptions used in hidden Markov models. As is well known, MFCCs already employ several principles which have known counterparts in the peripheral properties of human hearing: decoupling across frequency, mel-warping of the frequency axis, log-compression of energy, etc. It is natural to introduce more mechanisms in the auditory periphery to improve the noise robustness of MFCC. In this paper, a k-nearest neighbors based frequency masking filter is proposed to reduce the audibility of spectra valleys which are sensitive to noise. Besides, Moore and Glasberg's critical band equivalent rectangular bandwidth (ERB) expression is utilized to determine the filter bandwidth. Furthermore, a new bandpass infinite impulse response (IIR) filter is proposed to imitate the temporal masking phenomenon of the human auditory system. These three auditory perceptual mechanisms are combined with the standard MFCC algorithm in order to investigate their effects on ASR performance, and a revised MFCC extraction scheme is presented. Recognition performances with the standard MFCC, RASTA perceptual linear prediction (RASTA-PLP) and the proposed feature extraction scheme are evaluated on a medium-vocabulary isolated-word recognition task and a more complex large vocabulary continuous speech recognition (LVCSR) task. Experimental results show that consistent robustness against background noise is achieved on these two tasks, and the proposed method outperforms both the standard MFCC and RASTA-PLP.

  12. Prosodic Features and Speech Naturalness in Individuals with Dysarthria

    ERIC Educational Resources Information Center

    Klopfenstein, Marie I.

    2012-01-01

    Despite the importance of speech naturalness to treatment outcomes, little research has been done on what constitutes speech naturalness and how to best maximize naturalness in relationship to other treatment goals like intelligibility. In addition, previous literature alludes to the relationship between prosodic aspects of speech and speech…

  13. Influence of Architectural Features and Styles on Various Acoustical Measures in Churches

    NASA Astrophysics Data System (ADS)

    Carvalho, Antonio Pedro Oliveira De.

    This work reports on acoustical field measurements made in a major survey of 41 Catholic churches in Portugal that were built in the last 14 centuries. A series of monaural and binaural acoustical measurements was taken at multiple source/receiver positions in each church using the impulse response with noise burst method. The acoustical measures were Reverberation Time (RT), Early Decay Time (EDT), Clarity (C80), Definition (D), Center Time (TS), Loudness (L), Bass Ratios based on the Reverberation Time and Loudness rm (BR_-RT and rm BR_-L), Rapid Speech Transmission Index (RASTI), and the binaural Coherence (COH). The scope of this research is to investigate how the acoustical performance of Catholic churches relates to their architectural features and to determine simple formulas to predict acoustical measures by the use of elementary architectural parameters. Prediction equations were defined among the acoustical measures to estimate values at individual locations within each room as well as the mean values in each church. Best fits with rm R^2~0.9 were not uncommon among many of the measures. Within and interchurch differences in the data for the acoustical measures were also analyzed. The variations of RT and EDT were identified as much smaller than the variations of the other measures. The churches tested were grouped in eight architectural styles, and the effect of their evolution through time on these acoustical measures was investigated. Statistically significant differences were found regarding some architectural styles that can be traced to historical changes in Church history, especially to the Reformation period. Prediction equations were defined to estimate mean acoustical measures by the use of fifteen simple architectural parameters. The use of the Sabine and Eyring reverberation time equations was tested. The effect of coupled spaces was analyzed, and a new algorithm for the application of the Sabine equation was developed, achieving an average of

  14. Vowel Acoustics in Dysarthria: Speech Disorder Diagnosis and Classification

    ERIC Educational Resources Information Center

    Lansford, Kaitlin L.; Liss, Julie M.

    2014-01-01

    Purpose: The purpose of this study was to determine the extent to which vowel metrics are capable of distinguishing healthy from dysarthric speech and among different forms of dysarthria. Method: A variety of vowel metrics were derived from spectral and temporal measurements of vowel tokens embedded in phrases produced by 45 speakers with…

  15. Fundamental frequency is critical to speech perception in noise in combined acoustic and electric hearinga

    PubMed Central

    Carroll, Jeff; Tiaden, Stephanie; Zeng, Fan-Gang

    2011-01-01

    Cochlear implant (CI) users have been shown to benefit from residual low-frequency hearing, specifically in pitch related tasks. It remains unclear whether this benefit is dependent on fundamental frequency (F0) or other acoustic cues. Three experiments were conducted to determine the role of F0, as well as its frequency modulated (FM) and amplitude modulated (AM) components, in speech recognition with a competing voice. In simulated CI listeners, the signal-to-noise ratio was varied to estimate the 50% correct response. Simulation results showed that the F0 cue contributes to a significant proportion of the benefit seen with combined acoustic and electric hearing, and additionally that this benefit is due to the FM rather than the AM component. In actual CI users, sentence recognition scores were collected with either the full F0 cue containing both the FM and AM components or the 500-Hz low-pass speech cue containing the F0 and additional harmonics. The F0 cue provided a benefit similar to the low-pass cue for speech in noise, but not in quiet. Poorer CI users benefited more from the F0 cue than better users. These findings suggest that F0 is critical to improving speech perception in noise in combined acoustic and electric hearing. PMID:21973360

  16. A Chimpanzee Recognizes Synthetic Speech With Significantly Reduced Acoustic Cues to Phonetic Content

    PubMed Central

    Heimbauer, Lisa A.; Beran, Michael J.; Owren, Michael J.

    2011-01-01

    Summary A long-standing debate concerns whether humans are specialized for speech perception [1–7], which some researchers argue is demonstrated by the ability to understand synthetic speech with significantly reduced acoustic cues to phonetic content [2–4,7]. We tested a chimpanzee (Pan troglodytes) that recognizes 128 spoken words [8,9], asking whether she could understand such speech. Three experiments presented 48 individual words, with the animal selecting a corresponding visuo-graphic symbol from among four alternatives. Experiment 1 tested spectrally reduced, noise-vocoded (NV) synthesis, originally developed to simulate input received by human cochlear-implant users [10]. Experiment 2 tested “impossibly unspeechlike” [3] sine-wave (SW) synthesis, which reduces speech to just three moving tones [11]. Although receiving only intermittent and non-contingent reward, the chimpanzee performed well above chance level, including when hearing synthetic versions for the first time. Recognition of SW words was least accurate, but improved in Experiment 3 when natural words in the same session were rewarded. The chimpanzee was more accurate with NV than SW versions, as were 32 human participants hearing these items. The chimpanzee's ability to spontaneously recognize acoustically reduced synthetic words suggests that experience rather than specialization is critical for speech-perception capabilities that some have suggested are uniquely human [12–14]. PMID:21723125

  17. From prosodic structure to acoustic saliency: A fMRI investigation of speech rate, clarity, and emphasis

    NASA Astrophysics Data System (ADS)

    Golfinopoulos, Elisa

    Acoustic variability in fluent speech can arise at many stages in speech production planning and execution. For example, at the phonological encoding stage, the grouping of phonemes into syllables determines which segments are coarticulated and, by consequence, segment-level acoustic variation. Likewise phonetic encoding, which determines the spatiotemporal extent of articulatory gestures, will affect the acoustic detail of segments. Functional magnetic resonance imaging (fMRI) was used to measure brain activity of fluent adult speakers in four speaking conditions: fast, normal, clear, and emphatic (or stressed) speech. These speech manner changes typically result in acoustic variations that do not change the lexical or semantic identity of productions but do affect the acoustic saliency of phonemes, syllables and/or words. Acoustic responses recorded inside the scanner were assessed quantitatively using eight acoustic measures and sentence duration was used as a covariate of non-interest in the neuroimaging analysis. Compared to normal speech, emphatic speech was characterized acoustically by a greater difference between stressed and unstressed vowels in intensity, duration, and fundamental frequency, and neurally by increased activity in right middle premotor cortex and supplementary motor area, and bilateral primary sensorimotor cortex. These findings are consistent with right-lateralized motor planning of prosodic variation in emphatic speech. Clear speech involved an increase in average vowel and sentence durations and average vowel spacing, along with increased activity in left middle premotor cortex and bilateral primary sensorimotor cortex. These findings are consistent with an increased reliance on feedforward control, resulting in hyper-articulation, under clear as compared to normal speech. Fast speech was characterized acoustically by reduced sentence duration and average vowel spacing, and neurally by increased activity in left anterior frontal

  18. Monaural speech intelligibility and detection in maskers with varying amounts of spectro-temporal speech features.

    PubMed

    Schubotz, Wiebke; Brand, Thomas; Kollmeier, Birger; Ewert, Stephan D

    2016-07-01

    Speech intelligibility is strongly affected by the presence of maskers. Depending on the spectro-temporal structure of the masker and its similarity to the target speech, different masking aspects can occur which are typically referred to as energetic, amplitude modulation, and informational masking. In this study speech intelligibility and speech detection was measured in maskers that vary systematically in the time-frequency domain from steady-state noise to a single interfering talker. Male and female target speech was used in combination with maskers based on speech for the same or different gender. Observed data were compared to predictions of the speech intelligibility index, extended speech intelligibility index, multi-resolution speech-based envelope-power-spectrum model, and the short-time objective intelligibility measure. The different models served as analysis tool to help distinguish between the different masking aspects. Comparison shows that overall masking can to a large extent be explained by short-term energetic masking. However, the other masking aspects (amplitude modulation an informational masking) influence speech intelligibility as well. Additionally, it was obvious that all models showed considerable deviations from the data. Therefore, the current study provides a benchmark for further evaluation of speech prediction models. PMID:27475175

  19. Time-expanded speech and speech recognition in older adults.

    PubMed

    Vaughan, Nancy E; Furukawa, Izumi; Balasingam, Nirmala; Mortz, Margaret; Fausti, Stephen A

    2002-01-01

    Speech understanding deficits are common in older adults. In addition to hearing sensitivity, changes in certain cognitive functions may affect speech recognition. One such change that may impact the ability to follow a rapidly changing speech signal is processing speed. When speakers slow the rate of their speech naturally in order to speak clearly, speech recognition is improved. The acoustic characteristics of naturally slowed speech are of interest in developing time-expansion algorithms to improve speech recognition for older listeners. In this study, we tested younger normally hearing, older normally hearing, and older hearing-impaired listeners on time-expanded speech using increased duration and increased intensity of unvoiced consonants. Although all groups performed best on unprocessed speech, performance with processed speech was better with the consonant gain feature without time expansion in the noise condition and better at the slowest time-expanded rate in the quiet condition. The effects of signal processing on speech recognition are discussed. PMID:17642020

  20. Acoustic-Phonetic Differences between Infant- and Adult-Directed Speech: The Role of Stress and Utterance Position

    ERIC Educational Resources Information Center

    Wang, Yuanyuan; Seidl, Amanda; Cristia, Alejandrina

    2015-01-01

    Previous studies have shown that infant-directed speech (IDS) differs from adult-directed speech (ADS) on a variety of dimensions. The aim of the current study was to investigate whether acoustic differences between IDS and ADS in English are modulated by prosodic structure. We compared vowels across the two registers (IDS, ADS) in both stressed…

  1. Speech after Radial Forearm Free Flap Reconstruction of the Tongue: A Longitudinal Acoustic Study of Vowel and Diphthong Sounds

    ERIC Educational Resources Information Center

    Laaksonen, Juha-Pertti; Rieger, Jana; Happonen, Risto-Pekka; Harris, Jeffrey; Seikaly, Hadi

    2010-01-01

    The purpose of this study was to use acoustic analyses to describe speech outcomes over the course of 1 year after radial forearm free flap (RFFF) reconstruction of the tongue. Eighteen Canadian English-speaking females and males with reconstruction for oral cancer had speech samples recorded (pre-operative, and 1 month, 6 months, and 1 year…

  2. An eighth-scale speech source for subjective assessments in acoustic models

    NASA Astrophysics Data System (ADS)

    Orlowski, R. J.

    1981-08-01

    The design of a source is described which is suitable for making speech recordings in eighth-scale acoustic models of auditoria. An attempt was made to match the directionality of the source with the directionality of the human voice using data reported in the literature. A narrow aperture was required for the design which was provided by mounting an inverted conical horn over the diaphragm of a high frequency loudspeaker. Resonance problems were encountered with the use of a horn and a description is given of the electronic techniques adopted to minimize the effect of these resonances. Subjective and objective assessments on the completed speech source have proved satisfactory. It has been used in a modelling exercise concerned with the acoustic design of a theatre with a thrust-type stage.

  3. Decoding spectrotemporal features of overt and covert speech from the human cortex

    PubMed Central

    Martin, Stéphanie; Brunner, Peter; Holdgraf, Chris; Heinze, Hans-Jochen; Crone, Nathan E.; Rieger, Jochem; Schalk, Gerwin; Knight, Robert T.; Pasley, Brian N.

    2014-01-01

    Auditory perception and auditory imagery have been shown to activate overlapping brain regions. We hypothesized that these phenomena also share a common underlying neural representation. To assess this, we used electrocorticography intracranial recordings from epileptic patients performing an out loud or a silent reading task. In these tasks, short stories scrolled across a video screen in two conditions: subjects read the same stories both aloud (overt) and silently (covert). In a control condition the subject remained in a resting state. We first built a high gamma (70–150 Hz) neural decoding model to reconstruct spectrotemporal auditory features of self-generated overt speech. We then evaluated whether this same model could reconstruct auditory speech features in the covert speech condition. Two speech models were tested: a spectrogram and a modulation-based feature space. For the overt condition, reconstruction accuracy was evaluated as the correlation between original and predicted speech features, and was significant in each subject (p < 10−5; paired two-sample t-test). For the covert speech condition, dynamic time warping was first used to realign the covert speech reconstruction with the corresponding original speech from the overt condition. Reconstruction accuracy was then evaluated as the correlation between original and reconstructed speech features. Covert reconstruction accuracy was compared to the accuracy obtained from reconstructions in the baseline control condition. Reconstruction accuracy for the covert condition was significantly better than for the control condition (p < 0.005; paired two-sample t-test). The superior temporal gyrus, pre- and post-central gyrus provided the highest reconstruction information. The relationship between overt and covert speech reconstruction depended on anatomy. These results provide evidence that auditory representations of covert speech can be reconstructed from models that are built from an overt speech

  4. Disordered speech disrupts conversational entrainment: a study of acoustic-prosodic entrainment and communicative success in populations with communication challenges

    PubMed Central

    Borrie, Stephanie A.; Lubold, Nichola; Pon-Barry, Heather

    2015-01-01

    Conversational entrainment, a pervasive communication phenomenon in which dialogue partners adapt their behaviors to align more closely with one another, is considered essential for successful spoken interaction. While well-established in other disciplines, this phenomenon has received limited attention in the field of speech pathology and the study of communication breakdowns in clinical populations. The current study examined acoustic-prosodic entrainment, as well as a measure of communicative success, in three distinctly different dialogue groups: (i) healthy native vs. healthy native speakers (Control), (ii) healthy native vs. foreign-accented speakers (Accented), and (iii) healthy native vs. dysarthric speakers (Disordered). Dialogue group comparisons revealed significant differences in how the groups entrain on particular acoustic–prosodic features, including pitch, intensity, and jitter. Most notably, the Disordered dialogues were characterized by significantly less acoustic-prosodic entrainment than the Control dialogues. Further, a positive relationship between entrainment indices and communicative success was identified. These results suggest that the study of conversational entrainment in speech pathology will have essential implications for both scientific theory and clinical application in this domain. PMID:26321996

  5. Reverberant speech recognition combining deep neural networks and deep autoencoders augmented with a phone-class feature

    NASA Astrophysics Data System (ADS)

    Mimura, Masato; Sakai, Shinsuke; Kawahara, Tatsuya

    2015-12-01

    We propose an approach to reverberant speech recognition adopting deep learning in the front-end as well as b a c k-e n d o f a r e v e r b e r a n t s p e e c h r e c o g n i t i o n s y s t e m, a n d a n o v e l m e t h o d t o i m p r o v e t h e d e r e v e r b e r a t i o n p e r f o r m a n c e of the front-end network using phone-class information. At the front-end, we adopt a deep autoencoder (DAE) for enhancing the speech feature parameters, and speech recognition is performed in the back-end using DNN-HMM acoustic models trained on multi-condition data. The system was evaluated through the ASR task in the Reverb Challenge 2014. The DNN-HMM system trained on the multi-condition training set achieved a conspicuously higher word accuracy compared to the MLLR-adapted GMM-HMM system trained on the same data. Furthermore, feature enhancement with the deep autoencoder contributed to the improvement of recognition accuracy especially in the more adverse conditions. While the mapping between reverberant and clean speech in DAE-based dereverberation is conventionally conducted only with the acoustic information, we presume the mapping is also dependent on the phone information. Therefore, we propose a new scheme (pDAE), which augments a phone-class feature to the standard acoustic features as input. Two types of the phone-class feature are investigated. One is the hard recognition result of monophones, and the other is a soft representation derived from the posterior outputs of monophone DNN. The augmented feature in either type results in a significant improvement (7-8 % relative) from the standard DAE.

  6. Feature enhancement of reverberant speech by distribution matching and non-negative matrix factorization

    NASA Astrophysics Data System (ADS)

    Keronen, Sami; Kallasjoki, Heikki; Palomäki, Kalle J.; Brown, Guy J.; Gemmeke, Jort F.

    2015-12-01

    This paper describes a novel two-stage dereverberation feature enhancement method for noise-robust automatic speech recognition. In the first stage, an estimate of the dereverberated speech is generated by matching the distribution of the observed reverberant speech to that of clean speech, in a decorrelated transformation domain that has a long temporal context in order to address the effects of reverberation. The second stage uses this dereverberated signal as an initial estimate within a non-negative matrix factorization framework, which jointly estimates a sparse representation of the clean speech signal and an estimate of the convolutional distortion. The proposed feature enhancement method, when used in conjunction with automatic speech recognizer back-end processing, is shown to improve the recognition performance compared to three other state-of-the-art techniques.

  7. Are you a good mimic? Neuro-acoustic signatures for speech imitation ability

    PubMed Central

    Reiterer, Susanne M.; Hu, Xiaochen; Sumathi, T. A.; Singh, Nandini C.

    2013-01-01

    We investigated individual differences in speech imitation ability in late bilinguals using a neuro-acoustic approach. One hundred and thirty-eight German-English bilinguals matched on various behavioral measures were tested for “speech imitation ability” in a foreign language, Hindi, and categorized into “high” and “low ability” groups. Brain activations and speech recordings were obtained from 26 participants from the two extreme groups as they performed a functional neuroimaging experiment which required them to “imitate” sentences in three conditions: (A) German, (B) English, and (C) German with fake English accent. We used recently developed novel acoustic analysis, namely the “articulation space” as a metric to compare speech imitation abilities of the two groups. Across all three conditions, direct comparisons between the two groups, revealed brain activations (FWE corrected, p < 0.05) that were more widespread with significantly higher peak activity in the left supramarginal gyrus and postcentral areas for the low ability group. The high ability group, on the other hand showed significantly larger articulation space in all three conditions. In addition, articulation space also correlated positively with imitation ability (Pearson's r = 0.7, p < 0.01). Our results suggest that an expanded articulation space for high ability individuals allows access to a larger repertoire of sounds, thereby providing skilled imitators greater flexibility in pronunciation and language learning. PMID:24155739

  8. Acoustic-Emergent Phonology in the Amplitude Envelope of Child-Directed Speech

    PubMed Central

    Leong, Victoria; Goswami, Usha

    2015-01-01

    When acquiring language, young children may use acoustic spectro-temporal patterns in speech to derive phonological units in spoken language (e.g., prosodic stress patterns, syllables, phonemes). Children appear to learn acoustic-phonological mappings rapidly, without direct instruction, yet the underlying developmental mechanisms remain unclear. Across different languages, a relationship between amplitude envelope sensitivity and phonological development has been found, suggesting that children may make use of amplitude modulation (AM) patterns within the envelope to develop a phonological system. Here we present the Spectral Amplitude Modulation Phase Hierarchy (S-AMPH) model, a set of algorithms for deriving the dominant AM patterns in child-directed speech (CDS). Using Principal Components Analysis, we show that rhythmic CDS contains an AM hierarchy comprising 3 core modulation timescales. These timescales correspond to key phonological units: prosodic stress (Stress AM, ~2 Hz), syllables (Syllable AM, ~5 Hz) and onset-rime units (Phoneme AM, ~20 Hz). We argue that these AM patterns could in principle be used by naïve listeners to compute acoustic-phonological mappings without lexical knowledge. We then demonstrate that the modulation statistics within this AM hierarchy indeed parse the speech signal into a primitive hierarchically-organised phonological system comprising stress feet (proto-words), syllables and onset-rime units. We apply the S-AMPH model to two other CDS corpora, one spontaneous and one deliberately-timed. The model accurately identified 72–82% (freely-read CDS) and 90–98% (rhythmically-regular CDS) stress patterns, syllables and onset-rime units. This in-principle demonstration that primitive phonology can be extracted from speech AMs is termed Acoustic-Emergent Phonology (AEP) theory. AEP theory provides a set of methods for examining how early phonological development is shaped by the temporal modulation structure of speech across

  9. Particle Swarm Optimization Based Feature Enhancement and Feature Selection for Improved Emotion Recognition in Speech and Glottal Signals

    PubMed Central

    Muthusamy, Hariharan; Polat, Kemal; Yaacob, Sazali

    2015-01-01

    In the recent years, many research works have been published using speech related features for speech emotion recognition, however, recent studies show that there is a strong correlation between emotional states and glottal features. In this work, Mel-frequency cepstralcoefficients (MFCCs), linear predictive cepstral coefficients (LPCCs), perceptual linear predictive (PLP) features, gammatone filter outputs, timbral texture features, stationary wavelet transform based timbral texture features and relative wavelet packet energy and entropy features were extracted from the emotional speech (ES) signals and its glottal waveforms(GW). Particle swarm optimization based clustering (PSOC) and wrapper based particle swarm optimization (WPSO) were proposed to enhance the discerning ability of the features and to select the discriminating features respectively. Three different emotional speech databases were utilized to gauge the proposed method. Extreme learning machine (ELM) was employed to classify the different types of emotions. Different experiments were conducted and the results show that the proposed method significantly improves the speech emotion recognition performance compared to previous works published in the literature. PMID:25799141

  10. Particle swarm optimization based feature enhancement and feature selection for improved emotion recognition in speech and glottal signals.

    PubMed

    Muthusamy, Hariharan; Polat, Kemal; Yaacob, Sazali

    2015-01-01

    In the recent years, many research works have been published using speech related features for speech emotion recognition, however, recent studies show that there is a strong correlation between emotional states and glottal features. In this work, Mel-frequency cepstralcoefficients (MFCCs), linear predictive cepstral coefficients (LPCCs), perceptual linear predictive (PLP) features, gammatone filter outputs, timbral texture features, stationary wavelet transform based timbral texture features and relative wavelet packet energy and entropy features were extracted from the emotional speech (ES) signals and its glottal waveforms(GW). Particle swarm optimization based clustering (PSOC) and wrapper based particle swarm optimization (WPSO) were proposed to enhance the discerning ability of the features and to select the discriminating features respectively. Three different emotional speech databases were utilized to gauge the proposed method. Extreme learning machine (ELM) was employed to classify the different types of emotions. Different experiments were conducted and the results show that the proposed method significantly improves the speech emotion recognition performance compared to previous works published in the literature. PMID:25799141

  11. The Effectiveness of Clear Speech as a Masker

    ERIC Educational Resources Information Center

    Calandruccio, Lauren; Van Engen, Kristin; Dhar, Sumitrajit; Bradlow, Ann R.

    2010-01-01

    Purpose: It is established that speaking clearly is an effective means of enhancing intelligibility. Because any signal-processing scheme modeled after known acoustic-phonetic features of clear speech will likely affect both target and competing speech, it is important to understand how speech recognition is affected when a competing speech signal…

  12. Segment-based acoustic models for continuous speech recognition

    NASA Astrophysics Data System (ADS)

    Ostendorf, Mari; Rohlicek, J. R.

    1993-07-01

    This research aims to develop new and more accurate stochastic models for speaker-independent continuous speech recognition, by extending previous work in segment-based modeling and by introducing a new hierarchical approach to representing intra-utterance statistical dependencies. These techniques, which are more costly than traditional approaches because of the large search space associated with higher order models, are made feasible through rescoring a set of HMM-generated N-best sentence hypotheses. We expect these different modeling techniques to result in improved recognition performance over that achieved by current systems, which handle only frame-based observations and assume that these observations are independent given an underlying state sequence. In the fourth quarter of the project, we have completed the following: (1) ported our recognition system to the Wall Street Journal task, a standard task in the ARPA community; (2) developed an initial dependency-tree model of intra-utterance observation correlation; and (3) implemented baseline language model estimation software. Our initial results on the Wall Street Journal task are quite good and represent significantly improved performance over most HMM systems reporting on the Nov. 1992 5k vocabulary test set.

  13. Voice-over: perceptual and acoustic analysis of vocal features.

    PubMed

    Medrado, Reny; Ferreira, Leslie Piccolotto; Behlau, Mara

    2005-09-01

    Voice-overs are professional voice users who use their voices to market products in the electronic media. The purposes of this study were to (1) analyze voice-overed and non-overed productions of an advertising text in two groups consisting of 10 male professional voice-overs and 10 male non-voice-overs; and (2) determine specific acoustic features of voice-over productions in both groups. A naïve group of listeners were engaged for the perceptual analysis of the recorded advertising text. The voice-overed production samples from both groups were submitted for analysis of acoustic and temporal features. The following parameters were analyzed: (1) the total text length, (2) the length of the three emphatic pauses, (3) values of the mean, (4) minimum, (5) maximum fundamental frequency, and (6) the semitone range. The majority of voice-overs and non-voice-overs were correctly identified by the listeners in both productions. However voice-overs were more consistently correctly identified than non-voice-overs. The total text length was greater for voice-overs. The pause time distribution was statistically more homogeneous for the voice-overs. The acoustic analysis indicated that the voice-overs had lower values of mean, minimum, and maximum fundamental frequency and a greater range of semitones. The voice-overs carry the voice-overed production features to their non-voice-overed production. PMID:16102662

  14. Is there a correlation between Japanese L2 learner's perception of English stressed words and acoustic features?

    NASA Astrophysics Data System (ADS)

    Asano, Keiko; Isei-Jakkola, Toshiko

    2003-10-01

    Is there a correlation between Japanese L2 learner's perception of English stressed words and acoustic features? [Keiko Asano (Yokohama National University, ll-ed@ynu.ac.jp) and Toshiko Isei-jaakkola (University of Helsinki)]. It is well known that the Japanese have weakness in listening to unstressed words in English, but there are less data on their perception of stressed words. Thus, the listening tests and the acoustic experiments were conducted in terms of (1) relevancy of difficulites depending on part of speech and their English proficiency, (2) the relationship between pitch and intensity of stressed words, and (3) if there is a correlation between their perception and experimental data. In the listening test, an English prose read by an American male speaker was used. The 150 Japanese L2 learners were assigned to mark the primary stressed words. The statistical results showed that there was a variance depending on part of speech and more markedly the comparative rating scores of correct words were highly correlated to the learner's English proficiency in any part of speech. In the acoustic experiments, pitch and intensity were measured. It was confirmed that (1) both F0 and dB carried the cue to perceive a stressed-word but they were not necessarily correlated, and (2) the relationship between F0 and dB might be compared only by relative movement. By further analyzing these acoustic data, prosodic combination of F0 and dB might be relevant to the correct ratios of part of speech.

  15. Dimensional feature weighting utilizing multiple kernel learning for single-channel talker location discrimination using the acoustic transfer function.

    PubMed

    Takashima, Ryoichi; Takiguchi, Tetsuya; Ariki, Yasuo

    2013-02-01

    This paper presents a method for discriminating the location of the sound source (talker) using only a single microphone. In a previous work, the single-channel approach for discriminating the location of the sound source was discussed, where the acoustic transfer function from a user's position is estimated by using a hidden Markov model of clean speech in the cepstral domain. In this paper, each cepstral dimension of the acoustic transfer function is newly weighted, in order to obtain the cepstral dimensions having information that is useful for classifying the user's position. Then, this paper proposes a feature-weighting method for the cepstral parameter using multiple kernel learning, defining the base kernels for each cepstral dimension of the acoustic transfer function. The user's position is trained and classified by support vector machine. The effectiveness of this method has been confirmed by sound source (talker) localization experiments performed in different room environments. PMID:23363107

  16. Effect of several acoustic cues on perceiving Mandarin retroflex affricates and fricatives in continuous speech.

    PubMed

    Zhu, Jian; Chen, Yaping

    2016-07-01

    Relatively little attention has been paid to the perception of the three-way contrast between unaspirated affricates, aspirated affricates and fricatives in Mandarin Chinese. This study reports two experiments that explore the acoustic cues relevant to the contrast between the Mandarin retroflex series /tʂ/, /tʂ(h)/ and /ʂ/ in continuous speech. Twenty participants performed two three-alternative forced-choice tasks, in which acoustic cues including closure, frication duration (FD), aspiration, and vocalic contexts (VCs) were systematically manipulated and presented in a carrier phrase. A subsequent classification tree analysis shows that FD distinguishes /tʂ/ from /tʂ(h)/ and /ʂ/, and that closure cues the affricate manner. Interactions between VC and individual cues are also found. The FD threshold for separating /ʂ/ and /tʂ/ is susceptible to the influence of the following vocalic segments, shifting to lower values if frication is followed by the low vowel /a/. On the other hand, while aspiration cues /tʂ(h)/ before /a/ and //, this acoustic cue is obscured by gesture continuation when /tʂ(h)/ precedes its homorganic approximant /ɻ/ in natural speech, which might cause potential confusion between /tʂ(h)/ and /ʂ/. PMID:27475170

  17. The role of metrical information in apraxia of speech. Perceptual and acoustic analyses of word stress.

    PubMed

    Aichert, Ingrid; Späth, Mona; Ziegler, Wolfram

    2016-02-01

    Several factors are known to influence speech accuracy in patients with apraxia of speech (AOS), e.g., syllable structure or word length. However, the impact of word stress has largely been neglected so far. More generally, the role of prosodic information at the phonetic encoding stage of speech production often remains unconsidered in models of speech production. This study aimed to investigate the influence of word stress on error production in AOS. Two-syllabic words with stress on the first (trochees) vs. the second syllable (iambs) were compared in 14 patients with AOS, three of them exhibiting pure AOS, and in a control group of six normal speakers. The patients produced significantly more errors on iambic than on trochaic words. A most prominent metrical effect was obtained for segmental errors. Acoustic analyses of word durations revealed a disproportionate advantage of the trochaic meter in the patients relative to the healthy controls. The results indicate that German apraxic speakers are sensitive to metrical information. It is assumed that metrical patterns function as prosodic frames for articulation planning, and that the regular metrical pattern in German, the trochaic form, has a facilitating effect on word production in patients with AOS. PMID:26792367

  18. Emphasis of short-duration acoustic speech cues for cochlear implant users.

    PubMed

    Vandali, A E

    2001-05-01

    A new speech-coding strategy for cochlear implant users, called the transient emphasis spectral maxima (TESM), was developed to aid perception of short-duration transient cues in speech. Speech-perception scores using the TESM strategy were compared to scores using the spectral maxima sound processor (SMSP) strategy in a group of eight adult users of the Nucleus 22 cochlear implant system. Significant improvements in mean speech-perception scores for the group were obtained on CNC open-set monosyllabic word tests in quiet (SMSP: 53.6% TESM: 61.3%, p<0.001), and on MUSL open-set sentence tests in multitalker noise (SMSP: 64.9% TESM: 70.6%, p<0.001). Significant increases were also shown for consonant scores in the word test (SMSP: 75.1% TESM: 80.6%, p<0.001) and for vowel scores in the word test (SMSP: 83.1% TESM: 85.7%, p<0.05). Analysis of consonant perception results from the CNC word tests showed that perception of nasal, stop, and fricative consonant discrimination was most improved. Information transmission analysis indicated that place of articulation was most improved, although improvements were also evident for manner of articulation. The increases in discrimination were shown to be related to improved coding of short-duration acoustic cues, particularly those of low intensity. PMID:11386557

  19. Modification of computational auditory scene analysis (CASA) for noise-robust acoustic feature

    NASA Astrophysics Data System (ADS)

    Kwon, Minseok

    While there have been many attempts to mitigate interferences of background noise, the performance of automatic speech recognition (ASR) still can be deteriorated by various factors with ease. However, normal hearing listeners can accurately perceive sounds of their interests, which is believed to be a result of Auditory Scene Analysis (ASA). As a first attempt, the simulation of the human auditory processing, called computational auditory scene analysis (CASA), was fulfilled through physiological and psychological investigations of ASA. CASA comprised of Zilany-Bruce auditory model, followed by tracking fundamental frequency for voice segmentation and detecting pairs of onset/offset at each characteristic frequency (CF) for unvoiced segmentation. The resulting Time-Frequency (T-F) representation of acoustic stimulation was converted into acoustic feature, gammachirp-tone frequency cepstral coefficients (GFCC). 11 keywords with various environmental conditions are used and the robustness of GFCC was evaluated by spectral distance (SD) and dynamic time warping distance (DTW). In "clean" and "noisy" conditions, the application of CASA generally improved noise robustness of the acoustic feature compared to a conventional method with or without noise suppression using MMSE estimator. The intial study, however, not only showed the noise-type dependency at low SNR, but also called the evaluation methods in question. Some modifications were made to capture better spectral continuity from an acoustic feature matrix, to obtain faster processing speed, and to describe the human auditory system more precisely. The proposed framework includes: 1) multi-scale integration to capture more accurate continuity in feature extraction, 2) contrast enhancement (CE) of each CF by competition with neighboring frequency bands, and 3) auditory model modifications. The model modifications contain the introduction of higher Q factor, middle ear filter more analogous to human auditory system

  20. Speech Emotion Feature Selection Method Based on Contribution Analysis Algorithm of Neural Network

    SciTech Connect

    Wang Xiaojia; Mao Qirong; Zhan Yongzhao

    2008-11-06

    There are many emotion features. If all these features are employed to recognize emotions, redundant features may be existed. Furthermore, recognition result is unsatisfying and the cost of feature extraction is high. In this paper, a method to select speech emotion features based on contribution analysis algorithm of NN is presented. The emotion features are selected by using contribution analysis algorithm of NN from the 95 extracted features. Cluster analysis is applied to analyze the effectiveness for the features selected, and the time of feature extraction is evaluated. Finally, 24 emotion features selected are used to recognize six speech emotions. The experiments show that this method can improve the recognition rate and the time of feature extraction.

  1. Feature Characteristics of Spontaneous Speech Production in Young Deaf Children.

    ERIC Educational Resources Information Center

    Geffner, Donna

    1980-01-01

    Sixty-five six-year-old deaf children from state supported schools were given an adaptation of the Goldman Fristoe test of articulation to assess their spontaneous speech production. Journal availability: Elsevier North Holland, Inc., 52 Vanderbilt Avenue, New York, NY 10017. (Author)

  2. Pattern Recognition Methods and Features Selection for Speech Emotion Recognition System

    PubMed Central

    Partila, Pavol; Voznak, Miroslav; Tovarek, Jaromir

    2015-01-01

    The impact of the classification method and features selection for the speech emotion recognition accuracy is discussed in this paper. Selecting the correct parameters in combination with the classifier is an important part of reducing the complexity of system computing. This step is necessary especially for systems that will be deployed in real-time applications. The reason for the development and improvement of speech emotion recognition systems is wide usability in nowadays automatic voice controlled systems. Berlin database of emotional recordings was used in this experiment. Classification accuracy of artificial neural networks, k-nearest neighbours, and Gaussian mixture model is measured considering the selection of prosodic, spectral, and voice quality features. The purpose was to find an optimal combination of methods and group of features for stress detection in human speech. The research contribution lies in the design of the speech emotion recognition system due to its accuracy and efficiency. PMID:26346654

  3. The effects of selective attention and speech acoustics on neural speech-tracking in a multi-talker scene

    PubMed Central

    Rimmele, Johanna M.; Golumbic, Elana Zion; Schröger, Erich; Poeppel, David

    2015-01-01

    Attending to one speaker in multi-speaker situations is challenging. One neural mechanism proposed to underlie the ability to attend to a particular speaker is phase-locking of low-frequency activity in auditory cortex to speech’s temporal envelope (“speech-tracking”), which is more precise for attended speech. However, it is not known what brings about this attentional effect, and specifically if it reflects enhanced processing of the fine structure of attended speech. To investigate this question we compared attentional effects on speech-tracking of natural vs. vocoded speech which preserves the temporal envelope but removes the fine-structure of speech. Pairs of natural and vocoded speech stimuli were presented concurrently and participants attended to one stimulus and performed a detection task while ignoring the other stimulus. We recorded magnetoencephalography (MEG) and compared attentional effects on the speech-tracking response in auditory cortex. Speech-tracking of natural, but not vocoded, speech was enhanced by attention, whereas neural tracking of ignored speech was similar for natural and vocoded speech. These findings suggest that the more precise speech tracking of attended natural speech is related to processing its fine structure, possibly reflecting the application of higher-order linguistic processes. In contrast, when speech is unattended its fine structure is not processed to the same degree and thus elicits less precise speech tracking more similar to vocoded speech. PMID:25650107

  4. Advantages from bilateral hearing in speech perception in noise with simulated cochlear implants and residual acoustic hearing.

    PubMed

    Schoof, Tim; Green, Tim; Faulkner, Andrew; Rosen, Stuart

    2013-02-01

    Acoustic simulations were used to study the contributions of spatial hearing that may arise from combining a cochlear implant with either a second implant or contralateral residual low-frequency acoustic hearing. Speech reception thresholds (SRTs) were measured in twenty-talker babble. Spatial separation of speech and noise was simulated using a spherical head model. While low-frequency acoustic information contralateral to the implant simulation produced substantially better SRTs there was no effect of spatial cues on SRT, even when interaural differences were artificially enhanced. Simulated bilateral implants showed a significant head shadow effect, but no binaural unmasking based on interaural time differences, and weak, inconsistent overall spatial release from masking. There was also a small but significant non-spatial summation effect. It appears that typical cochlear implant speech processing strategies may substantially reduce the utility of spatial cues, even in the absence of degraded neural processing arising from auditory deprivation. PMID:23363118

  5. Revisiting Neil Armstrongs Moon-Landing Quote: Implications for Speech Perception, Function Word Reduction, and Acoustic Ambiguity.

    PubMed

    Baese-Berk, Melissa M; Dilley, Laura C; Schmidt, Stephanie; Morrill, Tuuli H; Pitt, Mark A

    2016-01-01

    Neil Armstrong insisted that his quote upon landing on the moon was misheard, and that he had said one small step for a man, instead of one small step for man. What he said is unclear in part because function words like a can be reduced and spectrally indistinguishable from the preceding context. Therefore, their presence can be ambiguous, and they may disappear perceptually depending on the rate of surrounding speech. Two experiments are presented examining production and perception of reduced tokens of for and for a in spontaneous speech. Experiment 1 investigates the distributions of several acoustic features of for and for a. The results suggest that the distributions of for and for a overlap substantially, both in terms of temporal and spectral characteristics. Experiment 2 examines perception of these same tokens when the context speaking rate differs. The perceptibility of the function word a varies as a function of this context speaking rate. These results demonstrate that substantial ambiguity exists in the original quote from Armstrong, and that this ambiguity may be understood through context speaking rate. PMID:27603209

  6. Identifying fatigue crack geometric features from acoustic emission signals

    NASA Astrophysics Data System (ADS)

    Bao, Jingjing; Poddar, Banibrata; Giurgiutiu, Victor

    2016-04-01

    Acoustic emission (AE) caused by the growth of fatigue crack were well studied by researchers. Conventional approaches predominantly are based on statistical analysis. In this study we focus on identifying geometric features of the crack from the AE signals using physics based approach. One of the main challenges of this approach is to develop a physics of materials based understanding of the generation and propagation of acoustic emissions due to the growth of a fatigue crack. As the geometry changes due to the crack growth, so does the local vibration modes around the crack. Our aim is to understand these changing local vibration modes and find possible relation between the AE signal features and the geometric features of the crack. Finite element (FE) analysis was used to model AE events due to fatigue crack growth. This was done using dipole excitation at the crack tips. Harmonic analysis was also performed on these FE models to understand the local vibration modes. Experimental study was carried out to verify these results. Piezoelectric wafer active sensors (PWAS) were used to excite cracked specimen and the local vibration modes were captured using laser Doppler vibrometry. The preliminary results show that the AE signals do carry the information related to the crack geometry.

  7. Improved perception of speech in noise and Mandarin tones with acoustic simulations of harmonic coding for cochlear implants.

    PubMed

    Li, Xing; Nie, Kaibao; Imennov, Nikita S; Won, Jong Ho; Drennan, Ward R; Rubinstein, Jay T; Atlas, Les E

    2012-11-01

    Harmonic and temporal fine structure (TFS) information are important cues for speech perception in noise and music perception. However, due to the inherently coarse spectral and temporal resolution in electric hearing, the question of how to deliver harmonic and TFS information to cochlear implant (CI) users remains unresolved. A harmonic-single-sideband-encoder [(HSSE); Nie et al. (2008). Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing; Lie et al., (2010). Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing] strategy has been proposed that explicitly tracks the harmonics in speech and transforms them into modulators conveying both amplitude modulation and fundamental frequency information. For unvoiced speech, HSSE transforms the TFS into a slowly varying yet still noise-like signal. To investigate its potential, four- and eight-channel vocoder simulations of HSSE and the continuous-interleaved-sampling (CIS) strategy were implemented, respectively. Using these vocoders, five normal-hearing subjects' speech recognition performance was evaluated under different masking conditions; another five normal-hearing subjects' Mandarin tone identification performance was also evaluated. Additionally, the neural discharge patterns evoked by HSSE- and CIS-encoded Mandarin tone stimuli were simulated using an auditory nerve model. All subjects scored significantly higher with HSSE than with CIS vocoders. The modeling analysis demonstrated that HSSE can convey temporal pitch cues better than CIS. Overall, the results suggest that HSSE is a promising strategy to enhance speech perception with CIs. PMID:23145619

  8. Speech intelligibility in rooms: Effect of prior listening exposure interacts with room acoustics.

    PubMed

    Zahorik, Pavel; Brandewie, Eugene J

    2016-07-01

    There is now converging evidence that a brief period of prior listening exposure to a reverberant room can influence speech understanding in that environment. Although the effect appears to depend critically on the amplitude modulation characteristic of the speech signal reaching the ear, the extent to which the effect may be influenced by room acoustics has not been thoroughly evaluated. This study seeks to fill this gap in knowledge by testing the effect of prior listening exposure or listening context on speech understanding in five different simulated sound fields, ranging from anechoic space to a room with broadband reverberation time (T60) of approximately 3 s. Although substantial individual variability in the effect was observed and quantified, the context effect was, on average, strongly room dependent. At threshold, the effect was minimal in anechoic space, increased to a maximum of 3 dB on average in moderate reverberation (T60 = 1 s), and returned to minimal levels again in high reverberation. This interaction suggests that the functional effects of prior listening exposure may be limited to sound fields with moderate reverberation (0.4 ≤ T60 ≤ 1 s). PMID:27475133

  9. Discrimination of Speech Stimuli Based on Neuronal Response Phase Patterns Depends on Acoustics But Not Comprehension

    PubMed Central

    Poeppel, David

    2010-01-01

    Speech stimuli give rise to neural activity in the listener that can be observed as waveforms using magnetoencephalography. Although waveforms vary greatly from trial to trial due to activity unrelated to the stimulus, it has been demonstrated that spoken sentences can be discriminated based on theta-band (3–7 Hz) phase patterns in single-trial response waveforms. Furthermore, manipulations of the speech signal envelope and fine structure that reduced intelligibility were found to produce correlated reductions in discrimination performance, suggesting a relationship between theta-band phase patterns and speech comprehension. This study investigates the nature of this relationship, hypothesizing that theta-band phase patterns primarily reflect cortical processing of low-frequency (<40 Hz) modulations present in the acoustic signal and required for intelligibility, rather than processing exclusively related to comprehension (e.g., lexical, syntactic, semantic). Using stimuli that are quite similar to normal spoken sentences in terms of low-frequency modulation characteristics but are unintelligible (i.e., their time-inverted counterparts), we find that discrimination performance based on theta-band phase patterns is equal for both types of stimuli. Consistent with earlier findings, we also observe that whereas theta-band phase patterns differ across stimuli, power patterns do not. We use a simulation model of the single-trial response to spoken sentence stimuli to demonstrate that phase-locked responses to low-frequency modulations of the acoustic signal can account not only for the phase but also for the power results. The simulation offers insight into the interpretation of the empirical results with respect to phase-resetting and power-enhancement models of the evoked response. PMID:20484530

  10. Features vs. Feelings: Dissociable representations of the acoustic features and valence of aversive sounds

    PubMed Central

    Kumar, Sukhbinder; von Kriegstein, Katharina; Friston, Karl; Griffiths, Timothy D

    2012-01-01

    This study addresses the neuronal representation of aversive sounds that are perceived as unpleasant. Functional magnetic resonance imaging (fMRI) in humans demonstrated responses in the amygdala and auditory cortex to aversive sounds. We show that the amygdala encodes both the acoustic features of a stimulus and its valence (perceived unpleasantness). Dynamic Causal Modelling (DCM) of this system revealed that evoked responses to sounds are relayed to the amygdala via auditory cortex. While acoustic features modulate effective connectivity from auditory cortex to the amygdala, the valence modulates the effective connectivity from amygdala to the auditory cortex. These results support a complex (recurrent) interaction between the auditory cortex and amygdala based on object-level analysis in the auditory cortex that portends the assignment of emotional valence in amygdala that in turn influences the representation of salient information in auditory cortex. PMID:23055488

  11. An acoustical assessment of pitch-matching accuracy in relation to speech frequency, speech frequency range, age and gender in preschool children

    NASA Astrophysics Data System (ADS)

    Trollinger, Valerie L.

    This study investigated the relationship between acoustical measurement of singing accuracy in relationship to speech fundamental frequency, speech fundamental frequency range, age and gender in preschool-aged children. Seventy subjects from Southeastern Pennsylvania; the San Francisco Bay Area, California; and Terre Haute, Indiana, participated in the study. Speech frequency was measured by having the subjects participate in spontaneous and guided speech activities with the researcher, with 18 diverse samples extracted from each subject's recording for acoustical analysis for fundamental frequency in Hz with the CSpeech computer program. The fundamental frequencies were averaged together to derive a mean speech frequency score for each subject. Speech range was calculated by subtracting the lowest fundamental frequency produced from the highest fundamental frequency produced, resulting in a speech range measured in increments of Hz. Singing accuracy was measured by having the subjects each echo-sing six randomized patterns using the pitches Middle C, D, E, F♯, G and A (440), using the solfege syllables of Do and Re, which were recorded by a 5-year-old female model. For each subject, 18 samples of singing were recorded. All samples were analyzed by the CSpeech for fundamental frequency. For each subject, deviation scores in Hz were derived by calculating the difference between what the model sang in Hz and what the subject sang in response in Hz. Individual scores for each child consisted of an overall mean total deviation frequency, mean frequency deviations for each pattern, and mean frequency deviation for each pitch. Pearson correlations, MANOVA and ANOVA analyses, Multiple Regressions and Discriminant Analysis revealed the following findings: (1) moderate but significant (p < .001) relationships emerged between mean speech frequency and the ability to sing the pitches E, F♯, G and A in the study; (2) mean speech frequency also emerged as the strongest

  12. Speech recognition in reverberant and noisy environments employing multiple feature extractors and i-vector speaker adaptation

    NASA Astrophysics Data System (ADS)

    Alam, Md Jahangir; Gupta, Vishwa; Kenny, Patrick; Dumouchel, Pierre

    2015-12-01

    The REVERB challenge provides a common framework for the evaluation of feature extraction techniques in the presence of both reverberation and additive background noise. State-of-the-art speech recognition systems perform well in controlled environments, but their performance degrades in realistic acoustical conditions, especially in real as well as simulated reverberant environments. In this contribution, we utilize multiple feature extractors including the conventional mel-filterbank, multi-taper spectrum estimation-based mel-filterbank, robust mel and compressive gammachirp filterbank, iterative deconvolution-based dereverberated mel-filterbank, and maximum likelihood inverse filtering-based dereverberated mel-frequency cepstral coefficient features for speech recognition with multi-condition training data. In order to improve speech recognition performance, we combine their results using ROVER (Recognizer Output Voting Error Reduction). For two- and eight-channel tasks, to get benefited from the multi-channel data, we also use ROVER, instead of the multi-microphone signal processing method, to reduce word error rate by selecting the best scoring word at each channel. As in a previous work, we also apply i-vector-based speaker adaptation which was found effective. In speech recognition task, speaker adaptation tries to reduce mismatch between the training and test speakers. Speech recognition experiments are conducted on the REVERB challenge 2014 corpora using the Kaldi recognizer. In our experiments, we use both utterance-based batch processing and full batch processing. In the single-channel task, full batch processing reduced word error rate (WER) from 10.0 to 9.3 % on SimData as compared to utterance-based batch processing. Using full batch processing, we obtained an average WER of 9.0 and 23.4 % on the SimData and RealData, respectively, for the two-channel task, whereas for the eight-channel task on the SimData and RealData, the average WERs found were 8

  13. Joint Spatial-Spectral Feature Space Clustering for Speech Activity Detection from ECoG Signals

    PubMed Central

    Kanas, Vasileios G.; Mporas, Iosif; Benz, Heather L.; Sgarbas, Kyriakos N.; Bezerianos, Anastasios; Crone, Nathan E.

    2014-01-01

    Brain machine interfaces for speech restoration have been extensively studied for more than two decades. The success of such a system will depend in part on selecting the best brain recording sites and signal features corresponding to speech production. The purpose of this study was to detect speech activity automatically from electrocorticographic signals based on joint spatial-frequency clustering of the ECoG feature space. For this study, the ECoG signals were recorded while a subject performed two different syllable repetition tasks. We found that the optimal frequency resolution to detect speech activity from ECoG signals was 8 Hz, achieving 98.8% accuracy by employing support vector machines (SVM) as a classifier. We also defined the cortical areas that held the most information about the discrimination of speech and non-speech time intervals. Additionally, the results shed light on the distinct cortical areas associated with the two syllable repetition tasks and may contribute to the development of portable ECoG-based communication. PMID:24658248

  14. Discrimination and Comprehension of Synthetic Speech by Students with Visual Impairments: The Case of Similar Acoustic Patterns

    ERIC Educational Resources Information Center

    Papadopoulos, Konstantinos; Argyropoulos, Vassilios S.; Kouroupetroglou, Georgios

    2008-01-01

    This study examined the perceptions held by sighted students and students with visual impairments of the intelligibility and comprehensibility of similar acoustic patterns produced by synthetic speech. It determined the types of errors the students made and compared the performance of the two groups on auditory discrimination and comprehension.

  15. Transient Auditory Storage of Acoustic Details Is Associated with Release of Speech from Informational Masking in Reverberant Conditions

    ERIC Educational Resources Information Center

    Huang, Ying; Huang, Qiang; Chen, Xun; Wu, Xihong; Li, Liang

    2009-01-01

    Perceptual integration of the sound directly emanating from the source with reflections needs both temporal storage and correlation computation of acoustic details. We examined whether the temporal storage is frequency dependent and associated with speech unmasking. In Experiment 1, a break in correlation (BIC) between interaurally correlated…

  16. The Effect of Dynamic Acoustical Features on Musical Timbre

    NASA Astrophysics Data System (ADS)

    Hajda, John M.

    Timbre has been an important concept for scientific exploration of music at least since the time of Helmholtz ([1877] 1954). Since Helmholtz's time, a number of studies have defined and investigated acoustical features of musical instrument tones to determine their perceptual importance, or salience (e.g., Grey, 1975, 1977; Kendall, 1986; Kendall et al., 1999; Luce and Clark, 1965; McAdams et al., 1995, 1999; Saldanha and Corso, 1964; Wedin and Goude, 1972). Most of these studies have considered only nonpercussive, or continuant, tones of Western orchestral instruments (or emulations thereof). In the past few years, advances in computing power and programming have made possible and affordable the definition and control of new acoustical variables. This chapter gives an overview of past and current research, with a special emphasis on the time-variant aspects of musical timbre. According to common observation, "music is made of tones in time" (Spaeth, 1933). We will also consider the fact that music is made of "time in tones."

  17. Features of developmental dyspraxia in the general speech-impaired population?

    PubMed

    McCabe, P; Rosenthal, J B; McLeod, S

    1998-01-01

    A typical clinical population with speech impairment was investigated to determine the extent of the presence of features of developmental dyspraxia and its interaction between the severity of impairment. Thirty diagnostic features of developmental dyspraxia were identified from the post-1981 literature and two scales of severity were devised. First the severity of these 30 features was measured (feature severity rating, FSR), and secondly severity of speech impairment was based on percentage of consonants correct (PCC). Using these features and severity ratings a retrospective file audit was conducted of 50 paediatric clients aged 2-8 years with impaired articulation or phonology. It was found that many characteristics regarded as diagnostic for developmental dyspraxia occur in the general speech-impaired population. The relationship between the variables was analysed, and support was found for the hypotheses that: (a) there is a relationship between the number of dyspraxic features expressed and the severity of impairment of speech production and (b) developmental dyspraxia is not characterized by severe impairment, but may occur in a range of severities from mild to severe. PMID:21434785

  18. Comments on "Effects of Noise on Speech Production: Acoustic and Perceptual Analyses" [J. Acoust. Soc. Am. 84, 917-928 (1988)].

    PubMed

    Fitch, H

    1989-11-01

    The effect of background noise on speech production is an important issue, both from the practical standpoint of developing speech recognition algorithms and from the theoretical standpoint of understanding how speech is tuned to the environment in which it is spoken. Summers et al. [J. Acoust. Soc. Am. 84, 917-928 (1988]) address this issue by experimentally manipulating the level of noise delivered through headphones to two talkers and making several kinds of acoustic measurements on the resulting speech. They indicate that they have replicated effects on amplitude, duration, and pitch and have found effects on spectral tilt and first-formant frequency (F1). The authors regard these acoustic changes as effects in themselves rather than as consequences of a change in vocal effort, and thus treat equally the change in spectral tilt and the change in F1. In fact, the change in spectral tilt is a well-documented and understood consequence of the change in the glottal waveform, which is known to occur with increased effort. The situation with F1 is less clear and is made difficult by measurement problems. The bias in linear predictive coding (LPC) techniques related to two of the other changes-fundamental frequency and spectral tilt-is discussed. PMID:2808931

  19. Comparing the effects of reverberation and of noise on speech recognition in simulated electric-acoustic listening

    PubMed Central

    Helms Tillery, Kate; Brown, Christopher A.; Bacon, Sid P.

    2012-01-01

    Cochlear implant users report difficulty understanding speech in both noisy and reverberant environments. Electric-acoustic stimulation (EAS) is known to improve speech intelligibility in noise. However, little is known about the potential benefits of EAS in reverberation, or about how such benefits relate to those observed in noise. The present study used EAS simulations to examine these questions. Sentences were convolved with impulse responses from a model of a room whose estimated reverberation times were varied from 0 to 1 sec. These reverberated stimuli were then vocoded to simulate electric stimulation, or presented as a combination of vocoder plus low-pass filtered speech to simulate EAS. Monaural sentence recognition scores were measured in two conditions: reverberated speech and speech in a reverberated noise. The long-term spectrum and amplitude modulations of the noise were equated to the reverberant energy, allowing a comparison of the effects of the interferer (speech vs noise). Results indicate that, at least in simulation, (1) EAS provides significant benefit in reverberation; (2) the benefits of EAS in reverberation may be underestimated by those in a comparable noise; and (3) the EAS benefit in reverberation likely arises from partially preserved cues in this background accessible via the low-frequency acoustic component. PMID:22280603

  20. Acoustic Differences between Humorous and Sincere Communicative Intentions

    ERIC Educational Resources Information Center

    Hoicka, Elena; Gattis, Merideth

    2012-01-01

    Previous studies indicate that the acoustic features of speech discriminate between positive and negative communicative intentions, such as approval and prohibition. Two studies investigated whether acoustic features of speech can discriminate between two positive communicative intentions: humour and sweet-sincerity, where sweet-sincerity involved…

  1. Continuous perception and graded categorization: Electrophysiological evidence for a linear relationship between the acoustic signal and perceptual encoding of speech

    PubMed Central

    Toscano, Joseph C.; McMurray, Bob; Dennhardt, Joel; Luck, Steven. J.

    2012-01-01

    Speech sounds are highly variable, yet listeners readily extract information from them and transform continuous acoustic signals into meaningful categories during language comprehension. A central question is whether perceptual encoding captures continuous acoustic detail in a one-to-one fashion or whether it is affected by categories. We addressed this in an event-related potential (ERP) experiment in which listeners categorized spoken words that varied along a continuous acoustic dimension (voice onset time; VOT) in an auditory oddball task. We found that VOT effects were present through a late stage of perceptual processing (N1 component, ca. 100 ms poststimulus) and were independent of categories. In addition, effects of within-category differences in VOT were present at a post-perceptual categorization stage (P3 component, ca. 450 ms poststimulus). Thus, at perceptual levels, acoustic information is encoded continuously, independent of phonological information. Further, at phonological levels, fine-grained acoustic differences are preserved along with category information. PMID:20935168

  2. Recognition of Emotions in Mexican Spanish Speech: An Approach Based on Acoustic Modelling of Emotion-Specific Vowels

    PubMed Central

    Caballero-Morales, Santiago-Omar

    2013-01-01

    An approach for the recognition of emotions in speech is presented. The target language is Mexican Spanish, and for this purpose a speech database was created. The approach consists in the phoneme acoustic modelling of emotion-specific vowels. For this, a standard phoneme-based Automatic Speech Recognition (ASR) system was built with Hidden Markov Models (HMMs), where different phoneme HMMs were built for the consonants and emotion-specific vowels associated with four emotional states (anger, happiness, neutral, sadness). Then, estimation of the emotional state from a spoken sentence is performed by counting the number of emotion-specific vowels found in the ASR's output for the sentence. With this approach, accuracy of 87–100% was achieved for the recognition of emotional state of Mexican Spanish speech. PMID:23935410

  3. The influence of phonemic awareness development on acoustic cue weighting strategies in children's speech perception.

    PubMed

    Mayo, Catherine; Scobbie, James M; Hewlett, Nigel; Waters, Daphne

    2003-10-01

    In speech perception, children give particular patterns of weight to different acoustic cues (their cue weighting). These patterns appear to change with increased linguistic experience. Previous speech perception research has found a positive correlation between more analytical cue weighting strategies and the ability to consciously think about and manipulate segment-sized units (phonemic awareness). That research did not, however, aim to address whether the relation is in any way causal or, if so, then in which direction possible causality might move. Causality in this relation could move in 1 of 2 ways: Either phonemic awareness development could impact on cue weighting strategies or changes in cue weighting could allow for the later development of phonemic awareness. The aim of this study was to follow the development of these 2 processes longitudinally to determine which of the above 2 possibilities was more likely. Five-year-old children were tested 3 times in 7 months on their cue weighting strategies for a /so/-/[symbol in text]o/ contrast, in which the 2 cues manipulated were the frequency of fricative spectrum and the frequency of vowel-onset formant transitions. The children were also tested at the same time on their phoneme segmentation and phoneme blending skills. Results showed that phonemic awareness skills tended to improve before cue weighting changed and that early phonemic awareness ability predicted later cue weighting strategies. These results suggest that the development of metaphonemic awareness may play some role in changes in cue weighting. PMID:14575351

  4. Acoustic markers of prominence influence infants' and adults' segmentation of speech sequences.

    PubMed

    Bion, Ricardo A H; Benavides-Varela, Silvia; Nespor, Marina

    2011-03-01

    Two experiments investigated the way acoustic markers of prominence influence the grouping of speech sequences by adults and 7-month-old infants. In the first experiment, adults were familiarized with and asked to memorize sequences of adjacent syllables that alternated in either pitch or duration. During the test phase, participants heard pairs of syllables with constant pitch and duration and were asked whether the syllables had appeared adjacently during familiarization. Adults were better at remembering pairs of syllables that during familiarization had short syllables preceding long syllables, or high-pitched syllables preceding low-pitched syllables. In the second experiment, infants were familiarized and tested with similar stimuli as in the first experiment, and their preference for pairs of syllables was accessed using the head-turn preference paradigm.When familiarized with syllables alternating in pitch, infants showed a preference to listen to pairs of syllables that had high pitch in the first syllable. However, no preference was found when the familiarization stream alternated in duration. It is proposed that these perceptual biases help infants and adults find linguistic units in the continuous speech stream.While the bias for grouping based on pitch appears early in development, biases for durational grouping might rely on more extensive linguistic experience. PMID:21524015

  5. Statistical evidence that musical universals derive from the acoustic characteristics of human speech

    NASA Astrophysics Data System (ADS)

    Schwartz, David; Howe, Catharine; Purves, Dale

    2003-04-01

    Listeners of all ages and societies produce a similar consonance ordering of chromatic scale tone combinations. Despite intense interest in this perceptual phenomenon over several millennia, it has no generally accepted explanation in physical, psychological, or physiological terms. Here we show that the musical universal of consonance ordering can be understood in terms of the statistical relationship between a pattern of sound pressure at the ear and the possible generative sources of the acoustic energy pattern. Since human speech is the principal naturally occurring source of tone-evoking (i.e., periodic) sound energy for human listeners, we obtained normalized spectra from more than 100000 recorded speech segments. The probability distribution of amplitude/frequency combinations derived from these spectra predicts both the fundamental frequency ratios that define the chromatic scale intervals and the consonance ordering of chromatic scale tone combinations. We suggest that these observations reveal the statistical character of the perceptual process by which the auditory system guides biologically successful behavior in response to inherently ambiguous sound stimuli.

  6. Speech Prosody Abnormalities and Specific Dimensional Schizotypy Features: Are Relationships Limited to Males?

    PubMed Central

    Bedwell, Jeffrey S.; Cohen, Alex S.; Trachik, Benjamin J.; Deptula, Andrew E.; Mitchell, Jonathan C.

    2014-01-01

    In schizophrenia, diminished vocal expressivity is associated with lower quality of life. Studies using computerized acoustic analysis of speech have found no evidence of diminished vocal prosody related to categorically-defined schizotypy, a subclinical analog of schizophrenia. However, existing studies have not examined the interaction between schizotypy and sex with vocal prosody measures. The current study examined 44 young adults (50% male) who were recruited to represent a continuous range of schizotypy. Speech samples were digitally recorded during autobiographical narratives and analyzed for prosody. In male participants, variability of fundamental frequency and variability of intensity were each negatively related to Schizotypal Personality Questionnaire (SPQ) Ideas of Reference subscale, while SPQ Suspiciousness was related to a greater number of utterances, and SPQ Odd Behavior was related to a greater number of pauses. As the relationships were restricted to males, and not significant in females, results may explain earlier negative findings with schizotypy. PMID:25198702

  7. Modeling of speech localization in a multi-talker mixture using periodicity and energy-based auditory features.

    PubMed

    Josupeit, Angela; Kopčo, Norbert; Hohmann, Volker

    2016-05-01

    A recent study showed that human listeners are able to localize a short speech target simultaneously masked by four speech tokens in reverberation [Kopčo, Best, and Carlile (2010). J. Acoust. Soc. Am. 127, 1450-1457]. Here, an auditory model for solving this task is introduced. The model has three processing stages: (1) extraction of the instantaneous interaural time difference (ITD) information, (2) selection of target-related ITD information ("glimpses") using a template-matching procedure based on periodicity, spectral energy, or both, and (3) target location estimation. The model performance was compared to the human data, and to the performance of a modified model using an ideal binary mask (IBM) at stage (2). The IBM-based model performed similarly to the subjects, indicating that the binaural model is able to accurately estimate source locations. Template matching using spectral energy and using a combination of spectral energy and periodicity achieved good results, while using periodicity alone led to poor results. Particularly, the glimpses extracted from the initial portion of the signal were critical for good performance. Simulation data show that the auditory features investigated here are sufficient to explain human performance in this challenging listening condition and thus may be used in models of auditory scene analysis. PMID:27250183

  8. Perception of Suprasegmental Features of Speech by Children with Cochlear Implants and Children with Hearing Aids

    ERIC Educational Resources Information Center

    Most, Tova; Peled, Miriam

    2007-01-01

    This study assessed perception of suprasegmental features of speech by 30 prelingual children with sensorineural hearing loss. Ten children had cochlear implants (CIs), and 20 children wore hearing aids (HA): 10 with severe hearing loss and 10 with profound hearing loss. Perception of intonation, syllable stress, word emphasis, and word pattern…

  9. Elicitation of the Acoustic Change Complex to Long-Duration Speech Stimuli in Four-Month-Old Infants.

    PubMed

    Chen, Ke Heng; Small, Susan A

    2015-01-01

    The acoustic change complex (ACC) is an auditory-evoked potential elicited to changes within an ongoing stimulus that indicates discrimination at the level of the auditory cortex. Only a few studies to date have attempted to record ACCs in young infants. The purpose of the present study was to investigate the elicitation of ACCs to long-duration speech stimuli in English-learning 4-month-old infants. ACCs were elicited to consonant contrasts made up of two concatenated speech tokens. The stimuli included native dental-dental /dada/ and dental-labial /daba/ contrasts and a nonnative Hindi dental-retroflex /daDa/ contrast. Each consonant-vowel speech token was 410 ms in duration. Slow cortical responses were recorded to the onset of the stimulus and to the acoustic change from /da/ to either /ba/ or /Da/ within the stimulus with significantly prolonged latencies compared with adults. ACCs were reliably elicited for all stimulus conditions with more robust morphology compared with our previous findings using stimuli that were shorter in duration. The P1 amplitudes elicited to the acoustic change in /daba/ and /daDa/ were significantly larger compared to /dada/ supporting that the brain discriminated between the speech tokens. These findings provide further evidence for the use of ACCs as an index of discrimination ability. PMID:26798343

  10. Elicitation of the Acoustic Change Complex to Long-Duration Speech Stimuli in Four-Month-Old Infants

    PubMed Central

    Chen, Ke Heng; Small, Susan A.

    2015-01-01

    The acoustic change complex (ACC) is an auditory-evoked potential elicited to changes within an ongoing stimulus that indicates discrimination at the level of the auditory cortex. Only a few studies to date have attempted to record ACCs in young infants. The purpose of the present study was to investigate the elicitation of ACCs to long-duration speech stimuli in English-learning 4-month-old infants. ACCs were elicited to consonant contrasts made up of two concatenated speech tokens. The stimuli included native dental-dental /dada/ and dental-labial /daba/ contrasts and a nonnative Hindi dental-retroflex /daDa/ contrast. Each consonant-vowel speech token was 410 ms in duration. Slow cortical responses were recorded to the onset of the stimulus and to the acoustic change from /da/ to either /ba/ or /Da/ within the stimulus with significantly prolonged latencies compared with adults. ACCs were reliably elicited for all stimulus conditions with more robust morphology compared with our previous findings using stimuli that were shorter in duration. The P1 amplitudes elicited to the acoustic change in /daba/ and /daDa/ were significantly larger compared to /dada/ supporting that the brain discriminated between the speech tokens. These findings provide further evidence for the use of ACCs as an index of discrimination ability. PMID:26798343

  11. The effect of different open plan and enclosed classroom acoustic conditions on speech perception in Kindergarten children.

    PubMed

    Mealings, Kiri T; Demuth, Katherine; Buchholz, Jörg M; Dillon, Harvey

    2015-10-01

    Open plan classrooms, where several classes are in the same room, have recently re-emerged in Australian primary schools. This paper explores how the acoustics of four Kindergarten classrooms [an enclosed classroom (25 children), double classroom (44 children), fully open plan triple classroom (91 children), and a semi-open plan K-6 "21st century learning space" (205 children)] affect speech perception. Twenty-two to 23 5-6-year-old children in each classroom participated in an online four-picture choice speech perception test while adjacent classes engaged in quiet versus noisy activities. The noise levels recorded during the test were higher the larger the classroom, except in the noisy condition for the K-6 classroom, possibly due to acoustic treatments. Linear mixed effects models revealed children's performance accuracy and speed decreased as noise level increased. Additionally, children's speech perception abilities decreased the further away they were seated from the loudspeaker in noise levels above 50 dBA. These results suggest that fully open plan classrooms are not appropriate learning environments for critical listening activities with young children due to their high intrusive noise levels which negatively affect speech perception. If open plan classrooms are desired, they need to be acoustically designed to be appropriate for critical listening activities. PMID:26520328

  12. Can acoustic vowel space predict the habitual speech rate of the speaker?

    PubMed

    Tsao, Y-C; Iqbal, K

    2005-01-01

    This study aims to find whether the acoustic vowel space reflect the habitual speaking rate of the speaker. The vowel space is defined as the area of the quadrilateral formed by the four corner vowels (i.e.,/i/,/æ/,/u/,/α) in the F1F2- 2 plane. The study compares the acoustic vowel space in the speech of habitually slow and fast talkers and further analyzes them by gender. In addition to the measurement of vowel duration and midpoint frequencies of F1 and F2, the F1/F2 vowel space areas were measured and compared across speakers. The results indicate substantial overlap in vowel space area functions between slow and fast talkers, though the slow speakers were found to have larger vowel spaces. Furthermore, large variability in vowel space area functions was noted among interspeakers in each group. Both F1 and F2 formant frequencies were found to be gender sensitive in consistence with the existing data. No predictive relation between vowel duration and formant frequencies was observed among speakers. PMID:17282413

  13. Speaker height estimation from speech: Fusing spectral regression and statistical acoustic models.

    PubMed

    Hansen, John H L; Williams, Keri; Bořil, Hynek

    2015-08-01

    Estimating speaker height can assist in voice forensic analysis and provide additional side knowledge to benefit automatic speaker identification or acoustic model selection for automatic speech recognition. In this study, a statistical approach to height estimation that incorporates acoustic models within a non-uniform height bin width Gaussian mixture model structure as well as a formant analysis approach that employs linear regression on selected phones are presented. The accuracy and trade-offs of these systems are explored by examining the consistency of the results, location, and causes of error as well a combined fusion of the two systems using data from the TIMIT corpus. Open set testing is also presented using the Multi-session Audio Research Project corpus and publicly available YouTube audio to examine the effect of channel mismatch between training and testing data and provide a realistic open domain testing scenario. The proposed algorithms achieve a highly competitive performance to previously published literature. Although the different data partitioning in the literature and this study may prevent performance comparisons in absolute terms, the mean average error of 4.89 cm for males and 4.55 cm for females provided by the proposed algorithm on TIMIT utterances containing selected phones suggest a considerable estimation error decrease compared to past efforts. PMID:26328721

  14. Effects of acoustic variability in the perceptual learning of non-native-accented speech sounds.

    PubMed

    Wade, Travis; Jongman, Allard; Sereno, Joan

    2007-01-01

    This study addressed whether acoustic variability and category overlap in non-native speech contribute to difficulty in its recognition, and more generally whether the benefits of exposure to acoustic variability during categorization training are stable across differences in category confusability. Three experiments considered a set of Spanish-accented English productions. The set was seen to pose learning and recognition difficulty (experiment 1) and was more variable and confusable than a parallel set of native productions (experiment 2). A training study (experiment 3) probed the relative contributions of category central tendency and variability to difficulty in vowel identification using derived inventories in which these dimensions were manipulated based on the results of experiments 1 and 2. Training and test difficulty related straightforwardly to category confusability but not to location in the vowel space. Benefits of high-variability exposure also varied across vowel categories, and seemed to be diminished for highly confusable vowels. Overall, variability was implicated in perception and learning difficulty in ways that warrant further investigation. PMID:17914280

  15. MOOD STATE PREDICTION FROM SPEECH OF VARYING ACOUSTIC QUALITY FOR INDIVIDUALS WITH BIPOLAR DISORDER

    PubMed Central

    Gideon, John; Provost, Emily Mower; McInnis, Melvin

    2016-01-01

    Speech contains patterns that can be altered by the mood of an individual. There is an increasing focus on automated and distributed methods to collect and monitor speech from large groups of patients suffering from mental health disorders. However, as the scope of these collections increases, the variability in the data also increases. This variability is due in part to the range in the quality of the devices, which in turn affects the quality of the recorded data, negatively impacting the accuracy of automatic assessment. It is necessary to mitigate variability effects in order to expand the impact of these technologies. This paper explores speech collected from phone recordings for analysis of mood in individuals with bipolar disorder. Two different phones with varying amounts of clipping, loudness, and noise are employed. We describe methodologies for use during preprocessing, feature extraction, and data modeling to correct these differences and make the devices more comparable. The results demonstrate that these pipeline modifications result in statistically significantly higher performance, which highlights the potential of distributed mental health systems. PMID:27570493

  16. Flow and acoustic features of a supersonic tapered nozzle

    NASA Astrophysics Data System (ADS)

    Gutmark, E.; Bowman, H. L.; Schadow, K. C.

    1992-05-01

    The acoustic and flow characteristics of a supersonic tapered jet were measured for free and shrouded flow configurations. Measurements were performed for a full range of pressure ratios including over- and underexpanded and design conditions. The supersonic tapered jet is issued from a converging-diverging nozzle with a 3∶1 rectangular slotted throat and a conical diverging section leading to a circular exit. The jet was compared to circular and rectangular supersonic jets operating at identical conditions. The distinct feature of the jet is the absence of screech tones in the entire range of operation. Its near-field pressure fluctuations have a wide band spectrum in the entire range of measurements, for Mach numbers of 1 to 2.5, for over- and underexpanded conditions. The free jet's spreading rate is nearly constant and similar to the rectangular jet, and in a shroud, the pressure drop it is inducing is linearly proportional to the primary jet Mach number. This behavior persisted in high adverse pressure gradients at overexpanded conditions, and with nozzle divergence angles of up to 35°, no inside flow separation was observed.

  17. Features of underwater acoustics from Aristotle to our time

    NASA Astrophysics Data System (ADS)

    Bjørnø, Leif

    2003-01-01

    Underwater acoustics has been one of the fastest growing fields of research in acoustics. In particular, the 20th Century has taken our understanding of underwater acoustics phenomena a great step forward. The two World Wars contributed to the recognition of the importance of research in underwater acoustics, and the momentum in research and development gained during World War II did not reduce in the years after the war. The so-called cold war and the development in computer technology both contributed substantially to the development in underwater acoustics over the second half of the 20th Century. However, the very widespread field of underwater acoustic activities started nearly 2300 years ago with human curiosity about the fundamental nature of sound in the sea. From primitive philosophical and experimental studies of the velocity of sound in the sea and through centuries of successes and failures, the knowledge about underwater acoustics has developed into its high-technological status of today. In particular the development through the period from Aristotle (384 322 BC) to 1960 formed the basis for the tremendous research and development efforts we have witnessed in our time. In this paper most emphasis will be put on the development in underwater acoustics through this period of nearly 2300 years duration, and only the main trends in later research will be mentioned.

  18. Speech transmission index from running speech: A neural network approach

    NASA Astrophysics Data System (ADS)

    Li, F. F.; Cox, T. J.

    2003-04-01

    Speech transmission index (STI) is an important objective parameter concerning speech intelligibility for sound transmission channels. It is normally measured with specific test signals to ensure high accuracy and good repeatability. Measurement with running speech was previously proposed, but accuracy is compromised and hence applications limited. A new approach that uses artificial neural networks to accurately extract the STI from received running speech is developed in this paper. Neural networks are trained on a large set of transmitted speech examples with prior knowledge of the transmission channels' STIs. The networks perform complicated nonlinear function mappings and spectral feature memorization to enable accurate objective parameter extraction from transmitted speech. Validations via simulations demonstrate the feasibility of this new method on a one-net-one-speech extract basis. In this case, accuracy is comparable with normal measurement methods. This provides an alternative to standard measurement techniques, and it is intended that the neural network method can facilitate occupied room acoustic measurements.

  19. Acoustic Source Characteristics, Across-Formant Integration, and Speech Intelligibility Under Competitive Conditions

    PubMed Central

    2015-01-01

    An important aspect of speech perception is the ability to group or select formants using cues in the acoustic source characteristics—for example, fundamental frequency (F0) differences between formants promote their segregation. This study explored the role of more radical differences in source characteristics. Three-formant (F1+F2+F3) synthetic speech analogues were derived from natural sentences. In Experiment 1, F1+F3 were generated by passing a harmonic glottal source (F0 = 140 Hz) through second-order resonators (H1+H3); in Experiment 2, F1+F3 were tonal (sine-wave) analogues (T1+T3). F2 could take either form (H2 or T2). In some conditions, the target formants were presented alone, either monaurally or dichotically (left ear = F1+F3; right ear = F2). In others, they were accompanied by a competitor for F2 (F1+F2C+F3; F2), which listeners must reject to optimize recognition. Competitors (H2C or T2C) were created using the time-reversed frequency and amplitude contours of F2. Dichotic presentation of F2 and F2C ensured that the impact of the competitor arose primarily through informational masking. In the absence of F2C, the effect of a source mismatch between F1+F3 and F2 was relatively modest. When F2C was present, intelligibility was lowest when F2 was tonal and F2C was harmonic, irrespective of which type matched F1+F3. This finding suggests that source type and context, rather than similarity, govern the phonetic contribution of a formant. It is proposed that wideband harmonic analogues are more effective informational maskers than narrowband tonal analogues, and so become dominant in across-frequency integration of phonetic information when placed in competition. PMID:25751040

  20. Acoustic source characteristics, across-formant integration, and speech intelligibility under competitive conditions.

    PubMed

    Roberts, Brian; Summers, Robert J; Bailey, Peter J

    2015-06-01

    An important aspect of speech perception is the ability to group or select formants using cues in the acoustic source characteristics--for example, fundamental frequency (F0) differences between formants promote their segregation. This study explored the role of more radical differences in source characteristics. Three-formant (F1+F2+F3) synthetic speech analogues were derived from natural sentences. In Experiment 1, F1+F3 were generated by passing a harmonic glottal source (F0 = 140 Hz) through second-order resonators (H1+H3); in Experiment 2, F1+F3 were tonal (sine-wave) analogues (T1+T3). F2 could take either form (H2 or T2). In some conditions, the target formants were presented alone, either monaurally or dichotically (left ear = F1+F3; right ear = F2). In others, they were accompanied by a competitor for F2 (F1+F2C+F3; F2), which listeners must reject to optimize recognition. Competitors (H2C or T2C) were created using the time-reversed frequency and amplitude contours of F2. Dichotic presentation of F2 and F2C ensured that the impact of the competitor arose primarily through informational masking. In the absence of F2C, the effect of a source mismatch between F1+F3 and F2 was relatively modest. When F2C was present, intelligibility was lowest when F2 was tonal and F2C was harmonic, irrespective of which type matched F1+F3. This finding suggests that source type and context, rather than similarity, govern the phonetic contribution of a formant. It is proposed that wideband harmonic analogues are more effective informational maskers than narrowband tonal analogues, and so become dominant in across-frequency integration of phonetic information when placed in competition. PMID:25751040

  1. Voice Modulations in German Ironic Speech

    ERIC Educational Resources Information Center

    Scharrer, Lisa; Christmann, Ursula; Knoll, Monja

    2011-01-01

    Previous research has shown that in different languages ironic speech is acoustically modulated compared to literal speech, and these modulations are assumed to aid the listener in the comprehension process by acting as cues that mark utterances as ironic. The present study was conducted to identify paraverbal features of German "ironic criticism"…

  2. The Feature Extraction Based on Texture Image Information for Emotion Sensing in Speech

    PubMed Central

    Wang, Kun-Ching

    2014-01-01

    In this paper, we present a novel texture image feature for Emotion Sensing in Speech (ESS). This idea is based on the fact that the texture images carry emotion-related information. The feature extraction is derived from time-frequency representation of spectrogram images. First, we transform the spectrogram as a recognizable image. Next, we use a cubic curve to enhance the image contrast. Then, the texture image information (TII) derived from the spectrogram image can be extracted by using Laws' masks to characterize emotional state. In order to evaluate the effectiveness of the proposed emotion recognition in different languages, we use two open emotional databases including the Berlin Emotional Speech Database (EMO-DB) and eNTERFACE corpus and one self-recorded database (KHUSC-EmoDB), to evaluate the performance cross-corpora. The results of the proposed ESS system are presented using support vector machine (SVM) as a classifier. Experimental results show that the proposed TII-based feature extraction inspired by visual perception can provide significant classification for ESS systems. The two-dimensional (2-D) TII feature can provide the discrimination between different emotions in visual expressions except for the conveyance pitch and formant tracks. In addition, the de-noising in 2-D images can be more easily completed than de-noising in 1-D speech. PMID:25207869

  3. The feature extraction based on texture image information for emotion sensing in speech.

    PubMed

    Wang, Kun-Ching

    2014-01-01

    In this paper, we present a novel texture image feature for Emotion Sensing in Speech (ESS). This idea is based on the fact that the texture images carry emotion-related information. The feature extraction is derived from time-frequency representation of spectrogram images. First, we transform the spectrogram as a recognizable image. Next, we use a cubic curve to enhance the image contrast. Then, the texture image information (TII) derived from the spectrogram image can be extracted by using Laws' masks to characterize emotional state. In order to evaluate the effectiveness of the proposed emotion recognition in different languages, we use two open emotional databases including the Berlin Emotional Speech Database (EMO-DB) and eNTERFACE corpus and one self-recorded database (KHUSC-EmoDB), to evaluate the performance cross-corpora. The results of the proposed ESS system are presented using support vector machine (SVM) as a classifier. Experimental results show that the proposed TII-based feature extraction inspired by visual perception can provide significant classification for ESS systems. The two-dimensional (2-D) TII feature can provide the discrimination between different emotions in visual expressions except for the conveyance pitch and formant tracks. In addition, the de-noising in 2-D images can be more easily completed than de-noising in 1-D speech. PMID:25207869

  4. Sine-wave and noise-vocoded sine-wave speech in a tone language: Acoustic details matter.

    PubMed

    Rosen, Stuart; Hui, Sze Ngar Catherine

    2015-12-01

    Sine-wave speech (SWS) is a highly simplified version of speech consisting only of frequency- and amplitude-modulated sinusoids representing the formants. That listeners can successfully understand SWS has led to claims that speech perception must be based on abstract properties of the stimuli far removed from their specific acoustic form. Here it is shown, in bilingual Cantonese/English listeners, that performance with Cantonese SWS is improved by noise vocoding, with no effect on English SWS utterances. This manipulation preserves the abstract informational structure in the signals but changes its surface form. The differential effects of noise vocoding likely arise from the fact that Cantonese is a tonal language and hence more reliant on fundamental frequency (F0) contours for its intelligibility. SWS does not preserve tonal information from the original speech but does have false tonal information signalled by the lowest frequency sinusoid. Noise vocoding SWS appears to minimise the tonal percept, which thus interferes less in the perception of Cantonese. It has no effect in English, which is minimally reliant on F0 variations for intelligibility. Therefore it is not only the informational structure of a sound that is important but also how its acoustic detail interacts with the phonological structure of a given language. PMID:26723325

  5. Effects of a music therapy voice protocol on speech intelligibility, vocal acoustic measures, and mood of individuals with Parkinson's disease.

    PubMed

    Haneishi, E

    2001-01-01

    This study examined the effects of a Music Therapy Voice Protocol (MTVP) on speech intelligibility, vocal intensity, maximum vocal range, maximum duration of sustained vowel phonation, vocal fundamental frequency, vocal fundamental frequency variability, and mood of individuals with Parkinson's disease. Four female patients, who demonstrated voice and speech problems, served as their own controls and participated in baseline assessment (study pretest), a series of MTVP sessions involving vocal and singing exercises, and final evaluation (study posttest). In study pre and posttests, data for speech intelligibility and all acoustic variables were collected. Statistically significant increases were found in speech intelligibility, as rated by caregivers, and in vocal intensity from study pretest to posttest as the results of paired samples t-tests. In addition, before and after each MTVP session (session pre and posttests), self-rated mood scores and selected acoustic variables were collected. No significant differences were found in any of the variables from the session pretests to posttests, across the entire treatment period, or their interactions as the results of two-way ANOVAs with repeated measures. Although not significant, the mean of mood scores in session posttests (M = 8.69) was higher than that in session pretests (M = 7.93). PMID:11796078

  6. Intensity Accents in French 2 Year Olds' Speech.

    ERIC Educational Resources Information Center

    Allen, George D.

    The acoustic features and functions of accentuation in French are discussed, and features of accentuation in the speech of French 2-year-olds are explored. The four major acoustic features used to signal accentual distinctions are fundamental frequency of voicing, duration of segments and syllables, intensity of segments and syllables, and…

  7. A Speech Endpoint Detection Algorithm Based on BP Neural Network and Multiple Features

    NASA Astrophysics Data System (ADS)

    Shi, Yong-Qiang; Li, Ru-Wei; Zhang, Shuang; Wang, Shuai; Yi, Xiao-Qun

    Focusing on a sharp decline in the performance of endpoint detection algorithm in a complicated noise environment, a new speech endpoint detection method based on BPNN (back propagation neural network) and multiple features is presented. Firstly, maximum of short-time autocorrelation function and spectrum variance of speech signals are extracted respectively. Secondly, these feature vectors as the input of BP neural network are trained and modeled and then the Genetic Algorithm is used to optimize the BP Neural Network. Finally, the signal's type is determined according to the output of Neural Network. The experiments show that the correct rate of this proposed algorithm is improved, because this method has better robustness and adaptability than algorithm based on maximum of short-time autocorrelation function or spectrum variance.

  8. Speech spectrogram expert

    SciTech Connect

    Johannsen, J.; Macallister, J.; Michalek, T.; Ross, S.

    1983-01-01

    Various authors have pointed out that humans can become quite adept at deriving phonetic transcriptions from speech spectrograms (as good as 90percent accuracy at the phoneme level). The authors describe an expert system which attempts to simulate this performance. The speech spectrogram expert (spex) is actually a society made up of three experts: a 2-dimensional vision expert, an acoustic-phonetic expert, and a phonetics expert. The visual reasoning expert finds important visual features of the spectrogram. The acoustic-phonetic expert reasons about how visual features relates to phonemes, and about how phonemes change visually in different contexts. The phonetics expert reasons about allowable phoneme sequences and transformations, and deduces an english spelling for phoneme strings. The speech spectrogram expert is highly interactive, allowing users to investigate hypotheses and edit rules. 10 references.

  9. Acoustic changes in the production of lexical stress during Lombard speech.

    PubMed

    Arciuli, Joanne; Simpson, Briony S; Vogel, Adam P; Ballard, Kirrie J

    2014-06-01

    The Lombard effect describes the phenomenon of individuals increasing their vocal intensity when speaking in the presence of background noise. Here, we conducted an investigation of the production of lexical stress during Lombard speech. Participants (N = 27) produced the same sentences in three conditions: one quiet condition and two noise conditions at 70 dB (white noise; multi-talker babble). Manual acoustic analyses (syllable duration, vowel intensity, and vowel fundamental frequency) were completed for repeated productions of two trisyllabic words with opposing patterns of lexical stress (weak-strong; strong-weak) in each of the three conditions. In total, 324 productions were analysed (12 utterances per participant). Results revealed that, rather than increasing vocal intensity equally across syllables, participants alter the degree of stress contrastivity when speaking in noise. This was especially evident in the production of strong-weak lexical stress where there was an increase in contrastivity across syllables in terms of intensity and fundamental frequency. This preliminary study paves the way for further research that is needed to establish these findings using a larger set of multisyllabic stimuli. PMID:25102603

  10. Production and perception of clear speech

    NASA Astrophysics Data System (ADS)

    Bradlow, Ann R.

    2003-04-01

    When a talker believes that the listener is likely to have speech perception difficulties due to a hearing loss, background noise, or a different native language, she or he will typically adopt a clear speaking style. Previous research has established that, with a simple set of instructions to the talker, ``clear speech'' can be produced by most talkers under laboratory recording conditions. Furthermore, there is reliable evidence that adult listeners with either impaired or normal hearing typically find clear speech more intelligible than conversational speech. Since clear speech production involves listener-oriented articulatory adjustments, a careful examination of the acoustic-phonetic and perceptual consequences of the conversational-to-clear speech transformation can serve as an effective window into talker- and listener-related forces in speech communication. Furthermore, clear speech research has considerable potential for the development of speech enhancement techniques. After reviewing previous and current work on the acoustic properties of clear versus conversational speech, this talk will present recent data from a cross-linguistic study of vowel production in clear speech and a cross-population study of clear speech perception. Findings from these studies contribute to an evolving view of clear speech production and perception as reflecting both universal, auditory and language-specific, phonological contrast enhancement features.

  11. Language-Specific Developmental Differences in Speech Production: A Cross-Language Acoustic Study

    ERIC Educational Resources Information Center

    Li, Fangfang

    2012-01-01

    Speech productions of 40 English- and 40 Japanese-speaking children (aged 2-5) were examined and compared with the speech produced by 20 adult speakers (10 speakers per language). Participants were recorded while repeating words that began with "s" and "sh" sounds. Clear language-specific patterns in adults' speech were found, with English…

  12. Reconstructing speech from human auditory cortex.

    PubMed

    Pasley, Brian N; David, Stephen V; Mesgarani, Nima; Flinker, Adeen; Shamma, Shihab A; Crone, Nathan E; Knight, Robert T; Chang, Edward F

    2012-01-01

    How the human auditory system extracts perceptually relevant acoustic features of speech is unknown. To address this question, we used intracranial recordings from nonprimary auditory cortex in the human superior temporal gyrus to determine what acoustic information in speech sounds can be reconstructed from population neural activity. We found that slow and intermediate temporal fluctuations, such as those corresponding to syllable rate, were accurately reconstructed using a linear model based on the auditory spectrogram. However, reconstruction of fast temporal fluctuations, such as syllable onsets and offsets, required a nonlinear sound representation based on temporal modulation energy. Reconstruction accuracy was highest within the range of spectro-temporal fluctuations that have been found to be critical for speech intelligibility. The decoded speech representations allowed readout and identification of individual words directly from brain activity during single trial sound presentations. These findings reveal neural encoding mechanisms of speech acoustic parameters in higher order human auditory cortex. PMID:22303281

  13. Reconstructing Speech from Human Auditory Cortex

    PubMed Central

    Pasley, Brian N.; David, Stephen V.; Mesgarani, Nima; Flinker, Adeen; Shamma, Shihab A.; Crone, Nathan E.; Knight, Robert T.; Chang, Edward F.

    2012-01-01

    How the human auditory system extracts perceptually relevant acoustic features of speech is unknown. To address this question, we used intracranial recordings from nonprimary auditory cortex in the human superior temporal gyrus to determine what acoustic information in speech sounds can be reconstructed from population neural activity. We found that slow and intermediate temporal fluctuations, such as those corresponding to syllable rate, were accurately reconstructed using a linear model based on the auditory spectrogram. However, reconstruction of fast temporal fluctuations, such as syllable onsets and offsets, required a nonlinear sound representation based on temporal modulation energy. Reconstruction accuracy was highest within the range of spectro-temporal fluctuations that have been found to be critical for speech intelligibility. The decoded speech representations allowed readout and identification of individual words directly from brain activity during single trial sound presentations. These findings reveal neural encoding mechanisms of speech acoustic parameters in higher order human auditory cortex. PMID:22303281

  14. Modeling words with subword units in an articulatorily constrained speech recognition algorithm

    SciTech Connect

    Hogden, J.

    1997-11-20

    The goal of speech recognition is to find the most probable word given the acoustic evidence, i.e. a string of VQ codes or acoustic features. Speech recognition algorithms typically take advantage of the fact that the probability of a word, given a sequence of VQ codes, can be calculated.

  15. Clinical and acoustical variability in hypokinetic dysarthria

    SciTech Connect

    Metter, E.J.; Hanson, W.R.

    1986-10-01

    Ten male patients with parkinsonism secondary to Parkinson's disease or progressive supranuclear palsy had clinical neurological, speech, and acoustical speech evaluations. In addition, seven of the patients were evaluated by x-ray computed tomography (CT) and (F-18)-fluorodeoxyglucose (FDG) positron emission tomography (PET). Extensive variability of speech features, both clinical and acoustical, were found and seemed to be independent of the severity of any parkinsonian sign, CT, or FDG PET. In addition, little relationship existed between the variability across each measured speech feature. What appeared to be important for the appearance of abnormal acoustic measures was the degree of overall severity of the dysarthria. These observations suggest that a better understanding of hypokinetic dysarthria may result from more extensive examination of the variability between patients. Emphasizing a specific feature such as rapid speaking rate in characterizing hypokinetic dysarthria focuses on a single and inconstant finding in a complex speech pattern.

  16. Acoustic and perceptual correlates of faster-than-habitual speech produced by speakers with Parkinson's disease and Multiple Sclerosis

    PubMed Central

    Kuo, Christina; Tjaden, Kris; Sussman, Joan E.

    2014-01-01

    Acoustic-perceptual characteristics of a faster-than-habitual rate (Fast condition) were examined for speakers with Parkinson's disease (PD) and Multiple Sclerosis (MS). Judgments of intelligibility for sentences produced at a habitual rate (Habitual condition) and at a faster-than-habitual rate (Fast condition) by 46 speakers with PD or MS as well as a group of 32 healthy speakers revealed that the Fast condition was, on average, associated with decreased intelligibility. However, some speakers' intelligibility did not decline. To further understand the acoustic characteristics of varied intelligibility in the Fast condition for speakers with dysarthria, a subgroup of speakers with PD or MS whose intelligibility did not decline in the Fast condition (No Decline group, n = 8) and a subgroup of speakers with significantly declined intelligibility (Decline group, n = 8) were compared. Acoustic measures of global speech timing, suprasegmental characteristics, and utterance-level segmental characteristics for vocalics were examined for the two subgroups. Results suggest acoustic contributions to intelligibility under rate modulation are complex. Potential clinical relevance and implications for the acoustic bases of intelligibility are discussed. PMID:25287378

  17. Determinants of dominance: is language laterality explained by physical or linguistic features of speech?

    PubMed

    Shtyrov, Yury; Pihko, Elina; Pulvermüller, Friedemann

    2005-08-01

    The nature of cerebral asymmetry of the language function is still not fully understood. Two main views are that laterality is best explained (1) by left cortical specialization for the processing of spectrally rich and rapidly changing sounds, and (2) by a predisposition of one hemisphere to develop a module for phonemes. We tested both of these views by investigating magnetic brain responses to the same brief acoustic stimulus, placed in contexts where it was perceived either as a noise burst with no resemblance of speech, or as a native language sound being part of a meaningless pseudoword. In further experiments, the same acoustic element was placed in the context of words. We found reliable left hemispheric dominance only when the sound was placed in word context. These results, obtained in a passive odd-ball paradigm, suggest that neither physical properties nor phoneme status of a sound are sufficient for laterality. In order to elicit left lateralized cortical activation in normal right-handed individuals, a rapidly changing spectrally rich sound with phoneme status needs to be placed in the context of frequently encountered larger language elements, such as words. This demonstrates that language laterality is bound to the processing of sounds as units of frequently occurring meaningful items and can thus be linked to the processes of learning and memory trace formation for such items rather than to their physical or phonological properties. PMID:16023039

  18. Semantic and acoustic analysis of speech by functional networks with distinct time scales.

    PubMed

    Deng, Siyi; Srinivasan, Ramesh

    2010-07-30

    Speech perception requires the successful interpretation of both phonetic and syllabic information in the auditory signal. It has been suggested by Poeppel (2003) that phonetic processing requires an optimal time scale of 25 ms while the time scale of syllabic processing is much slower (150-250 ms). To better understand the operation of brain networks at these characteristic time scales during speech perception, we studied the spatial and dynamic properties of EEG responses to five different stimuli: (1) amplitude modulated (AM) speech, (2) AM speech with added broadband noise, (3) AM reversed speech, (4) AM broadband noise, and (5) AM pure tone. Amplitude modulation at gamma band frequencies (40 Hz) elicited steady-state auditory evoked responses (SSAERs) bilaterally over primary auditory cortices. Reduced SSAERs were observed over the left auditory cortex only for stimuli containing speech. In addition, we found over the left hemisphere, anterior to primary auditory cortex, a network whose instantaneous frequencies in the theta to alpha band (4-16 Hz) are correlated with the amplitude envelope of the speech signal. This correlation was not observed for reversed speech. The presence of speech in the sound input activates a 4-16 Hz envelope tracking network and suppresses the 40-Hz gamma band network which generates the steady-state responses over the left auditory cortex. We believe these findings to be consistent with the idea that processing of the speech signals involves preferentially processing at syllabic time scales rather than phonetic time scales. PMID:20580635

  19. Semantic and acoustic analysis of speech by functional networks with distinct time scales

    PubMed Central

    Deng, Siyi; Srinivasan, Ramesh

    2014-01-01

    Speech perception requires the successful interpretation of both phonetic and syllabic information in the auditory signal. It has been suggested by Poeppel (2003) that phonetic processing requires an optimal time scale of 25 ms while the time scale of syllabic processing is much slower (150–250ms). To better understand the operation of brain networks at these characteristic time scales during speech perception, we studied the spatial and dynamic properties of EEG responses to five different stimuli: (1) amplitude modulated (AM) speech, (2) AM speech with added broadband noise, (3) AM reversed speech, (4) AM broadband noise, and (5) AM pure tone. Amplitude modulation at gamma band frequencies (40 Hz) elicited steady-state auditory evoked responses (SSAERs) bilaterally over primary auditory cortices. Reduced SSAERs were observed over the left auditory cortex only for stimuli containing speech. In addition, we found over the left hemisphere, anterior to primary auditory cortex, a network whose instantaneous frequencies in the theta to alpha band (4–16 Hz) are correlated with the amplitude envelope of the speech signal. This correlation was not observed for reversed speech. The presence of speech in the sound input activates a 4–16 Hz envelope tracking network and suppresses the 40-Hz gamma band network which generates the steady-state responses over the left auditory cortex. We believe these findings to be consistent with the idea that processing of the speech signals involves preferentially processing at syllabic time scales rather than phonetic time scales. PMID:20580635

  20. Speech-clarity judgments of hearing-aid-processed speech in noise: differing polar patterns and acoustic environments.

    PubMed

    Amlani, Amyn M; Rakerd, Brad; Punch, Jerry L

    2006-06-01

    This investigation assessed the extent to which listeners' preferences for hearing aid microphone polar patterns vary across listening environments, and whether normal-hearing and inexperienced and experienced hearing-impaired listeners differ in such preferences. Paired-comparison judgments of speech clarity (i.e. subjective speech intelligibility) were made monaurally for recordings of speech in noise processed by a commercially available hearing aid programmed with an omnidirectional and two directional polar patterns (cardioid and hypercardioid). Testing environments included a sound-treated room, a living room, and a classroom. Polar-pattern preferences were highly reliable and agreed closely across all three groups of listeners. All groups preferred listening in the sound-treated room over listening in the living room, and preferred listening in the living room over listening in the classroom. Each group preferred the directional patterns to the omnidirectional pattern in all room conditions. We observed no differences in preference judgments between the two directional patterns or between hearing-impaired listeners' extent of amplification experience. Overall, findings indicate that listeners perceived qualitative benefits from microphones having directional polar patterns. PMID:16777778

  1. The cortical representation of the speech envelope is earlier for audiovisual speech than audio speech.

    PubMed

    Crosse, Michael J; Lalor, Edmund C

    2014-04-01

    Visual speech can greatly enhance a listener's comprehension of auditory speech when they are presented simultaneously. Efforts to determine the neural underpinnings of this phenomenon have been hampered by the limited temporal resolution of hemodynamic imaging and the fact that EEG and magnetoencephalographic data are usually analyzed in response to simple, discrete stimuli. Recent research has shown that neuronal activity in human auditory cortex tracks the envelope of natural speech. Here, we exploit this finding by estimating a linear forward-mapping between the speech envelope and EEG data and show that the latency at which the envelope of natural speech is represented in cortex is shortened by >10 ms when continuous audiovisual speech is presented compared with audio-only speech. In addition, we use a reverse-mapping approach to reconstruct an estimate of the speech stimulus from the EEG data and, by comparing the bimodal estimate with the sum of the unimodal estimates, find no evidence of any nonlinear additive effects in the audiovisual speech condition. These findings point to an underlying mechanism that could account for enhanced comprehension during audiovisual speech. Specifically, we hypothesize that low-level acoustic features that are temporally coherent with the preceding visual stream may be synthesized into a speech object at an earlier latency, which may provide an extended period of low-level processing before extraction of semantic information. PMID:24401714

  2. Feature extraction from time domain acoustic signatures of weapons systems fire

    NASA Astrophysics Data System (ADS)

    Yang, Christine; Goldman, Geoffrey H.

    2014-06-01

    The U.S. Army is interested in developing algorithms to classify weapons systems fire based on their acoustic signatures. To support this effort, an algorithm was developed to extract features from acoustic signatures of weapons systems fire and applied to over 1300 signatures. The algorithm filtered the data using standard techniques then estimated the amplitude and time of the first five peaks and troughs and the location of the zero crossing in the waveform. The results were stored in Excel spreadsheets. The results are being used to develop and test acoustic classifier algorithms.

  3. Acoustic Analysis of Clear Versus Conversational Speech in Individuals with Parkinson Disease

    ERIC Educational Resources Information Center

    Goberman, A.M.; Elmer, L.W.

    2005-01-01

    A number of studies have been devoted to the examination of clear versus conversational speech in non-impaired speakers. The purpose of these previous studies has been primarily to help increase speech intelligibility for the benefit of hearing-impaired listeners. The goal of the present study was to examine differences between conversational and…

  4. A Computational Model of Word Segmentation from Continuous Speech Using Transitional Probabilities of Atomic Acoustic Events

    ERIC Educational Resources Information Center

    Rasanen, Okko

    2011-01-01

    Word segmentation from continuous speech is a difficult task that is faced by human infants when they start to learn their native language. Several studies indicate that infants might use several different cues to solve this problem, including intonation, linguistic stress, and transitional probabilities between subsequent speech sounds. In this…

  5. Acoustic Analysis of the Speech of Children with Cochlear Implants: A Longitudinal Study

    ERIC Educational Resources Information Center

    Liker, Marko; Mildner, Vesna; Sindija, Branka

    2007-01-01

    The aim of the study was to analyse the speech of the children with cochlear implants, and compare it with the speech of hearing controls. We focused on three categories of Croatian sounds: vowels (F1 and F2 frequencies), fricatives (noise frequencies of /s/ and /[esh]/ ), and affricates (total duration and the pattern of stop-fricative components…

  6. Designing acoustics for linguistically diverse classrooms: Effects of background noise, reverberation and talker foreign accent on speech comprehension by native and non-native English-speaking listeners

    NASA Astrophysics Data System (ADS)

    Peng, Zhao Ellen

    The current classroom acoustics standard (ANSI S12.60-2010) recommends core learning spaces not to exceed background noise level (BNL) of 35 dBA and reverberation time (RT) of 0.6 second, based on speech intelligibility performance mainly by the native English-speaking population. Existing literature has not correlated these recommended values well with student learning outcomes. With a growing population of non-native English speakers in American classrooms, the special needs for perceiving degraded speech among non-native listeners, either due to realistic room acoustics or talker foreign accent, have not been addressed in the current standard. This research seeks to investigate the effects of BNL and RT on the comprehension of English speech from native English and native Mandarin Chinese talkers as perceived by native and non-native English listeners, and to provide acoustic design guidelines to supplement the existing standard. This dissertation presents two studies on the effects of RT and BNL on more realistic classroom learning experiences. How do native and non-native English-speaking listeners perform on speech comprehension tasks under adverse acoustic conditions, if the English speech is produced by talkers of native English (Study 1) versus native Mandarin Chinese (Study 2)? Speech comprehension materials were played back in a listening chamber to individual listeners: native and non-native English-speaking in Study 1; native English, native Mandarin Chinese, and other non-native English-speaking in Study 2. Each listener was screened for baseline English proficiency level, and completed dual tasks simultaneously involving speech comprehension and adaptive dot-tracing under 15 acoustic conditions, comprised of three BNL conditions (RC-30, 40, and 50) and five RT scenarios (0.4 to 1.2 seconds). The results show that BNL and RT negatively affect both objective performance and subjective perception of speech comprehension, more severely for non

  7. Associations between voice ergonomic risk factors and acoustic features of the voice.

    PubMed

    Rantala, Leena M; Hakala, Suvi; Holmqvist, Sofia; Sala, Eeva

    2015-10-01

    The associations between voice ergonomic risk factors in 40 classrooms and the acoustic parameters of 40 schoolteachers' voices were investigated. The risk factors assessed were connected to participants' working practices, working postures, and the indoor air quality in their workplaces. The teachers recorded spontaneous speech and sustained /a/ before and after a working day. Fundamental frequency, sound pressure level, the slope of the spectrum, perturbation, and harmonic-to-noise ratio were analysed. The results showed that the more the voice ergonomic risk factors were involved, the louder the teachers' voices became. Working practices correlated most often with the acoustic parameters; associations were found especially before a working day. The results suggest that a risky voice ergonomic environment affects voice production. PMID:24007529

  8. Vibrotactile perception of suprasegmental features of speech: a comparison of single-channel and multichannel instruments.

    PubMed

    Carney, A E; Beachler, C R

    1986-01-01

    The recognition of three suprasegmental aspects of speech--the number of syllables in a word, the stress pattern of a word, and rising or falling intonation patterns--through a single-channel tactile device and through a 24-channel tactile vocoder, using two groups of normal-hearing subjects, was compared. All subjects received an initial pretest on three recognition tasks, one for each prosodic feature. Half the subjects from each group then received 12 h of training with feedback on the tasks and stimuli used in the pretest. All subjects received a post-test which contained physically different stimuli from those previously tested. Performance was significantly better on the syllable-number and syllabic stress tasks with the single-channel than with the multichannel device on both the pre- and post-tests; no difference was found for the intonation task. Performance on the post-test was poorer for all trained subjects compared to their final training results, suggesting that cues learned in training were not readily transferable to new stimuli, even those with similar prosodic characteristics. Overall, the results provide support for the notion that certain prosodic features of speech may be conveyed more readily when the waveform envelope is preserved. PMID:3944340

  9. Prosodic Contrasts in Ironic Speech

    ERIC Educational Resources Information Center

    Bryant, Gregory A.

    2010-01-01

    Prosodic features in spontaneous speech help disambiguate implied meaning not explicit in linguistic surface structure, but little research has examined how these signals manifest themselves in real conversations. Spontaneously produced verbal irony utterances generated between familiar speakers in conversational dyads were acoustically analyzed…

  10. Evaluation of Spectral and Prosodic Features of Speech Affected by Orthodontic Appliances Using the Gmm Classifier

    NASA Astrophysics Data System (ADS)

    Přibil, Jiří; Přibilová, Anna; Ďuračkoá, Daniela

    2014-01-01

    The paper describes our experiment with using the Gaussian mixture models (GMM) for classification of speech uttered by a person wearing orthodontic appliances. For the GMM classification, the input feature vectors comprise the basic and the complementary spectral properties as well as the supra-segmental parameters. Dependence of classification correctness on the number of the parameters in the input feature vector and on the computation complexity is also evaluated. In addition, an influence of the initial setting of the parameters for GMM training process was analyzed. Obtained recognition results are compared visually in the form of graphs as well as numerically in the form of tables and confusion matrices for tested sentences uttered using three configurations of orthodontic appliances.

  11. A Combination of Vocal Frequency Dynamic and Summary Features Discriminates between Three Pragmatic Categories of Infant-Directed Speech.

    ERIC Educational Resources Information Center

    Katz, Gary S.; And Others

    1996-01-01

    Assessed the relative contribution of dynamic and summary features of vocal frequency to the discrimination of pragmatic categories in infant-directed speech. Forty-nine mothers were instructed to use their voice to get their infant's attention, show approval, and provide comfort. Findings suggest that both dynamic and summary features are…

  12. Acoustic features to arousal and identity in disturbance calls of tree shrews (Tupaia belangeri).

    PubMed

    Schehka, Simone; Zimmermann, Elke

    2009-11-01

    Across mammalian species, comparable morphological and physiological constraints in the production of airborne vocalisations are suggested to lead to commonalities in the vocal conveyance of acoustic features to specific attributes of callers, such as arousal and individual identity. To explore this hypothesis we examined intra- and interindividual acoustic variation in chatter calls of tree shrews (Tupaia belangeri). The calls were induced experimentally by a disturbance paradigm and related to two defined arousal states of a subject. The arousal state of an animal was primarily operationalised by the habituation of the subject to a new environment and additionally determined by behavioural indicators of stress in tree shrews (tail-position and piloerection). We investigated whether the arousal state and indexical features of the caller, namely individual identity and sex, are conveyed acoustically. Frame-by-frame videographic and multiparametric sound analyses revealed that arousal and identity, but not sex of a caller reliably predicted spectral-temporal variation in sound structure. Furthermore, there was no effect of age or body weight on individual-specific acoustic features. Similar results in another call type of tree shrews and comparable findings in other mammalian lineages provide evidence that comparable physiological and morphological constraints in the production of airborne vocalisations across mammals lead to commonalities in acoustic features conveying arousal and identity, respectively. PMID:19445967

  13. Effect of Digital Frequency Compression (DFC) on Speech Recognition in Candidates for Combined Electric and Acoustic Stimulation (EAS)

    PubMed Central

    Gifford, René H.; Dorman, Michael F.; Spahr, Anthony J.; McKarns, Sharon A.

    2008-01-01

    Purpose To compare the effects of conventional amplification (CA) and digital frequency compression (DFC) amplification on the speech recognition abilities of candidates for a partial-insertion cochlear implant, that is, candidates for combined electric and acoustic stimulation (EAS). Method The participants were 6 patients whose audiometric thresholds at 500 Hz and below were ≤60 dB HL and whose thresholds at 2000 Hz and above were ≥80 dB HL. Six tests of speech understanding were administered with CA and DFC. The Abbreviated Profile of Hearing Aid Benefit (APHAB) was also administered following use of CA and DFC. Results Group mean scores were not statistically different in the CA and DFC conditions. However, 2 patients received substantial benefit in DFC conditions. APHAB scores suggested increased ease of communication, but also increased aversive sound quality. Conclusion Results suggest that a relatively small proportion of individuals who meet EAS candidacy will receive substantial benefit from a DFC hearing aid and that a larger proportion will receive at least a small benefit when speech is presented against a background of noise. This benefit, however, comes at a cost—aversive sound quality. PMID:17905905

  14. Fifty years of progress in acoustic phonetics

    NASA Astrophysics Data System (ADS)

    Stevens, Kenneth N.

    2004-10-01

    Three events that occurred 50 or 60 years ago shaped the study of acoustic phonetics, and in the following few decades these events influenced research and applications in speech disorders, speech development, speech synthesis, speech recognition, and other subareas in speech communication. These events were: (1) the source-filter theory of speech production (Chiba and Kajiyama; Fant); (2) the development of the sound spectrograph and its interpretation (Potter, Kopp, and Green; Joos); and (3) the birth of research that related distinctive features to acoustic patterns (Jakobson, Fant, and Halle). Following these events there has been systematic exploration of the articulatory, acoustic, and perceptual bases of phonological categories, and some quantification of the sources of variability in the transformation of this phonological representation of speech into its acoustic manifestations. This effort has been enhanced by studies of how children acquire language in spite of this variability and by research on speech disorders. Gaps in our knowledge of this inherent variability in speech have limited the directions of applications such as synthesis and recognition of speech, and have led to the implementation of data-driven techniques rather than theoretical principles. Some examples of advances in our knowledge, and limitations of this knowledge, are reviewed.

  15. Spectral Features for Perceptually Natural Phoneme Replacement by Another Speaker's Speech

    NASA Astrophysics Data System (ADS)

    Takou, Reiko; Segi, Hiroyuki; Takagi, Tohru; Seiyama, Nobumasa

    The frequency regions and spectral features that can be used to measure the perceived similarity and continuity of voice quality are reported here. A perceptual evaluation test was conducted to assess the naturalness of spoken sentences in which either a vowel or a long vowel of the original speaker was replaced by that of another. Correlation analysis between the evaluation score and the spectral feature distance was conducted to select the spectral features that were expected to be effective in measuring the voice quality and to identify the appropriate speech segment of another speaker. The mel-frequency cepstrum coefficient (MFCC) and the spectral center of gravity (COG) in the low-, middle-, and high-frequency regions were selected. A perceptual paired comparison test was carried out to confirm the effectiveness of the spectral features. The results showed that the MFCC was effective for spectra across a wide range of frequency regions, the COG was effective in the low- and high-frequency regions, and the effective spectral features differed among the original speakers.

  16. Cross-Channel Amplitude Sweeps Are Crucial to Speech Intelligibility

    ERIC Educational Resources Information Center

    Prendergast, Garreth; Green, Gary G. R.

    2012-01-01

    Classical views of speech perception argue that the static and dynamic characteristics of spectral energy peaks (formants) are the acoustic features that underpin phoneme recognition. Here we use representations where the amplitude modulations of sub-band filtered speech are described, precisely, in terms of co-sinusoidal pulses. These pulses are…

  17. A Model for Speech Processing in Second Language Listening Activities

    ERIC Educational Resources Information Center

    Zoghbor, Wafa Shahada

    2016-01-01

    Teachers' understanding of the process of speech perception could inform practice in listening classrooms. Catford (1950) developed a model for speech perception taking into account the influence of the acoustic features of the linguistic forms used by the speaker, whereby the listener "identifies" and "interprets" these…

  18. Coding of acoustic features for a single-channel tactile aid.

    PubMed

    Summers, I R; Milnes, P; Stevens, J C; Cooper, P G

    1996-08-01

    Measurements have been made on the discrimination of speech contrasts on the basis of single-channel vibrotactile presentation of a variety of speech-derived signals, coded as amplitude- and frequency-modulated pulse trains. Stimulation was at the index fingertip. The signals chosen for tactile presentation were the speech amplitude envelope, the voice fundamental frequency FO and the zero-crossing frequency in the 1.3-6.6 kHz band. "Two-feature' codings, which present two of these signals simultaneously (one coded as stimulus frequency and one coded as stimulus amplitude), were found to be no more effective than "single feature' codings which present only one signal (coded as both amplitude and frequency). Scores for consonant discrimination were highest for the single-feature coding of zero-crossing frequency, although differences between the codings were not, in general, significant. Scores for emphatic-stress discrimination were highest for the single-feature coding of F0, and this coding produced best results overall. A practical wrist-worn device, whose design is influenced by these experimental results is briefly described. PMID:8879689

  19. Acoustic Variations in Adductor Spasmodic Dysphonia as a Function of Speech Task.

    ERIC Educational Resources Information Center

    Sapienza, Christine M.; Walton, Suzanne; Murry, Thomas

    1999-01-01

    Acoustic phonatory events were identified in 14 women diagnosed with adductor spasmodic dysphonia (ADSD), a focal laryngeal dystonia that disturbs phonatory function, and compared with those of 14 age-matched women with no vocal dysfunction. Findings indicated ADSD subjects produced more aberrant acoustic events than controls during tasks of…

  20. Comparison of character-level and part of speech features for name recognition in biomedical texts.

    PubMed

    Collier, Nigel; Takeuchi, Koichi

    2004-12-01

    The immense volume of data which is now available from experiments in molecular biology has led to an explosion in reported results most of which are available only in unstructured text format. For this reason there has been great interest in the task of text mining to aid in fact extraction, document screening, citation analysis, and linkage with large gene and gene-product databases. In particular there has been an intensive investigation into the named entity (NE) task as a core technology in all of these tasks which has been driven by the availability of high volume training sets such as the GENIA v3.02 corpus. Despite such large training sets accuracy for biology NE has proven to be consistently far below the high levels of performance in the news domain where F scores above 90 are commonly reported which can be considered near to human performance. We argue that it is crucial that more rigorous analysis of the factors that contribute to the model's performance be applied to discover where the underlying limitations are and what our future research direction should be. Our investigation in this paper reports on variations of two widely used feature types, part of speech (POS) tags and character-level orthographic features, and makes a comparison of how these variations influence performance. We base our experiments on a proven state-of-the-art model, support vector machines using a high quality subset of 100 annotated MEDLINE abstracts. Experiments reveal that the best performing features are orthographic features with F score of 72.6. Although the Brill tagger trained in-domain on the GENIA v3.02p POS corpus gives the best overall performance of any POS tagger, at an F score of 68.6, this is still significantly below the orthographic features. In combination these two features types appear to interfere with each other and degrade performance slightly to an F score of 72.3. PMID:15542016

  1. Acoustic features of infant vocalic utterances at 3, 6, and 9 months.

    PubMed

    Kent, R D; Murray, A D

    1982-08-01

    Recordings were obtained of the comfort-state vocalizations of infants at 3, 6, and 9 months of age during a session of play and vocal interaction with the infant's mother and the experimenter. Acoustic analysis, primarily spectrography, was used to determine utterance durations, formant frequencies of vocalic utterances, patterns of f0 frequency change during vocalizations, variations in source excitation of the vocal tract, and general properties of the utterances. Most utterances had durations of less than 400 ms although occasional sounds lasted 2 s or more. An increase in the ranges of both the F1 and F2 frequencies was observed across both periods of age increase, but the center of the F1-F2 plot for the group vowels appeared to change very little. Phonatory characteristics were at least generally compatible with published descriptions of infant cry. The f0 frequency averaged 445 Hz for 3-month-olds, 450 Hz for 6-month-olds, and 415 Hz for 9-month-olds. As has been previously reported for infant cry, the vocalizations frequently were associated with tremor (vibrato), harmonic doubling, abrupt f0 shift, vocal fry (or roll), and noise segments. Thus, from a strictly acoustic perspective, early cry and the later vocalizations of cooing and babbling appear to be vocal performances in continuity. Implications of the acoustic analyses are discussed for phonetic development and speech acquisition. PMID:7119278

  2. A human vocal utterance corpus for perceptual and acoustic analysis of speech, singing, and intermediate vocalizations

    NASA Astrophysics Data System (ADS)

    Gerhard, David

    2002-11-01

    In this paper we present the collection and annotation process of a corpus of human utterance vocalizations used for speech and song research. The corpus was collected to fill a void in current research tools, since no corpus currently exists which is useful for the classification of intermediate utterances between speech and monophonic singing. Much work has been done in the domain of speech versus music discrimination, and several corpora exist which can be used for this research. A specific example is the work done by Eric Scheirer and Malcom Slaney [IEEE ICASSP, 1997, pp. 1331-1334]. The collection of the corpus is described including questionnaire design and intended and actual response characteristics, as well as the collection and annotation of pre-existing samples. The annotation of the corpus consisted of a survey tool for a subset of the corpus samples, including ratings of the clips based on a speech-song continuum, and questions on the perceptual qualities of speech and song, both generally and corresponding to particular clips in the corpus.

  3. Perception of Suprasegmental Speech Features via Bimodal Stimulation: Cochlear Implant on One Ear and Hearing Aid on the Other

    ERIC Educational Resources Information Center

    Most, Tova; Harel, Tamar; Shpak, Talma; Luntz, Michal

    2011-01-01

    Purpose: The purpose of the study was to evaluate the contribution of acoustic hearing to the perception of suprasegmental features by adults who use a cochlear implant (CI) and a hearing aid (HA) in opposite ears. Method: 23 adults participated in this study. Perception of suprasegmental features--intonation, syllable stress, and word…

  4. Comparison of Acoustic and Kinematic Approaches to Measuring Utterance-Level Speech Variability

    ERIC Educational Resources Information Center

    Howell, Peter; Anderson, Andrew J.; Bartrip, Jon; Bailey, Eleanor

    2009-01-01

    Purpose: The spatiotemporal index (STI) is one measure of variability. As currently implemented, kinematic data are used, requiring equipment that cannot be used with some patient groups or in scanners. An experiment is reported that addressed whether STI can be extended to an audio measure of sound pressure of the speech envelope over time that…

  5. The Exploitation of Subphonemic Acoustic Detail in L2 Speech Segmentation

    ERIC Educational Resources Information Center

    Shoemaker, Ellenor

    2014-01-01

    The current study addresses an aspect of second language (L2) phonological acquisition that has received little attention to date--namely, the acquisition of allophonic variation as a word boundary cue. The role of subphonemic variation in the segmentation of speech by native speakers has been indisputably demonstrated; however, the acquisition of…

  6. Impact of Aberrant Acoustic Properties on the Perception of Sound Quality in Electrolarynx Speech

    ERIC Educational Resources Information Center

    Meltzner, Geoffrey S.; Hillman, Robert E.

    2005-01-01

    A large percentage of patients who have undergone laryngectomy to treat advanced laryngeal cancer rely on an electrolarynx (EL) to communicate verbally. Although serviceable, EL speech is plagued by shortcomings in both sound quality and intelligibility. This study sought to better quantify the relative contributions of previously identified…

  7. Neuromagnetic Evidence for a Featural Distinction of English Consonants: Sensor- and Source-Space Data

    ERIC Educational Resources Information Center

    Scharinger, Mathias; Merickel, Jennifer; Riley, Joshua; Idsardi, William J.

    2011-01-01

    Speech sounds can be classified on the basis of their underlying articulators or on the basis of the acoustic characteristics resulting from particular articulatory positions. Research in speech perception suggests that distinctive features are based on both articulatory and acoustic information. In recent years, neuroelectric and neuromagnetic…

  8. New Features Using Robust MVDR Spectrum of Filtered Autocorrelation Sequence for Robust Speech Recognition

    PubMed Central

    Gazor, Saeed

    2013-01-01

    This paper presents a novel noise-robust feature extraction method for speech recognition using the robust perceptual minimum variance distortionless response (MVDR) spectrum of temporally filtered autocorrelation sequence. The perceptual MVDR spectrum of the filtered short-time autocorrelation sequence can reduce the effects of residue of the nonstationary additive noise which remains after filtering the autocorrelation. To achieve a more robust front-end, we also modify the robust distortionless constraint of the MVDR spectral estimation method via revised weighting of the subband power spectrum values based on the sub-band signal to noise ratios (SNRs), which adjusts it to the new proposed approach. This new function allows the components of the input signal at the frequencies least affected by noise to pass with larger weights and attenuates more effectively the noisy and undesired components. This modification results in reduction of the noise residuals of the estimated spectrum from the filtered autocorrelation sequence, thereby leading to a more robust algorithm. Our proposed method, when evaluated on Aurora 2 task for recognition purposes, outperformed all Mel frequency cepstral coefficients (MFCC) as the baseline, relative autocorrelation sequence MFCC (RAS-MFCC), and the MVDR-based features in several different noisy conditions. PMID:24501584

  9. Alarming features: birds use specific acoustic properties to identify heterospecific alarm calls

    PubMed Central

    Fallow, Pamela M.; Pitcher, Benjamin J.; Magrath, Robert D.

    2013-01-01

    Vertebrates that eavesdrop on heterospecific alarm calls must distinguish alarms from sounds that can safely be ignored, but the mechanisms for identifying heterospecific alarm calls are poorly understood. While vertebrates learn to identify heterospecific alarms through experience, some can also respond to unfamiliar alarm calls that are acoustically similar to conspecific alarm calls. We used synthetic calls to test the role of specific acoustic properties in alarm call identification by superb fairy-wrens, Malurus cyaneus. Individuals fled more often in response to synthetic calls with peak frequencies closer to those of conspecific calls, even if other acoustic features were dissimilar to that of fairy-wren calls. Further, they then spent more time in cover following calls that had both peak frequencies and frequency modulation rates closer to natural fairy-wren means. Thus, fairy-wrens use similarity in specific acoustic properties to identify alarms and adjust a two-stage antipredator response. Our study reveals how birds respond to heterospecific alarm calls without experience, and, together with previous work using playback of natural calls, shows that both acoustic similarity and learning are important for interspecific eavesdropping. More generally, this study reconciles contrasting views on the importance of alarm signal structure and learning in recognition of heterospecific alarms. PMID:23303539

  10. Phonetic Feature Encoding in Human Superior Temporal Gyrus

    PubMed Central

    Mesgarani, Nima; Cheung, Connie; Johnson, Keith; Chang, Edward F.

    2015-01-01

    During speech perception, linguistic elements such as consonants and vowels are extracted from a complex acoustic speech signal. The superior temporal gyrus (STG) participates in high-order auditory processing of speech, but how it encodes phonetic information is poorly understood. We used high-density direct cortical surface recordings in humans while they listened to natural, continuous speech to reveal the STG representation of the entire English phonetic inventory. At single electrodes, we found response selectivity to distinct phonetic features. Encoding of acoustic properties was mediated by a distributed population response. Phonetic features could be directly related to tuning for spectrotemporal acoustic cues, some of which were encoded in a nonlinear fashion or by integration of multiple cues. These findings demonstrate the acoustic-phonetic representation of speech in human STG. PMID:24482117

  11. Acoustic Analyses of Speech Sounds and Rhythms in Japanese- and English-Learning Infants

    PubMed Central

    Yamashita, Yuko; Nakajima, Yoshitaka; Ueda, Kazuo; Shimada, Yohko; Hirsh, David; Seno, Takeharu; Smith, Benjamin Alexander

    2013-01-01

    The purpose of this study was to explore developmental changes, in terms of spectral fluctuations and temporal periodicity with Japanese- and English-learning infants. Three age groups (15, 20, and 24 months) were selected, because infants diversify phonetic inventories with age. Natural speech of the infants was recorded. We utilized a critical-band-filter bank, which simulated the frequency resolution in adults’ auditory periphery. First, the correlations between the power fluctuations of the critical-band outputs represented by factor analysis were observed in order to see how the critical bands should be connected to each other, if a listener is to differentiate sounds in infants’ speech. In the following analysis, we analyzed the temporal fluctuations of factor scores by calculating autocorrelations. The present analysis identified three factors as had been observed in adult speech at 24 months of age in both linguistic environments. These three factors were shifted to a higher frequency range corresponding to the smaller vocal tract size of the infants. The results suggest that the vocal tract structures of the infants had developed to become adult-like configuration by 24 months of age in both language environments. The amount of utterances with periodic nature of shorter time increased with age in both environments. This trend was clearer in the Japanese environment. PMID:23450824

  12. Is the Linguistic Content of Speech Less Salient than Its Perceptual Features in Autism?

    ERIC Educational Resources Information Center

    Jarvinen-Pasley, Anna; Pasley, John; Heaton, Pamela

    2008-01-01

    Open-ended tasks are rarely used to investigate cognition in autism. No known studies have directly examined whether increased attention to the perceptual level of speech in autism might contribute to a reduced tendency to process language meaningfully. The present study investigated linguistic versus perceptual speech processing preferences.…

  13. Transitional Speech Features in the College Lecture. CATESOL Occasional Papers, No. 1.

    ERIC Educational Resources Information Center

    Cook, Margaret

    This paper examines the speech performance characteristic of the college lecturer. One of the most organized forms of speech performance, the lecture functions as a referential monologue and has a necessarily topical focus. Specifically dealt with are the ways in which lecturers introduce new topics, link together topical utterances, and close out…

  14. Changes in Speech Production in a Child with a Cochlear Implant: Acoustic and Kinematic Evidence.

    ERIC Educational Resources Information Center

    Goffman, Lisa; Ertmer, David J.; Erdle, Christa

    2002-01-01

    A method is presented for examining change in motor patterns used to produce linguistic contrasts. In this case study, the method is applied to a child who experienced hearing loss at age 3 and received a multi-channel cochlear implant at 7. Post-implant, acoustic durations showed a maturational change. (Contains references.) (Author/CR)

  15. Intelligibility of Telephone Speech for the Hearing Impaired When Various Microphones Are Used for Acoustic Coupling.

    ERIC Educational Resources Information Center

    Janota, Claus P.; Janota, Jeanette Olach

    1991-01-01

    Various candidate microphones were evaluated for acoustic coupling of hearing aids to a telephone receiver. Results from testing by 9 hearing-impaired adults found comparable listening performance with a pressure gradient microphone at a 10 decibel higher level of interfering noise than with a normal pressure-sensitive microphone. (Author/PB)

  16. Depression Diagnoses and Fundamental Frequency-Based Acoustic Cues in Maternal Infant-Directed Speech

    ERIC Educational Resources Information Center

    Porritt, Laura L.; Zinser, Michael C.; Bachorowski, Jo-Anne; Kaplan, Peter S.

    2014-01-01

    F[subscript 0]-based acoustic measures were extracted from a brief, sentence-final target word spoken during structured play interactions between mothers and their 3- to 14-month-old infants and were analyzed based on demographic variables and DSM-IV Axis-I clinical diagnoses and their common modifiers. F[subscript 0] range (?F[subscript 0]) was…

  17. Acoustic Correlates of Emphatic Stress in Central Catalan

    ERIC Educational Resources Information Center

    Nadeu, Marianna; Hualde, Jose Ignacio

    2012-01-01

    A common feature of public speech in Catalan is the placement of prominence on lexically unstressed syllables ("emphatic stress"). This paper presents an acoustic study of radio speech data. Instances of emphatic stress were perceptually identified. Within-word comparison between vowels with emphatic stress and vowels with primary lexical stress…

  18. Acoustic features of male baboon loud calls: Influences of context, age, and individuality

    NASA Astrophysics Data System (ADS)

    Fischer, Julia; Hammerschmidt, Kurt; Cheney, Dorothy L.; Seyfarth, Robert M.

    2002-03-01

    The acoustic structure of loud calls (``wahoos'') recorded from free-ranging male baboons (Papio cynocephalus ursinus) in the Moremi Game Reserve, Botswana, was examined for differences between and within contexts, using calls given in response to predators (alarm wahoos), during male contests (contest wahoos), and when a male had become separated from the group (contact wahoos). Calls were recorded from adolescent, subadult, and adult males. In addition, male alarm calls were compared with those recorded from females. Despite their superficial acoustic similarity, the analysis revealed a number of significant differences between alarm, contest, and contact wahoos. Contest wahoos are given at a much higher rate, exhibit lower frequency characteristics, have a longer ``hoo'' duration, and a relatively louder ``hoo'' portion than alarm wahoos. Contact wahoos are acoustically similar to contest wahoos, but are given at a much lower rate. Both alarm and contest wahoos also exhibit significant differences among individuals. Some of the acoustic features that vary in relation to age and sex presumably reflect differences in body size, whereas others are possibly related to male stamina and endurance. The finding that calls serving markedly different functions constitute variants of the same general call type suggests that the vocal production in nonhuman primates is evolutionarily constrained.

  19. Cortical speech and non-speech discrimination in relation to cognitive measures in preschool children.

    PubMed

    Kuuluvainen, Soila; Alku, Paavo; Makkonen, Tommi; Lipsanen, Jari; Kujala, Teija

    2016-03-01

    Effective speech sound discrimination at preschool age is known to be a prerequisite for the development of language skills and later literacy acquisition. However, the speech specificity of cortical discrimination skills in small children is currently not known, as previous research has either studied speech functions without comparison with non-speech sounds, or used much simpler sounds such as harmonic or sinusoidal tones as control stimuli. We investigated the cortical discrimination of five syllable features (consonant, vowel, vowel duration, fundamental frequency, and intensity), covering both segmental and prosodic phonetic changes, and their acoustically matched non-speech counterparts in 63 6-year-old typically developed children, by using a multi-feature mismatch negativity (MMN) paradigm. Each of the five investigated features elicited a unique pattern of differentiating negativities: an early differentiating negativity, MMN, and a late differentiating negativity. All five studied features showed speech-related enhancement of at least one of these responses, suggesting experience-related neural commitment in both phonetic and prosodic speech processing. In addition, the cognitive performance and language skills of the children were tested extensively. The speech-related neural enhancement was positively associated with the level of performance in several neurocognitive tasks, indicating a relationship between successful establishment of cortical memory traces for speech and enhanced cognitive functioning. The results contribute to the understanding of typical developmental trajectories of linguistic vs. non-linguistic auditory skills, and provide a reference for future studies investigating deficits in language-related disorders at preschool age. PMID:26647120

  20. Predicting Speech Intelligibility with a Multiple Speech Subsystems Approach in Children with Cerebral Palsy

    ERIC Educational Resources Information Center

    Lee, Jimin; Hustad, Katherine C.; Weismer, Gary

    2014-01-01

    Purpose: Speech acoustic characteristics of children with cerebral palsy (CP) were examined with a multiple speech subsystems approach; speech intelligibility was evaluated using a prediction model in which acoustic measures were selected to represent three speech subsystems. Method: Nine acoustic variables reflecting different subsystems, and…

  1. A Robust Approach For Acoustic Noise Suppression In Speech Using ANFIS

    NASA Astrophysics Data System (ADS)

    Martinek, Radek; Kelnar, Michal; Vanus, Jan; Bilik, Petr; Zidek, Jan

    2015-11-01

    The authors of this article deals with the implementation of a combination of techniques of the fuzzy system and artificial intelligence in the application area of non-linear noise and interference suppression. This structure used is called an Adaptive Neuro Fuzzy Inference System (ANFIS). This system finds practical use mainly in audio telephone (mobile) communication in a noisy environment (transport, production halls, sports matches, etc). Experimental methods based on the two-input adaptive noise cancellation concept was clearly outlined. Within the experiments carried out, the authors created, based on the ANFIS structure, a comprehensive system for adaptive suppression of unwanted background interference that occurs in audio communication and degrades the audio signal. The system designed has been tested on real voice signals. This article presents the investigation and comparison amongst three distinct approaches to noise cancellation in speech; they are LMS (least mean squares) and RLS (recursive least squares) adaptive filtering and ANFIS. A careful review of literatures indicated the importance of non-linear adaptive algorithms over linear ones in noise cancellation. It was concluded that the ANFIS approach had the overall best performance as it efficiently cancelled noise even in highly noise-degraded speech. Results were drawn from the successful experimentation, subjective-based tests were used to analyse their comparative performance while objective tests were used to validate them. Implementation of algorithms was experimentally carried out in Matlab to justify the claims and determine their relative performances.

  2. Nonlinear spectro-temporal features based on a cochlear model for automatic speech recognition in a noisy situation.

    PubMed

    Choi, Yong-Sun; Lee, Soo-Young

    2013-09-01

    A nonlinear speech feature extraction algorithm was developed by modeling human cochlear functions, and demonstrated as a noise-robust front-end for speech recognition systems. The algorithm was based on a model of the Organ of Corti in the human cochlea with such features as such as basilar membrane (BM), outer hair cells (OHCs), and inner hair cells (IHCs). Frequency-dependent nonlinear compression and amplification of OHCs were modeled by lateral inhibition to enhance spectral contrasts. In particular, the compression coefficients had frequency dependency based on the psychoacoustic evidence. Spectral subtraction and temporal adaptation were applied in the time-frame domain. With long-term and short-term adaptation characteristics, these factors remove stationary or slowly varying components and amplify the temporal changes such as onset or offset. The proposed features were evaluated with a noisy speech database and showed better performance than the baseline methods such as mel-frequency cepstral coefficients (MFCCs) and RASTA-PLP in unknown noisy conditions. PMID:23558292

  3. Research in continuous speech recognition

    NASA Astrophysics Data System (ADS)

    Schwartz, R. M.; Chow, Y. L.; Makhoul, J.

    1983-12-01

    This annual report describes the work performed during the past year in an ongoing effort to design and implement a system that performs phonetic recognition of continuous speech. The general approach used it to develop a Hidden Markov Model (HMM) of speech parameter movements, which can be used to distinguish among the different phonemes. The resulting phoneme models incorporate the contextural effects of neighboring phonemes. One main aspect of this research is to incorporate both spectral parameters and acoustic-phonetic features into the HMM formalism.

  4. Prenatal features of Pena-Shokeir sequence with atypical response to acoustic stimulation.

    PubMed

    Pittyanont, Sirida; Jatavan, Phudit; Suwansirikul, Songkiat; Tongsong, Theera

    2016-09-01

    A fetal sonographic screening examination performed at 23 weeks showed polyhydramnios, micrognathia, fixed postures of all long bones, but no movement and no breathing. The fetus showed fetal heart rate acceleration but no movement when acoustic stimulation was applied with artificial larynx. All these findings persisted on serial examinations. The neonate was stillborn at 37 weeks and a final diagnosis of Pena-Shokeir sequence was made. In addition to typical sonographic features of Pena-Shokeir sequence, fetal heart rate accelerations with no movement in response to acoustic stimulation suggests that peripheral myopathy may possibly play an important role in the pathogenesis of the disease. © 2016 Wiley Periodicals, Inc. J Clin Ultrasound 44:459-462, 2016. PMID:27312123

  5. Scanning Acoustic Microscopy for Characterization of Coatings and Near-Surface Features of Ceramics

    SciTech Connect

    Qu, Jun; Blau, Peter Julian

    2006-01-01

    Scanning Acoustic Microscopy (SAcM) has been widely used for non-destructive evaluation (NDE) in various fields such as material characterization, electronics, and biomedicine. SAcM uses high-frequency acoustic waves (60 MHz to 2.0 GHz) providing much higher resolution (up to 0.5 {micro}m) compared to conventional ultrasonic NDE, which is typically about 500 {micro}m. SAcM offers the ability to non-destructively image subsurface features and visualize the variations in elastic properties. These attributes make SAcM a valuable tool for characterizing near-surface material properties and detecting fine-scale flaws. This paper presents some recent applications of SAcM in detecting subsurface damage, assessing coatings, and visualizing residual stress for ceramic and semiconductor materials.

  6. Speech discrimination after early exposure to pulsed-noise or speech

    PubMed Central

    Ranasinghe, Kamalini G.; Carraway, Ryan S.; Borland, Michael S.; Moreno, Nicole A.; Hanacik, Elizabeth A.; Miller, Robert S.; Kilgard, Michael P

    2012-01-01

    Early experience of structured inputs and complex sound features generate lasting changes in tonotopy and receptive field properties of primary auditory cortex (A1). In this study we tested whether these changes are severe enough to alter neural representations and behavioral discrimination of speech. We exposed two groups of rat pups during the critical period of auditory development to pulsed noise or speech. Both groups of rats were trained to discriminate speech sounds when they were young adults, and anesthetized neural responses were recorded from A1. The representation of speech in A1 and behavioral discrimination of speech remained robust to altered spectral and temporal characteristics of A1 neurons after pulsed-noise exposure. Exposure to passive speech during early development provided no added advantage in speech sound processing. Speech training increased A1 neuronal firing rate for speech stimuli in naïve rats, but did not increase responses in rats that experienced early exposure to pulsed noise or speech. Our results suggest that speech sound processing is resistant to changes in simple neural response properties caused by manipulating early acoustic environment. PMID:22575207

  7. Intelligibility Assessment of Ideal Binary-Masked Noisy Speech with Acceptance of Room Acoustic

    NASA Astrophysics Data System (ADS)

    Vladimír, Sedlak; Daniela, Durackova; Roman, Zalusky; Tomas, Kovacik

    2015-01-01

    In this paper the intelligibility of ideal binary-masked noisy signal is evaluated for different signal to noise ratio (SNR), mask error, masker types, distance between source and receiver, reverberation time and local criteria for forming the binary mask. The ideal binary mask is computed from time-frequency decompositions of target and masker signals by thresholding the local SNR within time-frequency units. The intelligibility of separated signal is measured using different objective measures computed in frequency and perceptual domain. The present study replicates and extends the findings which were already presented but mainly shows impact of room acoustic on the intelligibility performance of IBM technique.

  8. Speech Modification by a Deaf Child through Dynamic Orometric Modeling and Feedback.

    ERIC Educational Resources Information Center

    Fletcher, Samuel G.; Hasegawa, Akira

    1983-01-01

    A three and one-half-year-old profoundly deaf girl, whose physiologic, acoustic, and phonetic data indicated poor speech production, rapidly learned goal articulation gestures (positional and timing features of speech) after visual articulatory modeling and feedbck on tongue position with a microprocessor based instrument and video display.…

  9. Effects of speech style, room acoustics, and vocal fatigue on vocal effort.

    PubMed

    Bottalico, Pasquale; Graetzer, Simone; Hunter, Eric J

    2016-05-01

    Vocal effort is a physiological measure that accounts for changes in voice production as vocal loading increases. It has been quantified in terms of sound pressure level (SPL). This study investigates how vocal effort is affected by speaking style, room acoustics, and short-term vocal fatigue. Twenty subjects were recorded while reading a text at normal and loud volumes in anechoic, semi-reverberant, and reverberant rooms in the presence of classroom babble noise. The acoustics in each environment were modified by creating a strong first reflection in the talker position. After each task, the subjects answered questions addressing their perception of the vocal effort, comfort, control, and clarity of their own voice. Variation in SPL for each subject was measured per task. It was found that SPL and self-reported effort increased in the loud style and decreased when the reflective panels were present and when reverberation time increased. Self-reported comfort and control decreased in the loud style, while self-reported clarity increased when panels were present. The lowest magnitude of vocal fatigue was experienced in the semi-reverberant room. The results indicate that early reflections may be used to reduce vocal effort without modifying reverberation time. PMID:27250179

  10. Acoustic evidence for the development of gestural coordination in the speech of 2-year-olds: a longitudinal study.

    PubMed

    Goodell, E W; Studdert-Kennedy, M

    1993-08-01

    Studies of child phonology have often assumed that young children first master a repertoire of phonemes and then build their lexicon by forming combinations of these abstract, contrastive units. However, evidence from children's systematic errors suggests that children first build a repertoire of words as integral sequences of gestures and then gradually differentiate these sequences into their gestural and segmental components. Recently, experimental support for this position has been found in the acoustic records of the speech of 3-, 5-, and 7-year-old children, suggesting that even in older children some phonemes have not yet fully segregated as units of gestural organization and control. The present longitudinal study extends this work to younger children (22- and 32-month-olds). Results demonstrate clear differences in the duration and coordination of gestures between children and adults, and a clear shift toward the patterns of adult speakers during roughly the third year of life. Details of the child-adult differences and developmental changes vary from one aspect of an utterance to another. PMID:8377484

  11. Improving robustness of speech recognition systems

    NASA Astrophysics Data System (ADS)

    Mitra, Vikramjit

    2010-11-01

    speech databases: X-ray microbeam and Aurora-2 were annotated, where the former was used to train a TV-estimator and the latter was used to train a Dynamic Bayesian Network (DBN) based ASR architecture. The DBN architecture used two sets of observation: (a) acoustic features in the form of mel-frequency cepstral coefficients (MFCCs) and (b) TVs (estimated from the acoustic speech signal). In this setup the articulatory gestures were modeled as hidden random variables, hence eliminating the necessity for explicit gesture recognition. Word recognition results using the DBN architecture indicate that articulatory representations not only can help to account for coarticulatory variations but can also significantly improve the noise robustness of ASR system.

  12. Subjective evaluation of speech and noise in learning environments in the realm of classroom acoustics: Results from laboratory and field experiments

    NASA Astrophysics Data System (ADS)

    Meis, Markus; Nocke, Christian; Hofmann, Simone; Becker, Bernhard

    2005-04-01

    The impact of different acoustical conditions in learning environments on noise annoyance and the evaluation of speech quality were tested in a series of three experiments. In Experiment 1 (n=79) the auralization of seven classrooms with reverberation times from 0.55 to 3.21 s [average between 250 Hz to 2 kHz] served to develop a Semantic Differential, evaluating a simulated teacher's voice. Four factors were found: acoustical comfort, roughness, sharpness, and loudness. In Experiment 2, the effects of two classroom renovations were examined from a holistic perspective. The rooms were treated acoustically with acoustic ceilings (RT=0.5 s [250 Hz-2 kHz]) and muffling floor materials as well as non-acoustically with a new lighting system and color design. The results indicate that pupils (n=61) in renovated classrooms judged the simulated voice more positively, were less annoyed from the noise in classrooms, and were more motivated to participate in the lessons. In Experiment 3 the sound environments from six different lecture rooms (RT=0.8 to 1.39 s [250 Hz-2 kHz]) in two Universities of Oldenburg were evaluated by 321 students during the lectures. Evidence found supports the assumption that acoustical comfort in rooms is dependent on frequency for rooms with higher reverberation times.

  13. Music-induced emotions can be predicted from a combination of brain activity and acoustic features.

    PubMed

    Daly, Ian; Williams, Duncan; Hallowell, James; Hwang, Faustina; Kirke, Alexis; Malik, Asad; Weaver, James; Miranda, Eduardo; Nasuto, Slawomir J

    2015-12-01

    It is widely acknowledged that music can communicate and induce a wide range of emotions in the listener. However, music is a highly-complex audio signal composed of a wide range of complex time- and frequency-varying components. Additionally, music-induced emotions are known to differ greatly between listeners. Therefore, it is not immediately clear what emotions will be induced in a given individual by a piece of music. We attempt to predict the music-induced emotional response in a listener by measuring the activity in the listeners electroencephalogram (EEG). We combine these measures with acoustic descriptors of the music, an approach that allows us to consider music as a complex set of time-varying acoustic features, independently of any specific music theory. Regression models are found which allow us to predict the music-induced emotions of our participants with a correlation between the actual and predicted responses of up to r=0.234,p<0.001. This regression fit suggests that over 20% of the variance of the participant's music induced emotions can be predicted by their neural activity and the properties of the music. Given the large amount of noise, non-stationarity, and non-linearity in both EEG and music, this is an encouraging result. Additionally, the combination of measures of brain activity and acoustic features describing the music played to our participants allows us to predict music-induced emotions with significantly higher accuracies than either feature type alone (p<0.01). PMID:26544602

  14. Fatigue level estimation of monetary bills based on frequency band acoustic signals with feature selection by supervised SOM

    NASA Astrophysics Data System (ADS)

    Teranishi, Masaru; Omatu, Sigeru; Kosaka, Toshihisa

    Fatigued monetary bills adversely affect the daily operation of automated teller machines (ATMs). In order to make the classification of fatigued bills more efficient, the development of an automatic fatigued monetary bill classification method is desirable. We propose a new method by which to estimate the fatigue level of monetary bills from the feature-selected frequency band acoustic energy pattern of banking machines. By using a supervised self-organizing map (SOM), we effectively estimate the fatigue level using only the feature-selected frequency band acoustic energy pattern. Furthermore, the feature-selected frequency band acoustic energy pattern improves the estimation accuracy of the fatigue level of monetary bills by adding frequency domain information to the acoustic energy pattern. The experimental results with real monetary bill samples reveal the effectiveness of the proposed method.

  15. Speech input and output

    NASA Astrophysics Data System (ADS)

    Class, F.; Mangold, H.; Stall, D.; Zelinski, R.

    1981-12-01

    Possibilities for acoustical dialogs with electronic data processing equipment were investigated. Speech recognition is posed as recognizing word groups. An economical, multistage classifier for word string segmentation is presented and its reliability in dealing with continuous speech (problems of temporal normalization and context) is discussed. Speech synthesis is considered in terms of German linguistics and phonetics. Preprocessing algorithms for total synthesis of written texts were developed. A macrolanguage, MUSTER, is used to implement this processing in an acoustic data information system (ADES).

  16. Hemispheric Asymmetries in Speech Perception: Sense, Nonsense and Modulations

    PubMed Central

    Rosen, Stuart; Wise, Richard J. S.; Chadha, Shabneet; Conway, Eleanor-Jayne; Scott, Sophie K.

    2011-01-01

    Background The well-established left hemisphere specialisation for language processing has long been claimed to be based on a low-level auditory specialization for specific acoustic features in speech, particularly regarding ‘rapid temporal processing’. Methodology A novel analysis/synthesis technique was used to construct a variety of sounds based on simple sentences which could be manipulated in spectro-temporal complexity, and whether they were intelligible or not. All sounds consisted of two noise-excited spectral prominences (based on the lower two formants in the original speech) which could be static or varying in frequency and/or amplitude independently. Dynamically varying both acoustic features based on the same sentence led to intelligible speech but when either or both acoustic features were static, the stimuli were not intelligible. Using the frequency dynamics from one sentence with the amplitude dynamics of another led to unintelligible sounds of comparable spectro-temporal complexity to the intelligible ones. Positron emission tomography (PET) was used to compare which brain regions were active when participants listened to the different sounds. Conclusions Neural activity to spectral and amplitude modulations sufficient to support speech intelligibility (without actually being intelligible) was seen bilaterally, with a right temporal lobe dominance. A left dominant response was seen only to intelligible sounds. It thus appears that the left hemisphere specialisation for speech is based on the linguistic properties of utterances, not on particular acoustic features. PMID:21980349

  17. A multimodal spectral approach to characterize rhythm in natural speech.

    PubMed

    Alexandrou, Anna Maria; Saarinen, Timo; Kujala, Jan; Salmelin, Riitta

    2016-01-01

    Human utterances demonstrate temporal patterning, also referred to as rhythm. While simple oromotor behaviors (e.g., chewing) feature a salient periodical structure, conversational speech displays a time-varying quasi-rhythmic pattern. Quantification of periodicity in speech is challenging. Unimodal spectral approaches have highlighted rhythmic aspects of speech. However, speech is a complex multimodal phenomenon that arises from the interplay of articulatory, respiratory, and vocal systems. The present study addressed the question of whether a multimodal spectral approach, in the form of coherence analysis between electromyographic (EMG) and acoustic signals, would allow one to characterize rhythm in natural speech more efficiently than a unimodal analysis. The main experimental task consisted of speech production at three speaking rates; a simple oromotor task served as control. The EMG-acoustic coherence emerged as a sensitive means of tracking speech rhythm, whereas spectral analysis of either EMG or acoustic amplitude envelope alone was less informative. Coherence metrics seem to distinguish and highlight rhythmic structure in natural speech. PMID:26827019

  18. Automatic Speech Recognition Based on Electromyographic Biosignals

    NASA Astrophysics Data System (ADS)

    Jou, Szu-Chen Stan; Schultz, Tanja

    This paper presents our studies of automatic speech recognition based on electromyographic biosignals captured from the articulatory muscles in the face using surface electrodes. We develop a phone-based speech recognizer and describe how the performance of this recognizer improves by carefully designing and tailoring the extraction of relevant speech feature toward electromyographic signals. Our experimental design includes the collection of audibly spoken speech simultaneously recorded as acoustic data using a close-speaking microphone and as electromyographic signals using electrodes. Our experiments indicate that electromyographic signals precede the acoustic signal by about 0.05-0.06 seconds. Furthermore, we introduce articulatory feature classifiers, which had recently shown to improved classical speech recognition significantly. We describe that the classification accuracy of articulatory features clearly benefits from the tailored feature extraction. Finally, these classifiers are integrated into the overall decoding framework applying a stream architecture. Our final system achieves a word error rate of 29.9% on a 100-word recognition task.

  19. Neural Oscillations Carry Speech Rhythm through to Comprehension

    PubMed Central

    Peelle, Jonathan E.; Davis, Matthew H.

    2012-01-01

    A key feature of speech is the quasi-regular rhythmic information contained in its slow amplitude modulations. In this article we review the information conveyed by speech rhythm, and the role of ongoing brain oscillations in listeners’ processing of this content. Our starting point is the fact that speech is inherently temporal, and that rhythmic information conveyed by the amplitude envelope contains important markers for place and manner of articulation, segmental information, and speech rate. Behavioral studies demonstrate that amplitude envelope information is relied upon by listeners and plays a key role in speech intelligibility. Extending behavioral findings, data from neuroimaging – particularly electroencephalography (EEG) and magnetoencephalography (MEG) – point to phase locking by ongoing cortical oscillations to low-frequency information (~4–8 Hz) in the speech envelope. This phase modulation effectively encodes a prediction of when important events (such as stressed syllables) are likely to occur, and acts to increase sensitivity to these relevant acoustic cues. We suggest a framework through which such neural entrainment to speech rhythm can explain effects of speech rate on word and segment perception (i.e., that the perception of phonemes and words in connected speech is influenced by preceding speech rate). Neuroanatomically, acoustic amplitude modulations are processed largely bilaterally in auditory cortex, with intelligible speech resulting in differential recruitment of left-hemisphere regions. Notable among these is lateral anterior temporal cortex, which we propose functions in a domain-general fashion to support ongoing memory and integration of meaningful input. Together, the reviewed evidence suggests that low-frequency oscillations in the acoustic speech signal form the foundation of a rhythmic hierarchy supporting spoken language, mirrored by phase-locked oscillations in the human brain. PMID:22973251

  20. The correlation dimension: A robust chaotic feature for classifying acoustic emission signals generated in construction materials

    NASA Astrophysics Data System (ADS)

    Kacimi, S.; Laurens, S.

    2009-07-01

    In the field of acoustic emission (AE) source recognition, this paper presents a classification feature based on the paradigm of nonlinear dynamical systems, often referred to as chaos theory. The approach considers signals as time series expressing an underlying dynamical phenomenon and enclosing all the information regarding the dynamics. The scientific knowledge on nonlinear dynamical systems has considerably improved for the past 40 years. The dynamical behavior is analyzed in the phase space, which is the space generated by the state variables of the system. The time evolution of a system is expressed in the phase space by trajectories, and the asymptotic behavior of trajectories defines a space area which is referred to as a system attractor. Dynamical systems may be characterized by the topological properties of attractors, such as the correlation dimension, which is a fractal dimension. According to Takens theorem, even if the system is not clearly defined, it is possible to infer topological information about the attractor from experimental observations. Such a method, which is called phase space reconstruction, was successfully applied for the classification of acoustic emission waveforms propagating in more or less complex materials such as granite and concrete. Laboratory tests were carried out in order to collect numerous AE waveforms from various controlled acoustic sources. Then, each signal was processed to extract a reconstructed attractor from which the correlation dimension was computed. The first results of this research show that the correlation dimension assessed after phase space reconstruction is very relevant and robust for classifying AE signals. These promising results may be explained by the fact that the totality of the signal is used to achieve classifying information. Moreover, due to the self-similar nature of attractors, the correlation dimension, and thus a correlation dimension-based classification approach, is theoretically

  1. Acoustics

    NASA Astrophysics Data System (ADS)

    The acoustics research activities of the DLR fluid-mechanics department (Forschungsbereich Stroemungsmechanik) during 1988 are surveyed and illustrated with extensive diagrams, drawings, graphs, and photographs. Particular attention is given to studies of helicopter rotor noise (high-speed impulsive noise, blade/vortex interaction noise, and main/tail-rotor interaction noise), propeller noise (temperature, angle-of-attack, and nonuniform-flow effects), noise certification, and industrial acoustics (road-vehicle flow noise and airport noise-control installations).

  2. Neurophysiological evidence that musical training influences the recruitment of right hemispheric homologues for speech perception

    PubMed Central

    Jantzen, McNeel G.; Howe, Bradley M.; Jantzen, Kelly J.

    2014-01-01

    Musicians have a more accurate temporal and tonal representation of auditory stimuli than their non-musician counterparts (Musacchia et al., 2007; Parbery-Clark et al., 2009a; Zendel and Alain, 2009; Kraus and Chandrasekaran, 2010). Musicians who are adept at the production and perception of music are also more sensitive to key acoustic features of speech such as voice onset timing and pitch. Together, these data suggest that musical training may enhance the processing of acoustic information for speech sounds. In the current study, we sought to provide neural evidence that musicians process speech and music in a similar way. We hypothesized that for musicians, right hemisphere areas traditionally associated with music are also engaged for the processing of speech sounds. In contrast we predicted that in non-musicians processing of speech sounds would be localized to traditional left hemisphere language areas. Speech stimuli differing in voice onset time was presented using a dichotic listening paradigm. Subjects either indicated aural location for a specified speech sound or identified a specific speech sound from a directed aural location. Musical training effects and organization of acoustic features were reflected by activity in source generators of the P50. This included greater activation of right middle temporal gyrus and superior temporal gyrus in musicians. The findings demonstrate recruitment of right hemisphere in musicians for discriminating speech sounds and a putative broadening of their language network. Musicians appear to have an increased sensitivity to acoustic features and enhanced selective attention to temporal features of speech that is facilitated by musical training and supported, in part, by right hemisphere homologues of established speech processing regions of the brain. PMID:24624107

  3. Neurophysiological evidence that musical training influences the recruitment of right hemispheric homologues for speech perception.

    PubMed

    Jantzen, McNeel G; Howe, Bradley M; Jantzen, Kelly J

    2014-01-01

    Musicians have a more accurate temporal and tonal representation of auditory stimuli than their non-musician counterparts (Musacchia et al., 2007; Parbery-Clark et al., 2009a; Zendel and Alain, 2009; Kraus and Chandrasekaran, 2010). Musicians who are adept at the production and perception of music are also more sensitive to key acoustic features of speech such as voice onset timing and pitch. Together, these data suggest that musical training may enhance the processing of acoustic information for speech sounds. In the current study, we sought to provide neural evidence that musicians process speech and music in a similar way. We hypothesized that for musicians, right hemisphere areas traditionally associated with music are also engaged for the processing of speech sounds. In contrast we predicted that in non-musicians processing of speech sounds would be localized to traditional left hemisphere language areas. Speech stimuli differing in voice onset time was presented using a dichotic listening paradigm. Subjects either indicated aural location for a specified speech sound or identified a specific speech sound from a directed aural location. Musical training effects and organization of acoustic features were reflected by activity in source generators of the P50. This included greater activation of right middle temporal gyrus and superior temporal gyrus in musicians. The findings demonstrate recruitment of right hemisphere in musicians for discriminating speech sounds and a putative broadening of their language network. Musicians appear to have an increased sensitivity to acoustic features and enhanced selective attention to temporal features of speech that is facilitated by musical training and supported, in part, by right hemisphere homologues of established speech processing regions of the brain. PMID:24624107

  4. Early recognition of speech

    PubMed Central

    Remez, Robert E; Thomas, Emily F

    2013-01-01

    Classic research on the perception of speech sought to identify minimal acoustic correlates of each consonant and vowel. In explaining perception, this view designated momentary components of an acoustic spectrum as cues to the recognition of elementary phonemes. This conceptualization of speech perception is untenable given the findings of phonetic sensitivity to modulation independent of the acoustic and auditory form of the carrier. The empirical key is provided by studies of the perceptual organization of speech, a low-level integrative function that finds and follows the sensory effects of speech amid concurrent events. These projects have shown that the perceptual organization of speech is keyed to modulation; fast; unlearned; nonsymbolic; indifferent to short-term auditory properties; and organization requires attention. The ineluctably multisensory nature of speech perception also imposes conditions that distinguish language among cognitive systems. WIREs Cogn Sci 2013, 4:213–223. doi: 10.1002/wcs.1213 PMID:23926454

  5. Auditory emotion recognition impairments in Schizophrenia: Relationship to acoustic features and cognition

    PubMed Central

    Gold, Rinat; Butler, Pamela; Revheim, Nadine; Leitman, David; Hansen, John A.; Gur, Ruben; Kantrowitz, Joshua T.; Laukka, Petri; Juslin, Patrik N.; Silipo, Gail S.; Javitt, Daniel C.

    2013-01-01

    Objective Schizophrenia is associated with deficits in ability to perceive emotion based upon tone of voice. The basis for this deficit, however, remains unclear and assessment batteries remain limited. We evaluated performance in schizophrenia on a novel voice emotion recognition battery with well characterized physical features, relative to impairments in more general emotional and cognitive function. Methods We studied in a primary sample of 92 patients relative to 73 controls. Stimuli were characterized according to both intended emotion and physical features (e.g., pitch, intensity) that contributed to the emotional percept. Parallel measures of visual emotion recognition, pitch perception, general cognition, and overall outcome were obtained. More limited measures were obtained in an independent replication sample of 36 patients, 31 age-matched controls, and 188 general comparison subjects. Results Patients showed significant, large effect size deficits in voice emotion recognition (F=25.4, p<.00001, d=1.1), and were preferentially impaired in recognition of emotion based upon pitch-, but not intensity-features (group X feature interaction: F=7.79, p=.006). Emotion recognition deficits were significantly correlated with pitch perception impairments both across (r=56, p<.0001) and within (r=.47, p<.0001) group. Path analysis showed both sensory-specific and general cognitive contributions to auditory emotion recognition deficits in schizophrenia. Similar patterns of results were observed in the replication sample. Conclusions The present study demonstrates impairments in auditory emotion recognition in schizophrenia relative to acoustic features of underlying stimuli. Furthermore, it provides tools and highlights the need for greater attention to physical features of stimuli used for study of social cognition in neuropsychiatric disorders. PMID:22362394

  6. Formant-Frequency Variation and Informational Masking of Speech by Extraneous Formants: Evidence Against Dynamic and Speech-Specific Acoustical Constraints

    PubMed Central

    2014-01-01

    How speech is separated perceptually from other speech remains poorly understood. Recent research indicates that the ability of an extraneous formant to impair intelligibility depends on the variation of its frequency contour. This study explored the effects of manipulating the depth and pattern of that variation. Three formants (F1+F2+F3) constituting synthetic analogues of natural sentences were distributed across the 2 ears, together with a competitor for F2 (F2C) that listeners must reject to optimize recognition (left = F1+F2C; right = F2+F3). The frequency contours of F1 − F3 were each scaled to 50% of their natural depth, with little effect on intelligibility. Competitors were created either by inverting the frequency contour of F2 about its geometric mean (a plausibly speech-like pattern) or using a regular and arbitrary frequency contour (triangle wave, not plausibly speech-like) matched to the average rate and depth of variation for the inverted F2C. Adding a competitor typically reduced intelligibility; this reduction depended on the depth of F2C variation, being greatest for 100%-depth, intermediate for 50%-depth, and least for 0%-depth (constant) F2Cs. This suggests that competitor impact depends on overall depth of frequency variation, not depth relative to that for the target formants. The absence of tuning (i.e., no minimum in intelligibility for the 50% case) suggests that the ability to reject an extraneous formant does not depend on similarity in the depth of formant-frequency variation. Furthermore, triangle-wave competitors were as effective as their more speech-like counterparts, suggesting that the selection of formants from the ensemble also does not depend on speech-specific constraints. PMID:24842068

  7. Formant-frequency variation and informational masking of speech by extraneous formants: evidence against dynamic and speech-specific acoustical constraints.

    PubMed

    Roberts, Brian; Summers, Robert J; Bailey, Peter J

    2014-08-01

    How speech is separated perceptually from other speech remains poorly understood. Recent research indicates that the ability of an extraneous formant to impair intelligibility depends on the variation of its frequency contour. This study explored the effects of manipulating the depth and pattern of that variation. Three formants (F1+F2+F3) constituting synthetic analogues of natural sentences were distributed across the 2 ears, together with a competitor for F2 (F2C) that listeners must reject to optimize recognition (left = F1+F2C; right = F2+F3). The frequency contours of F1 - F3 were each scaled to 50% of their natural depth, with little effect on intelligibility. Competitors were created either by inverting the frequency contour of F2 about its geometric mean (a plausibly speech-like pattern) or using a regular and arbitrary frequency contour (triangle wave, not plausibly speech-like) matched to the average rate and depth of variation for the inverted F2C. Adding a competitor typically reduced intelligibility; this reduction depended on the depth of F2C variation, being greatest for 100%-depth, intermediate for 50%-depth, and least for 0%-depth (constant) F2Cs. This suggests that competitor impact depends on overall depth of frequency variation, not depth relative to that for the target formants. The absence of tuning (i.e., no minimum in intelligibility for the 50% case) suggests that the ability to reject an extraneous formant does not depend on similarity in the depth of formant-frequency variation. Furthermore, triangle-wave competitors were as effective as their more speech-like counterparts, suggesting that the selection of formants from the ensemble also does not depend on speech-specific constraints. PMID:24842068

  8. Comparison of spatial frequency domain features for the detection of side attack explosive ballistics in synthetic aperture acoustics

    NASA Astrophysics Data System (ADS)

    Dowdy, Josh; Anderson, Derek T.; Luke, Robert H.; Ball, John E.; Keller, James M.; Havens, Timothy C.

    2016-05-01

    Explosive hazards in current and former conflict zones are a threat to both military and civilian personnel. As a result, much effort has been dedicated to identifying automated algorithms and systems to detect these threats. However, robust detection is complicated due to factors like the varied composition and anatomy of such hazards. In order to solve this challenge, a number of platforms (vehicle-based, handheld, etc.) and sensors (infrared, ground penetrating radar, acoustics, etc.) are being explored. In this article, we investigate the detection of side attack explosive ballistics via a vehicle-mounted acoustic sensor. In particular, we explore three acoustic features, one in the time domain and two on synthetic aperture acoustic (SAA) beamformed imagery. The idea is to exploit the varying acoustic frequency profile of a target due to its unique geometry and material composition with respect to different viewing angles. The first two features build their angle specific frequency information using a highly constrained subset of the signal data and the last feature builds its frequency profile using all available signal data for a given region of interest (centered on the candidate target location). Performance is assessed in the context of receiver operating characteristic (ROC) curves on cross-validation experiments for data collected at a U.S. Army test site on different days with multiple target types and clutter. Our preliminary results are encouraging and indicate that the top performing feature is the unrolled two dimensional discrete Fourier transform (DFT) of SAA beamformed imagery.

  9. Hearing speech in music.

    PubMed

    Ekström, Seth-Reino; Borg, Erik

    2011-01-01

    The masking effect of a piano composition, played at different speeds and in different octaves, on speech-perception thresholds was investigated in 15 normal-hearing and 14 moderately-hearing-impaired subjects. Running speech (just follow conversation, JFC) testing and use of hearing aids increased the everyday validity of the findings. A comparison was made with standard audiometric noises [International Collegium of Rehabilitative Audiology (ICRA) noise and speech spectrum-filtered noise (SPN)]. All masking sounds, music or noise, were presented at the same equivalent sound level (50 dBA). The results showed a significant effect of piano performance speed and octave (P<.01). Low octave and fast tempo had the largest effect; and high octave and slow tempo, the smallest. Music had a lower masking effect than did ICRA noise with two or six speakers at normal vocal effort (P<.01) and SPN (P<.05). Subjects with hearing loss had higher masked thresholds than the normal-hearing subjects (P<.01), but there were smaller differences between masking conditions (P<.01). It is pointed out that music offers an interesting opportunity for studying masking under realistic conditions, where spectral and temporal features can be varied independently. The results have implications for composing music with vocal parts, designing acoustic environments and creating a balance between speech perception and privacy in social settings. PMID:21768731

  10. Speech masking and cancelling and voice obscuration

    DOEpatents

    Holzrichter, John F.

    2013-09-10

    A non-acoustic sensor is used to measure a user's speech and then broadcasts an obscuring acoustic signal diminishing the user's vocal acoustic output intensity and/or distorting the voice sounds making them unintelligible to persons nearby. The non-acoustic sensor is positioned proximate or contacting a user's neck or head skin tissue for sensing speech production information.

  11. A Technique for Estimating Intensity of Emotional Expressions and Speaking Styles in Speech Based on Multiple-Regression HSMM

    NASA Astrophysics Data System (ADS)

    Nose, Takashi; Kobayashi, Takao

    In this paper, we propose a technique for estimating the degree or intensity of emotional expressions and speaking styles appearing in speech. The key idea is based on a style control technique for speech synthesis using a multiple regression hidden semi-Markov model (MRHSMM), and the proposed technique can be viewed as the inverse of the style control. In the proposed technique, the acoustic features of spectrum, power, fundamental frequency, and duration are simultaneously modeled using the MRHSMM. We derive an algorithm for estimating explanatory variables of the MRHSMM, each of which represents the degree or intensity of emotional expressions and speaking styles appearing in acoustic features of speech, based on a maximum likelihood criterion. We show experimental results to demonstrate the ability of the proposed technique using two types of speech data, simulated emotional speech and spontaneous speech with different speaking styles. It is found that the estimated values have correlation with human perception.

  12. Classification and clinicoradiologic features of primary progressive aphasia (PPA) and apraxia of speech.

    PubMed

    Botha, Hugo; Duffy, Joseph R; Whitwell, Jennifer L; Strand, Edythe A; Machulda, Mary M; Schwarz, Christopher G; Reid, Robert I; Spychalla, Anthony J; Senjem, Matthew L; Jones, David T; Lowe, Val; Jack, Clifford R; Josephs, Keith A

    2015-08-01

    The consensus criteria for the diagnosis and classification of primary progressive aphasia (PPA) have served as an important tool in studying this group of disorders. However, a large proportion of patients remain unclassifiable whilst others simultaneously meet criteria for multiple subtypes. We prospectively evaluated a large cohort of patients with degenerative aphasia and/or apraxia of speech using multidisciplinary clinical assessments and multimodal imaging. Blinded diagnoses were made using operational definitions with important differences compared to the consensus criteria. Of the 130 included patients, 40 were diagnosed with progressive apraxia of speech (PAOS), 12 with progressive agrammatic aphasia, 9 with semantic dementia, 52 with logopenic progressive aphasia, and 4 with progressive fluent aphasia, while 13 were unclassified. The PAOS and progressive fluent aphasia groups were least impaired. Performance on repetition and sentence comprehension was especially poor in the logopenic group. The semantic and progressive fluent aphasia groups had prominent anomia, but only semantic subjects had loss of word meaning and object knowledge. Distinct patterns of grey matter loss and white matter changes were found in all groups compared to controls. PAOS subjects had bilateral frontal grey matter loss, including the premotor and supplementary motor areas, and bilateral frontal white matter involvement. The agrammatic group had more widespread, predominantly left sided grey matter loss and white matter abnormalities. Semantic subjects had bitemporal grey matter loss and white matter changes, including the uncinate and inferior occipitofrontal fasciculi, whereas progressive fluent subjects only had left sided temporal involvement. Logopenic subjects had diffuse and bilateral grey matter loss and diffusion tensor abnormalities, maximal in the posterior temporal region. A diagnosis of logopenic aphasia was strongly associated with being amyloid positive (46

  13. Multilevel Analysis in Analyzing Speech Data

    ERIC Educational Resources Information Center

    Guddattu, Vasudeva; Krishna, Y.

    2011-01-01

    The speech produced by human vocal tract is a complex acoustic signal, with diverse applications in phonetics, speech synthesis, automatic speech recognition, speaker identification, communication aids, speech pathology, speech perception, machine translation, hearing research, rehabilitation and assessment of communication disorders and many…

  14. Production and perception of clear speech in Croatian and English

    NASA Astrophysics Data System (ADS)

    Smiljanić, Rajka; Bradlow, Ann R.

    2005-09-01

    Previous research has established that naturally produced English clear speech is more intelligible than English conversational speech. The major goal of this paper was to establish the presence of the clear speech effect in production and perception of a language other than English, namely Croatian. A systematic investigation of the conversational-to-clear speech transformations across languages with different phonological properties (e.g., large versus small vowel inventory) can provide a window into the interaction of general auditory-perceptual and phonological, structural factors that contribute to the high intelligibility of clear speech. The results of this study showed that naturally produced clear speech is a distinct, listener-oriented, intelligibility-enhancing mode of speech production in both languages. Furthermore, the acoustic-phonetic features of the conversational-to-clear speech transformation revealed cross-language similarities in clear speech production strategies. In both languages, talkers exhibited a decrease in speaking rate and an increase in pitch range, as well as an expansion of the vowel space. Notably, the findings of this study showed equivalent vowel space expansion in English and Croatian clear speech, despite the difference in vowel inventory size across the two languages, suggesting that the extent of vowel contrast enhancement in hyperarticulated clear speech is independent of vowel inventory size.

  15. Design Foundations for Content-Rich Acoustic Interfaces: Investigating Audemes as Referential Non-Speech Audio Cues

    ERIC Educational Resources Information Center

    Ferati, Mexhid Adem

    2012-01-01

    To access interactive systems, blind and visually impaired users can leverage their auditory senses by using non-speech sounds. The current structure of non-speech sounds, however, is geared toward conveying user interface operations (e.g., opening a file) rather than large theme-based information (e.g., a history passage) and, thus, is ill-suited…

  16. Robust Speech Rate Estimation for Spontaneous Speech

    PubMed Central

    Wang, Dagen; Narayanan, Shrikanth S.

    2010-01-01

    In this paper, we propose a direct method for speech rate estimation from acoustic features without requiring any automatic speech transcription. We compare various spectral and temporal signal analysis and smoothing strategies to better characterize the underlying syllable structure to derive speech rate. The proposed algorithm extends the methods of spectral subband correlation by including temporal correlation and the use of prominent spectral subbands for improving the signal correlation essential for syllable detection. Furthermore, to address some of the practical robustness issues in previously proposed methods, we introduce some novel components into the algorithm such as the use of pitch confidence for filtering spurious syllable envelope peaks, magnifying window for tackling neighboring syllable smearing, and relative peak measure thresholds for pseudo peak rejection. We also describe an automated approach for learning algorithm parameters from data, and find the optimal settings through Monte Carlo simulations and parameter sensitivity analysis. Final experimental evaluations are conducted based on a portion of the Switchboard corpus for which manual phonetic segmentation information, and published results for direct comparison are available. The results show a correlation coefficient of 0.745 with respect to the ground truth based on manual segmentation. This result is about a 17% improvement compared to the current best single estimator and a 11% improvement over the multiestimator evaluated on the same Switchboard database. PMID:20428476

  17. The Auditory-Brainstem Response to Continuous, Non-repetitive Speech Is Modulated by the Speech Envelope and Reflects Speech Processing

    PubMed Central

    Reichenbach, Chagit S.; Braiman, Chananel; Schiff, Nicholas D.; Hudspeth, A. J.; Reichenbach, Tobias

    2016-01-01

    The auditory-brainstem response (ABR) to short and simple acoustical signals is an important clinical tool used to diagnose the integrity of the brainstem. The ABR is also employed to investigate the auditory brainstem in a multitude of tasks related to hearing, such as processing speech or selectively focusing on one speaker in a noisy environment. Such research measures the response of the brainstem to short speech signals such as vowels or words. Because the voltage signal of the ABR has a tiny amplitude, several hundred to a thousand repetitions of the acoustic signal are needed to obtain a reliable response. The large number of repetitions poses a challenge to assessing cognitive functions due to neural adaptation. Here we show that continuous, non-repetitive speech, lasting several minutes, may be employed to measure the ABR. Because the speech is not repeated during the experiment, the precise temporal form of the ABR cannot be determined. We show, however, that important structural features of the ABR can nevertheless be inferred. In particular, the brainstem responds at the fundamental frequency of the speech signal, and this response is modulated by the envelope of the voiced parts of speech. We accordingly introduce a novel measure that assesses the ABR as modulated by the speech envelope, at the fundamental frequency of speech and at the characteristic latency of the response. This measure has a high signal-to-noise ratio and can hence be employed effectively to measure the ABR to continuous speech. We use this novel measure to show that the ABR is weaker to intelligible speech than to unintelligible, time-reversed speech. The methods presented here can be employed for further research on speech processing in the auditory brainstem and can lead to the development of future clinical diagnosis of brainstem function. PMID:27303286

  18. Acoustics

    NASA Technical Reports Server (NTRS)

    Goodman, Jerry R.; Grosveld, Ferdinand

    2007-01-01

    The acoustics environment in space operations is important to maintain at manageable levels so that the crewperson can remain safe, functional, effective, and reasonably comfortable. High acoustic levels can produce temporary or permanent hearing loss, or cause other physiological symptoms such as auditory pain, headaches, discomfort, strain in the vocal cords, or fatigue. Noise is defined as undesirable sound. Excessive noise may result in psychological effects such as irritability, inability to concentrate, decrease in productivity, annoyance, errors in judgment, and distraction. A noisy environment can also result in the inability to sleep, or sleep well. Elevated noise levels can affect the ability to communicate, understand what is being said, hear what is going on in the environment, degrade crew performance and operations, and create habitability concerns. Superfluous noise emissions can also create the inability to hear alarms or other important auditory cues such as an equipment malfunctioning. Recent space flight experience, evaluations of the requirements in crew habitable areas, and lessons learned (Goodman 2003; Allen and Goodman 2003; Pilkinton 2003; Grosveld et al. 2003) show the importance of maintaining an acceptable acoustics environment. This is best accomplished by having a high-quality set of limits/requirements early in the program, the "designing in" of acoustics in the development of hardware and systems, and by monitoring, testing and verifying the levels to ensure that they are acceptable.

  19. Differences in acoustic features of vocalizations produced by killer whales cross-socialized with bottlenose dolphins.

    PubMed

    Musser, Whitney B; Bowles, Ann E; Grebner, Dawn M; Crance, Jessica L

    2014-10-01

    Limited previous evidence suggests that killer whales (Orcinus orca) are capable of vocal production learning. However, vocal contextual learning has not been studied, nor the factors promoting learning. Vocalizations were collected from three killer whales with a history of exposure to bottlenose dolphins (Tursiops truncatus) and compared with data from seven killer whales held with conspecifics and nine bottlenose dolphins. The three whales' repertoires were distinguishable by a higher proportion of click trains and whistles. Time-domain features of click trains were intermediate between those of whales held with conspecifics and dolphins. These differences provided evidence for contextual learning. One killer whale spontaneously learned to produce artificial chirps taught to dolphins; acoustic features fell within the range of inter-individual differences among the dolphins. This whale also produced whistles similar to a stereotyped whistle produced by one dolphin. Thus, results provide further support for vocal production learning and show that killer whales are capable of contextual learning. That killer whales produce similar repertoires when associated with another species suggests substantial vocal plasticity and motivation for vocal conformity with social associates. PMID:25324098

  20. Psychoacoustic cues to emotion in speech prosody and music.

    PubMed

    Coutinho, Eduardo; Dibben, Nicola

    2013-01-01

    There is strong evidence of shared acoustic profiles common to the expression of emotions in music and speech, yet relatively limited understanding of the specific psychoacoustic features involved. This study combined a controlled experiment and computational modelling to investigate the perceptual codes associated with the expression of emotion in the acoustic domain. The empirical stage of the study provided continuous human ratings of emotions perceived in excerpts of film music and natural speech samples. The computational stage created a computer model that retrieves the relevant information from the acoustic stimuli and makes predictions about the emotional expressiveness of speech and music close to the responses of human subjects. We show that a significant part of the listeners' second-by-second reported emotions to music and speech prosody can be predicted from a set of seven psychoacoustic features: loudness, tempo/speech rate, melody/prosody contour, spectral centroid, spectral flux, sharpness, and roughness. The implications of these results are discussed in the context of cross-modal similarities in the communication of emotion in the acoustic domain. PMID:23057507

  1. Perception of Speech Features by French-Speaking Children with Cochlear Implants

    ERIC Educational Resources Information Center

    Bouton, Sophie; Serniclaes, Willy; Bertoncini, Josiane; Cole, Pascale

    2012-01-01

    Purpose: The present study investigates the perception of phonological features in French-speaking children with cochlear implants (CIs) compared with normal-hearing (NH) children matched for listening age. Method: Scores for discrimination and identification of minimal pairs for all features defining consonants (e.g., place, voicing, manner,…

  2. Smartphone Application for the Analysis of Prosodic Features in Running Speech with a Focus on Bipolar Disorders: System Performance Evaluation and Case Study.

    PubMed

    Guidi, Andrea; Salvi, Sergio; Ottaviano, Manuel; Gentili, Claudio; Bertschy, Gilles; de Rossi, Danilo; Scilingo, Enzo Pasquale; Vanello, Nicola

    2015-01-01

    Bipolar disorder is one of the most common mood disorders characterized by large and invalidating mood swings. Several projects focus on the development of decision support systems that monitor and advise patients, as well as clinicians. Voice monitoring and speech signal analysis can be exploited to reach this goal. In this study, an Android application was designed for analyzing running speech using a smartphone device. The application can record audio samples and estimate speech fundamental frequency, F0, and its changes. F0-related features are estimated locally on the smartphone, with some advantages with respect to remote processing approaches in terms of privacy protection and reduced upload costs. The raw features can be sent to a central server and further processed. The quality of the audio recordings, algorithm reliability and performance of the overall system were evaluated in terms of voiced segment detection and features estimation. The results demonstrate that mean F0 from each voiced segment can be reliably estimated, thus describing prosodic features across the speech sample. Instead, features related to F0 variability within each voiced segment performed poorly. A case study performed on a bipolar patient is presented. PMID:26561811

  3. Smartphone Application for the Analysis of Prosodic Features in Running Speech with a Focus on Bipolar Disorders: System Performance Evaluation and Case Study

    PubMed Central

    Guidi, Andrea; Salvi, Sergio; Ottaviano, Manuel; Gentili, Claudio; Bertschy, Gilles; de Rossi, Danilo; Scilingo, Enzo Pasquale; Vanello, Nicola

    2015-01-01

    Bipolar disorder is one of the most common mood disorders characterized by large and invalidating mood swings. Several projects focus on the development of decision support systems that monitor and advise patients, as well as clinicians. Voice monitoring and speech signal analysis can be exploited to reach this goal. In this study, an Android application was designed for analyzing running speech using a smartphone device. The application can record audio samples and estimate speech fundamental frequency, F0, and its changes. F0-related features are estimated locally on the smartphone, with some advantages with respect to remote processing approaches in terms of privacy protection and reduced upload costs. The raw features can be sent to a central server and further processed. The quality of the audio recordings, algorithm reliability and performance of the overall system were evaluated in terms of voiced segment detection and features estimation. The results demonstrate that mean F0 from each voiced segment can be reliably estimated, thus describing prosodic features across the speech sample. Instead, features related to F0 variability within each voiced segment performed poorly. A case study performed on a bipolar patient is presented. PMID:26561811

  4. Experimental investigation of the effects of the acoustical conditions in a simulated classroom on speech recognition and learning in children a

    PubMed Central

    Valente, Daniel L.; Plevinsky, Hallie M.; Franco, John M.; Heinrichs-Graham, Elizabeth C.; Lewis, Dawna E.

    2012-01-01

    The potential effects of acoustical environment on speech understanding are especially important as children enter school where students’ ability to hear and understand complex verbal information is critical to learning. However, this ability is compromised because of widely varied and unfavorable classroom acoustics. The extent to which unfavorable classroom acoustics affect children’s performance on longer learning tasks is largely unknown as most research has focused on testing children using words, syllables, or sentences as stimuli. In the current study, a simulated classroom environment was used to measure comprehension performance of two classroom learning activities: a discussion and lecture. Comprehension performance was measured for groups of elementary-aged students in one of four environments with varied reverberation times and background noise levels. The reverberation time was either 0.6 or 1.5 s, and the signal-to-noise level was either +10 or +7 dB. Performance is compared to adult subjects as well as to sentence-recognition in the same condition. Significant differences were seen in comprehension scores as a function of age and condition; both increasing background noise and reverberation degraded performance in comprehension tasks compared to minimal differences in measures of sentence-recognition. PMID:22280587

  5. Differential Diagnosis of Severe Speech Disorders Using Speech Gestures

    ERIC Educational Resources Information Center

    Bahr, Ruth Huntley

    2005-01-01

    The differentiation of childhood apraxia of speech from severe phonological disorder is a common clinical problem. This article reports on an attempt to describe speech errors in children with childhood apraxia of speech on the basis of gesture use and acoustic analyses of articulatory gestures. The focus was on the movement of articulators and…

  6. Prediction Method of Speech Recognition Performance Based on HMM-based Speech Synthesis Technique

    NASA Astrophysics Data System (ADS)

    Terashima, Ryuta; Yoshimura, Takayoshi; Wakita, Toshihiro; Tokuda, Keiichi; Kitamura, Tadashi

    We describe an efficient method that uses a HMM-based speech synthesis technique as a test pattern generator for evaluating the word recognition rate. The recognition rates of each word and speaker can be evaluated by the synthesized speech by using this method. The parameter generation technique can be formulated as an algorithm that can determine the speech parameter vector sequence O by maximizing P(O¦Q,λ) given the model parameter λ and the state sequence Q, under a dynamic acoustic feature constraint. We conducted recognition experiments to illustrate the validity of the method. Approximately 100 speakers were used to train the speaker dependent models for the speech synthesis used in these experiments, and the synthetic speech was generated as the test patterns for the target speech recognizer. As a result, the recognition rate of the HMM-based synthesized speech shows a good correlation with the recognition rate of the actual speech. Furthermore, we find that our method can predict the speaker recognition rate with approximately 2% error on average. Therefore the evaluation of the speaker recognition rate will be performed automatically by using the proposed method.

  7. A gearbox fault diagnosis scheme based on near-field acoustic holography and spatial distribution features of sound field

    NASA Astrophysics Data System (ADS)

    Lu, Wenbo; Jiang, Weikang; Yuan, Guoqing; Yan, Li

    2013-05-01

    Vibration signal analysis is the main technique in machine condition monitoring or fault diagnosis, whereas in some cases vibration-based diagnosis is restrained because of its contact measurement. Acoustic-based diagnosis (ABD) with non-contact measurement has received little attention, although sound field may contain abundant information related to fault pattern. A new scheme of ABD for gearbox based on near-field acoustic holography (NAH) and spatial distribution features of sound field is presented in this paper. It focuses on applying distribution information of sound field to gearbox fault diagnosis. A two-stage industrial helical gearbox is experimentally studied in a semi-anechoic chamber and a lab workshop, respectively. Firstly, multi-class faults (mild pitting, moderate pitting, severe pitting and tooth breakage) are simulated, respectively. Secondly, sound fields and corresponding acoustic images in different gearbox running conditions are obtained by fast Fourier transform (FFT) based NAH. Thirdly, by introducing texture analysis to fault diagnosis, spatial distribution features are extracted from acoustic images for capturing fault patterns underlying the sound field. Finally, the features are fed into multi-class support vector machine for fault pattern identification. The feasibility and effectiveness of our proposed scheme is demonstrated on the good experimental results and the comparison with traditional ABD method. Even with strong noise interference, spatial distribution features of sound field can reliably reveal the fault patterns of gearbox, and thus the satisfactory accuracy can be obtained. The combination of histogram features and gray level gradient co-occurrence matrix features is suggested for good diagnosis accuracy and low time cost.

  8. Measurement of acoustical characteristics of mosques in Saudi Arabia.

    PubMed

    Abdou, Adel A

    2003-03-01

    The study of mosque acoustics, with regard to acoustical characteristics, sound quality for speech intelligibility, and other applicable acoustic criteria, has been largely neglected. In this study a background as to why mosques are designed as they are and how mosque design is influenced by worship considerations is given. In the study the acoustical characteristics of typically constructed contemporary mosques in Saudi Arabia have been investigated, employing a well-known impulse response. Extensive field measurements were taken in 21 representative mosques of different sizes and architectural features in order to characterize their acoustical quality and to identify the impact of air conditioning, ceiling fans, and sound reinforcement systems on their acoustics. Objective room-acoustic indicators such as reverberation time (RT) and clarity (C50) were measured. Background noise (BN) was assessed with and without the operation of air conditioning and fans. The speech transmission index (STI) was also evaluated with and without the operation of existing sound reinforcement systems. The existence of acoustical deficiencies was confirmed and quantified. The study, in addition to describing mosque acoustics, compares design goals to results obtained in practice and suggests acoustical target values for mosque design. The results show that acoustical quality in the investigated mosques deviates from optimum conditions when unoccupied, but is much better in the occupied condition. PMID:12656385

  9. Measurement of acoustical characteristics of mosques in Saudi Arabia

    NASA Astrophysics Data System (ADS)

    Abdou, Adel A.

    2003-03-01

    The study of mosque acoustics, with regard to acoustical characteristics, sound quality for speech intelligibility, and other applicable acoustic criteria, has been largely neglected. In this study a background as to why mosques are designed as they are and how mosque design is influenced by worship considerations is given. In the study the acoustical characteristics of typically constructed contemporary mosques in Saudi Arabia have been investigated, employing a well-known impulse response. Extensive field measurements were taken in 21 representative mosques of different sizes and architectural features in order to characterize their acoustical quality and to identify the impact of air conditioning, ceiling fans, and sound reinforcement systems on their acoustics. Objective room-acoustic indicators such as reverberation time (RT) and clarity (C50) were measured. Background noise (BN) was assessed with and without the operation of air conditioning and fans. The speech transmission index (STI) was also evaluated with and without the operation of existing sound reinforcement systems. The existence of acoustical deficiencies was confirmed and quantified. The study, in addition to describing mosque acoustics, compares design goals to results obtained in practice and suggests acoustical target values for mosque design. The results show that acoustical quality in the investigated mosques deviates from optimum conditions when unoccupied, but is much better in the occupied condition.

  10. Effect of Acoustic Spectrographic Instruction on Production of English /i/ and /I/ by Spanish Pre-Service English Teachers

    ERIC Educational Resources Information Center

    Quintana-Lara, Marcela

    2014-01-01

    This study investigates the effects of Acoustic Spectrographic Instruction on the production of the English phonological contrast /i/ and / I /. Acoustic Spectrographic Instruction is based on the assumption that physical representations of speech sounds and spectrography allow learners to objectively see and modify those non-accurate features in…

  11. ON THE NATURE OF SPEECH SCIENCE.

    ERIC Educational Resources Information Center

    PETERSON, GORDON E.

    IN THIS ARTICLE THE NATURE OF THE DISCIPLINE OF SPEECH SCIENCE IS CONSIDERED AND THE VARIOUS BASIC AND APPLIED AREAS OF THE DISCIPLINE ARE DISCUSSED. THE BASIC AREAS ENCOMPASS THE VARIOUS PROCESSES OF THE PHYSIOLOGY OF SPEECH PRODUCTION, THE ACOUSTICAL CHARACTERISTICS OF SPEECH, INCLUDING THE SPEECH WAVE TYPES AND THE INFORMATION-BEARING ACOUSTIC…

  12. Is Birdsong More Like Speech or Music?

    PubMed

    Shannon, Robert V

    2016-04-01

    Music and speech share many acoustic cues but not all are equally important. For example, harmonic pitch is essential for music but not for speech. When birds communicate is their song more like speech or music? A new study contrasting pitch and spectral patterns shows that birds perceive their song more like humans perceive speech. PMID:26944220

  13. Emotional Communication in Speech and Music: The Role of Melodic and Rhythmic Contrasts

    PubMed Central

    Quinto, Lena; Thompson, William Forde; Keating, Felicity Louise

    2013-01-01

    Many acoustic features convey emotion similarly in speech and music. Researchers have established that acoustic features such as pitch height, tempo, and intensity carry important emotional information in both domains. In this investigation, we examined the emotional significance of melodic and rhythmic contrasts between successive syllables or tones in speech and music, referred to as Melodic Interval Variability (MIV) and the normalized Pairwise Variability Index (nPVI). The spoken stimuli were 96 tokens expressing the emotions of irritation, fear, happiness, sadness, tenderness, or no emotion. The music stimuli were 96 phrases, played with or without performance expression and composed with the intention of communicating the same emotions. Results showed that nPVI, but not MIV, operates similarly in music and speech. Spoken stimuli, but not musical stimuli, were characterized by changes in MIV as a function of intended emotion. The results suggest that these measures may signal emotional intentions differently in speech and music. PMID:23630507

  14. A study of acoustic-to-articulatory inversion of speech by analysis-by-synthesis using chain matrices and the Maeda articulatory model

    PubMed Central

    Panchapagesan, Sankaran; Alwan, Abeer

    2011-01-01

    In this paper, a quantitative study of acoustic-to-articulatory inversion for vowel speech sounds by analysis-by-synthesis using the Maeda articulatory model is performed. For chain matrix calculation of vocal tract (VT) acoustics, the chain matrix derivatives with respect to area function are calculated and used in a quasi-Newton method for optimizing articulatory trajectories. The cost function includes a distance measure between natural and synthesized first three formants, and parameter regularization and continuity terms. Calibration of the Maeda model to two speakers, one male and one female, from the University of Wisconsin x-ray microbeam (XRMB) database, using a cost function, is discussed. Model adaptation includes scaling the overall VT and the pharyngeal region and modifying the outer VT outline using measured palate and pharyngeal traces. The inversion optimization is initialized by a fast search of an articulatory codebook, which was pruned using XRMB data to improve inversion results. Good agreement between estimated midsagittal VT outlines and measured XRMB tongue pellet positions was achieved for several vowels and diphthongs for the male speaker, with average pellet-VT outline distances around 0.15 cm, smooth articulatory trajectories, and less than 1% average error in the first three formants. PMID:21476670

  15. Maternal depression and the learning-promoting effects of infant-directed speech: Roles of maternal sensitivity, depression diagnosis, and speech acoustic cues.

    PubMed

    Kaplan, Peter S; Danko, Christina M; Cejka, Anna M; Everhart, Kevin D

    2015-11-01

    The hypothesis that the associative learning-promoting effects of infant-directed speech (IDS) depend on infants' social experience was tested in a conditioned-attention paradigm with a cumulative sample of 4- to 14-month-old infants. Following six forward pairings of a brief IDS segment and a photographic slide of a smiling female face, infants of clinically depressed mothers exhibited evidence of having acquired significantly weaker voice-face associations than infants of non-depressed mothers. Regression analyses revealed that maternal depression was significantly related to infant learning even after demographic correlates of depression, antidepressant medication use, and extent of pitch modulation in maternal IDS had been taken into account. However, after maternal depression had been accounted for, maternal emotional availability, coded by blind raters from separate play interactions, accounted for significant further increments in the proportion of variance accounted for in infant learning scores. Both maternal depression and maternal insensitivity negatively, and additively, predicted poor learning. PMID:26311468

  16. Low-Frequency Cortical Entrainment to Speech Reflects Phoneme-Level Processing.

    PubMed

    Di Liberto, Giovanni M; O'Sullivan, James A; Lalor, Edmund C

    2015-10-01

    The human ability to understand speech is underpinned by a hierarchical auditory system whose successive stages process increasingly complex attributes of the acoustic input. It has been suggested that to produce categorical speech perception, this system must elicit consistent neural responses to speech tokens (e.g., phonemes) despite variations in their acoustics. Here, using electroencephalography (EEG), we provide evidence for this categorical phoneme-level speech processing by showing that the relationship between continuous speech and neural activity is best described when that speech is represented using both low-level spectrotemporal information and categorical labeling of phonetic features. Furthermore, the mapping between phonemes and EEG becomes more discriminative for phonetic features at longer latencies, in line with what one might expect from a hierarchical system. Importantly, these effects are not seen for time-reversed speech. These findings may form the basis for future research on natural language processing in specific cohorts of interest and for broader insights into how brains transform acoustic input into meaning. PMID:26412129

  17. Room Acoustics

    NASA Astrophysics Data System (ADS)

    Kuttruff, Heinrich; Mommertz, Eckard

    The traditional task of room acoustics is to create or formulate conditions which ensure the best possible propagation of sound in a room from a sound source to a listener. Thus, objects of room acoustics are in particular assembly halls of all kinds, such as auditoria and lecture halls, conference rooms, theaters, concert halls or churches. Already at this point, it has to be pointed out that these conditions essentially depend on the question if speech or music should be transmitted; in the first case, the criterion for transmission quality is good speech intelligibility, in the other case, however, the success of room-acoustical efforts depends on other factors that cannot be quantified that easily, not least it also depends on the hearing habits of the listeners. In any case, absolutely "good acoustics" of a room do not exist.

  18. Two-microphone spatial filtering provides speech reception benefits for cochlear implant users in difficult acoustic environments.

    PubMed

    Goldsworthy, Raymond L; Delhorne, Lorraine A; Desloge, Joseph G; Braida, Louis D

    2014-08-01

    This article introduces and provides an assessment of a spatial-filtering algorithm based on two closely-spaced (∼1 cm) microphones in a behind-the-ear shell. The evaluated spatial-filtering algorithm used fast (∼10 ms) temporal-spectral analysis to determine the location of incoming sounds and to enhance sounds arriving from straight ahead of the listener. Speech reception thresholds (SRTs) were measured for eight cochlear implant (CI) users using consonant and vowel materials under three processing conditions: An omni-directional response, a dipole-directional response, and the spatial-filtering algorithm. The background noise condition used three simultaneous time-reversed speech signals as interferers located at 90°, 180°, and 270°. Results indicated that the spatial-filtering algorithm can provide speech reception benefits of 5.8 to 10.7 dB SRT compared to an omni-directional response in a reverberant room with multiple noise sources. Given the observed SRT benefits, coupled with an efficient design, the proposed algorithm is promising as a CI noise-reduction solution. PMID:25096120

  19. The neural basis of sublexical speech and corresponding nonspeech processing: a combined EEG-MEG study.

    PubMed

    Kuuluvainen, Soila; Nevalainen, Päivi; Sorokin, Alexander; Mittag, Maria; Partanen, Eino; Putkinen, Vesa; Seppänen, Miia; Kähkönen, Seppo; Kujala, Teija

    2014-03-01

    We addressed the neural organization of speech versus nonspeech sound processing by investigating preattentive cortical auditory processing of changes in five features of a consonant-vowel syllable (consonant, vowel, sound duration, frequency, and intensity) and their acoustically matched nonspeech counterparts in a simultaneous EEG-MEG recording of mismatch negativity (MMN/MMNm). Overall, speech-sound processing was enhanced compared to nonspeech sound processing. This effect was strongest for changes which affect word meaning (consonant, vowel, and vowel duration) in the left and for the vowel identity change in the right hemisphere also. Furthermore, in the right hemisphere, speech-sound frequency and intensity changes were processed faster than their nonspeech counterparts, and there was a trend for speech-enhancement in frequency processing. In summary, the results support the proposed existence of long-term memory traces for speech sounds in the auditory cortices, and indicate at least partly distinct neural substrates for speech and nonspeech sound processing. PMID:24576806

  20. Investigations of High Pressure Acoustic Waves in Resonators with Seal-Like Features

    NASA Technical Reports Server (NTRS)

    Daniels, Christopher C.; Steinetz, Bruce M.; Finkbeiner, Joshua R.; Li, Xiao-Fan; Raman, Ganesh

    2004-01-01

    1) Standing waves with maximum pressures of 188 kPa have been produced in resonators containing ambient pressure air; 2) Addition of structures inside the resonator shifts the fundamental frequency and decreases the amplitude of the generated pressure waves; 3) Addition of holes to the resonator does reduce the magnitude of the acoustic waves produced, but their addition does not prohibit the generation of large magnitude non-linear standing waves; 4) The feasibility of reducing leakage using non-linear acoustics has been confirmed.

  1. Advances in speech processing

    NASA Astrophysics Data System (ADS)

    Ince, A. Nejat

    1992-10-01

    The field of speech processing is undergoing a rapid growth in terms of both performance and applications and this is fueled by the advances being made in the areas of microelectronics, computation, and algorithm design. The use of voice for civil and military communications is discussed considering advantages and disadvantages including the effects of environmental factors such as acoustic and electrical noise and interference and propagation. The structure of the existing NATO communications network and the evolving Integrated Services Digital Network (ISDN) concept are briefly reviewed to show how they meet the present and future requirements. The paper then deals with the fundamental subject of speech coding and compression. Recent advances in techniques and algorithms for speech coding now permit high quality voice reproduction at remarkably low bit rates. The subject of speech synthesis is next treated where the principle objective is to produce natural quality synthetic speech from unrestricted text input. Speech recognition where the ultimate objective is to produce a machine which would understand conversational speech with unrestricted vocabulary, from essentially any talker, is discussed. Algorithms for speech recognition can be characterized broadly as pattern recognition approaches and acoustic phonetic approaches. To date, the greatest degree of success in speech recognition has been obtained using pattern recognition paradigms. It is for this reason that the paper is concerned primarily with this technique.

  2. Isolating Neural Indices of Continuous Speech Processing at the Phonetic Level.

    PubMed

    Di Liberto, Giovanni M; Lalor, Edmund C

    2016-01-01

    The human ability to understand speech across an enormous range of listening conditions is underpinned by a hierarchical auditory processing system whose successive stages process increasingly complex attributes of the acoustic input. In order to produce a categorical perception of words and phonemes, it has been suggested that, while earlier areas of the auditory system undoubtedly respond to acoustic differences in speech tokens, later areas must exhibit consistent neural responses to those tokens. Neural indices of such hierarchical processing in the context of continuous speech have been identified using low-frequency scalp-recorded electroencephalography (EEG) data. The relationship between continuous speech and its associated neural responses has been shown to be best described when that speech is represented using both its low-level spectrotemporal information and also the categorical labelling of its phonetic features (Di Liberto et al., Curr Biol 25(19):2457-2465, 2015). While the phonetic features have been proven to carry extra-information not captured by the speech spectrotemporal representation, the causes of this EEG activity remain unclear. This study aims to demonstrate a framework for examining speech-specific processing and for disentangling high-level neural activity related to intelligibility from low-level activity in response to spectrotemporal fluctuations of speech. Preliminary results suggest that neural measure of processing at the phonetic level can be isolated. PMID:27080674

  3. Tutorial on architectural acoustics

    NASA Astrophysics Data System (ADS)

    Shaw, Neil; Talaske, Rick; Bistafa, Sylvio

    2002-11-01

    This tutorial is intended to provide an overview of current knowledge and practice in architectural acoustics. Topics covered will include basic concepts and history, acoustics of small rooms (small rooms for speech such as classrooms and meeting rooms, music studios, small critical listening spaces such as home theatres) and the acoustics of large rooms (larger assembly halls, auditoria, and performance halls).

  4. Segmenting Words from Natural Speech: Subsegmental Variation in Segmental Cues

    ERIC Educational Resources Information Center

    Rytting, C. Anton; Brew, Chris; Fosler-Lussier, Eric

    2010-01-01

    Most computational models of word segmentation are trained and tested on transcripts of speech, rather than the speech itself, and assume that speech is converted into a sequence of symbols prior to word segmentation. We present a way of representing speech corpora that avoids this assumption, and preserves acoustic variation present in speech. We…

  5. Emotional recognition from the speech signal for a virtual education agent

    NASA Astrophysics Data System (ADS)

    Tickle, A.; Raghu, S.; Elshaw, M.

    2013-06-01

    This paper explores the extraction of features from the speech wave to perform intelligent emotion recognition. A feature extract tool (openSmile) was used to obtain a baseline set of 998 acoustic features from a set of emotional speech recordings from a microphone. The initial features were reduced to the most important ones so recognition of emotions using a supervised neural network could be performed. Given that the future use of virtual education agents lies with making the agents more interactive, developing agents with the capability to recognise and adapt to the emotional state of humans is an important step.

  6. Speech Acquisition and Automatic Speech Recognition for Integrated Spacesuit Audio Systems

    NASA Technical Reports Server (NTRS)

    Huang, Yiteng; Chen, Jingdong; Chen, Shaoyan

    2010-01-01

    A voice-command human-machine interface system has been developed for spacesuit extravehicular activity (EVA) missions. A multichannel acoustic signal processing method has been created for distant speech acquisition in noisy and reverberant environments. This technology reduces noise by exploiting differences in the statistical nature of signal (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, the automatic speech recognition (ASR) accuracy can be improved to the level at which crewmembers would find the speech interface useful. The developed speech human/machine interface will enable both crewmember usability and operational efficiency. It can enjoy a fast rate of data/text entry, small overall size, and can be lightweight. In addition, this design will free the hands and eyes of a suited crewmember. The system components and steps include beam forming/multi-channel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, model adaption, ASR HMM (Hidden Markov Model) training, and ASR decoding. A state-of-the-art phoneme recognizer can obtain an accuracy rate of 65 percent when the training and testing data are free of noise. When it is used in spacesuits, the rate drops to about 33 percent. With the developed microphone array speech-processing technologies, the performance is improved and the phoneme recognition accuracy rate rises to 44 percent. The recognizer can be further improved by combining the microphone array and HMM model adaptation techniques and using speech samples collected from inside spacesuits. In addition, arithmetic complexity models for the major HMMbased ASR components were developed. They can help real-time ASR system designers select proper tasks when in the face of constraints in computational resources.

  7. Improved speech inversion using general regression neural network.

    PubMed

    Najnin, Shamima; Banerjee, Bonny

    2015-09-01

    The problem of nonlinear acoustic to articulatory inversion mapping is investigated in the feature space using two models, the deep belief network (DBN) which is the state-of-the-art, and the general regression neural network (GRNN). The task is to estimate a set of articulatory features for improved speech recognition. Experiments with MOCHA-TIMIT and MNGU0 databases reveal that, for speech inversion, GRNN yields a lower root-mean-square error and a higher correlation than DBN. It is also shown that conjunction of acoustic and GRNN-estimated articulatory features yields state-of-the-art accuracy in broad class phonetic classification and phoneme recognition using less computational power. PMID:26428818

  8. Detection of prosodics by using a speech recognition system

    NASA Astrophysics Data System (ADS)

    Hupp, N. A.

    1991-07-01

    The problem was to determine the ability of a speech recognizer to extract prosodic speech features, such as pitch and stress, and to examine these features for application to future voice recognition systems. The Speech Systems Incorporated (SSI) speech recognizer demonstrated that it could detect prosodic features and that these features do indicate the word and/or syllable that is stressed by the speaker. The research examined the effect of prosodics, such as pitch, amplitude, and duration, on word and syllable stress by using the SSI. Subjects read phases and sentences, using a given intonation and stress. The three sections of the experiment compared questions and answers, words stressed within a sentence, and noun/verb pairs, such as object and subject. The results were analyzed both on the syllable level and the word level. In all cases, there was a significant increase in pitch, amplitude, and duration when comparing stressed words and syllables to unstressed words and syllables. When comparing unstressed words only, it was also noted that the first word in a sentence has an increase in pitch, amplitude, and duration. The threshold could be set in recognition systems for each of these parameters. Current speech recognizers do not use acoustic data above the word level. This research shows that we have the capability of developing better speech systems by incorporating prosodics with new linguistic software.

  9. Advancements in robust algorithm formulation for speaker identification of whispered speech

    NASA Astrophysics Data System (ADS)

    Fan, Xing

    Whispered speech is an alternative speech production mode from neutral speech, which is used by talkers intentionally in natural conversational scenarios to protect privacy and to avoid certain content from being overheard/made public. Due to the profound differences between whispered and neutral speech in production mechanism and the absence of whispered adaptation data, the performance of speaker identification systems trained with neutral speech degrades significantly. This dissertation therefore focuses on developing a robust closed-set speaker recognition system for whispered speech by using no or limited whispered adaptation data from non-target speakers. This dissertation proposes the concept of "High''/"Low'' performance whispered data for the purpose of speaker identification. A variety of acoustic properties are identified that contribute to the quality of whispered data. An acoustic analysis is also conducted to compare the phoneme/speaker dependency of the differences between whispered and neutral data in the feature domain. The observations from those acoustic analysis are new in this area and also serve as a guidance for developing robust speaker identification systems for whispered speech. This dissertation further proposes two systems for speaker identification of whispered speech. One system focuses on front-end processing. A two-dimensional feature space is proposed to search for "Low''-quality performance based whispered utterances and separate feature mapping functions are applied to vowels and consonants respectively in order to retain the speaker's information shared between whispered and neutral speech. The other system focuses on speech-mode-independent model training. The proposed method generates pseudo whispered features from neutral features by using the statistical information contained in a whispered Universal Background model (UBM) trained from extra collected whispered data from non-target speakers. Four modeling methods are proposed

  10. Acoustic and Perceptual Measurement of Expressive Prosody in High-Functioning Autism: Increased Pitch Range and What it Means to Listeners

    ERIC Educational Resources Information Center

    Nadig, Aparna; Shaw, Holly

    2012-01-01

    Are there consistent markers of atypical prosody in speakers with high functioning autism (HFA) compared to typically-developing speakers? We examined: (1) acoustic measurements of pitch range, mean pitch and speech rate in conversation, (2) perceptual ratings of conversation for these features and overall prosody, and (3) acoustic measurements of…

  11. Processing changes when listening to foreign-accented speech.

    PubMed

    Romero-Rivas, Carlos; Martin, Clara D; Costa, Albert

    2015-01-01

    This study investigates the mechanisms responsible for fast changes in processing foreign-accented speech. Event Related brain Potentials (ERPs) were obtained while native speakers of Spanish listened to native and foreign-accented speakers of Spanish. We observed a less positive P200 component for foreign-accented speech relative to native speech comprehension. This suggests that the extraction of spectral information and other important acoustic features was hampered during foreign-accented speech comprehension. However, the amplitude of the N400 component for foreign-accented speech comprehension decreased across the experiment, suggesting the use of a higher level, lexical mechanism. Furthermore, during native speech comprehension, semantic violations in the critical words elicited an N400 effect followed by a late positivity. During foreign-accented speech comprehension, semantic violations only elicited an N400 effect. Overall, our results suggest that, despite a lack of improvement in phonetic discrimination, native listeners experience changes at lexical-semantic levels of processing after brief exposure to foreign-accented speech. Moreover, these results suggest that lexical access, semantic integration and linguistic re-analysis processes are permeable to external factors, such as the accent of the speaker. PMID:25859209

  12. Processing changes when listening to foreign-accented speech

    PubMed Central

    Romero-Rivas, Carlos; Martin, Clara D.; Costa, Albert

    2015-01-01

    This study investigates the mechanisms responsible for fast changes in processing foreign-accented speech. Event Related brain Potentials (ERPs) were obtained while native speakers of Spanish listened to native and foreign-accented speakers of Spanish. We observed a less positive P200 component for foreign-accented speech relative to native speech comprehension. This suggests that the extraction of spectral information and other important acoustic features was hampered during foreign-accented speech comprehension. However, the amplitude of the N400 component for foreign-accented speech comprehension decreased across the experiment, suggesting the use of a higher level, lexical mechanism. Furthermore, during native speech comprehension, semantic violations in the critical words elicited an N400 effect followed by a late positivity. During foreign-accented speech comprehension, semantic violations only elicited an N400 effect. Overall, our results suggest that, despite a lack of improvement in phonetic discrimination, native listeners experience changes at lexical-semantic levels of processing after brief exposure to foreign-accented speech. Moreover, these results suggest that lexical access, semantic integration and linguistic re-analysis processes are permeable to external factors, such as the accent of the speaker. PMID:25859209

  13. Benefits to Speech Perception in Noise From the Binaural Integration of Electric and Acoustic Signals in Simulated Unilateral Deafness

    PubMed Central

    Ma, Ning; Morris, Saffron; Kitterick, Pádraig Thomas

    2016-01-01

    Objectives: This study used vocoder simulations with normal-hearing (NH) listeners to (1) measure their ability to integrate speech information from an NH ear and a simulated cochlear implant (CI), and (2) investigate whether binaural integration is disrupted by a mismatch in the delivery of spectral information between the ears arising from a misalignment in the mapping of frequency to place. Design: Eight NH volunteers participated in the study and listened to sentences embedded in background noise via headphones. Stimuli presented to the left ear were unprocessed. Stimuli presented to the right ear (referred to as the CI-simulation ear) were processed using an eight-channel noise vocoder with one of the three processing strategies. An Ideal strategy simulated a frequency-to-place map across all channels that matched the delivery of spectral information between the ears. A Realistic strategy created a misalignment in the mapping of frequency to place in the CI-simulation ear where the size of the mismatch between the ears varied across channels. Finally, a Shifted strategy imposed a similar degree of misalignment in all channels, resulting in consistent mismatch between the ears across frequency. The ability to report key words in sentences was assessed under monaural and binaural listening conditions and at signal to noise ratios (SNRs) established by estimating speech-reception thresholds in each ear alone. The SNRs ensured that the monaural performance of the left ear never exceeded that of the CI-simulation ear. The advantages of binaural integration were calculated by comparing binaural performance with monaural performance using the CI-simulation ear alone. Thus, these advantages reflected the additional use of the experimentally constrained left ear and were not attributable to better-ear listening. Results: Binaural performance was as accurate as, or more accurate than, monaural performance with the CI-simulation ear alone. When both ears supported a

  14. Acoustic constituents of prosodic typology

    NASA Astrophysics Data System (ADS)

    Komatsu, Masahiko

    Different languages sound different, and considerable part of it derives from the typological difference of prosody. Although such difference is often referred to as lexical accent types (stress accent, pitch accent, and tone; e.g. English, Japanese, and Chinese respectively) and rhythm types (stress-, syllable-, and mora-timed rhythms; e.g. English, Spanish, and Japanese respectively), it is unclear whether these types are determined in terms of acoustic properties, The thesis intends to provide a potential basis for the description of prosody in terms of acoustics. It argues for the hypothesis that the source component of the source-filter model (acoustic features) approximately corresponds to prosody (linguistic features) through several experimental-phonetic studies. The study consists of four parts. (1) Preliminary experiment: Perceptual language identification tests were performed using English and Japanese speech samples whose frequency spectral information (i.e. non-source component) is heavily reduced. The results indicated that humans can discriminate languages with such signals. (2) Discussion on the linguistic information that the source component contains: This part constitutes the foundation of the argument of the thesis. Perception tests of consonants with the source signal indicated that the source component carries the information on broad categories of phonemes that contributes to the creation of rhythm. (3) Acoustic analysis: The speech samples of Chinese, English, Japanese, and Spanish, differing in prosodic types, were analyzed. These languages showed difference in acoustic characteristics of the source component. (4) Perceptual experiment: A language identification test for the above four languages was performed using the source signal with its acoustic features parameterized. It revealed that humans can discriminate prosodic types solely with the source features and that the discrimination is easier as acoustic information increases. The

  15. Speech perception and production in severe environments

    NASA Astrophysics Data System (ADS)

    Pisoni, David B.

    1990-09-01

    The goal was to acquire new knowledge about speech perception and production in severe environments such as high masking noise, increased cognitive load or sustained attentional demands. Changes were examined in speech production under these adverse conditions through acoustic analysis techniques. One set of studies focused on the effects of noise on speech production. The experiments in this group were designed to generate a database of speech obtained in noise and in quiet. A second set of experiments was designed to examine the effects of cognitive load on the acoustic-phonetic properties of speech. Talkers were required to carry out a demanding perceptual motor task while they read lists of test words. A final set of experiments explored the effects of vocal fatigue on the acoustic-phonetic properties of speech. Both cognitive load and vocal fatigue are present in many applications where speech recognition technology is used, yet their influence on speech production is poorly understood.

  16. Somatosensory basis of speech production.

    PubMed

    Tremblay, Stéphanie; Shiller, Douglas M; Ostry, David J

    2003-06-19

    The hypothesis that speech goals are defined acoustically and maintained by auditory feedback is a central idea in speech production research. An alternative proposal is that speech production is organized in terms of control signals that subserve movements and associated vocal-tract configurations. Indeed, the capacity for intelligible speech by deaf speakers suggests that somatosensory inputs related to movement play a role in speech production-but studies that might have documented a somatosensory component have been equivocal. For example, mechanical perturbations that have altered somatosensory feedback have simultaneously altered acoustics. Hence, any adaptation observed under these conditions may have been a consequence of acoustic change. Here we show that somatosensory information on its own is fundamental to the achievement of speech movements. This demonstration involves a dissociation of somatosensory and auditory feedback during speech production. Over time, subjects correct for the effects of a complex mechanical load that alters jaw movements (and hence somatosensory feedback), but which has no measurable or perceptible effect on acoustic output. The findings indicate that the positions of speech articulators and associated somatosensory inputs constitute a goal of speech movements that is wholly separate from the sounds produced. PMID:12815431

  17. Speech Research

    NASA Astrophysics Data System (ADS)

    Several articles addressing topics in speech research are presented. The topics include: exploring the functional significance of physiological tremor: A biospectroscopic approach; differences between experienced and inexperienced listeners to deaf speech; a language-oriented view of reading and its disabilities; Phonetic factors in letter detection; categorical perception; Short-term recall by deaf signers of American sign language; a common basis for auditory sensory storage in perception and immediate memory; phonological awareness and verbal short-term memory; initiation versus execution time during manual and oral counting by stutterers; trading relations in the perception of speech by five-year-old children; the role of the strap muscles in pitch lowering; phonetic validation of distinctive features; consonants and syllable boundaires; and vowel information in postvocalic frictions.

  18. Age-related changes in prosodic features of maternal speech to prelingually deaf infants with cochlear implants

    PubMed Central

    Kondaurova, Maria V.; Bergeson, Tonya R.; Xu, Huipuing

    2012-01-01

    This study investigated prosodic and structural characteristics of infant-directed speech to hearing-impaired infants as they gain hearing experience with a cochlear implant over a 12-month period of time. Mothers were recorded during a play interaction with their HI infants (N = 27, mean age 18.4 months) at 3, 6, and 12 months post-implantation. Two separate control groups of mothers with age-matched normal-hearing infants (NH-AM) (N = 21, mean age 18.1 months) and hearing experience-matched normal-hearing infants (NH-EM) (N = 24, mean age 3.1 months) were recorded at three testing sessions. Mothers produced less exaggerated pitch characteristics, a larger number of syllables per utterance, and faster speaking rate when interacting with NH-AM as compared to HI infants. Mothers also produced more syllables and demonstrated a trend suggesting faster speaking rate in speech to NH-EM relative to HI infants. Age-related modifications included decreased pitch standard deviation and increased number of syllables in speech to NH-AM infants and increased number of syllables in speech to HI and NH-EM infants across the 12-month period. These results suggest that mothers are sensitive to the hearing status of their infants and modify characteristics of infant-direct speech over time. PMID:24244108

  19. General American Speech and Phonic Symbols.

    ERIC Educational Resources Information Center

    Calvert, Donald R.

    1982-01-01

    General American Symbols, speech and phonic symbols adapted from the Northampton symbols, are presented as a simplified system for teaching reading and speech to deaf children. Ways to use symbols for indicating features of speech production are suggested. (Author)

  20. The vocal repertoire of the domesticated zebra finch: a data-driven approach to decipher the information-bearing acoustic features of communication signals.

    PubMed

    Elie, Julie E; Theunissen, Frédéric E

    2016-03-01

    Although a universal code for the acoustic features of animal vocal communication calls may not exist, the thorough analysis of the distinctive acoustical features of vocalization categories is important not only to decipher the acoustical code for a specific species but also to understand the evolution of communication signals and the mechanisms used to produce and understand them. Here, we recorded more than 8000 examples of almost all the vocalizations of the domesticated zebra finch, Taeniopygia guttata: vocalizations produced to establish contact, to form and maintain pair bonds, to sound an alarm, to communicate distress or to advertise hunger or aggressive intents. We characterized each vocalization type using complete representations that avoided any a priori assumptions on the acoustic code, as well as classical bioacoustics measures that could provide more intuitive interpretations. We then used these acoustical features to rigorously determine the potential information-bearing acoustical features for each vocalization type using both a novel regularized classifier and an unsupervised clustering algorithm. Vocalization categories are discriminated by the shape of their frequency spectrum and by their pitch saliency (noisy to tonal vocalizations) but not particularly by their fundamental frequency. Notably, the spectral shape of zebra finch vocalizations contains peaks or formants that vary systematically across categories and that would be generated by active control of both the vocal organ (source) and the upper vocal tract (filter). PMID:26581377

  1. Simulation study and guidelines to generate Laser-induced Surface Acoustic Waves for human skin feature detection

    NASA Astrophysics Data System (ADS)

    Li, Tingting; Fu, Xing; Chen, Kun; Dorantes-Gonzalez, Dante J.; Li, Yanning; Wu, Sen; Hu, Xiaotang

    2015-12-01

    Despite the seriously increasing number of people contracting skin cancer every year, limited attention has been given to the investigation of human skin tissues. To this regard, Laser-induced Surface Acoustic Wave (LSAW) technology, with its accurate, non-invasive and rapid testing characteristics, has recently shown promising results in biological and biomedical tissues. In order to improve the measurement accuracy and efficiency of detecting important features in highly opaque and soft surfaces such as human skin, this paper identifies the most important parameters of a pulse laser source, as well as provides practical guidelines to recommended proper ranges to generate Surface Acoustic Waves (SAWs) for characterization purposes. Considering that melanoma is a serious type of skin cancer, we conducted a finite element simulation-based research on the generation and propagation of surface waves in human skin containing a melanoma-like feature, determine best pulse laser parameter ranges of variation, simulation mesh size and time step, working bandwidth, and minimal size of detectable melanoma.

  2. Features of Propagation of the Acoustic-Gravity Waves Generated by High-Power Periodic Radiation

    NASA Astrophysics Data System (ADS)

    Chernogor, L. F.; Frolov, V. L.

    2013-09-01

    We present the results of the bandpass filtering of temporal variations of the Doppler frequency shift of radio signals from a vertical-sounding Doppler radar located near the city of Kharkov when the ionosphere was heated by high-power periodic (with 10 and 15-min periods) radiation from the Sura facility. The filtering was done in the ranges of periods that are close to the acoustic cutoff period and the Brunt—Väisälä period (4-6, 8-12, and 13-17 min). Oscillations with periods of 4-6 min and amplitudes of 50-100 mHz were not recorded in fact. Oscillations with periods of 8-12 and 13-17 min and amplitudes of 60-100 mHz were detected in almost all the sessions. In the former and the latter oscillations, the time of delay with respect to the heater switch-on was close to 100 min and about 40-50 min, respectively. These values correspond to group propagation velocities of about 160 and 320-400 m/s. The Doppler shift oscillations were caused by the acoustic-gravity waves which led to periodic variations in the electron number density with a relative amplitude of about 0.1-1.0%. It was demonstrated that the acoustic-gravity waves were not recorded when the effective power of the Sura facility was equal to 50 MW and they were confidently observed when the effective power was increased up to 130 MW. It is shown that the period of the wave processes was determined by the period of the heating-pause cycles, and the duration of the wave trains did not depend on the duration of the series of heating-pause cycles. The data suggest that the generation mechanism of recorded wave disturbances is different from the mechanism proposed in 1970-1990.

  3. Interaction of dust-ion acoustic solitary waves in nonplanar geometry with electrons featuring Tsallis distribution

    SciTech Connect

    Narayan Ghosh, Uday; Chatterjee, Prasanta; Tribeche, Mouloud

    2012-11-15

    The head-on collisions between nonplanar dust-ion acoustic solitary waves are dealt with by an extended version of Poincare-Lighthill-Kuo perturbation method, for a plasma having stationary dust grains, inertial ions, and nonextensive electrons. The nonplanar geometry modified analytical phase-shift after a head-on collision is derived. It is found that as the nonextensive character of the electrons becomes important, the phase-shift decreases monotonically before levelling-off at a constant value. This leads us to think that nonextensivity may have a stabilizing effect on the phase-shift.

  4. Acoustic analysis in Mudejar-Gothic churches: experimental results.

    PubMed

    Galindo, Miguel; Zamarreño, Teófilo; Girón, Sara

    2005-05-01

    This paper describes the preliminary results of research work in acoustics, conducted in a set of 12 Mudejar-Gothic churches in the city of Seville in the south of Spain. Despite common architectural style, the churches feature individual characteristics and have volumes ranging from 3947 to 10 708 m3. Acoustic parameters were measured in unoccupied churches according to the ISO-3382 standard. An extensive experimental study was carried out using impulse response analysis through a maximum length sequence measurement system in each church. It covered aspects such as reverberation (reverberation times, early decay times), distribution of sound levels (sound strength); early to late sound energy parameters derived from the impulse responses (center time, clarity for speech, clarity, definition, lateral energy fraction), and speech intelligibility (rapid speech transmission index), which all take both spectral and spatial distribution into account. Background noise was also measured to obtain the NR indices. The study describes the acoustic field inside each temple and establishes a discussion for each one of the acoustic descriptors mentioned by using the theoretical models available and the principles of architectural acoustics. Analysis of the quality of the spaces for music and speech is carried out according to the most widespread criteria for auditoria. PMID:15957758

  5. Acoustic analysis in Mudejar-Gothic churches: Experimental results

    NASA Astrophysics Data System (ADS)

    Galindo, Miguel; Zamarreño, Teófilo; Girón, Sara

    2005-05-01

    This paper describes the preliminary results of research work in acoustics, conducted in a set of 12 Mudejar-Gothic churches in the city of Seville in the south of Spain. Despite common architectural style, the churches feature individual characteristics and have volumes ranging from 3947 to 10 708 m3. Acoustic parameters were measured in unoccupied churches according to the ISO-3382 standard. An extensive experimental study was carried out using impulse response analysis through a maximum length sequence measurement system in each church. It covered aspects such as reverberation (reverberation times, early decay times), distribution of sound levels (sound strength); early to late sound energy parameters derived from the impulse responses (center time, clarity for speech, clarity, definition, lateral energy fraction), and speech intelligibility (rapid speech transmission index), which all take both spectral and spatial distribution into account. Background noise was also measured to obtain the NR indices. The study describes the acoustic field inside each temple and establishes a discussion for each one of the acoustic descriptors mentioned by using the theoretical models available and the principles of architectural acoustics. Analysis of the quality of the spaces for music and speech is carried out according to the most widespread criteria for auditoria. .

  6. Detection of Clinical Depression in Adolescents’ Speech During Family Interactions

    PubMed Central

    Low, Lu-Shih Alex; Maddage, Namunu C.; Lech, Margaret; Sheeber, Lisa B.; Allen, Nicholas B.

    2013-01-01

    The properties of acoustic speech have previously been investigated as possible cues for depression in adults. However, these studies were restricted to small populations of patients and the speech recordings were made during patients’ clinical interviews or fixed-text reading sessions. Symptoms of depression often first appear during adolescence at a time when the voice is changing, in both males and females, suggesting that specific studies of these phenomena in adolescent populations are warranted. This study investigated acoustic correlates of depression in a large sample of 139 adolescents (68 clinically depressed and 71 controls). Speech recordings were made during naturalistic interactions between adolescents and their parents. Prosodic, cepstral, spectral, and glottal features, as well as features derived from the Teager energy operator (TEO), were tested within a binary classification framework. Strong gender differences in classification accuracy were observed. The TEO-based features clearly outperformed all other features and feature combinations, providing classification accuracy ranging between 81%–87% for males and 72%–79% for females. Close, but slightly less accurate, results were obtained by combining glottal features with prosodic and spectral features (67%–69% for males and 70%–75% for females). These findings indicate the importance of nonlinear mechanisms associated with the glottal flow formation as cues for clinical depression. PMID:21075715

  7. Automatic speech recognition

    NASA Astrophysics Data System (ADS)

    Espy-Wilson, Carol

    2005-04-01

    Great strides have been made in the development of automatic speech recognition (ASR) technology over the past thirty years. Most of this effort has been centered around the extension and improvement of Hidden Markov Model (HMM) approaches to ASR. Current commercially-available and industry systems based on HMMs can perform well for certain situational tasks that restrict variability such as phone dialing or limited voice commands. However, the holy grail of ASR systems is performance comparable to humans-in other words, the ability to automatically transcribe unrestricted conversational speech spoken by an infinite number of speakers under varying acoustic environments. This goal is far from being reached. Key to the success of ASR is effective modeling of variability in the speech signal. This tutorial will review the basics of ASR and the various ways in which our current knowledge of speech production, speech perception and prosody can be exploited to improve robustness at every level of the system.

  8. Effective acoustic modeling for robust speaker recognition

    NASA Astrophysics Data System (ADS)

    Hasan Al Banna, Taufiq

    Robustness due to mismatched train/test conditions is the biggest challenge facing the speaker recognition community today, with transmission channel and environmental noise degradation being the prominent factors. Performance of state-of-the art speaker recognition methods aim at mitigating these factors by effectively modeling speech in multiple recording conditions, so that it can learn to distinguish between inter-speaker and intra-speaker variability. The increasing demand and availability of large development corpora introduces difficulties in effective data utilization and computationally efficient modeling. Traditional compensation strategies operate on higher dimensional utterance features, known as supervectors, which are obtained from the acoustic modeling of short-time features. Feature compensation is performed during front-end processing. Motivated by the covariance structure of conventional acoustic features, we envision that feature normalization and compensation can be integrated into the acoustic modeling. In this dissertation, we investigate the following fundamental research challenges: (i) analysis of data requirements for effective and efficient background model training, (ii) introducing latent factor analysis modeling of acoustic features, (iii) integration of channel compensation strategies in mixture-models, and (iv) development of noise robust background models using factor analysis. The effectiveness of the proposed solutions are demonstrated in various noisy and channel degraded conditions using the recent evaluation datasets released by the National Institute of Standards and Technology (NIST). These research accomplishments make an important step towards improving speaker recognition robustness in diverse acoustic conditions.

  9. Nonlinear features of ion acoustic shock waves in dissipative magnetized dusty plasma

    SciTech Connect

    Sahu, Biswajit; Sinha, Anjana; Roychoudhury, Rajkumar

    2014-10-15

    The nonlinear propagation of small as well as arbitrary amplitude shocks is investigated in a magnetized dusty plasma consisting of inertia-less Boltzmann distributed electrons, inertial viscous cold ions, and stationary dust grains without dust-charge fluctuations. The effects of dissipation due to viscosity of ions and external magnetic field, on the properties of ion acoustic shock structure, are investigated. It is found that for small amplitude waves, the Korteweg-de Vries-Burgers (KdVB) equation, derived using Reductive Perturbation Method, gives a qualitative behaviour of the transition from oscillatory wave to shock structure. The exact numerical solution for arbitrary amplitude wave differs somehow in the details from the results obtained from KdVB equation. However, the qualitative nature of the two solutions is similar in the sense that a gradual transition from KdV oscillation to shock structure is observed with the increase of the dissipative parameter.

  10. Nonlinear ion-acoustic double-layers in electronegative plasmas with electrons featuring Tsallis distribution

    NASA Astrophysics Data System (ADS)

    Ghebache, Siham; Tribeche, Mouloud

    2016-04-01

    Weakly nonlinear ion-acoustic (IA) double-layers (DLs), which accompany electronegative plasmas composed of positive ions, negative ions, and nonextensive electrons are investigated. A generalized Korteweg-de Vries equation with a cubic nonlinearity is derived using a reductive perturbation method. Different types of electronegative plasmas inspired from the experimental studies of Ichiki et al. (2001) are discussed. It is shown that the IA wave phase velocity, in different mixtures of negative and positive ions, decreases as the nonextensive parameter q increases, before levelling-off at a constant value for larger q. Moreover, a relative increase of Q involves an enhancement of the IA phase velocity. Existence domains of either solitary waves or double-layers are then presented and their parametric dependence is determined. Owing to the electron nonextensivity, our present plasma model can admit compressive as well as rarefactive IA-DLs.

  11. Nonlinear features of ion acoustic shock waves in dissipative magnetized dusty plasma

    NASA Astrophysics Data System (ADS)

    Sahu, Biswajit; Sinha, Anjana; Roychoudhury, Rajkumar

    2014-10-01

    The nonlinear propagation of small as well as arbitrary amplitude shocks is investigated in a magnetized dusty plasma consisting of inertia-less Boltzmann distributed electrons, inertial viscous cold ions, and stationary dust grains without dust-charge fluctuations. The effects of dissipation due to viscosity of ions and external magnetic field, on the properties of ion acoustic shock structure, are investigated. It is found that for small amplitude waves, the Korteweg-de Vries-Burgers (KdVB) equation, derived using Reductive Perturbation Method, gives a qualitative behaviour of the transition from oscillatory wave to shock structure. The exact numerical solution for arbitrary amplitude wave differs somehow in the details from the results obtained from KdVB equation. However, the qualitative nature of the two solutions is similar in the sense that a gradual transition from KdV oscillation to shock structure is observed with the increase of the dissipative parameter.

  12. The WiggleZ Dark Energy Survey: improved distance measurements to z = 1 with reconstruction of the baryonic acoustic feature

    NASA Astrophysics Data System (ADS)

    Kazin, Eyal A.; Koda, Jun; Blake, Chris; Padmanabhan, Nikhil; Brough, Sarah; Colless, Matthew; Contreras, Carlos; Couch, Warrick; Croom, Scott; Croton, Darren J.; Davis, Tamara M.; Drinkwater, Michael J.; Forster, Karl; Gilbank, David; Gladders, Mike; Glazebrook, Karl; Jelliffe, Ben; Jurek, Russell J.; Li, I.-hui; Madore, Barry; Martin, D. Christopher; Pimbblet, Kevin; Poole, Gregory B.; Pracy, Michael; Sharp, Rob; Wisnioski, Emily; Woods, David; Wyder, Ted K.; Yee, H. K. C.

    2014-07-01

    We present significant improvements in cosmic distance measurements from the WiggleZ Dark Energy Survey, achieved by applying the reconstruction of the baryonic acoustic feature technique. We show using both data and simulations that the reconstruction technique can often be effective despite patchiness of the survey, significant edge effects and shot-noise. We investigate three redshift bins in the redshift range 0.2 < z < 1, and in all three find improvement after reconstruction in the detection of the baryonic acoustic feature and its usage as a standard ruler. We measure model-independent distance measures DV(rsfid/rs) of 1716 ± 83, 2221 ± 101, 2516 ± 86 Mpc (68 per cent CL) at effective redshifts z = 0.44, 0.6, 0.73, respectively, where DV is the volume-averaged distance, and rs is the sound horizon at the end of the baryon drag epoch. These significantly improved 4.8, 4.5 and 3.4 per cent accuracy measurements are equivalent to those expected from surveys with up to 2.5 times the volume of WiggleZ without reconstruction applied. These measurements are fully consistent with cosmologies allowed by the analyses of the Planck Collaboration and the Sloan Digital Sky Survey. We provide the DV(rsfid/rs) posterior probability distributions and their covariances. When combining these measurements with temperature fluctuations measurements of Planck, the polarization of Wilkinson Microwave Anisotropy Probe 9, and the 6dF Galaxy Survey baryonic acoustic feature, we do not detect deviations from a flat Λ cold dark matter (ΛCDM) model. Assuming this model, we constrain the current expansion rate to H0 = 67.15 ± 0.98 km s-1Mpc-1. Allowing the equation of state of dark energy to vary, we obtain wDE = -1.080 ± 0.135. When assuming a curved ΛCDM model we obtain a curvature value of ΩK = -0.0043 ± 0.0047.

  13. Normal Aspects of Speech, Hearing, and Language.

    ERIC Educational Resources Information Center

    Minifie, Fred. D., Ed.; And Others

    This book is written as a guide to the understanding of the processes involved in human speech communication. Ten authorities contributed material to provide an introduction to the physiological aspects of speech production and reception, the acoustical aspects of speech production and transmission, the psychophysics of sound reception, the nature…

  14. A Screening Approach for Classroom Acoustics Using Web-Based Listening Tests and Subjective Ratings

    PubMed Central

    Persson Waye, Kerstin; Magnusson, Lennart; Fredriksson, Sofie; Croy, Ilona

    2015-01-01

    Background Perception of speech is crucial in school where speech is the main mode of communication. The aim of the study was to evaluate whether a web based approach including listening tests and questionnaires could be used as a screening tool for poor classroom acoustics. The prime focus was the relation between pupils’ comprehension of speech, the classroom acoustics and their description of the acoustic qualities of the classroom. Methodology/Principal Findings In total, 1106 pupils aged 13-19, from 59 classes and 38 schools in Sweden participated in a listening study using Hagerman’s sentences administered via Internet. Four listening conditions were applied: high and low background noise level and positions close and far away from the loudspeaker. The pupils described the acoustic quality of the classroom and teachers provided information on the physical features of the classroom using questionnaires. Conclusions/Significance In 69% of the classes, at least three pupils described the sound environment as adverse and in 88% of the classes one or more pupil reported often having difficulties concentrating due to noise. The pupils’ comprehension of speech was strongly influenced by the background noise level (p<0.001) and distance to the loudspeakers (p<0.001). Of the physical classroom features, presence of suspended acoustic panels (p<0.05) and length of the classroom (p<0.01) predicted speech comprehension. Of the pupils’ descriptions of acoustic qualities, clattery significantly (p<0.05) predicted speech comprehension. Clattery was furthermore associated to difficulties understanding each other, while the description noisy was associated to concentration difficulties. The majority of classrooms do not seem to have an optimal sound environment. The pupil’s descriptions of acoustic qualities and listening tests can be one way of predicting sound conditions in the classroom. PMID:25615692

  15. Automatic detection of wheezes by evaluation of multiple acoustic feature extraction methods and C-weighted SVM

    NASA Astrophysics Data System (ADS)

    Sosa, Germán. D.; Cruz-Roa, Angel; González, Fabio A.

    2015-01-01

    This work addresses the problem of lung sound classification, in particular, the problem of distinguishing between wheeze and normal sounds. Wheezing sound detection is an important step to associate lung sounds with an abnormal state of the respiratory system, usually associated with tuberculosis or another chronic obstructive pulmonary diseases (COPD). The paper presents an approach for automatic lung sound classification, which uses different state-of-the-art sound features in combination with a C-weighted support vector machine (SVM) classifier that works better for unbalanced data. Feature extraction methods used here are commonly applied in speech recognition and related problems thanks to the fact that they capture the most informative spectral content from the original signals. The evaluated methods were: Fourier transform (FT), wavelet decomposition using Wavelet Packet Transform bank of filters (WPT) and Mel Frequency Cepstral Coefficients (MFCC). For comparison, we evaluated and contrasted the proposed approach against previous works using different combination of features and/or classifiers. The different methods were evaluated on a set of lung sounds including normal and wheezing sounds. A leave-two-out per-case cross-validation approach was used, which, in each fold, chooses as validation set a couple of cases, one including normal sounds and the other including wheezing sounds. Experimental results were reported in terms of traditional classification performance measures: sensitivity, specificity and balanced accuracy. Our best results using the suggested approach, C-weighted SVM and MFCC, achieve a 82.1% of balanced accuracy obtaining the best result for this problem until now. These results suggest that supervised classifiers based on kernel methods are able to learn better models for this challenging classification problem even using the same feature extraction methods.

  16. A keyword spotting model using perceptually significant energy features

    NASA Astrophysics Data System (ADS)

    Umakanthan, Padmalochini

    The task of a keyword recognition system is to detect the presence of certain words in a conversation based on the linguistic information present in human speech. Such keyword spotting systems have applications in homeland security, telephone surveillance and human-computer interfacing. General procedure of a keyword spotting system involves feature generation and matching. In this work, new set of features that are based on the psycho-acoustic masking nature of human speech are proposed. After developing these features a time aligned pattern matching process was implemented to locate the words in a set of unknown words. A word boundary detection technique based on frame classification using the nonlinear characteristics of speech is also addressed in this work. Validation of this keyword spotting model was done using widely acclaimed Cepstral features. The experimental results indicate the viability of using these perceptually significant features as an augmented feature set in keyword spotting.

  17. [ACOUSTIC FEATURES OF VOCALIZATIONS, REFLECTING THE DISCOMFORT AND COMFORT STATE OF INFANTS AGED THREE AND SIX MONTHS].

    PubMed

    Pavlikova, M I; Makarov, A K; Lyakso, E E

    2015-08-01

    The paper presented the possibility of recognition by adult the comfort and discomfort state of 3 and 6 months old infant's on the base of their vocalizations. The acoustic features of the vocalizations that are important for the recognition of the infant state of the characteristics of voice was described. It is shown that discomfort vocalizations differ from comfort ones on the basis of the average and maximum values of pitch, pitch values in the central and final part of the vocalization. A mathematical model is proposed and described a classification function signal of discomfort and comfort. Was found that the vocalizations of infants attributable adults with a probability of 0.75 and above the categories of comfort and discomfort with high reliability are recognized by the mathematical model based on a classification function. PMID:26591591

  18. Contributions of speech science to the technology of man-machine voice interactions

    NASA Technical Reports Server (NTRS)

    Lea, Wayne A.

    1977-01-01

    Research in speech understanding was reviewed. Plans which include prosodics research, phonological rules for speech understanding systems, and continued interdisciplinary phonetics research are discussed. Improved acoustic phonetic analysis capabilities in speech recognizers are suggested.

  19. Extraction of features from ultrasound acoustic emissions: a tool to assess the hydraulic vulnerability of Norway spruce trunkwood?

    PubMed Central

    Rosner, Sabine; Klein, Andrea; Wimmer, Rupert; Karlsson, Bo

    2011-01-01

    Summary • The aim of this study was to assess the hydraulic vulnerability of Norway spruce (Picea abies) trunkwood by extraction of selected features of acoustic emissions (AEs) detected during dehydration of standard size samples. • The hydraulic method was used as the reference method to assess the hydraulic vulnerability of trunkwood of different cambial ages. Vulnerability curves were constructed by plotting the percentage loss of conductivity vs an overpressure of compressed air. • Differences in hydraulic vulnerability were very pronounced between juvenile and mature wood samples; therefore, useful AE features, such as peak amplitude, duration and relative energy, could be filtered out. The AE rates of signals clustered by amplitude and duration ranges and the AE energies differed greatly between juvenile and mature wood at identical relative water losses. • Vulnerability curves could be constructed by relating the cumulated amount of relative AE energy to the relative loss of water and to xylem tension. AE testing in combination with feature extraction offers a readily automated and easy to use alternative to the hydraulic method. PMID:16771986

  20. Effect of body position on vocal tract acoustics: Acoustic pharyngometry and vowel formants

    PubMed Central

    Vorperian, Houri K.; Kurtzweil, Sara L.; Fourakis, Marios; Kent, Ray D.; Tillman, Katelyn K.; Austin, Diane

    2015-01-01

    The anatomic basis and articulatory features of speech production are often studied with imaging studies that are typically acquired in the supine body position. It is important to determine if changes in body orientation to the gravitational field alter vocal tract dimensions and speech acoustics. The purpose of this study was to assess the effect of body position (upright versus supine) on (1) oral and pharyngeal measurements derived from acoustic pharyngometry and (2) acoustic measurements of fundamental frequency (F0) and the first four formant frequencies (F1–F4) for the quadrilateral point vowels. Data were obtained for 27 male and female participants, aged 17 to 35 yrs. Acoustic pharyngometry showed a statistically significant effect of body position on volumetric measurements, with smaller values in the supine than upright position, but no changes in length measurements. Acoustic analyses of vowels showed significantly larger values in the supine than upright position for the variables of F0, F3, and the Euclidean distance from the centroid to each corner vowel in the F1-F2-F3 space. Changes in body position affected measurements of vocal tract volume but not length. Body position also affected the aforementioned acoustic variables, but the main vowel formants were preserved. PMID:26328699

  1. Experiment in Learning to Discriminate Frequency Transposed Speech.

    ERIC Educational Resources Information Center

    Ahlstrom, K.G.; And Others

    In order to improve speech perception by transposing the speech signals to lower frequencies, to determine which aspects of the information in the acoustic speech signals were influenced by transposition, and to compare two different methods of training speech perception, 44 subjects were trained to discriminate between transposed words or…

  2. Musical ability and non-native speech-sound processing are linked through sensitivity to pitch and spectral information.

    PubMed

    Kempe, Vera; Bublitz, Dennis; Brooks, Patricia J

    2015-05-01

    Is the observed link between musical ability and non-native speech-sound processing due to enhanced sensitivity to acoustic features underlying both musical and linguistic processing? To address this question, native English speakers (N = 118) discriminated Norwegian tonal contrasts and Norwegian vowels. Short tones differing in temporal, pitch, and spectral characteristics were used to measure sensitivity to the various acoustic features implicated in musical and speech processing. Musical ability was measured using Gordon's Advanced Measures of Musical Audiation. Results showed that sensitivity to specific acoustic features played a role in non-native speech-sound processing: Controlling for non-verbal intelligence, prior foreign language-learning experience, and sex, sensitivity to pitch and spectral information partially mediated the link between musical ability and discrimination of non-native vowels and lexical tones. The findings suggest that while sensitivity to certain acoustic features partially mediates the relationship between musical ability and non-native speech-sound processing, complex tests of musical ability also tap into other shared mechanisms. PMID:25220831

  3. Optimal selection of wavelet-packet-based features using genetic algorithm in pathological assessment of patients' speech signal with unilateral vocal fold paralysis.

    PubMed

    Behroozmand, Roozbeh; Almasganj, Farshad

    2007-04-01

    Unilateral vocal fold paralysis (UVFP) is one of the most severe types of neurogenic laryngeal disorder in which the patients, due to their vocal cords malfunction, are confronted by some serious problems. As the effect of such pathologies would be significantly evident in the reduced quality and feature variation of dysphonic voices, this study is designed to scrutinize the piecewise variation of some specific types of these features, known as energy and entropy, all over the frequency range of pathological speech signals. In order to do so, the wavelet-packet coefficients, in five consecutive levels of decomposition, are used to extract the energy and entropy measures at different spectral sub-bands. As the decomposition procedure leads to a set of high-dimensional feature vectors, genetic algorithm is invoked to search for a group of optimal sub-band indexes for which the extracted features result in the highest recognition rate for pathological and normal subjects' classification. The results of our simulations, using support vector machine classifier, show that the highest recognition rate, for both optimized energy and entropy measures, is achieved at the fifth level of wavelet-packet decomposition. It is also found that entropy feature, with the highest recognition rate of 100% vs. 93.62% for energy, is more prominent in discriminating patients with UVFP from normal subjects. Therefore, entropy feature, in comparison with energy, demonstrates a more efficient description of such pathological voices and provides us a valuable tool for clinical diagnosis of unilateral laryngeal paralysis. PMID:17034780

  4. Automatic measurement and representation of prosodic features

    NASA Astrophysics Data System (ADS)

    Ying, Goangshiuan Shawn

    Effective measurement and representation of prosodic features of the acoustic signal for use in automatic speech recognition and understanding systems is the goal of this work. Prosodic features-stress, duration, and intonation-are variations of the acoustic signal whose domains are beyond the boundaries of each individual phonetic segment. Listeners perceive prosodic features through a complex combination of acoustic correlates such as intensity, duration, and fundamental frequency (F0). We have developed new tools to measure F0 and intensity features. We apply a probabilistic global error correction routine to an Average Magnitude Difference Function (AMDF) pitch detector. A new short-term frequency-domain Teager energy algorithm is used to measure the energy of a speech signal. We have conducted a series of experiments performing lexical stress detection on words in continuous English speech from two speech corpora. We have experimented with two different approaches, a segment-based approach and a rhythm unit-based approach, in lexical stress detection. The first approach uses pattern recognition with energy- and duration-based measurements as features to build Bayesian classifiers to detect the stress level of a vowel segment. In the second approach we define rhythm unit and use only the F0-based measurement and a scoring system to determine the stressed segment in the rhythm unit. A duration-based segmentation routine was developed to break polysyllabic words into rhythm units. The long-term goal of this work is to develop a system that can effectively detect the stress pattern for each word in continuous speech utterances. Stress information will be integrated as a constraint for pruning the word hypotheses in a word recognition system based on hidden Markov models.

  5. Linking Speech Perception and Neurophysiology: Speech Decoding Guided by Cascaded Oscillators Locked to the Input Rhythm

    PubMed Central

    Ghitza, Oded

    2011-01-01

    The premise of this study is that current models of speech perception, which are driven by acoustic features alone, are incomplete, and that the role of decoding time during memory access must be incorporated to account for the patterns of observed recognition phenomena. It is postulated that decoding time is governed by a cascade of neuronal oscillators, which guide template-matching operations at a hierarchy of temporal scales. Cascaded cortical oscillations in the theta, beta, and gamma frequency bands are argued to be crucial for speech intelligibility. Intelligibility is high so long as these oscillations remain phase locked to the auditory input rhythm. A model (Tempo) is presented which is capable of emulating recent psychophysical data on the intelligibility of speech sentences as a function of “packaging” rate (Ghitza and Greenberg, 2009). The data show that intelligibility of speech that is time-compressed by a factor of 3 (i.e., a high syllabic rate) is poor (above 50% word error rate), but is substantially restored when the information stream is re-packaged by the insertion of silent gaps in between successive compressed-signal intervals – a counterintuitive finding, difficult to explain using classical models of speech perception, but emerging naturally from the Tempo architecture. PMID:21743809

  6. Inconsistency of speech in children with childhood apraxia of speech, phonological disorders, and typical speech

    NASA Astrophysics Data System (ADS)

    Iuzzini, Jenya

    There is a lack of agreement on the features used to differentiate Childhood Apraxia of Speech (CAS) from Phonological Disorders (PD). One criterion which has gained consensus is lexical inconsistency of speech (ASHA, 2007); however, no accepted measure of this feature has been defined. Although lexical assessment provides information about consistency of an item across repeated trials, it may not capture the magnitude of inconsistency within an item. In contrast, segmental analysis provides more extensive information about consistency of phoneme usage across multiple contexts and word-positions. The current research compared segmental and lexical inconsistency metrics in preschool-aged children with PD, CAS, and typical development (TD) to determine how inconsistency varies with age in typical and disordered speakers, and whether CAS and PD were differentiated equally well by both assessment levels. Whereas lexical and segmental analyses may be influenced by listener characteristics or speaker intelligibility, the acoustic signal is less vulnerable to these factors. In addition, the acoustic signal may reveal information which is not evident in the perceptual signal. A second focus of the current research was motivated by Blumstein et al.'s (1980) classic study on voice onset time (VOT) in adults with acquired apraxia of speech (AOS) which demonstrated a motor impairment underlying AOS. In the current study, VOT analyses were conducted to determine the relationship between age and group with the voicing distribution for bilabial and alveolar plosives. Findings revealed that 3-year-olds evidenced significantly higher inconsistency than 5-year-olds; segmental inconsistency approached 0% in 5-year-olds with TD, whereas it persisted in children with PD and CAS suggesting that for child in this age-range, inconsistency is a feature of speech disorder rather than typical development (Holm et al., 2007). Likewise, whereas segmental and lexical inconsistency were

  7. Lexical and Acoustic Features of Maternal Utterances Addressing Preverbal Infants in Picture Book Reading Link to 5-Year-Old Children's Language Development

    ERIC Educational Resources Information Center

    Liu, Huei-Mei

    2014-01-01

    Research Findings: I examined the long-term association between the lexical and acoustic features of maternal utterances during book reading and the language skills of infants and children. Maternal utterances were collected from 22 mother-child dyads in picture book-reading episodes when children were ages 6-12 months and 5 years. Two aspects of…

  8. Speech perception as categorization

    PubMed Central

    Holt, Lori L.; Lotto, Andrew J.

    2010-01-01

    Speech perception (SP) most commonly refers to the perceptual mapping from the highly variable acoustic speech signal to a linguistic representation, whether it be phonemes, diphones, syllables, or words. This is an example of categorization, in that potentially discriminable speech sounds are assigned to functionally equivalent classes. In this tutorial, we present some of the main challenges to our understanding of the categorization of speech sounds and the conceptualization of SP that has resulted from these challenges. We focus here on issues and experiments that define open research questions relevant to phoneme categorization, arguing that SP is best understood as perceptual categorization, a position that places SP in direct contact with research from other areas of perception and cognition. PMID:20601702

  9. Acoustic Longitudinal Field NIF Optic Feature Detection Map Using Time-Reversal & MUSIC

    SciTech Connect

    Lehman, S K

    2006-02-09

    We developed an ultrasonic longitudinal field time-reversal and MUltiple SIgnal Classification (MUSIC) based detection algorithm for identifying and mapping flaws in fused silica NIF optics. The algorithm requires a fully multistatic data set, that is one with multiple, independently operated, spatially diverse transducers, each transmitter of which, in succession, launches a pulse into the optic and the scattered signal measured and recorded at every receiver. We have successfully localized engineered ''defects'' larger than 1 mm in an optic. We confirmed detection and localization of 3 mm and 5 mm features in experimental data, and a 0.5 mm in simulated data with sufficiently high signal-to-noise ratio. We present the theory, experimental results, and simulated results.

  10. Deep Bottleneck Features for Spoken Language Identification

    PubMed Central

    Jiang, Bing; Song, Yan; Wei, Si; Liu, Jun-Hua; McLoughlin, Ian Vince; Dai, Li-Rong

    2014-01-01

    A key problem in spoken language identification (LID) is to design effective representations which are specific to language information. For example, in recent years, representations based on both phonotactic and acoustic features have proven their effectiveness for LID. Although advances in machine learning have led to significant improvements, LID performance is still lacking, especially for short duration speech utterances. With the hypothesis that language information is weak and represented only latently in speech, and is largely dependent on the statistical properties of the speech content, existing representations may be insufficient. Furthermore they may be susceptible to the variations caused by different speakers, specific content of the speech segments, and background noise. To address this, we propose using Deep Bottleneck Features (DBF) for spoken LID, motivated by the success of Deep Neural Networks (DNN) in speech recognition. We show that DBFs can form a low-dimensional compact representation of the original inputs with a powerful descriptive and discriminative capability. To evaluate the effectiveness of this, we design two acoustic models, termed DBF-TV and parallel DBF-TV (PDBF-TV), using a DBF based i-vector representation for each speech utterance. Results on NIST language recognition evaluation 2009 (LRE09) show significant improvements over state-of-the-art systems. By fusing the output of phonotactic and acoustic approaches, we achieve an EER of 1.08%, 1.89% and 7.01% for 30 s, 10 s and 3 s test utterances respectively. Furthermore, various DBF configurations have been extensively evaluated, and an optimal system proposed. PMID:24983963

  11. Fluid Dynamics of Human Phonation and Speech

    NASA Astrophysics Data System (ADS)

    Mittal, Rajat; Erath, Byron D.; Plesniak, Michael W.

    2013-01-01

    This article presents a review of the fluid dynamics, flow-structure interactions, and acoustics associated with human phonation and speech. Our voice is produced through the process of phonation in the larynx, and an improved understanding of the underlying physics of this process is essential to advancing the treatment of voice disorders. Insights into the physics of phonation and speech can also contribute to improved vocal training and the development of new speech compression and synthesis schemes. This article introduces the key biomechanical features of the laryngeal physiology, reviews the basic principles of voice production, and summarizes the progress made over the past half-century in understanding the flow physics of phonation and speech. Laryngeal pathologies, which significantly enhance the complexity of phonatory dynamics, are discussed. After a thorough examination of the state of the art in computational modeling and experimental investigations of phonatory biomechanics, we present a synopsis of the pacing issues in this arena and an outlook for research in this fascinating subject.

  12. Reverberant speech recognition exploiting clarity index estimation

    NASA Astrophysics Data System (ADS)

    Parada, Pablo Peso; Sharma, Dushyant; Naylor, Patrick A.; Waterschoot, Toon van

    2015-12-01

    We present single-channel approaches to robust automatic speech recognition (ASR) in reverberant environments based on non-intrusive estimation of the clarity index ( C 50). Our best performing method includes the estimated value of C 50 in the ASR feature vector and also uses C 50 to select the most suitable ASR acoustic model according to the reverberation level. We evaluate our method on the REVERB Challenge database employing two different C 50 estimators and show that our method outperforms the best baseline of the challenge achieved without unsupervised acoustic model adaptation, i.e. using multi-condition hidden Markov models (HMMs). Our approach achieves a 22.4 % relative word error rate reduction in comparison to the best baseline of the challenge.

  13. Specific Features of Destabilization of the Wave Profile During Reflection of an Intense Acoustic Beam from a Soft Boundary

    NASA Astrophysics Data System (ADS)

    Deryabin, M. S.; Kasyanov, D. A.; Kurin, V. V.; Garasyov, M. A.

    2016-05-01

    We show that a significant energy redistribution occurs in the spectrum of reflected nonlinear waves, when an intense acoustic beam is reflected from an acoustically soft boundary, which manifests itself at short wave distances from a reflecting boundary. This effect leads to the appearance of extrema in the distributions of the amplitude and intensity of the field of the reflected acoustic beam near the reflecting boundary. The results of physical experiments are confirmed by numerical modeling of the process of transformation of nonlinear waves reflected from an acoustically soft boundary. Numerical modeling was performed by means of the Khokhlov—Zabolotskaya—Kuznetsov (KZK) equation.

  14. Specific Features of Destabilization of the Wave Profile During Reflection of an Intense Acoustic Beam from a Soft Boundary

    NASA Astrophysics Data System (ADS)

    Deryabin, M. S.; Kasyanov, D. A.; Kurin, V. V.; Garasyov, M. A.

    2016-06-01

    We show that a significant energy redistribution occurs in the spectrum of reflected nonlinear waves, when an intense acoustic beam is reflected from an acoustically soft boundary, which manifests itself at short wave distances from a reflecting boundary. This effect leads to the appearance of extrema in the distributions of the amplitude and intensity of the field of the reflected acoustic beam near the reflecting boundary. The results of physical experiments are confirmed by numerical modeling of the process of transformation of nonlinear waves reflected from an acoustically soft boundary. Numerical modeling was performed by means of the Khokhlov—Zabolotskaya—Kuznetsov (KZK) equation.

  15. Speech prosody in cerebellar ataxia

    NASA Astrophysics Data System (ADS)

    Casper, Maureen

    The present study sought an acoustic signature for the speech disturbance recognized in cerebellar degeneration. Magnetic resonance imaging was used for a radiological rating of cerebellar involvement in six cerebellar ataxic dysarthric speakers. Acoustic measures of the [pap] syllables in contrastive prosodic conditions and of normal vs. brain-damaged patients were used to further our understanding both of the speech degeneration that accompanies cerebellar pathology and of speech motor control and movement in general. Pair-wise comparisons of the prosodic conditions within the normal group showed statistically significant differences for four prosodic contrasts. For three of the four contrasts analyzed, the normal speakers showed both longer durations and higher formant and fundamental frequency values in the more prominent first condition of the contrast. The acoustic measures of the normal prosodic contrast values were then used as a model to measure the degree of speech deterioration for individual cerebellar subjects. This estimate of speech deterioration as determined by individual differences between cerebellar and normal subjects' acoustic values of the four prosodic contrasts was used in correlation analyses with MRI ratings. Moderate correlations between speech deterioration and cerebellar atrophy were found in the measures of syllable duration and f0. A strong negative correlation was found for F1. Moreover, the normal model presented by these acoustic data allows for a description of the flexibility of task- oriented behavior in normal speech motor control. These data challenge spatio-temporal theory which explains movement as an artifact of time wherein longer durations predict more extreme movements and give further evidence for gestural internal dynamics of movement in which time emerges from articulatory events rather than dictating those events. This model provides a sensitive index of cerebellar pathology with quantitative acoustic

  16. Modeling Pathological Speech Perception From Data With Similarity Labels.

    PubMed

    Berisha, Visar; Liss, Julie; Sandoval, Steven; Utianski, Rene; Spanias, Andreas

    2014-05-01

    The current state of the art in judging pathological speech intelligibility is subjective assessment performed by trained speech pathologists (SLP). These tests, however, are inconsistent, costly and, oftentimes suffer from poor intra- and inter-judge reliability. As such, consistent, reliable, and perceptually-relevant objective evaluations of pathological speech are critical. Here, we propose a data-driven approach to this problem. We propose new cost functions for examining data from a series of experiments, whereby we ask certified SLPs to rate pathological speech along the perceptual dimensions that contribute to decreased intelligibility. We consider qualitative feedback from SLPs in the form of comparisons similar to statements "Is Speaker A's rhythm more similar to Speaker B or Speaker C?" Data of this form is common in behavioral research, but is different from the traditional data structures expected in supervised (data matrix + class labels) or unsupervised (data matrix) machine learning. The proposed method identifies relevant acoustic features that correlate with the ordinal data collected during the experiment. Using these features, we show that we are able to develop objective measures of the speech signal degradation that correlate well with SLP responses. PMID:25435817

  17. The auditory representation of speech sounds in human motor cortex

    PubMed Central

    Cheung, Connie; Hamilton, Liberty S; Johnson, Keith; Chang, Edward F

    2016-01-01

    In humans, listening to speech evokes neural responses in the motor cortex. This has been controversially interpreted as evidence that speech sounds are processed as articulatory gestures. However, it is unclear what information is actually encoded by such neural activity. We used high-density direct human cortical recordings while participants spoke and listened to speech sounds. Motor cortex neural patterns during listening were substantially different than during articulation of the same sounds. During listening, we observed neural activity in the superior and inferior regions of ventral motor cortex. During speaking, responses were distributed throughout somatotopic representations of speech articulators in motor cortex. The structure of responses in motor cortex during listening was organized along acoustic features similar to auditory cortex, rather than along articulatory features as during speaking. Motor cortex does not contain articulatory representations of perceived actions in speech, but rather, represents auditory vocal information. DOI: http://dx.doi.org/10.7554/eLife.12577.001 PMID:26943778

  18. Modeling Pathological Speech Perception From Data With Similarity Labels

    PubMed Central

    Berisha, Visar; Liss, Julie; Sandoval, Steven; Utianski, Rene; Spanias, Andreas

    2014-01-01

    The current state of the art in judging pathological speech intelligibility is subjective assessment performed by trained speech pathologists (SLP). These tests, however, are inconsistent, costly and, oftentimes suffer from poor intra- and inter-judge reliability. As such, consistent, reliable, and perceptually-relevant objective evaluations of pathological speech are critical. Here, we propose a data-driven approach to this problem. We propose new cost functions for examining data from a series of experiments, whereby we ask certified SLPs to rate pathological speech along the perceptual dimensions that contribute to decreased intelligibility. We consider qualitative feedback from SLPs in the form of comparisons similar to statements “Is Speaker A's rhythm more similar to Speaker B or Speaker C?” Data of this form is common in behavioral research, but is different from the traditional data structures expected in supervised (data matrix + class labels) or unsupervised (data matrix) machine learning. The proposed method identifies relevant acoustic features that correlate with the ordinal data collected during the experiment. Using these features, we show that we are able to develop objective measures of the speech signal degradation that correlate well with SLP responses. PMID:25435817

  19. Common cues to emotion in the dynamic facial expressions of speech and song

    PubMed Central

    Livingstone, Steven R.; Thompson, William F.; Wanderley, Marcelo M.; Palmer, Caroline

    2015-01-01

    Speech and song are universal forms of vocalization that may share aspects of emotional expression. Research has focused on parallels in acoustic features, overlooking facial cues to emotion. In three experiments, we compared moving facial expressions in speech and song. In Experiment 1, vocalists spoke and sang statements each with five emotions. Vocalists exhibited emotion-dependent movements of the eyebrows and lip corners that transcended speech–song differences. Vocalists’ jaw movements were coupled to their acoustic intensity, exhibiting differences across emotion and speech–song. Vocalists’ emotional movements extended beyond vocal sound to include large sustained expressions, suggesting a communicative function. In Experiment 2, viewers judged silent videos of vocalists’ facial expressions prior to, during, and following vocalization. Emotional intentions were identified accurately for movements during and after vocalization, suggesting that these movements support the acoustic message. Experiment 3 compared emotional identification in voice-only, face-only, and face-and-voice recordings. Emotion judgements for voice-only singing were poorly identified, yet were accurate for all other conditions, confirming that facial expressions conveyed emotion more accurately than the voice in song, yet were equivalent in speech. Collectively, these findings highlight broad commonalities in the facial cues to emotion in speech and song, yet highlight differences in perception and acoustic-motor production. PMID:25424388

  20. System and method for investigating sub-surface features of a rock formation with acoustic sources generating coded signals

    SciTech Connect

    Vu, Cung Khac; Nihei, Kurt; Johnson, Paul A; Guyer, Robert; Ten Cate, James A; Le Bas, Pierre-Yves; Larmat, Carene S

    2014-12-30

    A system and a method for investigating rock formations includes generating, by a first acoustic source, a first acoustic signal comprising a first plurality of pulses, each pulse including a first modulated signal at a central frequency; and generating, by a second acoustic source, a second acoustic signal comprising a second plurality of pulses. A receiver arranged within the borehole receives a detected signal including a signal being generated by a non-linear mixing process from the first-and-second acoustic signal in a non-linear mixing zone within the intersection volume. The method also includes-processing the received signal to extract the signal generated by the non-linear mixing process over noise or over signals generated by a linear interaction process, or both.

  1. Using the Speech Transmission Index for predicting non-native speech intelligibility

    NASA Astrophysics Data System (ADS)

    van Wijngaarden, Sander J.; Bronkhorst, Adelbert W.; Houtgast, Tammo; Steeneken, Herman J. M.

    2004-03-01

    While the Speech Transmission Index (STI) is widely applied for prediction of speech intelligibility in room acoustics and telecommunication engineering, it is unclear how to interpret STI values when non-native talkers or listeners are involved. Based on subjectively measured psychometric functions for sentence intelligibility in noise, for populations of native and non-native communicators, a correction function for the interpretation of the STI is derived. This function is applied to determine the appropriate STI ranges with qualification labels (``bad''-``excellent''), for specific populations of non-natives. The correction function is derived by relating the non-native psychometric function to the native psychometric function by a single parameter (ν). For listeners, the ν parameter is found to be highly correlated with linguistic entropy. It is shown that the proposed correction function is also valid for conditions featuring bandwidth limiting and reverberation.

  2. Multiple levels of linguistic and paralinguistic features contribute to voice recognition.

    PubMed

    Zarate, Jean Mary; Tian, Xing; Woods, Kevin J P; Poeppel, David

    2015-01-01

    Voice or speaker recognition is critical in a wide variety of social contexts. In this study, we investigated the contributions of acoustic, phonological, lexical, and semantic information toward voice recognition. Native English speaking participants were trained to recognize five speakers in five conditions: non-speech, Mandarin, German, pseudo-English, and English. We showed that voice recognition significantly improved as more information became available, from purely acoustic features in non-speech to additional phonological information varying in familiarity. Moreover, we found that the recognition performance is transferable between training and testing in phonologically familiar conditions (German, pseudo-English, and English), but not in unfamiliar (Mandarin) or non-speech conditions. These results provide evidence suggesting that bottom-up acoustic analysis and top-down influence from phonological processing collaboratively govern voice recognition. PMID:26088739

  3. Multiple levels of linguistic and paralinguistic features contribute to voice recognition

    PubMed Central

    Mary Zarate, Jean; Tian, Xing; Woods, Kevin J. P.; Poeppel, David

    2015-01-01

    Voice or speaker recognition is critical in a wide variety of social contexts. In this study, we investigated the contributions of acoustic, phonological, lexical, and semantic information toward voice recognition. Native English speaking participants were trained to recognize five speakers in five conditions: non-speech, Mandarin, German, pseudo-English, and English. We showed that voice recognition significantly improved as more information became available, from purely acoustic features in non-speech to additional phonological information varying in familiarity. Moreover, we found that the recognition performance is transferable between training and testing in phonologically familiar conditions (German, pseudo-English, and English), but not in unfamiliar (Mandarin) or non-speech conditions. These results provide evidence suggesting that bottom-up acoustic analysis and top-down influence from phonological processing collaboratively govern voice recognition. PMID:26088739

  4. Phonetic recognition of natural speech by nonstationary Markov models

    NASA Astrophysics Data System (ADS)

    Falaschi, Alessandro

    1988-04-01

    A speech recognition system based on statistical decision theory, viewing the problem as the classical design of a decoder in a communication system framework is outlined. Statistical properties of the language are used to characterize the allowable phonetic sequence inside the words, while trying to capture allophonic phoneme features into functional-dependent acoustical models with the aim of utilizing them as word segmentation cues. Experiments prove the utility of an explicit modeling of the intrinsic speech nonstationarity in a statistically based speech recognition system. The nonstationarity of phonetic chain statistics and acoustical transition probabilities can be easily taken into account, yielding recognition improvements. The use of inside syllable position dependent phonetic models does not improve recognition performance, and the iterative Viterbi training algorithm seems unable to adequately valorize this kind of acoustical modeling. As a direct consequence of the system design, the recognized phonetic sequence exhibits word boundary marks even in absence of pauses between words, thus giving anchor points to the higher level parsing algorithms needed in a complete recognition system.

  5. Parameterization of the Voice Source by Combining Spectral Decay and Amplitude Features of the Glottal Flow.

    ERIC Educational Resources Information Center

    Alku, Paavo; Vilkman, Erkki; Laukkanen, Anne-Maria

    1998-01-01

    A new method is presented for the parameterization of glottal volume velocity waveforms that have been estimated by inverse filtering acoustic speech pressure signals. The new technique combines two features of voice production: the AC value and the spectral decay of the glottal flow. Testing found the new parameter correlates strongly with the…

  6. Intuitive visualizations of pitch and loudness in speech.

    PubMed

    Schaefer, Rebecca S; Beijer, Lilian J; Seuskens, Wiel; Rietveld, Toni C M; Sadakata, Makiko

    2016-04-01

    Visualizing acoustic features of speech has proven helpful in speech therapy; however, it is as yet unclear how to create intuitive and fitting visualizations. To better understand the mappings from speech sound aspects to visual space, a large web-based experiment (n = 249) was performed to evaluate spatial parameters that may optimally represent pitch and loudness of speech. To this end, five novel animated visualizations were developed and presented in pairwise comparisons, together with a static visualization. Pitch and loudness of speech were each mapped onto either the vertical (y-axis) or the size (z-axis) dimension, or combined (with size indicating loudness and vertical position indicating pitch height) and visualized as an animation along the horizontal dimension (x-axis) over time. The results indicated that firstly, there is a general preference towards the use of the y-axis for both pitch and loudness, with pitch ranking higher than loudness in terms of fit. Secondly, the data suggest that representing both pitch and loudness combined in a single visualization is preferred over visualization in only one dimension. Finally, the z-axis, although not preferred, was evaluated as corresponding better to loudness than to pitch. This relation between sound and visual space has not been reported previously for speech sounds, and elaborates earlier findings on musical material. In addition to elucidating more general mappings between auditory and visual modalities, the findings provide us with a method of visualizing speech that may be helpful in clinical applications such as computerized speech therapy, or other feedback-based learning paradigms. PMID:26370217

  7. Perception of Words and Pitch Patterns in Song and Speech

    PubMed Central

    Merrill, Julia; Sammler, Daniela; Bangert, Marc; Goldhahn, Dirk; Lohmann, Gabriele; Turner, Robert; Friederici, Angela D.

    2012-01-01

    This functional magnetic resonance imaging study examines shared and distinct cortical areas involved in the auditory perception of song and speech at the level of their underlying constituents: words and pitch patterns. Univariate and multivariate analyses were performed to isolate the neural correlates of the word- and pitch-based discrimination between song and speech, corrected for rhythmic differences in both. Therefore, six conditions, arranged in a subtractive hierarchy were created: sung sentences including words, pitch and rhythm; hummed speech prosody and song melody containing only pitch patterns and rhythm; and as a control the pure musical or speech rhythm. Systematic contrasts between these balanced conditions following their hierarchical organization showed a great overlap between song and speech at all levels in the bilateral temporal lobe, but suggested a differential role of the inferior frontal gyrus (IFG) and intraparietal sulcus (IPS) in processing song and speech. While the left IFG coded for spoken words and showed predominance over the right IFG in prosodic pitch processing, an opposite lateralization was found for pitch in song. The IPS showed sensitivity to discrete pitch relations in song as opposed to the gliding pitch in speech. Finally, the superior temporal gyrus and premotor cortex coded for general differences between words and pitch patterns, irrespective of whether they were sung or spoken. Thus, song and speech share many features which are reflected in a fundamental similarity of brain areas involved in their perception. However, fine-grained acoustic differences on word and pitch level are reflected in the IPS and the lateralized activity of the IFG. PMID:22457659

  8. Predicting Speech Intelligibility with A Multiple Speech Subsystems Approach in Children with Cerebral Palsy

    PubMed Central

    Lee, Jimin; Hustad, Katherine C.; Weismer, Gary

    2014-01-01

    Purpose Speech acoustic characteristics of children with cerebral palsy (CP) were examined with a multiple speech subsystem approach; speech intelligibility was evaluated using a prediction model in which acoustic measures were selected to represent three speech subsystems. Method Nine acoustic variables reflecting different subsystems, and speech intelligibility, were measured in 22 children with CP. These children included 13 with a clinical diagnosis of dysarthria (SMI), and nine judged to be free of dysarthria (NSMI). Data from children with CP were compared to data from age-matched typically developing children (TD). Results Multiple acoustic variables reflecting the articulatory subsystem were different in the SMI group, compared to the NSMI and TD groups. A significant speech intelligibility prediction model was obtained with all variables entered into the model (Adjusted R-squared = .801). The articulatory subsystem showed the most substantial independent contribution (58%) to speech intelligibility. Incremental R-squared analyses revealed that any single variable explained less than 9% of speech intelligibility variability. Conclusions Children in the SMI group have articulatory subsystem problems as indexed by acoustic measures. As in the adult literature, the articulatory subsystem makes the primary contribution to speech intelligibility variance in dysarthria, with minimal or no contribution from other systems. PMID:24824584

  9. Externality, Internality, and (In)Dispensability of Grammatical Features in Speech Production: Evidence from Czech Declension and Conjugation

    ERIC Educational Resources Information Center

    Bordag, Denisa; Pechmann, Thomas

    2009-01-01

    In 3 picture-word experiments, the authors explored the activation of 2 grammatical features in Czech during lexical access: declensional class of nouns and conjugational class of verbs. Experiments 1 and 2 demonstrated congruency effects of declensional and conjugational class, respectively. Picture naming times were reliably longer if the…

  10. Ion acoustic solitary waves and double layers in a plasma with two temperature electrons featuring Tsallis distribution

    SciTech Connect

    Shalini, Saini, N. S.

    2014-10-15

    The propagation properties of large amplitude ion acoustic solitary waves (IASWs) are studied in a plasma containing cold fluid ions and multi-temperature electrons (cool and hot electrons) with nonextensive distribution. Employing Sagdeev pseudopotential method, an energy balance equation has been derived and from the expression for Sagdeev potential function, ion acoustic solitary waves and double layers are investigated numerically. The Mach number (lower and upper limits) for the existence of solitary structures is determined. Positive as well as negative polarity solitary structures are observed. Further, conditions for the existence of ion acoustic double layers (IADLs) are also determined numerically in the form of the critical values of q{sub c}, f and the Mach number (M). It is observed that the nonextensivity of electrons (via q{sub c,h}), concentration of electrons (via f) and temperature ratio of cold to hot electrons (via β) significantly influence the characteristics of ion acoustic solitary waves as well as double layers.

  11. Time-forward speech intelligibility in time-reversed rooms

    PubMed Central

    Longworth-Reed, Laricia; Brandewie, Eugene; Zahorik, Pavel

    2009-01-01

    The effects of time-reversed room acoustics on word recognition abilities were examined using virtual auditory space techniques, which allowed for temporal manipulation of the room acoustics independent of the speech source signals. Two acoustical conditions were tested: one in which room acoustics were simulated in a realistic time-forward fashion and one in which the room acoustics were reversed in time, causing reverberation and acoustic reflections to precede the direct-path energy. Significant decreases in speech intelligibility—from 89% on average to less than 25%—were observed between the time-forward and time-reversed rooms. This result is not predictable using standard methods for estimating speech intelligibility based on the modulation transfer function of the room. It may instead be due to increased degradation of onset information in the speech signals when room acoustics are time-reversed. PMID:19173377

  12. System and method for investigating sub-surface features of a rock formation with acoustic sources generating conical broadcast signals

    DOEpatents

    Vu, Cung Khac; Skelt, Christopher; Nihei, Kurt; Johnson, Paul A.; Guyer, Robert; Ten Cate, James A.; Le Bas, Pierre -Yves; Larmat, Carene S.

    2015-08-18

    A method of interrogating a formation includes generating a conical acoustic signal, at a first frequency--a second conical acoustic signal at a second frequency each in the between approximately 500 Hz and 500 kHz such that the signals intersect in a desired intersection volume outside the borehole. The method further includes receiving, a difference signal returning to the borehole resulting from a non-linear mixing of the signals in a mixing zone within the intersection volume.

  13. Inherited and de novo 22q11.2 distal duplications in two patients with autistic features, speech delay and no dysmorphology

    PubMed Central

    Hantash, Feras M.; Wang, Boris T.; Owen, Renius; Ross, Leslie P.; Mahon, Loretta W.; Boyar, Fatih Z.; Anguiano, Arturo; Strom, Charles M.

    2012-01-01

    In a screen of patients by fluorescence in-situ hybridization and array comparative genomic hybridization in the past two years (July 2007--July 2009), we identified two patients with duplications in the 22q11.22-23, occurring outside the common DiGeorge syndrome/valocardiofacial syndrome region. Fluorescent in-situ hybridization, multiplex ligation-dependent probe amplification and high density bacterial artificial chromosomes and oligo arrays were used to identify the extent of the duplications. In one patient the duplication extended from LCR22-E/5 to LCR22-H/8, which is similar to recently described 22q11.2 distal duplications, while in the second patient, a de novo duplication was identified extending between LCR22-E/5 to LCR22-F/6. The second proband also harbored a de novo 15q14 duplication, complicating phenotype interpretation. The patients were affected with speech delay and autistic features, but neither reported cardiac concern or dysmorphic features.

  14. Irregular Speech Rate Dissociates Auditory Cortical Entrainment, Evoked Responses, and Frontal Alpha

    PubMed Central

    Kayser, Stephanie J.; Ince, Robin A.A.; Gross, Joachim

    2015-01-01

    The entrainment of slow rhythmic auditory cortical activity to the temporal regularities in speech is considered to be a central mechanism underlying auditory perception. Previous work has shown that entrainment is reduced when the quality of the acoustic input is degraded, but has also linked rhythmic activity at similar time scales to the encoding of temporal expectations. To understand these bottom-up and top-down contributions to rhythmic entrainment, we manipulated the temporal predictive structure of speech by parametrically altering the distribution of pauses between syllables or words, thereby rendering the local speech rate irregular while preserving intelligibility and the envelope fluctuations of the acoustic signal. Recording EEG activity in human participants, we found that this manipulation did not alter neural processes reflecting the encoding of individual sound transients, such as evoked potentials. However, the manipulation significantly reduced the fidelity of auditory delta (but not theta) band entrainment to the speech envelope. It also reduced left frontal alpha power and this alpha reduction was predictive of the reduced delta entrainment across participants. Our results show that rhythmic auditory entrainment in delta and theta bands reflect functionally distinct processes. Furthermore, they reveal that delta entrainment is under top-down control and likely reflects prefrontal processes that are sensitive to acoustical regularities rather than the bottom-up encoding of acoustic features. SIGNIFICANCE STATEMENT The entrainment of rhythmic auditory cortical activity to the speech envelope is considered to be critical for hearing. Previous work has proposed divergent views in which entrainment reflects either early evoked responses related to sound encoding or high-level processes related to expectation or cognitive selection. Using a manipulation of speech rate, we dissociated auditory entrainment at different time scales. Specifically, our

  15. Interactions between distal speech rate, linguistic knowledge, and speech environment.

    PubMed

    Morrill, Tuuli; Baese-Berk, Melissa; Heffner, Christopher; Dilley, Laura

    2015-10-01

    During lexical access, listeners use both signal-based and knowledge-based cues, and information from the linguistic context can affect the perception of acoustic speech information. Recent findings suggest that the various cues used in lexical access are implemented with flexibility and may be affected by information from the larger speech context. We conducted 2 experiments to examine effects of a signal-based cue (distal speech rate) and a knowledge-based cue (linguistic structure) on lexical perception. In Experiment 1, we manipulated distal speech rate in utterances where an acoustically ambiguous critical word was either obligatory for the utterance to be syntactically well formed (e.g., Conner knew that bread and butter (are) both in the pantry) or optional (e.g., Don must see the harbor (or) boats). In Experiment 2, we examined identical target utterances as in Experiment 1 but changed the distribution of linguistic structures in the fillers. The results of the 2 experiments demonstrate that speech rate and linguistic knowledge about critical word obligatoriness can both influence speech perception. In addition, it is possible to alter the strength of a signal-based cue by changing information in the speech environment. These results provide support for models of word segmentation that include flexible weighting of signal-based and knowledge-based cues. PMID:25794478

  16. Dynamic Encoding of Speech Sequence Probability in Human Temporal Cortex

    PubMed Central

    Leonard, Matthew K.; Bouchard, Kristofer E.; Tang, Claire

    2015-01-01

    Sensory processing involves identification of stimulus features, but also integration with the surrounding sensory and cognitive context. Previous work in animals and humans has shown fine-scale sensitivity to context in the form of learned knowledge about the statistics of the sensory environment, including relative probabilities of discrete units in a stream of sequential auditory input. These statistics are a defining characteristic of one of the most important sequential signals humans encounter: speech. For speech, extensive exposure to a language tunes listeners to the statistics of sound sequences. To address how speech sequence statistics are neurally encoded, we used high-resolution direct cortical recordings from human lateral superior temporal cortex as subjects listened to words and nonwords with varying transition probabilities between sound segments. In addition to their sensitivity to acoustic features (including contextual features, such as coarticulation), we found that neural responses dynamically encoded the language-level probability of both preceding and upcoming speech sounds. Transition probability first negatively modulated neural responses, followed by positive modulation of neural responses, consistent with coordinated predictive and retrospective recognition processes, respectively. Furthermore, transition probability encoding was different for real English words compared with nonwords, providing evidence for online interactions with high-order linguistic knowledge. These results demonstrate that sensory processing of deeply learned stimuli involves integrating physical stimulus features with their contextual sequential structure. Despite not being consciously aware of phoneme sequence statistics, listeners use this information to process spoken input and to link low-level acoustic representations with linguistic information about word identity and meaning. PMID:25948269

  17. Why Impromptu Speech Is Easy To Understand.

    ERIC Educational Resources Information Center

    Le Feal, K. Dejean

    Impromptu speech is characterized by the simultaneous processes of ideation (the elaboration and structuring of reasoning by the speaker as he improvises) and expression in the speaker. Other elements accompany this characteristic: division of speech flow into short segments, acoustic relief in the form of word stress following a pause, and both…

  18. Integrated speech enhancement for functional MRI environment.

    PubMed

    Pathak, Nishank; Milani, Ali A; Panahi, Issa; Briggs, Richard

    2009-01-01

    This paper presents an integrated speech enhancement (SE) method for the noisy MRI environment. We show that the performance of SE system improves considerably when the speech signal dominated by MRI acoustic noise at very low SNR is enhanced in two successive stages using two-channel SE methods followed by a single-channel post processing SE algorithm. Actual MRI noisy speech data are used in our experiments showing the improved performance of the proposed SE method. PMID:19964964

  19. The diagnosis value of acoustic radiation force impulse (ARFI) elastography for thyroid malignancy without highly suspicious features on conventional ultrasound

    PubMed Central

    Liu, Bo-Ji; Lu, Feng; Xu, Hui-Xiong; Guo, Le-Hang; Li, Dan-Dan; Bo, Xiao-Wan; Li, Xiao-Long; Zhang, Yi-Feng; Xu, Jun-Mei; Xu, Xiao-Hong; Qu, Shen

    2015-01-01

    Objective: The aim of this study was to evaluate the potential diagnostic performance of acoustic radiation force impulse (ARFI) elastography in identifying malignancy in nodules that do not appear highly suspicious on conventional ultrasound (US). Methods: 330 pathologically confirmed thyroid nodules (40 malignant and 290 benign; mean size, 22.0±11.6 mm) not suspicious of malignancy on conventional US in 330 patients (mean age 52.8±11.7 years) underwent ARFI elastography before surgery. ARFI elastography included qualitative ARFI-induced strain elastography (SE) and quantitative point shear wave elastography (p-SWE). ARFI-induced SE image was assessed by SE score, while p-SWE was denoted with shear wave velocity (SWV, m/s). The diagnostic performance of four criteria sets was evaluated: criteria set 1 (ARFI-induced SE), criteria set 2 (p-SWE), criteria set 3 (either set 1 or 2), criteria set 4 (both set 1 and 2). Receiver operating characteristic curve (ROC) analyses were performed to assess the diagnostic performance. Results: SE score ≥4 was more frequently found in malignant nodules (32/40) than in benign nodules (30/290, P<0.001). The mean SWV of malignant nodules (3.64±2.23 m/s) was significantly higher than that of benign nodules (2.02±0.69 m/s) (P<0.001). ARFI-induced SE (set 1) had a sensitivity of 80.0% (32/40) and a specificity of 89.7% (260/290) with a cut-off point of SE score ≥4; p-SWE (set 2) had a sensitivity of 80.0% (32/40) and a specificity of 57.9% (168/290) with a cut-off point of SWV ≥2.15 m/s. When ARFI-induced SE and p-SWE were combined, set 3 had the highest sensitivity (92.5%, 37/40) while set 4 had the highest specificity (95.2%, 276/290). Conclusion: ARFI elastography can be used for differential diagnosis of malignant thyroid nodules without highly suspicious features on US. The combination of ARFI-induced SE and p-SWE leads to improved sensitivity and specificity. PMID:26629025

  20. Breathing-Impaired Speech after Brain Haemorrhage: A Case Study

    ERIC Educational Resources Information Center

    Heselwood, Barry

    2007-01-01

    Results are presented from an auditory and acoustic analysis of the speech of an adult male with impaired prosody and articulation due to brain haemorrhage. They show marked effects on phonation, speech rate and articulator velocity, and a speech rhythm disrupted by "intrusive" stresses. These effects are discussed in relation to the speaker's…

  1. Speech for the Deaf Child: Knowledge and Use.

    ERIC Educational Resources Information Center

    Connor, Leo E., Ed.

    Presented is a collection of 16 papers on speech development, handicaps, teaching methods, and educational trends for the aurally handicapped child. Arthur Boothroyd relates acoustic phonetics to speech teaching, and Jean Utley Lehman investigates a scheme of linguistic organization. Differences in speech production by deaf and normal hearing…

  2. Brainstem transcription of speech is disrupted in children with autism spectrum disorders

    PubMed Central

    Russo, Nicole; Nicol, Trent; Trommer, Barbara; Zecker, Steve; Kraus, Nina

    2009-01-01

    Language impairment is a hallmark of autism spectrum disorders (ASD). The origin of the deficit is poorly understood although deficiencies in auditory processing have been detected in both perception and cortical encoding of speech sounds. Little is known about the processing and transcription of speech sounds at earlier (brainstem) levels or about how background noise may impact this transcription process. Unlike cortical encoding of sounds, brainstem representation preserves stimulus features with a degree of fidelity that enables a direct link between acoustic components of the speech syllable (e.g., onsets) to specific aspects of neural encoding (e.g., waves V and A). We measured brainstem responses to the syllable /da/, in quiet and background noise, in children with and without ASD. Children with ASD exhibited deficits in both the neural synchrony (timing) and phase locking (frequency encoding) of speech sounds, despite normal click-evoked brainstem responses. They also exhibited reduced magnitude and fidelity of speech-evoked responses and inordinate degradation of responses by background noise in comparison to typically developing controls. Neural synchrony in noise was significantly related to measures of core and receptive language ability. These data support the idea that abnormalities in the brainstem processing of speech contribute to the language impairment in ASD. Because it is both passively-elicited and malleable, the speech-evoked brainstem response may serve as a clinical tool to assess auditory processing as well as the effects of auditory training in the ASD population. PMID:19635083

  3. Proposal for Classifying the Severity of Speech Disorder Using a Fuzzy Model in Accordance with the Implicational Model of Feature Complexity

    ERIC Educational Resources Information Center

    Brancalioni, Ana Rita; Magnago, Karine Faverzani; Keske-Soares, Marcia

    2012-01-01

    The objective of this study is to create a new proposal for classifying the severity of speech disorders using a fuzzy model in accordance with a linguistic model that represents the speech acquisition of Brazilian Portuguese. The fuzzy linguistic model was run in the MATLAB software fuzzy toolbox from a set of fuzzy rules, and it encompassed…

  4. Speech Development

    MedlinePlus

    ... W View More… Donate Donor Spotlight Fundraising Ideas Vehicle Donation Volunteer Efforts Speech Development skip to submenu ... Lip and Palate . Bzoch (1997). Cleft Palate Speech Management: A Multidisciplinary Approach . Shprintzen, Bardach (1995). Cleft Palate: ...

  5. Speech Problems

    MedlinePlus

    ... a person's ability to speak clearly. Some Common Speech Disorders Stuttering is a problem that interferes with fluent ... is a language disorder, while stuttering is a speech disorder. A person who stutters has trouble getting out ...

  6. REGARDING THE LINE-OF-SIGHT BARYONIC ACOUSTIC FEATURE IN THE SLOAN DIGITAL SKY SURVEY AND BARYON OSCILLATION SPECTROSCOPIC SURVEY LUMINOUS RED GALAXY SAMPLES

    SciTech Connect

    Kazin, Eyal A.; Blanton, Michael R.; Scoccimarro, Roman; McBride, Cameron K.; Berlind, Andreas A.

    2010-08-20

    We analyze the line-of-sight baryonic acoustic feature in the two-point correlation function {xi} of the Sloan Digital Sky Survey luminous red galaxy (LRG) sample (0.16 < z < 0.47). By defining a narrow line-of-sight region, r{sub p} < 5.5 h {sup -1} Mpc, where r{sub p} is the transverse separation component, we measure a strong excess of clustering at {approx}110 h {sup -1} Mpc, as previously reported in the literature. We also test these results in an alternative coordinate system, by defining the line of sight as {theta} < 3{sup 0}, where {theta} is the opening angle. This clustering excess appears much stronger than the feature in the better-measured monopole. A fiducial {Lambda}CDM nonlinear model in redshift space predicts a much weaker signature. We use realistic mock catalogs to model the expected signal and noise. We find that the line-of-sight measurements can be explained well by our mocks as well as by a featureless {xi} = 0. We conclude that there is no convincing evidence that the strong clustering measurement is the line-of-sight baryonic acoustic feature. We also evaluate how detectable such a signal would be in the upcoming Baryon Oscillation Spectroscopic Survey (BOSS) LRG volume. Mock LRG catalogs (z < 0.6) suggest that (1) the narrow line-of-sight cylinder and cone defined above probably will not reveal a detectable acoustic feature in BOSS; (2) a clustering measurement as high as that in the current sample can be ruled out (or confirmed) at a high confidence level using a BOSS-sized data set; (3) an analysis with wider angular cuts, which provide better signal-to-noise ratios, can nevertheless be used to compare line-of-sight and transverse distances, and thereby constrain the expansion rate H(z) and diameter distance D{sub A}(z).

  7. VISIBLE SPEECH.

    ERIC Educational Resources Information Center

    POTTER, RALPH K.; AND OTHERS

    A CORRECTED REPUBLICATION OF THE 1947 EDITION, THE BOOK DESCRIBES A FORM OF VISIBLE SPEECH OBTAINED BY THE RECORDING OF AN ANALYSIS OF SPEECH SOMEWHAT SIMILAR TO THE ANALYSIS PERFORMED BY THE EAR. ORIGINALLY INTENDED TO PRESENT AN EXPERIMENTAL TRAINING PROGRAM IN THE READING OF VISIBLE SPEECH AND EXPANDED TO INCLUDE MATERIAL OF INTEREST TO VARIOUS…

  8. Environment-dependent denoising autoencoder for distant-talking speech recognition

    NASA Astrophysics Data System (ADS)

    Ueda, Yuma; Wang, Longbiao; Kai, Atsuhiko; Ren, Bo

    2015-12-01

    In this paper, we propose an environment-dependent denoising autoencoder (DAE) and automatic environment identification based on a deep neural network (DNN) with blind reverberation estimation for robust distant-talking speech recognition. Recently, DAEs have been shown to be effective in many noise reduction and reverberation suppression applications because higher-level representations and increased flexibility of the feature mapping function can be learned. However, a DAE is not adequate in mismatched training and test environments. In a conventional DAE, parameters are trained using pairs of reverberant speech and clean speech under various acoustic conditions (that is, an environment-independent DAE). To address the above problem, we propose two environment-dependent DAEs to reduce the influence of mismatches between training and test environments. In the first approach, we train various DAEs using speech from different acoustic environments, and the DAE for the condition that best matches the test condition is automatically selected (that is, a two-step environment-dependent DAE). To improve environment identification performance, we propose a DNN that uses both reverberant speech and estimated reverberation. In the second approach, we add estimated reverberation features to the input of the DAE (that is, a one-step environment-dependent DAE or a reverberation-aware DAE). The proposed method is evaluated using speech in simulated and real reverberant environments. Experimental results show that the environment-dependent DAE outperforms the environment-independent one in both simulated and real reverberant environments. For two-step environment-dependent DAE, the performance of environment identification based on the proposed DNN approach is also better than that of the conventional DNN approach, in which only reverberant speech is used and reverberation is not blindly estimated. And, the one-step environment-dependent DAE significantly outperforms the two

  9. Perception of Speech Reflects Optimal Use of Probabilistic Speech Cues

    ERIC Educational Resources Information Center

    Clayards, Meghan; Tanenhaus, Michael K.; Aslin, Richard N.; Jacobs, Robert A.

    2008-01-01

    Listeners are exquisitely sensitive to fine-grained acoustic detail within phonetic categories for sounds and words. Here we show that this sensitivity is optimal given the probabilistic nature of speech cues. We manipulated the probability distribution of one probabilistic cue, voice onset time (VOT), which differentiates word initial labial…

  10. Multivoxel patterns reveal functionally differentiated networks underlying auditory feedback processing of speech.

    PubMed

    Zheng, Zane Z; Vicente-Grabovetsky, Alejandro; MacDonald, Ewen N; Munhall, Kevin G; Cusack, Rhodri; Johnsrude, Ingrid S

    2013-03-01

    The everyday act of speaking involves the complex processes of speech motor control. An important component of control is monitoring, detection, and processing of errors when auditory feedback does not correspond to the intended motor gesture. Here we show, using fMRI and converging operations within a multivoxel pattern analysis framework, that this sensorimotor process is supported by functionally differentiated brain networks. During scanning, a real-time speech-tracking system was used to deliver two acoustically different types of distorted auditory feedback or unaltered feedback while human participants were vocalizing monosyllabic words, and to present the same auditory stimuli while participants were passively listening. Whole-brain analysis of neural-pattern similarity revealed three functional networks that were differentially sensitive to distorted auditory feedback during vocalization, compared with during passive listening. One network of regions appears to encode an "error signal" regardless of acoustic features of the error: this network, including right angular gyrus, right supplementary motor area, and bilateral cerebellum, yielded consistent neural patterns across acoustically different, distorted feedback types, only during articulation (not during passive listening). In contrast, a frontotemporal network appears sensitive to the speech features of auditory stimuli during passive listening; this preference for speech features was diminished when the same stimuli were presented as auditory concomitants of vocalization. A third network, showing a distinct functional pattern from the other two, appears to capture aspects of both neural response profiles. Together, our findings suggest that auditory feedback processing during speech motor control may rely on multiple, interactive, functionally differentiated neural systems. PMID:23467350

  11. Detection and Classification of Whale Acoustic Signals

    NASA Astrophysics Data System (ADS)

    Xian, Yin

    vocalization data set. The word error rate of the DCTNet feature is similar to the MFSC in speech recognition tasks, suggesting that the convolutional network is able to reveal acoustic content of speech signals.

  12. Automatic audiovisual integration in speech perception.

    PubMed

    Gentilucci, Maurizio; Cattaneo, Luigi

    2005-11-01

    Two experiments aimed to determine whether features of both the visual and acoustical inputs are always merged into the perceived representation of speech and whether this audiovisual integration is based on either cross-modal binding functions or on imitation. In a McGurk paradigm, observers were required to repeat aloud a string of phonemes uttered by an actor (acoustical presentation of phonemic string) whose mouth, in contrast, mimicked pronunciation of a different string (visual presentation). In a control experiment participants read the same printed strings of letters. This condition aimed to analyze the pattern of voice and the lip kinematics controlling for imitation. In the control experiment and in the congruent audiovisual presentation, i.e. when the articulation mouth gestures were congruent with the emission of the string of phones, the voice spectrum and the lip kinematics varied according to the pronounced strings of phonemes. In the McGurk paradigm the participants were unaware of the incongruence between visual and acoustical stimuli. The acoustical analysis of the participants' spoken responses showed three distinct patterns: the fusion of the two stimuli (the McGurk effect), repetition of the acoustically presented string of phonemes, and, less frequently, of the string of phonemes corresponding to the mouth gestures mimicked by the actor. However, the analysis of the latter two responses showed that the formant 2 of the participants' voice spectra always differed from the value recorded in the congruent audiovisual presentation. It approached the value of the formant 2 of the string of phonemes presented in the other modality, which was apparently ignored. The lip kinematics of the participants repeating the string of phonemes acoustically presented were influenced by the observation of the lip movements mimicked by the actor, but only when pronouncing a labial consonant. The data are discussed in favor of the hypothesis that features of both

  13. Proposal for classifying the severity of speech disorder using a fuzzy model in accordance with the implicational model of feature complexity.

    PubMed

    Brancalioni, Ana Rita; Magnago, Karine Faverzani; Keske-Soares, Marcia

    2012-09-01

    The objective of this study is to create a new proposal for classifying the severity of speech disorders using a fuzzy model in accordance with a linguistic model that represents the speech acquisition of Brazilian Portuguese. The fuzzy linguistic model was run in the MATLAB software fuzzy toolbox from a set of fuzzy rules, and it encompassed three input variables: path routing, level of complexity and phoneme acquisition. The output was the Speech Disorder Severity Index, and it used the following fuzzy subsets: severe, moderate severe, mild moderate and mild. The proposal was used for 204 children with speech disorders who were monolingual speakers of Brazilian Portuguese. The fuzzy linguistic model provided the Speech Disorder Severity Index for all of the evaluated phonological systems in a fast and practical manner. It was then possible to classify the systems according to the severity of the speech disorder as severe, moderate severe, mild moderate and mild; the speech disorders could also be differentiated according to the severity index. PMID:22876768

  14. Nonsensory factors in speech perception

    NASA Astrophysics Data System (ADS)

    Holt, Rachael F.; Carney, Arlene E.

    2001-05-01

    The nature of developmental differences was examined in a speech discrimination task, the change/no-change procedure, in which a varying number of speech stimuli are presented during a trial. Standard stimuli are followed by comparison stimuli that are identical to or acoustically different from the standard. Fourteen adults and 30 4- and 5-year-old children were tested with three speech contrast pairs at a variety of signal-to-noise ratios using various numbers of standard and comparison stimulus presentations. Adult speech discrimination performance followed the predictions of the multiple looks hypothesis [N. F. Viemeister and G. H. Wakefield, J. Acoust. Soc. Am. 90, 858-865 (1991)] there was an increase in d by a factor of 1.4 for a doubling in the number of standard and comparison stimulus presentations near d values of 1.0. For children, increasing the number of standard stimuli improved discrimination performance, whereas increasing the number of comparisons did not. The multiple looks hypothesis did not explain the children's data. They are explained more parsimoniously by the developmental weighting shift [Nittrouer et al., J. Acoust. Soc. Am. 101, 2253-2266 (1993)], which proposes that children attend to different aspects of speech stimuli from adults. [Work supported by NIDCD and ASHF.

  15. Embedding speech into virtual realities

    NASA Astrophysics Data System (ADS)

    Bohn, Christian-Arved; Krueger, Wolfgang

    1993-05-01

    In this work a speaker-independent speech recognition system is presented, which is suitable for implementation in Virtual Reality applications. The use of an artificial neural network in connection with a special compression of the acoustic input leads to a system, which is robust, fast, easy to use and needs no additional hardware, beside a common VR-equipment.

  16. Embedding speech into virtual realities

    NASA Technical Reports Server (NTRS)

    Bohn, Christian-Arved; Krueger, Wolfgang

    1993-01-01

    In this work a speaker-independent speech recognition system is presented, which is suitable for implementation in Virtual Reality applications. The use of an artificial neural network in connection with a special compression of the acoustic input leads to a system, which is robust, fast, easy to use and needs no additional hardware, beside a common VR-equipment.

  17. Classroom Acoustics: Understanding Barriers to Learning.

    ERIC Educational Resources Information Center

    Crandell, Carl C., Ed.; Smaldino, Joseph J., Ed.

    2001-01-01

    This booklet explores classroom acoustics and their importance on the learning potential of children with hearing loss and related disabilities. The booklet also reviews research on classroom acoustics and the need for the development of classroom acoustics standards. Chapters examine: 1) a speech-perception model demonstrating the linkage between…

  18. Neural pathways for visual speech perception

    PubMed Central

    Bernstein, Lynne E.; Liebenthal, Einat

    2014-01-01

    This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody) can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns of activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1) The visual perception of speech relies on visual pathway representations of speech qua speech. (2) A proposed site of these representations, the temporal visual speech area (TVSA) has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS). (3) Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA. PMID:25520611

  19. High-speed imaging, acoustic features, and aeroacoustic computations of jet noise from Strombolian (and Vulcanian) explosions

    NASA Astrophysics Data System (ADS)

    Taddeucci, J.; Sesterhenn, J.; Scarlato, P.; Stampka, K.; Del Bello, E.; Pena Fernandez, J. J.; Gaudin, D.

    2014-05-01

    High-speed imaging of explosive eruptions at Stromboli (Italy), Fuego (Guatemala), and Yasur (Vanuatu) volcanoes allowed visualization of pressure waves from seconds-long explosions. From the explosion jets, waves radiate with variable geometry, timing, and apparent direction and velocity. Both the explosion jets and their wave fields are replicated well by numerical simulations of supersonic jets impulsively released from a pressurized vessel. The scaled acoustic signal from one explosion at Stromboli displays a frequency pattern with an excellent match to those from the simulated jets. We conclude that both the observed waves and the audible sound from the explosions are jet noise, i.e., the typical acoustic field radiating from high-velocity jets. Volcanic jet noise was previously quantified only in the infrasonic emissions from large, sub-Plinian to Plinian eruptions. Our combined approach allows us to define the spatial and temporal evolution of audible jet noise from supersonic jets in small-scale volcanic eruptions.

  20. A fast algorithm for the phonemic segmentation of continuous speech

    NASA Astrophysics Data System (ADS)

    Smidt, D.

    1986-04-01

    The method of differential learning (DL method) was applied to the fast phonemic classification of acoustic speech spectra. The method was also tested with a simple algorithm for continuous speech recognition. In every learning step of the DL method only that single pattern component which deviates most from the reference value is used for a new rule. Several rules of this type were connected in a conjunctive or disjunctive way. Tests with a single speaker demonstrate good classification capability and a very high speed. The inclusion of automatically additional features selected according to their relevance is discussed. It is shown that there exists a correspondence between processes related to the DL method and pattern recognition in living beings with their ability for generalization and differentiation.

  1. Sparse representation in speech signal processing

    NASA Astrophysics Data System (ADS)

    Lee, Te-Won; Jang, Gil-Jin; Kwon, Oh-Wook

    2003-11-01

    We review the sparse representation principle for processing speech signals. A transformation for encoding the speech signals is learned such that the resulting coefficients are as independent as possible. We use independent component analysis with an exponential prior to learn a statistical representation for speech signals. This representation leads to extremely sparse priors that can be used for encoding speech signals for a variety of purposes. We review applications of this method for speech feature extraction, automatic speech recognition and speaker identification. Furthermore, this method is also suited for tackling the difficult problem of separating two sounds given only a single microphone.

  2. Applications for Subvocal Speech

    NASA Technical Reports Server (NTRS)

    Jorgensen, Charles; Betts, Bradley

    2007-01-01

    A research and development effort now underway is directed toward the use of subvocal speech for communication in settings in which (1) acoustic noise could interfere excessively with ordinary vocal communication and/or (2) acoustic silence or secrecy of communication is required. By "subvocal speech" is meant sub-audible electromyographic (EMG) signals, associated with speech, that are acquired from the surface of the larynx and lingual areas of the throat. Topics addressed in this effort include recognition of the sub-vocal EMG signals that represent specific original words or phrases; transformation (including encoding and/or enciphering) of the signals into forms that are less vulnerable to distortion, degradation, and/or interception; and reconstruction of the original words or phrases at the receiving end of a communication link. Potential applications include ordinary verbal communications among hazardous- material-cleanup workers in protective suits, workers in noisy environments, divers, and firefighters, and secret communications among law-enforcement officers and military personnel in combat and other confrontational situations.

  3. 40Hz-Transcranial alternating current stimulation (tACS) selectively modulates speech perception.

    PubMed

    Rufener, Katharina S; Zaehle, Tino; Oechslin, Mathias S; Meyer, Martin

    2016-03-01

    The present study investigated the functional relevance of gamma oscillations for the processing of rapidly changing acoustic features in speech signals. For this purpose we analyzed repetition-induced perceptual learning effects in 18 healthy adult participants. The participants received either 6Hz or 40Hz tACS over the bilateral auditory cortex, while repeatedly performing a phoneme categorization task. In result, we found that 40Hz tACS led to a specific alteration in repetition-induced perceptual learning. While participants in the non-stimulated control group as well as those in the experimental group receiving 6Hz tACS considerably improved their perceptual performance, the application of 40Hz tACS selectively attenuated the repetition-induced improvement in phoneme categorization abilities. Our data provide causal evidence for a functional relevance of gamma oscillations during the perceptual learning of acoustic speech features. Moreover, we demonstrate that even less than twenty minutes of alternating current stimulation below the individual perceptual threshold is sufficient to affect speech perception. This finding is relevant in that this novel approach might have implications with respect to impaired speech processing in dyslexics and older adults. PMID:26779822

  4. Experimental comparison between speech transmission index, rapid speech transmission index, and speech intelligibility index.

    PubMed

    Larm, Petra; Hongisto, Valtteri

    2006-02-01

    During the acoustical design of, e.g., auditoria or open-plan offices, it is important to know how speech can be perceived in various parts of the room. Different objective methods have been developed to measure and predict speech intelligibility, and these have been extensively used in various spaces. In this study, two such methods were compared, the speech transmission index (STI) and the speech intelligibility index (SII). Also the simplification of the STI, the room acoustics speech transmission index (RASTI), was considered. These quantities are all based on determining an apparent speech-to-noise ratio on selected frequency bands and summing them using a specific weighting. For comparison, some data were needed on the possible differences of these methods resulting from the calculation scheme and also measuring equipment. Their prediction accuracy was also of interest. Measurements were made in a laboratory having adjustable noise level and absorption, and in a real auditorium. It was found that the measurement equipment, especially the selection of the loudspeaker, can greatly affect the accuracy of the results. The prediction accuracy of the RASTI was found acceptable, if the input values for the prediction are accurately known, even though the studied space was not ideally diffuse. PMID:16521772

  5. Subjective and objective assessments in classrooms following acoustical renovation

    NASA Astrophysics Data System (ADS)

    Astolfi, Arianna

    2005-04-01

    The effectiveness of an expensive acoustical intervention in an old Italian high school building has been assessed in this work. The school building has fifty classrooms, the majority of which were acoustically renovated. A subjective survey and measurements were performed in both the renovated and non renovated classrooms. With the competence of some psychologists from Turin University, a questionnaire was set up for the subjective analysis. The questionnaire, validated after numerous pilot tests, was submitted to all the students and the teachers in two periods of the year. The questions on acoustical features included questions on annoyance from room noise, reverberation, speech comprehension, overall acoustical satisfaction and the consequences of bad acoustical conditions. Apart from the acoustics, other aspects of environmental quality, such as the thermal and visual environmental features and IAQ were investigated. The statistical analysis of the subjective answers allowed aggregated information to be obtained on the users and different data to be correlated. The aim of the statistical correlation was to determine any significant relationships between the objective and subjective data, and between the overall satisfaction scores and the different environmental factors. The effects of bad environmental conditions and their influence on learning capacity were also examined.

  6. Watch what you say, your computer might be listening: A review of automated speech recognition

    NASA Technical Reports Server (NTRS)

    Degennaro, Stephen V.

    1991-01-01

    Spoken language is the most convenient and natural means by which people interact with each other and is, therefore, a promising candidate for human-machine interactions. Speech also offers an additional channel for hands-busy applications, complementing the use of motor output channels for control. Current speech recognition systems vary considerably across a number of important characteristics, including vocabulary size, speaking mode, training requirements for new speakers, robustness to acoustic environments, and accuracy. Algorithmically, these systems range from rule-based techniques through more probabilistic or self-learning approaches such as hidden Markov modeling and neural networks. This tutorial begins with a brief summary of the relevant features of current speech recognition systems and the strengths and weaknesses of the various algorithmic approaches.

  7. Discovering words in fluent speech: the contribution of two kinds of statistical information.

    PubMed

    Thiessen, Erik D; Erickson, Lucy C

    2012-01-01

    To efficiently segment fluent speech, infants must discover the predominant phonological form of words in the native language. In English, for example, content words typically begin with a stressed syllable. To discover this regularity, infants need to identify a set of words. We propose that statistical learning plays two roles in this process. First, it provides a cue that allows infants to segment words from fluent speech, even without language-specific phonological knowledge. Second, once infants have identified a set of lexical forms, they can learn from the distribution of acoustic features across those word forms. The current experiments demonstrate both processes are available to 5-month-old infants. This demonstration of sensitivity to statistical structure in speech, weighted more heavily than phonological cues to segmentation at an early age, is consistent with theoretical accounts that claim statistical learning plays a role in helping infants to adapt to the structure of their native language from very early in life. PMID:23335903

  8. Deletion of 4.4 Mb at 2q33.2q33.3 May Cause Growth Deficiency in a Patient with Mental Retardation, Facial Dysmorphic Features and Speech Delay.

    PubMed

    Papoulidis, Ioannis; Paspaliaris, Vassilis; Papageorgiou, Elena; Siomou, Elissavet; Dagklis, Themistoklis; Sotiriou, Sotirios; Thomaidis, Loretta; Manolakos, Emmanouil

    2015-01-01

    A patient with a rare interstitial deletion of chromosomal band 2q33.2q33.3 is described. The clinical features resembled the 2q33.1 microdeletion syndrome (Glass syndrome), including mental retardation, facial dysmorphism, high-arched narrow palate, growth deficiency, and speech delay. The chromosomal aberration was characterized by whole genome BAC aCGH. A comparison of the current patient and Glass syndrome features revealed that this case displayed a relatively mild phenotype. Overall, it is suggested that the deleted region of 2q33 causative for Glass syndrome may be larger than initially suggested. PMID:25925190

  9. Virtual acoustics displays

    NASA Technical Reports Server (NTRS)

    Wenzel, Elizabeth M.; Fisher, Scott S.; Stone, Philip K.; Foster, Scott H.

    1991-01-01

    The real time acoustic display capabilities are described which were developed for the Virtual Environment Workstation (VIEW) Project at NASA-Ames. The acoustic display is capable of generating localized acoustic cues in real time over headphones. An auditory symbology, a related collection of representational auditory 'objects' or 'icons', can be designed using ACE (Auditory Cue Editor), which links both discrete and continuously varying acoustic parameters with information or events in the display. During a given display scenario, the symbology can be dynamically coordinated in real time with 3-D visual objects, speech, and gestural displays. The types of displays feasible with the system range from simple warnings and alarms to the acoustic representation of multidimensional data or events.

  10. Speech disorders reflect differing pathophysiology in Parkinson's disease, progressive supranuclear palsy and multiple system atrophy.

    PubMed

    Rusz, Jan; Bonnet, Cecilia; Klempíř, Jiří; Tykalová, Tereza; Baborová, Eva; Novotný, Michal; Rulseh, Aaron; Růžička, Evžen

    2015-01-01

    Although speech disorder is frequently an early and prominent clinical feature of Parkinson's disease (PD) as well as atypical parkinsonian syndromes (APS) such as progressive supranuclear palsy (PSP) and multiple system atrophy (MSA), there is a lack of objective and quantitative evidence to verify whether any specific speech characteristics allow differentiation between PD, PSP and MSA. Speech samples were acquired from 77 subjects including 15 PD, 12 PSP, 13 MSA and 37 healthy controls. The accurate differential diagnosis of dysarthria subtypes was based on the quantitative acoustic analysis of 16 speech dimensions. Dysarthria was uniformly present in all parkinsonian patients but was more severe in PSP and MSA than in PD. Whilst PD speakers manifested pure hypokinetic dysarthria, ataxic components were more affected in MSA whilst PSP subjects demonstrated severe deficits in hypokinetic and spastic elements of dysarthria. Dysarthria in PSP was dominated by increased dysfluency, decreased slow rate, inappropriate silences, deficits in vowel articulation and harsh voice quality whereas MSA by pitch fluctuations, excess intensity variations, prolonged phonemes, vocal tremor and strained-strangled voice quality. Objective speech measurements were able to discriminate between APS and PD with 95% accuracy and between PSP and MSA with 75% accuracy. Dysarthria severity in APS was related to overall disease severity (r = 0.54, p = 0.006). Dysarthria with various combinations of hypokinetic, spastic and ataxic components reflects differing pathophysiology in PD, PSP and MSA. Thus, motor speech examination may provide useful information in the evaluation of these diseases with similar manifestations. PMID:25683763

  11. Prediction and constraint in audiovisual speech perception.

    PubMed

    Peelle, Jonathan E; Sommers, Mitchell S

    2015-07-01

    During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing the precision of prediction. Electrophysiological studies demonstrate that oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to acoustic information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration

  12. Headphone localization of speech

    NASA Technical Reports Server (NTRS)

    Begault, Durand R.; Wenzel, Elizabeth M.

    1993-01-01

    Three-dimensional acoustic display systems have recently been developed that synthesize virtual sound sources over headphones based on filtering by head-related transfer functions (HRTFs), the direction-dependent spectral changes caused primarily by the pinnae. In this study, 11 inexperienced subjects judged the apparent spatial location of headphone-presented speech stimuli filtered with nonindividualized HRTFs. About half of the subjects 'pulled' their judgments toward either the median or the lateral-vertical planes, and estimates were almost always elevated. Individual differences were pronounced for the distance judgments; 15 to 46 percent of stimuli were heard inside the head, with the shortest estimates near the median plane. The results suggest that most listeners can obtain useful azimuth information from speech stimuli filtered by nonindividualized HRTFs. Measurements of localization error and reversal rates are comparable with a previous study that used broadband noise stimuli.

  13. Acoustic Target Location and Scattering Feature identification for a solid cylinder utilizing reversible Synthetic Aperture Sonar filtering

    NASA Astrophysics Data System (ADS)

    Eastland, Grant; Marston, Timothy; Marston, Philip

    2010-10-01

    Understanding the scattering features of proud and partially exposed cylinders is relevant to understanding the high frequency scattering by a variety of simple targets. We performed various experiments where partial exposure was studied by lowering a solid aluminum cylinder through a flat free surface into a tank of water insonified at grazing incidence with short pulses to identify different features while monitoring evolution of the scattering as a function of the amount of exposure. The present investigation also allows for the recording of bistatic scattering and reversible filtering based on a form of synthetic aperture sonar (SAS). The slope of the feature timing, derived using ray theory, expressed by the derivative dt/dh where t is the measured time of the feature, depends on the feature type as well as the source and receiver grazing angles. Free surface interactions for features revealed by the slopes are accurately identified using reversible SAS filtering.

  14. Auditory-Perceptual Learning Improves Speech Motor Adaptation in Children

    PubMed Central

    Shiller, Douglas M.; Rochon, Marie-Lyne

    2015-01-01

    Auditory feedback plays an important role in children’s speech development by providing the child with information about speech outcomes that is used to learn and fine-tune speech motor plans. The use of auditory feedback in speech motor learning has been extensively studied in adults by examining oral motor responses to manipulations of auditory feedback during speech production. Children are also capable of adapting speech motor patterns to perceived changes in auditory feedback, however it is not known whether their capacity for motor learning is limited by immature auditory-perceptual abilities. Here, the link between speech perceptual ability and the capacity for motor learning was explored in two groups of 5–7-year-old children who underwent a period of auditory perceptual training followed by tests of speech motor adaptation to altered auditory feedback. One group received perceptual training on a speech acoustic property relevant to the motor task while a control group received perceptual training on an irrelevant speech contrast. Learned perceptual improvements led to an enhancement in speech motor adaptation (proportional to the perceptual change) only for the experimental group. The results indicate that children’s ability to perceive relevant speech acoustic properties has a direct influence on their capacity for sensory-based speech motor adaptation. PMID:24842067

  15. Improving on hidden Markov models: An articulatorily constrained, maximum likelihood approach to speech recognition and speech coding

    SciTech Connect

    Hogden, J.

    1996-11-05

    The goal of the proposed research is to test a statistical model of speech recognition that incorporates the knowledge that speech is produced by relatively slow motions of the tongue, lips, and other speech articulators. This model is called Maximum Likelihood Continuity Mapping (Malcom). Many speech researchers believe that by using constraints imposed by articulator motions, we can improve or replace the current hidden Markov model based speech recognition algorithms. Unfortunately, previous efforts to incorporate information about articulation into speech recognition algorithms have suffered because (1) slight inaccuracies in our knowledge or the formulation of our knowledge about articulation may decrease recognition performance, (2) small changes in the assumptions underlying models of speech production can lead to large changes in the speech derived from the models, and (3) collecting measurements of human articulator positions in sufficient quantity for training a speech recognition algorithm is still impractical. The most interesting (and in fact, unique) quality of Malcom is that, even though Malcom makes use of a mapping between acoustics and articulation, Malcom can be trained to recognize speech using only acoustic data. By learning the mapping between acoustics and articulation using only acoustic data, Malcom avoids the difficulties involved in collecting articulator position measurements and does not require an articulatory synthesizer model to estimate the mapping between vocal tract shapes and speech acoustics. Preliminary experiments that demonstrate that Malcom can learn the mapping between acoustics and articulation are discussed. Potential applications of Malcom aside from speech recognition are also discussed. Finally, specific deliverables resulting from the proposed research are described.

  16. An Acoustic Study of the Relationships among Neurologic Disease, Dysarthria Type, and Severity of Dysarthria

    ERIC Educational Resources Information Center

    Kim, Yunjung; Kent, Raymond D.; Weismer, Gary

    2011-01-01

    Purpose: This study examined acoustic predictors of speech intelligibility in speakers with several types of dysarthria secondary to different diseases and conducted classification analysis solely by acoustic measures according to 3 variables (disease, speech severity, and dysarthria type). Method: Speech recordings from 107 speakers with…

  17. Articulatory-to-Acoustic Relations in Response to Speaking Rate and Loudness Manipulations

    ERIC Educational Resources Information Center

    Mefferd, Antje S.; Green, Jordan R.

    2010-01-01

    Purpose: In this investigation, the authors determined the strength of association between tongue kinematic and speech acoustics changes in response to speaking rate and loudness manipulations. Performance changes in the kinematic and acoustic domains were measured using two aspects of speech production presumably affecting speech clarity:…

  18. Speech Communication.

    ERIC Educational Resources Information Center

    Anderson, Betty

    The communications approach to teaching speech to high school students views speech as the study of the communication process in order to develop an awareness of and a sensitivity to the variables that affect human interaction. In using this approach the student is encouraged to try out as many types of messages using as many techniques and…

  19. Speech Aids

    NASA Technical Reports Server (NTRS)

    1987-01-01

    Designed to assist deaf and hearing impaired-persons in achieving better speech, Resnick Worldwide Inc.'s device provides a visual means of cuing the deaf as a speech-improvement measure. This is done by electronically processing the subjects' sounds and comparing them with optimum values which are displayed for comparison.

  20. Symbolic Speech

    ERIC Educational Resources Information Center

    Podgor, Ellen S.

    1976-01-01

    The concept of symbolic speech emanates from the 1967 case of United States v. O'Brien. These discussions of flag desecration, grooming and dress codes, nude entertainment, buttons and badges, and musical expression show that the courts place symbolic speech in different strata from verbal communication. (LBH)

  1. Some behavioral and neurobiological constraints on theories of audiovisual speech integration: a review and suggestions for new directions.

    PubMed

    Altieri, Nicholas; Pisoni, David B; Townsend, James T

    2011-01-01

    Summerfield (1987) proposed several accounts of audiovisual speech perception, a field of research that has burgeoned in recent years. The proposed accounts included the integration of discrete phonetic features, vectors describing the values of independent acoustical and optical parameters, the filter function of the vocal tract, and articulatory dynamics of the vocal tract. The latter two accounts assume that the representations of audiovisual speech perception are based on abstract gestures, while the former two assume that the representations consist of symbolic or featural information obtained from visual and auditory modalities. Recent converging evidence from several different disciplines reveals that the general framework of Summerfield's feature-based theories should be expanded. An updated framework building upon the feature-based theories is presented. We propose a processing model arguing that auditory and visual brain circuits provide facilitatory information when the inputs are correctly timed, and that auditory and visual speech representations do not necessarily undergo translation into a common code during information processing. Future research on multisensory processing in speech perception should investigate the connections between auditory and visual brain regions, and utilize dynamic modeling tools to further understand the timing and information processing mechanisms involved in audiovisual speech integration. PMID:21968081

  2. Some Behavioral and Neurobiological Constraints on Theories of Audiovisual Speech Integration: A Review and Suggestions for New Directions

    PubMed Central

    Altieri, Nicholas; Pisoni, David B.; Townsend, James T.

    2012-01-01

    Summerfield (1987) proposed several accounts of audiovisual speech perception, a field of research that has burgeoned in recent years. The proposed accounts included the integration of discrete phonetic features, vectors describing the values of independent acoustical and optical parameters, the filter function of the vocal tract, and articulatory dynamics of the vocal tract. The latter two accounts assume that the representations of audiovisual speech perception are based on abstract gestures, while the former two assume that the representations consist of symbolic or featural information obtained from visual and auditory modalities. Recent converging evidence from several different disciplines reveals that the general framework of Summerfield’s feature-based theories should be expanded. An updated framework building upon the feature-based theories is presented. We propose a processing model arguing that auditory and visual brain circuits provide facilitatory information when the inputs are correctly timed, and that auditory and visual speech representations do not necessarily undergo translation into a common code during information processing. Future research on multisensory processing in speech perception should investigate the connections between auditory and visual brain regions, and utilize dynamic modeling tools to further understand the timing and information processing mechanisms involved in audiovisual speech integration. PMID:21968081

  3. Speech coding

    SciTech Connect

    Ravishankar, C., Hughes Network Systems, Germantown, MD

    1998-05-08

    Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the

  4. Speech-cue transmission by an algorithm to increase consonant recognition in noise for hearing-impaired listeners

    PubMed Central

    Healy, Eric W.; Yoho, Sarah E.; Wang, Yuxuan; Apoux, Frédéric; Wang, DeLiang

    2014-01-01

    Consonant recognition was assessed following extraction of speech from noise using a more efficient version of the speech-segregation algorithm described in Healy, Yoho, Wang, and Wang [(2013) J. Acoust. Soc. Am. 134, 3029–3038]. Substantial increases in recognition were observed following algorithm processing, which were significantly larger for hearing-impaired (HI) than for normal-hearing (NH) listeners in both speech-shaped noise and babble backgrounds. As observed previously for sentence recognition, older HI listeners having access to the algorithm performed as well or better than young NH listeners in conditions of identical noise. It was also found that the binary masks estimated by the algorithm transmitted speech features to listeners in a fashion highly similar to that of the ideal binary mask (IBM), suggesting that the algorithm is estimating the IBM with substantial accuracy. Further, the speech features associated with voicing, manner of articulation, and place of articulation were all transmitted with relative uniformity and at relatively high levels, indicating that the algorithm and the IBM transmit speech cues without obvious deficiency. Because the current implementation of the algorithm is much more efficient, it should be more amenable to real-time implementation in devices such as hearing aids and cochlear implants. PMID:25480077

  5. Speech-cue transmission by an algorithm to increase consonant recognition in noise for hearing-impaired listeners.

    PubMed

    Healy, Eric W; Yoho, Sarah E; Wang, Yuxuan; Apoux, Frédéric; Wang, DeLiang

    2014-12-01

    Consonant recognition was assessed following extraction of speech from noise using a more efficient version of the speech-segregation algorithm described in Healy, Yoho, Wang, and Wang [(2013) J. Acoust. Soc. Am. 134, 3029-3038]. Substantial increases in recognition were observed following algorithm processing, which were significantly larger for hearing-impaired (HI) than for normal-hearing (NH) listeners in both speech-shaped noise and babble backgrounds. As observed previously for sentence recognition, older HI listeners having access to the algorithm performed as well or better than young NH listeners in conditions of identical noise. It was also found that the binary masks estimated by the algorithm transmitted speech features to listeners in a fashion highly similar to that of the ideal binary mask (IBM), suggesting that the algorithm is estimating the IBM with substantial accuracy. Further, the speech features associated with voicing, manner of articulation, and place of articulation were all transmitted with relative uniformity and at relatively high levels, indicating that the algorithm and the IBM transmit speech cues without obvious deficiency. Because the current implementation of the algorithm is much more efficient, it should be more amenable to real-time implementation in devices such as hearing aids and cochlear implants. PMID:25480077

  6. Critique: auditory form and gestural topology in the perception of speech.

    PubMed

    Remez, R E

    1996-03-01

    Some influential accounts of speech perception have asserted that the goal of perception is to recover the articulatory gestures that create the acoustic signal, while others have proposed that speech perception proceeds by a method of acoustic categorization of signal elements. These accounts have been frustrated by difficulties in identifying a set of primitive articulatory constituents underlying speech production, and a set of primitive acoustic-auditory elements underlying speech perception. An argument by Lindblom favors an account of production and perception based on the auditory form of speech and its cognitive elaboration, rejecting the aim of defining a set of articulatory primitives by appealing to theoretical principle, while recognizing the empirical difficulty of identifying a set of acoustic or auditory primitives. An examination of this thesis found opportunities to defend some of its conclusions with independent evidence, but favors a characterization of the constituents of speech perception as linguistic rather than as articulatory or acoustic. PMID:8964930

  7. Phrase-level speech simulation with an airway modulation model of speech production

    PubMed Central

    Story, Brad H.

    2012-01-01

    Artificial talkers and speech synthesis systems have long been used as a means of understanding both speech production and speech perception. The development of an airway modulation model is described that simulates the time-varying changes of the glottis and vocal tract, as well as acoustic wave propagation, during speech production. The result is a type of artificial talker that can be used to study various aspects of how sound is generated by humans and how that sound is perceived by a listener. The primary components of the model are introduced and simulation of words and phrases are demonstrated. PMID:23503742

  8. The Rhythm of Perception: Acoustic Rhythmic Entrainment Induces Subsequent Perceptual Oscillation

    PubMed Central

    Hickok, Gregory; Farahbod, Haleh; Saberi, Kourosh

    2015-01-01

    Acoustic rhythms are pervasive in speech, music, and environmental sounds. Evidence for neural codes representing periodic information has recently emerged, which seem a likely neural basis for the ability to detect rhythm and rhythmic information has been found to modulate auditory system excitability, providing a potential mechanism for parsing the acoustic stream. Here we explore the effects of a previous rhythmic stimulus on subsequent auditory perception. We found that a low-frequency (3 Hz) amplitute modulated signal induces a subsequent oscillation of perceptual detectability of a brief non-periodic acoustic stimulus (1 kHz tone); the frequency but not phase of the perceptual oscillation matches the entrained stimulus-driven rhythmic oscillation. This provides evidence that rhythmic contexts have a direct influence on subsequent auditory perception of discrete acoustic events. Rhythm coding is likely a fundamental feature of auditory system design that predates the development of explicit human enjoyment of rhythm in music or poetry. PMID:25968248

  9. The Rhythm of Perception: Entrainment to Acoustic Rhythms Induces Subsequent Perceptual Oscillation.

    PubMed

    Hickok, Gregory; Farahbod, Haleh; Saberi, Kourosh

    2015-07-01

    Acoustic rhythms are pervasive in speech, music, and environmental sounds. Recent evidence for neural codes representing periodic information suggests that they may be a neural basis for the ability to detect rhythm. Further, rhythmic information has been found to modulate auditory-system excitability, which provides a potential mechanism for parsing the acoustic stream. Here, we explored the effects of a rhythmic stimulus on subsequent auditory perception. We found that a low-frequency (3 Hz), amplitude-modulated signal induces a subsequent oscillation of the perceptual detectability of a brief nonperiodic acoustic stimulus (1-kHz tone); the frequency but not the phase of the perceptual oscillation matches the entrained stimulus-driven rhythmic oscillation. This provides evidence that rhythmic contexts have a direct influence on subsequent auditory perception of discrete acoustic events. Rhythm coding is likely a fundamental feature of auditory-system design that predates the development of explicit human enjoyment of rhythm in music or poetry. PMID:25968248

  10. Variability in English vowels is comparable in articulation and acoustics

    PubMed Central

    Noiray, Aude; Iskarous, Khalil; Whalen, D. H.

    2014-01-01

    The nature of the links between speech production and perception has been the subject of longstanding debate. The present study investigated the articulatory parameter of tongue height and the acoustic F1-F0 difference for the phonological distinction of vowel height in American English front vowels. Multiple repetitions of /i, ɪ, e, ε, æ/ in [(h)Vd] sequences were recorded in seven adult speakers. Articulatory (ultrasound) and acoustic data were collected simultaneously to provide a direct comparison of variability in vowel production in both domains. Results showed idiosyncratic patterns of articulation for contrasting the three front vowel pairs /i-ɪ/, /e-ε/ and /ε-æ/ across subjects, with the degree of variability in vowel articulation comparable to that observed in the acoustics for all seven participants. However, contrary to what was expected, some speakers showed reversals for tongue height for /ɪ/-/e/ that was also reflected in acoustics with F1 higher for /ɪ/ than for /e/. The data suggest the phonological distinction of height is conveyed via speaker-specific articulatory-acoustic patterns that do not strictly match features descriptions. However, the acoustic signal is faithful to the articulatory configuration that generated it, carrying the crucial information for perceptual contrast. PMID:25101144

  11. Cascading Influences on the Production of Speech: Evidence from Articulation✩

    PubMed Central

    McMillan, Corey T.

    2010-01-01

    Recent investigations have supported the suggestion that phonological speech errors may reflect the simultaneous activation of more than one phonemic representation. This presents a challenge for speech error evidence which is based on the assumption of well-formedness, because we may continue to perceive well-formed errors, even when they are not produced. To address this issue, we present two tongue-twister experiments in which the articulation of onset consonants is quantified and compared to baseline measures from cases where there is no phonemic competition. We report three measure of articulatory variability: changes in tongue-to-palate contact using electropalatography (EPG, Experiment 1), changes in midsagittal spline of the tongue using ultrasound (Experiment 2), and acoustic changes manifested as voice-onset-time (VOT). These three sources provide converging evidence that articulatory variability increases when competing onsets differ by one phonological feature, but the increase is attenuated when onsets differ by two features. This finding provides clear evidence, based solely on production, that the articulation of phonemes is influenced by cascading activation from the speech plan. PMID:20947071

  12. Improving the speech intelligibility in classrooms

    NASA Astrophysics Data System (ADS)

    Lam, Choi Ling Coriolanus

    One of the major acoustical concerns in classrooms is the establishment of effective verbal communication between teachers and students. Non-optimal acoustical conditions, resulting in reduced verbal communication, can cause two main problems. First, they can lead to reduce learning efficiency. Second, they can also cause fatigue, stress, vocal strain and health problems, such as headaches and sore throats, among teachers who are forced to compensate for poor acoustical conditions by raising their voices. Besides, inadequate acoustical conditions can induce the usage of public address system. Improper usage of such amplifiers or loudspeakers can lead to impairment of students' hearing systems. The social costs of poor classroom acoustics will be large to impair the learning of children. This invisible problem has far reaching implications for learning, but is easily solved. Many researches have been carried out that they have accurately and concisely summarized the research findings on classrooms acoustics. Though, there is still a number of challenging questions remaining unanswered. Most objective indices for speech intelligibility are essentially based on studies of western languages. Even several studies of tonal languages as Mandarin have been conducted, there is much less on Cantonese. In this research, measurements have been done in unoccupied rooms to investigate the acoustical parameters and characteristics of the classrooms. The speech intelligibility tests, which based on English, Mandarin and Cantonese, and the survey were carried out on students aged from 5 years old to 22 years old. It aims to investigate the differences in intelligibility between English, Mandarin and Cantonese of the classrooms in Hong Kong. The significance on speech transmission index (STI) related to Phonetically Balanced (PB) word scores will further be developed. Together with developed empirical relationship between the speech intelligibility in classrooms with the variations

  13. Some articulatory details of emotional speech

    NASA Astrophysics Data System (ADS)

    Lee, Sungbok; Yildirim, Serdar; Bulut, Murtaza; Kazemzadeh, Abe; Narayanan, Shrikanth

    2005-09-01

    Differences in speech articulation among four emotion types, neutral, anger, sadness, and happiness are investigated by analyzing tongue tip, jaw, and lip movement data collected from one male and one female speaker of American English. The data were collected using an electromagnetic articulography (EMA) system while subjects produce simulated emotional speech. Pitch, root-mean-square (rms) energy and the first three formants were estimated for vowel segments. For both speakers, angry speech exhibited the largest rms energy and largest articulatory activity in terms of displacement range and movement speed. Happy speech is characterized by largest pitch variability. It has higher rms energy than neutral speech but articulatory activity is rather comparable to, or less than, neutral speech. That is, happy speech is more prominent in voicing activity than in articulation. Sad speech exhibits longest sentence duration and lower rms energy. However, its articulatory activity is no less than neutral speech. Interestingly, for the male speaker, articulation for vowels in sad speech is consistently more peripheral (i.e., more forwarded displacements) when compared to other emotions. However, this does not hold for female subject. These and other results will be discussed in detail with associated acoustics and perceived emotional qualities. [Work supported by NIH.

  14. Acoustic biosensors

    PubMed Central

    Fogel, Ronen; Seshia, Ashwin A.

    2016-01-01

    Resonant and acoustic wave devices have been researched for several decades for application in the gravimetric sensing of a variety of biological and chemical analytes. These devices operate by coupling the measurand (e.g. analyte adsorption) as a modulation in the physical properties of the acoustic wave (e.g. resonant frequency, acoustic velocity, dissipation) that can then be correlated with the amount of adsorbed analyte. These devices can also be miniaturized with advantages in terms of cost, size and scalability, as well as potential additional features including integration with microfluidics and electronics, scaled sensitivities associated with smaller dimensions and higher operational frequencies, the ability to multiplex detection across arrays of hundreds of devices embedded in a single chip, increased throughput and the ability to interrogate a wider range of modes including within the same device. Additionally, device fabrication is often compatible with semiconductor volume batch manufacturing techniques enabling cost scalability and a high degree of precision and reproducibility in the manufacturing process. Integration with microfluidics handling also enables suitable sample pre-processing/separation/purification/amplification steps that could improve selectivity and the overall signal-to-noise ratio. Three device types are reviewed here: (i) bulk acoustic wave sensors, (ii) surface acoustic wave sensors, and (iii) micro/nano-electromechanical system (MEMS/NEMS) sensors. PMID:27365040

  15. Acoustic biosensors.

    PubMed

    Fogel, Ronen; Limson, Janice; Seshia, Ashwin A

    2016-06-30

    Resonant and acoustic wave devices have been researched for several decades for application in the gravimetric sensing of a variety of biological and chemical analytes. These devices operate by coupling the measurand (e.g. analyte adsorption) as a modulation in the physical properties of the acoustic wave (e.g. resonant frequency, acoustic velocity, dissipation) that can then be correlated with the amount of adsorbed analyte. These devices can also be miniaturized with advantages in terms of cost, size and scalability, as well as potential additional features including integration with microfluidics and electronics, scaled sensitivities associated with smaller dimensions and higher operational frequencies, the ability to multiplex detection across arrays of hundreds of devices embedded in a single chip, increased throughput and the ability to interrogate a wider range of modes including within the same device. Additionally, device fabrication is often compatible with semiconductor volume batch manufacturing techniques enabling cost scalability and a high degree of precision and reproducibility in the manufacturing process. Integration with microfluidics handling also enables suitable sample pre-processing/separation/purification/amplification steps that could improve selectivity and the overall signal-to-noise ratio. Three device types are reviewed here: (i) bulk acoustic wave sensors, (ii) surface acoustic wave sensors, and (iii) micro/nano-electromechanical system (MEMS/NEMS) sensors. PMID:27365040

  16. A t(5;16)(p15.32;q23.3) generating 16q23.3 --> qter duplication and 5p15.32 --> pter deletion in two siblings with mental retardation, dysmorphic features, and speech delay.

    PubMed

    Hellani, Ali; Mohamed, Sarar; Al-Akoum, Siham; Bosley, Thomas M; Abu-Amero, Khaled K

    2010-06-01

    We report on two siblings (half brothers on the paternal side) with a syndrome consisting of delayed development, cardiac anomalies, chest deformity, hip rotation, metatarsus adductus, genital hypoplasia, dysmorphic face, depressed nasal bridge, mental retardation, and speech delay. All metaphases examined showed a normal karyotype in the patients, their father, and both mothers. High-resolution array CGH examination revealed a 16q (6 Mb) duplication dup(16)(16q23.3 --> 16qter) and a 5p (0.97 Mb) terminal deletion del(5)(p15.32 --> pter) in both affected boys but not their healthy siblings or parents. Interphase fluorescence in situ hybridization (FISH) confirmed both the 16q duplicated region and the 5p terminal deletion. Clinical abnormalities in the patients included thin upper lip, clinodactyly, and foot deformity, which were reported previously with duplications in 16q23.3. Pectus excavatum, hip rotation, metatarsus adductus, umbilical hernia, brachycephaly, and esotropia were not reported previously in chromosome 16q duplications but may be features that occur intermittently. The 5p deleted region has been associated previously only with speech delay, which was present in both patients. These patients display certain phenotypic characteristics not reported previously in 16q duplication and confirm 5p terminal deletion as an important chromosome anomaly for speech delay. PMID:20503335

  17. Measures to Evaluate the Effects of DBS on Speech Production

    PubMed Central

    Weismer, Gary; Yunusova, Yana; Bunton, Kate

    2011-01-01

    The purpose of this paper is to review and evaluate measures of speech production that could be used to document effects of Deep Brain Stimulation (DBS) on speech performance, especially in persons with Parkinson disease (PD). A small set of evaluative criteria for these measures is presented first, followed by consideration of several speech physiology and speech acoustic measures that have been studied frequently and reported on in the literature on normal speech production, and speech production affected by neuromotor disorders (dysarthria). Each measure is reviewed and evaluated against the evaluative criteria. Embedded within this review and evaluation is a presentation of new data relating speech motions to speech intelligibility measures in speakers with PD, amyotrophic lateral sclerosis (ALS), and control speakers (CS). These data are used to support the conclusion that at the present time the slope of second formant transitions (F2 slope), an acoustic measure, is well suited to make inferences to speech motion and to predict speech intelligibility. The use of other measures should not be ruled out, however, and we encourage further development of evaluative criteria for speech measures designed to probe the effects of DBS or any treatment with potential effects on speech production and communication skills. PMID:24932066

  18. Speech entrainment enables patients with Broca's aphasia to produce fluent speech.

    PubMed

    Fridriksson, Julius; Hubbard, H Isabel; Hudspeth, Sarah Grace; Holland, Audrey L; Bonilha, Leonardo; Fromm, Davida; Rorden, Chris

    2012-12-01

    A distinguishing feature of Broca's aphasia is non-fluent halting speech typically involving one to three words per utterance. Yet, despite such profound impairments, some patients can mimic audio-visual speech stimuli enabling them to produce fluent speech in real time. We call this effect 'speech entrainment' and reveal its neural mechanism as well as explore its usefulness as a treatment for speech production in Broca's aphasia. In Experiment 1, 13 patients with Broca's aphasia were tested in three conditions: (i) speech entrainment with audio-visual feedback where they attempted to mimic a speaker whose mouth was seen on an iPod screen; (ii) speech entrainment with audio-only feedback where patients mimicked heard speech; and (iii) spontaneous speech where patients spoke freely about assigned topics. The patients produced a greater variety of words using audio-visual feedback compared with audio-only feedback and spontaneous speech. No difference was found between audio-only feedback and spontaneous speech. In Experiment 2, 10 of the 13 patients included in Experiment 1 and 20 control subjects underwent functional magnetic resonance imaging to determine the neural mechanism that supports speech entrainment. Group results with patients and controls revealed greater bilateral cortical activation for speech produced during speech entrainment compared with spontaneous speech at the junction of the anterior insula and Brodmann area 47, in Brodmann area 37, and unilaterally in the left middle temporal gyrus and the dorsal portion of Broca's area. Probabilistic white matter tracts constructed for these regions in the normal subjects revealed a structural network connected via the corpus callosum and ventral fibres through the extreme capsule. Unilateral areas were connected via the arcuate fasciculus. In Experiment 3, all patients included in Experiment 1 participated in a 6-week treatment phase using speech entrainment to improve speech production. Behavioural and

  19. Contemporary Issues in Phoneme Production by Hearing-Impaired Persons: Physiological and Acoustic Aspects.

    ERIC Educational Resources Information Center

    McGarr, Nancy S.; Whitehead, Robert

    1992-01-01

    This paper on physiologic correlates of speech production in children and youth with hearing impairments focuses specifically on the production of phonemes and includes data on respiration for speech production, phonation, speech aerodynamics, articulation, and acoustic analyses of speech by hearing-impaired persons. (Author/DB)

  20. Audiovisual Speech Synchrony Measure: Application to Biometrics

    NASA Astrophysics Data System (ADS)

    Bredin, Hervé; Chollet, Gérard

    2007-12-01

    Speech is a means of communication which is intrinsically bimodal: the audio signal originates from the dynamics of the articulators. This paper reviews recent works in the field of audiovisual speech, and more specifically techniques developed to measure the level of correspondence between audio and visual speech. It overviews the most common audio and visual speech front-end processing, transformations performed on audio, visual, or joint audiovisual feature spaces, and the actual measure of correspondence between audio and visual speech. Finally, the use of synchrony measure for biometric identity verification based on talking faces is experimented on the BANCA database.

  1. Prediction and constraint in audiovisual speech perception

    PubMed Central

    Peelle, Jonathan E.; Sommers, Mitchell S.

    2015-01-01

    During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing precision of prediction. Electrophysiological studies demonstrate oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to auditory information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration, supported

  2. Correlation study of predictive and descriptive metrics of speech intelligibility

    NASA Astrophysics Data System (ADS)

    Stefaniw, Abigail; Shimizu, Yasushi; Smith, Dana

    2002-11-01

    There exists a wide range of speech-intelligibility metrics, each of which is designed to encapsulate a different aspect of room acoustics that relates to speech intelligibility. This study reviews the different definitions of and correlations between various proposed speech intelligibility measures. Speech Intelligibility metrics can be grouped by two main uses: prediction of designed rooms and description of existing rooms. Two descriptive metrics still under investigation are Ease of Hearing and Acoustical Comfort. These are measured by a simple questionnaire, and their relationships with each other and with significant speech intelligibility metrics are explored. A variety of rooms are modeled and auralized in cooperation with a larger study, including classrooms, lecture halls, and offices. Auralized rooms are used to conveniently provide calculated metrics and cross-talk canceled auralizations for diagnostic and descriptive intelligibility tests. Rooms are modeled in CATT-Acoustic and auralized with a multi-channel speaker array in a hemi-anechoic chamber.

  3. Computational speech segregation based on an auditory-inspired modulation analysis.

    PubMed

    May, Tobias; Dau, Torsten

    2014-12-01

    A monaural speech segregation system is presented that estimates the ideal binary mask from noisy speech based on the supervised learning of amplitude modulation spectrogram (AMS) features. Instead of using linearly scaled modulation filters with constant absolute bandwidth, an auditory-inspired modulation filterbank with logarithmically scaled filters is employed. To reduce the dependency of the AMS features on the overall background noise level, a feature normalization stage is applied. In addition, a spectro-temporal integration stage is incorporated in order to exploit the context information about speech activity present in neighboring time-frequency units. In order to evaluate the generalization performance of the system to unseen acoustic conditions, the speech segregation system is trained with a limited set of low signal-to-noise ratio (SNR) conditions, but tested over a wide range of SNRs up to 20 dB. A systematic evaluation of the system demonstrates that auditory-inspired modulation processing can substantially improve the mask estimation accuracy in the presence of stationary and fluctuating interferers. PMID:25480079

  4. Acoustic neuroma

    MedlinePlus

    Vestibular schwannoma; Tumor - acoustic; Cerebellopontine angle tumor; Angle tumor ... Acoustic neuromas have been linked with the genetic disorder neurofibromatosis type 2 (NF2). Acoustic neuromas are uncommon.

  5. Speech Enhancement Using Microphone Arrays.

    NASA Astrophysics Data System (ADS)

    Adugna, Eneyew

    Arrays of sensors have been employed effectively in communication systems for the directional transmission and reception of electromagnetic waves. Among the numerous benefits, this helps improve the signal-to-interference ratio (SIR) of the signal at the receiver. Arrays have since been used in related areas that employ propagating waves for the transmission of information. Several investigators have successfully adopted array principles to acoustics, sonar, seismic, and medical imaging. In speech applications the microphone is used as the sensor for acoustic data acquisition. The performance of subsequent speech processing algorithms--such as speech recognition or speaker recognition--relies heavily on the level of interference within the transduced or recorded speech signal. The normal practice is to use a single, hand-held or head-mounted, microphone. Under most environmental conditions, i.e., environments where other acoustic sources are also active, the speech signal from a single microphone is a superposition of acoustic signals present in the environment. Such cases represent a lower SIR value. To alleviate this problem an array of microphones--linear array, planar array, and 3-dimensional arrays--have been suggested and implemented. This work focuses on microphone arrays in room environments where reverberation is the main source of interference. The acoustic wave incident on the array from a point source is sampled and recorded by a linear array of sensors along with reflected waves. Array signal processing algorithms are developed and used to remove reverberations from the signal received by the array. Signals from other positions are considered as interference. Unlike most studies that deal with plane waves, we base our algorithm on spherical waves originating at a source point. This is especially true for room environments. The algorithm consists of two stages--a first stage to locate the source and a second stage to focus on the source. The first part

  6. Classification of Fricative Consonants for Speech Enhancement in Hearing Devices

    PubMed Central

    Kong, Ying-Yee; Mullangi, Ala; Kokkinakis, Kostas

    2014-01-01

    Objective To investigate a set of acoustic features and classification methods for the classification of three groups of fricative consonants differing in place of articulation. Method A support vector machine (SVM) algorithm was used to classify the fricatives extracted from the TIMIT database in quiet and also in speech babble noise at various signal-to-noise ratios (SNRs). Spectral features including four spectral moments, peak, slope, Mel-frequency cepstral coefficients (MFCC), Gammatone filters outputs, and magnitudes of fast Fourier Transform (FFT) spectrum were used for the classification. The analysis frame was restricted to only 8 msec. In addition, commonly-used linear and nonlinear principal component analysis dimensionality reduction techniques that project a high-dimensional feature vector onto a lower dimensional space were examined. Results With 13 MFCC coefficients, 14 or 24 Gammatone filter outputs, classification performance was greater than or equal to 85% in quiet and at +10 dB SNR. Using 14 Gammatone filter outputs above 1 kHz, classification accuracy remained high (greater than 80%) for a wide range of SNRs from +20 to +5 dB SNR. Conclusions High levels of classification accuracy for fricative consonants in quiet and in noise could be achieved using only spectral features extracted from a short time window. Results of this work have a direct impact on the development of speech enhancement algorithms for hearing devices. PMID:24747721

  7. Recognition of information-bearing elements in speech

    NASA Astrophysics Data System (ADS)

    Hermansky, Hynek

    2003-10-01

    An acoustic speech signal carries many different kinds of information: the basic linguistic message, many characteristics of the speaker of the message, details of the environment in which the message was produced and transmitted, etc. The human auditory/cognitive system is able to detect, decode, and separate all these information sources. Understanding this ability and emulating it on a machine has been an important but elusive scientific and engineering goal for a long time. This talk critically surveys the situation in the speech recognition field. It puts automatic recognition of speech in perspective with other acoustic signal detection and classification tasks, reviews some historical, contemporary, and evolving techniques for machine recognition of speech, critically compares competing techniques, and gives some examples of applications in speech, speaker, and language recognition and identification. The talk is intended for an audience interested but not directly involved in the processing of speech.

  8. Coding pitch differences in voiceless fricatives: Whispered relative to normal speech.

    PubMed

    Heeren, Willemijn F L

    2015-12-01

    Intonation can be perceived in whispered speech despite the absence of the fundamental frequency. In the past, acoustic correlates of pitch in whisper have been sought in vowel content, but, recently, studies of normal speech demonstrated correlates of intonation in consonants as well. This study examined how consonants may contribute to the coding of intonation in whispered relative to normal speech. The acoustic characteristics of whispered, voiceless fricatives /s/ and /f/, produced at different pitch targets (low, mid, high), were investigated and compared to corresponding normal speech productions to assess if whisper contained secondary or compensatory pitch correlates. Furthermore, listener sensitivity to fricative cues to pitch in whisper was established, also relative to normal speech. Consistent with recent studies, acoustic correlates of whispered and normal speech fricatives systematically varied with pitch target. Comparable findings across speech modes showed that acoustic correlates were secondary. Discrimination of vowel-fricative-vowel stimuli was less accurate and slower in whispered than normal speech, which is attributed to differences in acoustic cues available. Perception of fricatives presented without their vowel contexts, however, revealed comparable processing speeds and response accuracies between speech modes, supporting the finding that within fricatives, acoustic correlates of pitch are similar across speech modes. PMID:26723300

  9. Acoustic Radiation Force Impulse Technology in the Differential Diagnosis of Solid Breast Masses with Different Sizes: Which Features Are Most Efficient?

    PubMed Central

    Bai, Min; Zhang, Hui-Ping; Xing, Jin-Fang; Shi, Qiu-Sheng; Gu, Ji-Ying; Li, Fan; Chen, Hui-Li; Zhang, Xue-Mei; Fang, Yun; Du, Lian-Fang

    2015-01-01

    Purpose. To evaluate diagnostic performance of acoustic radiation force impulse (ARFI) technology for solid breast masses with different sizes and determine which features are most efficient. Materials and Methods. 271 solid breast masses in 242 women were examined with ARFI, and their shear wave velocities (SWVs), Virtual Touch tissue imaging (VTI) patterns, and area ratios (ARs) were measured and compared with their histopathological outcomes. Receiver operating characteristic curves (ROC) were calculated to assess diagnostic performance of ARFI for small masses (6–14 mm) and big masses (15–30 mm). Results. SWV of mass was shown to be positively associated with mass size (P < 0.001). For small masses, area under ROC (Az) of AR was larger than that of SWV (P < 0.001) and VTI pattern (P < 0.001); no significant difference was found between Az of SWV and that of VTI pattern (P = 0.906). For big masses, Az of VTI pattern was less than that of SWV (P = 0.008) and AR (P = 0.002); no significant difference was identified between Az of SWV and that of AR (P = 0.584). Conclusions. For big masses, SWV and AR are both efficient measures; nevertheless, for small masses, AR seems to be the best feature. PMID:26258138

  10. Speech recognition based on pattern recognition techniques

    NASA Astrophysics Data System (ADS)

    Rabiner, Lawrence R.

    1990-05-01

    Algorithms for speech recognition can be characterized broadly as pattern recognition approaches and acoustic phonetic approaches. To date, the greatest degree of success in speech recognition has been obtained using pattern recognition paradigms. The use of pattern recognition techniques were applied to the problems of isolated word (or discrete utterance) recognition, connected word recognition, and continuous speech recognition. It is shown that understanding (and consequently the resulting recognizer performance) is best to the simplest recognition tasks and is considerably less well developed for large scale recognition systems.

  11. Articulatory Mediation of Speech Perception: A Causal Analysis of Multi-Modal Imaging Data

    ERIC Educational Resources Information Center

    Gow, David W., Jr.; Segawa, Jennifer A.

    2009-01-01

    The inherent confound between the organization of articulation and the acoustic-phonetic structure of the speech signal makes it exceptionally difficult to evaluate the competing claims of motor and acoustic-phonetic accounts of how listeners recognize coarticulated speech. Here we use Granger causation analysis of high spatiotemporal resolution…

  12. Compensation for Coarticulation: Disentangling Auditory and Gestural Theories of Perception of Coarticulatory Effects in Speech

    ERIC Educational Resources Information Center

    Viswanathan, Navin; Magnuson, James S.; Fowler, Carol A.

    2010-01-01

    According to one approach to speech perception, listeners perceive speech by applying general pattern matching mechanisms to the acoustic signal (e.g., Diehl, Lotto, & Holt, 2004). An alternative is that listeners perceive the phonetic gestures that structured the acoustic signal (e.g., Fowler, 1986). The two accounts have offered different…

  13. A new feature constituting approach to detection of vocal fold pathology

    NASA Astrophysics Data System (ADS)

    Hariharan, M.; Polat, Kemal; Yaacob, Sazali

    2014-08-01

    In the last two decades, non-invasive methods through acoustic analysis of voice signal have been proved to be excellent and reliable tool to diagnose vocal fold pathologies. This paper proposes a new feature vector based on the wavelet packet transform and singular value decomposition for the detection of vocal fold pathology. k-means clustering based feature weighting is proposed to increase the distinguishing performance of the proposed features. In this work, two databases Massachusetts Eye and Ear Infirmary (MEEI) voice disorders database and MAPACI speech pathology database are used. Four different supervised classifiers such as k-nearest neighbour (k-NN), least-square support vector machine, probabilistic neural network and general regression neural network are employed for testing the proposed features. The experimental results uncover that the proposed features give very promising classification accuracy of 100% for both MEEI database and MAPACI speech pathology database.

  14. [Improving the speech with a prosthetic construction].

    PubMed

    Stalpers, M J; Engelen, M; van der Stappen, J A A M; Weijs, W L J; Takes, R P; van Heumen, C C M

    2016-03-01

    A 12-year-old boy had problems with his speech due to a defect in the soft palate. This defect was caused by the surgical removal of a synovial sarcoma. Testing with a nasometer revealed hypernasality above normal values. Given the size and severity of the defect in the soft palate, the possibility of improving the speech with speech therapy was limited. At a centre for special dentistry an attempt was made with a prosthetic construction to improve the performance of the palate and, in that way, the speech. This construction consisted of a denture with an obturator attached to it. With it, an effective closure of the palate could be achieved. New measurements with acoustic nasometry showed scores within the normal values. The nasality in the speech largely disappeared. The obturator is an effective and relatively easy solution for palatal insufficiency resulting from surgical resection. Intrusive reconstructive surgery can be avoided in this way. PMID:26973984

  15. Strategies for distant speech recognitionin reverberant environments

    NASA Astrophysics Data System (ADS)

    Delcroix, Marc; Yoshioka, Takuya; Ogawa, Atsunori; Kubo, Yotaro; Fujimoto, Masakiyo; Ito, Nobutaka; Kinoshita, Keisuke; Espi, Miquel; Araki, Shoko; Hori, Takaaki; Nakatani, Tomohiro

    2015-12-01

    Reverberation and noise are known to severely affect the automatic speech recognition (ASR) performance of speech recorded by distant microphones. Therefore, we must deal with reverberation if we are to realize high-performance hands-free speech recognition. In this paper, we review a recognition system that we developed at our laboratory to deal with reverberant speech. The system consists of a speech enhancement (SE) front-end that employs long-term linear prediction-based dereverberation followed by noise reduction. We combine our SE front-end with an ASR back-end that uses neural networks for acoustic and language modeling. The proposed system achieved top scores on the ASR task of the REVERB challenge. This paper describes the different technologies used in our system and presents detailed experimental results that justify our implementation choices and may provide hints for designing distant ASR systems.

  16. Infant-Directed Speech Is Modulated by Infant Feedback

    ERIC Educational Resources Information Center

    Smith, Nicholas A.; Trainor, Laurel J.

    2008-01-01

    When mothers engage in infant-directed (ID) speech, their voices change in a number of characteristic ways, including adopting a higher overall pitch. Studies have examined these acoustical cues and have tested infants' preferences for ID speech. However, little is known about how these cues change with maternal sensitivity to infant feedback in…

  17. The Effects of Macroglossia on Speech: A Case Study

    ERIC Educational Resources Information Center

    Mekonnen, Abebayehu Messele

    2012-01-01

    This article presents a case study of speech production in a 14-year-old Amharic-speaking boy. The boy had developed secondary macroglossia, related to a disturbance of growth hormones, following a history of normal speech development. Perceptual analysis combined with acoustic analysis and static palatography is used to investigate the specific…

  18. Integrated Spacesuit Audio System Enhances Speech Quality and Reduces Noise

    NASA Technical Reports Server (NTRS)

    Huang, Yiteng Arden; Chen, Jingdong; Chen, Shaoyan Sharyl

    2009-01-01

    A new approach has been proposed for increasing astronaut comfort and speech capture. Currently, the special design of a spacesuit forms an extreme acoustic environment making it difficult to capture clear speech without compromising comfort. The proposed Integrated Spacesuit Audio (ISA) system is to incorporate the microphones into the helmet and use software to extract voice signals from background noise.

  19. The Pump-Valve Model of Speech Articulation.

    ERIC Educational Resources Information Center

    Dew, Donald

    The traditional respiration-phonation-articulation-resonation model of speech production which permeates introductory literature is not the only suitable model of this process. The pump-valve model, which derives from the acoustic theory of speech production, is a viable alternative. This newer model is also consistent with modern theories. It…

  20. Does Signal Degradation Affect Top-Down Processing of Speech?

    PubMed

    Wagner, Anita; Pals, Carina; de Blecourt, Charlotte M; Sarampalis, Anastasios; Başkent, Deniz

    2016-01-01

    Speech perception is formed based on both the acoustic signal and listeners' knowledge of the world and semantic context. Access to semantic information can facilitate interpretation of degraded speech, such as speech in background noise or the speech signal transmitted via cochlear implants (CIs). This paper focuses on the latter, and investigates the time course of understanding words, and how sentential context reduces listeners' dependency on the acoustic signal for natural and degraded speech via an acoustic CI simulation.In an eye-tracking experiment we combined recordings of listeners' gaze fixations with pupillometry, to capture effects of semantic information on both the time course and effort of speech processing. Normal-hearing listeners were presented with sentences with or without a semantically constraining verb (e.g., crawl) preceding the target (baby), and their ocular responses were recorded to four pictures, including the target, a phonological (bay) competitor and a semantic (worm) and an unrelated distractor.The results show that in natural speech, listeners' gazes reflect their uptake of acoustic information, and integration of preceding semantic context. Degradation of the signal leads to a later disambiguation of phonologically similar words, and to a delay in integration of semantic information. Complementary to this, the pupil dilation data show that early semantic integration reduces the effort in disambiguating phonologically similar words. Processing degraded speech comes with increased effort due to the impoverished nature of the signal. Delayed integration of semantic information further constrains listeners' ability to compensate for inaudible signals. PMID:27080670

  1. Across-formant integration and speech intelligibility: Effects of acoustic source properties in the presence and absence of a contralateral interferer.

    PubMed

    Summers, Robert J; Bailey, Peter J; Roberts, Brian

    2016-08-01

    The role of source properties in across-formant integration was explored using three-formant (F1+F2+F3) analogues of natural sentences (targets). In experiment 1, F1+F3 were harmonic analogues (H1+H3) generated using a monotonous buzz source and second-order resonators; in experiment 2, F1+F3 were tonal analogues (T1+T3). F2 could take either form (H2 or T2). Target formants were always presented monaurally; the receiving ear was assigned randomly on each trial. In some conditions, only the target was present; in others, a competitor for F2 (F2C) was presented contralaterally. Buzz-excited or tonal competitors were created using the time-reversed frequency and amplitude contours of F2. Listeners must reject F2C to optimize keyword recognition. Whether or not a competitor was present, there was no effect of source mismatch between F1+F3 and F2. The impact of adding F2C was modest when it was tonal but large when it was harmonic, irrespective of whether F2C matched F1+F3. This pattern was maintained when harmonic and tonal counterparts were loudness-matched (experiment 3). Source type and competition, rather than acoustic similarity, governed the phonetic contribution of a formant. Contrary to earlier research using dichotic targets, requiring across-ear integration to optimize intelligibility, H2C was an equally effective informational masker for H2 as for T2. PMID:27586751

  2. Spectral identification of sperm whales from Littoral Acoustic Demonstration Center passive acoustic recordings

    NASA Astrophysics Data System (ADS)

    Sidorovskaia, Natalia A.; Richard, Blake; Ioup, George E.; Ioup, Juliette W.

    2005-09-01

    The Littoral Acoustic Demonstration Center (LADC) made a series of passive broadband acoustic recordings in the Gulf of Mexico and Ligurian Sea to study noise and marine mammal phonations. The collected data contain a large amount of various types of sperm whale phonations, such as isolated clicks and communication codas. It was previously reported that the spectrograms of the extracted clicks and codas contain well-defined null patterns that seem to be unique for individuals. The null pattern is formed due to individual features of the sound production organs of an animal. These observations motivated the present studies of adapting human speech identification techniques for deep-diving marine mammal phonations. A three-state trained hidden Markov model (HMM) was used with the phonation spectra of sperm whales. The HHM-algorithm gave 75% accuracy in identifying individuals when it had been initially tested for the acoustic data set correlated with visual observations of sperm whales. A comparison of the identification accuracy based on null-pattern similarity analysis and the HMM-algorithm is presented. The results can establish the foundation for developing an acoustic identification database for sperm whales and possibly other deep-diving marine mammals that would be difficult to observe visually. [Research supported by ONR.

  3. Recognition of speaker-dependent continuous speech with KEAL

    NASA Astrophysics Data System (ADS)

    Mercier, G.; Bigorgne, D.; Miclet, L.; Le Guennec, L.; Querre, M.

    1989-04-01

    A description of the speaker-dependent continuous speech recognition system KEAL is given. An unknown utterance, is recognized by means of the followng procedures: acoustic analysis, phonetic segmentation and identification, word and sentence analysis. The combination of feature-based, speaker-independent coarse phonetic segmentation with speaker-dependent statistical classification techniques is one of the main design features of the acoustic-phonetic decoder. The lexical access component is essentially based on a statistical dynamic programming technique which aims at matching a phonemic lexical entry containing various phonological forms, against a phonetic lattice. Sentence recognition is achieved by use of a context-free grammar and a parsing algorithm derived from Earley's parser. A speaker adaptation module allows some of the system parameters to be adjusted by matching known utterances with their acoustical representation. The task to be performed, described by its vocabulary and its grammar, is given as a parameter of the system. Continuously spoken sentences extracted from a 'pseudo-Logo' language are analyzed and results are presented.

  4. Effects of charge design features on parameters of acoustic and seismic waves and cratering, for SMR chemical surface explosions

    NASA Astrophysics Data System (ADS)

    Gitterman, Y.

    2012-04-01

    time delays clearly separated for the shot of IMI explosives (characterized by much higher detonation velocity than ANFO). Additionally acoustic records at close distances from WSMR explosions Distant Image (2440 tons of ANFO) and Minor Uncle (2725 tons of ANFO) were used to extend the charge and distance range for the SS delay scaled relationship, that showed consistency with SMR ANFO shots. The developed specific charge design contributed to the success of this unique dual Sayarim explosion experiment, providing the strongest GT0 sources since the establishment of the IMS network, that demonstrated clearly the most favorable westward/ eastward infrasound propagation up to 3400/6250 km according to appropriate summer/winter weather pattern and stratospheric wind directions, respectively, and thus verified empirically common models of infrasound propagation in the atmosphere. The research was supported by the CTBTO, Vienna, and the Israel Ministry of Immigrant Absorption.

  5. Formant trajectory characteristics in speakers with dysarthria and homogeneous speech intelligibility scores: Further data

    NASA Astrophysics Data System (ADS)

    Kim, Yunjung; Weismer, Gary; Kent, Ray D.

    2005-09-01

    In previous work [J. Acoust. Soc. Am. 117, 2605 (2005)], we reported on formant trajectory characteristics of a relatively large number of speakers with dysarthria and near-normal speech intelligibility. The purpose of that analysis was to begin a documentation of the variability, within relatively homogeneous speech-severity groups, of acoustic measures commonly used to predict across-speaker variation in speech intelligibility. In that study we found that even with near-normal speech intelligibility (90%-100%), many speakers had reduced formant slopes for some words and distributional characteristics of acoustic measures that were different than values obtained from normal speakers. In the current report we extend those findings to a group of speakers with dysarthria with somewhat poorer speech intelligibility than the original group. Results are discussed in terms of the utility of certain acoustic measures as indices of speech intelligibility, and as explanatory data for theories of dysarthria. [Work supported by NIH Award R01 DC00319.

  6. When speech sounds like music.

    PubMed

    Falk, Simone; Rathcke, Tamara; Dalla Bella, Simone

    2014-08-01

    Repetition can boost memory and perception. However, repeating the same stimulus several times in immediate succession also induces intriguing perceptual transformations and illusions. Here, we investigate the Speech to Song Transformation (S2ST), a massed repetition effect in the auditory modality, which crosses the boundaries between language and music. In the S2ST, a phrase repeated several times shifts to being heard as sung. To better understand this unique cross-domain transformation, we examined the perceptual determinants of the S2ST, in particular the role of acoustics. In 2 Experiments, the effects of 2 pitch properties and 3 rhythmic properties on the probability and speed of occurrence of the transformation were examined. Results showed that both pitch and rhythmic properties are key features fostering the transformation. However, some properties proved to be more conducive to the S2ST than others. Stable tonal targets that allowed for the perception of a musical melody led more often and quickly to the S2ST than scalar intervals. Recurring durational contrasts arising from segmental grouping favoring a metrical interpretation of the stimulus also facilitated the S2ST. This was, however, not the case for a regular beat structure within and across repetitions. In addition, individual perceptual abilities allowed to predict the likelihood of the S2ST. Overall, the study demonstrated that repetition enables listeners to reinterpret specific prosodic features of spoken utterances in terms of musical structures. The findings underline a tight link between language and music, but they also reveal important differences in communicative functions of prosodic structure in the 2 domains. PMID:24911013

  7. Acoustic Emphasis in Four Year Olds

    ERIC Educational Resources Information Center

    Wonnacott, Elizabeth; Watson, Duane G.

    2008-01-01

    Acoustic emphasis may convey a range of subtle discourse distinctions, yet little is known about how this complex ability develops in children. This paper presents a first investigation of the factors which influence the production of acoustic prominence in young children's spontaneous speech. In a production experiment, SVO sentences were…

  8. Audiovisual Speech Perception in Children with Developmental Language Disorder in Degraded Listening Conditions

    ERIC Educational Resources Information Center

    Meronen, Auli; Tiippana, Kaisa; Westerholm, Jari; Ahonen, Timo

    2013-01-01

    Purpose: The effect of the signal-to-noise ratio (SNR) on the perception of audiovisual speech in children with and without developmental language disorder (DLD) was investigated by varying the noise level and the sound intensity of acoustic speech. The main hypotheses were that the McGurk effect (in which incongruent visual speech alters the…

  9. The Hypothesis of Apraxia of Speech in Children with Autism Spectrum Disorder

    ERIC Educational Resources Information Center

    Shriberg, Lawrence D.; Paul, Rhea; Black, Lois M.; van Santen, Jan P.

    2011-01-01

    In a sample of 46 children aged 4-7 years with Autism Spectrum Disorder (ASD) and intelligible speech, there was no statistical support for the hypothesis of concomitant Childhood Apraxia of Speech (CAS). Perceptual and acoustic measures of participants' speech, prosody, and voice were compared with data from 40 typically-developing children, 13…

  10. Children's Perception of Speech Produced in a Two-Talker Background

    ERIC Educational Resources Information Center

    Baker, Mallory; Buss, Emily; Jacks, Adam; Taylor, Crystal; Leibold, Lori J.

    2014-01-01

    Purpose: This study evaluated the degree to which children benefit from the acoustic modifications made by talkers when they produce speech in noise. Method: A repeated measures design compared the speech perception performance of children (5-11 years) and adults in a 2-talker masker. Target speech was produced in a 2-talker background or in…

  11. Effects of Elicitation Task Variables on Speech Production by Children with Cochlear Implants

    ERIC Educational Resources Information Center

    McCleary, Elizabeth A.; Ide-Helvie, Dana L.; Lotto, Andrew J.; Carney, Arlene Earley; Higgins, Maureen B.

    2007-01-01

    Given the interest in comparing speech production development in children with normal hearing and hearing impairment, it is important to evaluate how variables within speech elicitation tasks can differentially affect the acoustics of speech production for these groups. In a first experiment, children (6-14 years old) with cochlear implants…

  12. Speech Processing Application Based on Phonetics and Phonology of the Polish Language

    NASA Astrophysics Data System (ADS)

    Kłosowski, Piotr

    The article presents methods of improving speech processing based on phonetics and phonology of Polish language. The new presented method for speech recognition was based on detection of distinctive acoustic parameters of phonemes in Polish language. Distinctivity has been assumed as the most important selection of parameters, which have represented objects from recognized classes. Speech recognition is widely used in telecommunications applications.

  13. Free Speech Yearbook: 1972.

    ERIC Educational Resources Information Center

    Tedford, Thomas L., Ed.

    This book is a collection of essays on free speech issues and attitudes, compiled by the Commission on Freedom of Speech of the Speech Communication Association. Four articles focus on freedom of speech in classroom situations as follows: a philosophic view of teaching free speech, effects of a course on free speech on student attitudes,…

  14. Speech analyzer

    NASA Technical Reports Server (NTRS)

    Lokerson, D. C. (Inventor)

    1977-01-01

    A speech signal is analyzed by applying the signal to formant filters which derive first, second and third signals respectively representing the frequency of the speech waveform in the first, second and third formants. A first pulse train having approximately a pulse rate representing the average frequency of the first formant is derived; second and third pulse trains having pulse rates respectively representing zero crossings of the second and third formants are derived. The first formant pulse train is derived by establishing N signal level bands, where N is an integer at least equal to two. Adjacent ones of the signal bands have common boundaries, each of which is a predetermined percentage of the peak level of a complete cycle of the speech waveform.

  15. Start/End Delays of Voiced and Unvoiced Speech Signals

    SciTech Connect

    Herrnstein, A

    1999-09-24

    Recent experiments using low power EM-radar like sensors (e.g, GEMs) have demonstrated a new method for measuring vocal fold activity and the onset times of voiced speech, as vocal fold contact begins to take place. Similarly the end time of a voiced speech segment can be measured. Secondly it appears that in most normal uses of American English speech, unvoiced-speech segments directly precede or directly follow voiced-speech segments. For many applications, it is useful to know typical duration times of these unvoiced speech segments. A corpus, assembled earlier of spoken ''Timit'' words, phrases, and sentences and recorded using simultaneously measured acoustic and EM-sensor glottal signals, from 16 male speakers, was used for this study. By inspecting the onset (or end) of unvoiced speech, using the acoustic signal, and the onset (or end) of voiced speech using the EM sensor signal, the average duration times for unvoiced segments preceding onset of vocalization were found to be 300ms, and for following segments, 500ms. An unvoiced speech period is then defined in time, first by using the onset of the EM-sensed glottal signal, as the onset-time marker for the voiced speech segment and end marker for the unvoiced segment. Then, by subtracting 300ms from the onset time mark of voicing, the unvoiced speech segment start time is found. Similarly, the times for a following unvoiced speech segment can be found. While data of this nature have proven to be useful for work in our laboratory, a great deal of additional work remains to validate such data for use with general populations of users. These procedures have been useful for applying optimal processing algorithms over time segments of unvoiced, voiced, and non-speech acoustic signals. For example, these data appear to be of use in speaker validation, in vocoding, and in denoising algorithms.

  16. Primary Progressive Aphasia and Apraxia of Speech

    PubMed Central

    Jung, Youngsin; Duffy, Joseph R.; Josephs, Keith A.

    2014-01-01

    Primary progressive aphasia is a neurodegenerative syndrome characterized by progressive language dysfunction. The majority of primary progressive aphasia cases can be classified into three subtypes: non-fluent/agrammatic, semantic, and logopenic variants of primary progressive aphasia. Each variant presents with unique clinical features, and is associated with distinctive underlying pathology and neuroimaging findings. Unlike primary progressive aphasia, apraxia of speech is a disorder that involves inaccurate production of sounds secondary to impaired planning or programming of speech movements. Primary progressive apraxia of speech is a neurodegenerative form of apraxia of speech, and it should be distinguished from primary progressive aphasia given its discrete clinicopathological presentation. Recently, there have been substantial advances in our understanding of these speech and language disorders. Here, we review clinical, neuroimaging, and histopathological features of primary progressive aphasia and apraxia of speech. The distinctions among these disorders will be crucial since accurate diagnosis will be important from a prognostic and therapeutic standpoint. PMID:24234355

  17. Talker Versus Dialect Effects on Speech Intelligibility: A Symmetrical Study.

    PubMed

    McCloy, Daniel R; Wright, Richard A; Souza, Pamela E

    2015-09-01

    This study investigates the relative effects of talker-specific variation and dialect-based variation on speech intelligibility. Listeners from two dialects of American English performed speech-in-noise tasks with sentences spoken by talkers of each dialect. An initial statistical model showed no significant effects for either talker or listener dialect group, and no interaction. However, a mixed-effects regression model including several acoustic measures of the talker's speech revealed a subtle effect of talker dialect once the various acoustic dimensions were accounted for. Results are discussed in relation to other recent studies of cross-dialect intelligibility. PMID:26529902

  18. Talker versus dialect effects on speech intelligibility: a symmetrical study

    PubMed Central

    McCloy, Daniel R.; Wright, Richard A.; Souza, Pamela E.

    2014-01-01

    This study investigates the relative effects of talker-specific variation and dialect-based variation on speech intelligibility. Listeners from two dialects of American English performed speech-in-noise tasks with sentences spoken by talkers of each dialect. An initial statistical model showed no significant effects for either talker or listener dialect group, and no interaction. However, a mixed-effects regression model including several acoustic measures of the talker’s speech revealed a subtle effect of talker dialect once the various acoustic dimensions were accounted for. Results are discussed in relation to other recent studies of cross-dialect intelligibility. PMID:26529902

  19. Learning Vowel Categories from Maternal Speech in Gurindji Kriol

    ERIC Educational Resources Information Center

    Jones, Caroline; Meakins, Felicity; Muawiyath, Shujau

    2012-01-01

    Distributional learning is a proposal for how infants might learn early speech sound categories from acoustic input before they know many words. When categories in the input differ greatly in relative frequency and overlap in acoustic space, research in bilingual development suggests that this affects the course of development. In the present…

  20. Speech Intelligibility

    NASA Astrophysics Data System (ADS)

    Brand, Thomas

    Speech intelligibility (SI) is important for different fields of research, engineering and diagnostics in order to quantify very different phenomena like the quality of recordings, communication and playback devices, the reverberation of auditoria, characteristics of hearing impairment, benefit using hearing aids or combinations of these things.