Science.gov

Sample records for acoustic speech features

  1. Adding articulatory features to acoustic features for automatic speech recognition

    SciTech Connect

    Zlokarnik, I.

    1995-05-01

    A hidden-Markov-model (HMM) based speech recognition system was evaluated that makes use of simultaneously recorded acoustic and articulatory data. The articulatory measurements were gathered by means of electromagnetic articulography and describe the movement of small coils fixed to the speakers` tongue and jaw during the production of German V{sub 1}CV{sub 2} sequences [P. Hoole and S. Gfoerer, J. Acoust. Soc. Am. Suppl. 1 {bold 87}, S123 (1990)]. Using the coordinates of the coil positions as an articulatory representation, acoustic and articulatory features were combined to make up an acoustic--articulatory feature vector. The discriminant power of this combined representation was evaluated for two subjects on a speaker-dependent isolated word recognition task. When the articulatory measurements were used both for training and testing the HMMs, the articulatory representation was capable of reducing the error rate of comparable acoustic-based HMMs by a relative percentage of more than 60%. In a separate experiment, the articulatory movements during the testing phase were estimated using a multilayer perceptron that performed an acoustic-to-articulatory mapping. Under these more realistic conditions, when articulatory measurements are only available during the training, the error rate could be reduced by a relative percentage of 18% to 25%.

  2. Primary Progressive Apraxia of Speech: Clinical Features and Acoustic and Neurologic Correlates

    PubMed Central

    Strand, Edythe A.; Clark, Heather; Machulda, Mary; Whitwell, Jennifer L.; Josephs, Keith A.

    2015-01-01

    Purpose This study summarizes 2 illustrative cases of a neurodegenerative speech disorder, primary progressive apraxia of speech (AOS), as a vehicle for providing an overview of the disorder and an approach to describing and quantifying its perceptual features and some of its temporal acoustic attributes. Method Two individuals with primary progressive AOS underwent speech-language and neurologic evaluations on 2 occasions, ranging from 2.0 to 7.5 years postonset. Performance on several tests, tasks, and rating scales, as well as several acoustic measures, were compared over time within and between cases. Acoustic measures were compared with performance of control speakers. Results Both patients initially presented with AOS as the only or predominant sign of disease and without aphasia or dysarthria. The presenting features and temporal progression were captured in an AOS Rating Scale, an Articulation Error Score, and temporal acoustic measures of utterance duration, syllable rates per second, rates of speechlike alternating motion and sequential motion, and a pairwise variability index measure. Conclusions AOS can be the predominant manifestation of neurodegenerative disease. Clinical ratings of its attributes and acoustic measures of some of its temporal characteristics can support its diagnosis and help quantify its salient characteristics and progression over time. PMID:25654422

  3. Analysis of Acoustic Features in Speakers with Cognitive Disorders and Speech Impairments

    NASA Astrophysics Data System (ADS)

    Saz, Oscar; Simón, Javier; Rodríguez, W. Ricardo; Lleida, Eduardo; Vaquero, Carlos

    2009-12-01

    This work presents the results in the analysis of the acoustic features (formants and the three suprasegmental features: tone, intensity and duration) of the vowel production in a group of 14 young speakers suffering different kinds of speech impairments due to physical and cognitive disorders. A corpus with unimpaired children's speech is used to determine the reference values for these features in speakers without any kind of speech impairment within the same domain of the impaired speakers; this is 57 isolated words. The signal processing to extract the formant and pitch values is based on a Linear Prediction Coefficients (LPCs) analysis of the segments considered as vowels in a Hidden Markov Model (HMM) based Viterbi forced alignment. Intensity and duration are also based in the outcome of the automated segmentation. As main conclusion of the work, it is shown that intelligibility of the vowel production is lowered in impaired speakers even when the vowel is perceived as correct by human labelers. The decrease in intelligibility is due to a 30% of increase in confusability in the formants map, a reduction of 50% in the discriminative power in energy between stressed and unstressed vowels and to a 50% increase of the standard deviation in the length of the vowels. On the other hand, impaired speakers keep good control of tone in the production of stressed and unstressed vowels.

  4. Speech Recognition and Acoustic Features in Combined Electric and Acoustic Stimulation

    ERIC Educational Resources Information Center

    Yoon, Yang-soo; Li, Yongxin; Fu, Qian-Jie

    2012-01-01

    Purpose: In this study, the authors aimed to identify speech information processed by a hearing aid (HA) that is additive to information processed by a cochlear implant (CI) as a function of signal-to-noise ratio (SNR). Method: Speech recognition was measured with CI alone, HA alone, and CI + HA. Ten participants were separated into 2 groups; good…

  5. Acoustic and Articulatory Features of Diphthong Production: A Speech Clarity Study

    ERIC Educational Resources Information Center

    Tasko, Stephen M.; Greilick, Kristin

    2010-01-01

    Purpose: The purpose of this study was to evaluate how speaking clearly influences selected acoustic and orofacial kinematic measures associated with diphthong production. Method: Forty-nine speakers, drawn from the University of Wisconsin X-Ray Microbeam Speech Production Database (J. R. Westbury, 1994), served as participants. Samples of clear…

  6. Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

    DOEpatents

    Holzrichter, J.F.; Ng, L.C.

    1998-03-17

    The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching. 35 figs.

  7. Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

    DOEpatents

    Holzrichter, John F.; Ng, Lawrence C.

    1998-01-01

    The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching.

  8. Acoustic Feature Optimization Based on F-Ratio for Robust Speech Recognition

    NASA Astrophysics Data System (ADS)

    Sun, Yanqing; Zhou, Yu; Zhao, Qingwei; Yan, Yonghong

    This paper focuses on the problem of performance degradation in mismatched speech recognition. The F-Ratio analysis method is utilized to analyze the significance of different frequency bands for speech unit classification, and we find that frequencies around 1kHz and 3kHz, which are the upper bounds of the first and the second formants for most of the vowels, should be emphasized in comparison to the Mel-frequency cepstral coefficients (MFCC). The analysis result is further observed to be stable in several typical mismatched situations. Similar to the Mel-Frequency scale, another frequency scale called the F-Ratio-scale is thus proposed to optimize the filter bank design for the MFCC features, and make each subband contains equal significance for speech unit classification. Under comparable conditions, with the modified features we get a relative 43.20% decrease compared with the MFCC in sentence error rate for the emotion affected speech recognition, 35.54%, 23.03% for the noisy speech recognition at 15dB and 0dB SNR (signal to noise ratio) respectively, and 64.50% for the three years' 863 test data. The application of the F-Ratio analysis on the clean training set of the Aurora2 database demonstrates its robustness over languages, texts and sampling rates.

  9. Exploration of Acoustic Features for Automatic Vowel Discrimination in Spontaneous Speech

    ERIC Educational Resources Information Center

    Tyson, Na'im R.

    2012-01-01

    In an attempt to understand what acoustic/auditory feature sets motivated transcribers towards certain labeling decisions, I built machine learning models that were capable of discriminating between canonical and non-canonical vowels excised from the Buckeye Corpus. Specifically, I wanted to model when the dictionary form and the transcribed-form…

  10. Time-Frequency Feature Representation Using Multi-Resolution Texture Analysis and Acoustic Activity Detector for Real-Life Speech Emotion Recognition

    PubMed Central

    Wang, Kun-Ching

    2015-01-01

    The classification of emotional speech is mostly considered in speech-related research on human-computer interaction (HCI). In this paper, the purpose is to present a novel feature extraction based on multi-resolutions texture image information (MRTII). The MRTII feature set is derived from multi-resolution texture analysis for characterization and classification of different emotions in a speech signal. The motivation is that we have to consider emotions have different intensity values in different frequency bands. In terms of human visual perceptual, the texture property on multi-resolution of emotional speech spectrogram should be a good feature set for emotion classification in speech. Furthermore, the multi-resolution analysis on texture can give a clearer discrimination between each emotion than uniform-resolution analysis on texture. In order to provide high accuracy of emotional discrimination especially in real-life, an acoustic activity detection (AAD) algorithm must be applied into the MRTII-based feature extraction. Considering the presence of many blended emotions in real life, in this paper make use of two corpora of naturally-occurring dialogs recorded in real-life call centers. Compared with the traditional Mel-scale Frequency Cepstral Coefficients (MFCC) and the state-of-the-art features, the MRTII features also can improve the correct classification rates of proposed systems among different language databases. Experimental results show that the proposed MRTII-based feature information inspired by human visual perception of the spectrogram image can provide significant classification for real-life emotional recognition in speech. PMID:25594590

  11. Speech recognition: Acoustic, phonetic and lexical

    NASA Astrophysics Data System (ADS)

    Zue, V. W.

    1985-10-01

    Our long-term research goal is the development and implementation of speaker-independent continuous speech recognition systems. It is our conviction that proper utilization of speech-specific knowledge is essential for advanced speech recognition systems. With this in mind, we have continued to make progress on the acquisition of acoustic-phonetic and lexical knowledge. We have completed the development of a continuous digit recognition system. The system was constructed to investigate the utilization of acoustic phonetic knowledge in a speech recognition system. Some of the significant development of this study includes a soft-failure procedure for lexical access, and the discovery of a set of acoustic-phonetic features for verification. We have completed a study of the constraints provided by lexical stress on word recognition. We found that lexical stress information alone can, on the average, reduce the number of word candidates from a large dictionary by more than 80%. In conjunction with this study, we successfully developed a system that automatically determines the stress pattern of a word from the acoustic signal.

  12. Speech recognition: Acoustic, phonetic and lexical knowledge

    NASA Astrophysics Data System (ADS)

    Zue, V. W.

    1985-08-01

    During this reporting period we continued to make progress on the acquisition of acoustic-phonetic and lexical knowledge. We completed development of a continuous digit recognition system. The system was constructed to investigate the use of acoustic-phonetic knowledge in a speech recognition system. The significant achievements of this study include the development of a soft-failure procedure for lexical access and the discovery of a set of acoustic-phonetic features for verification. We completed a study of the constraints that lexical stress imposes on word recognition. We found that lexical stress information alone can, on the average, reduce the number of word candidates from a large dictionary by more than 80 percent. In conjunction with this study, we successfully developed a system that automatically determines the stress pattern of a word from the acoustic signal. We performed an acoustic study on the characteristics of nasal consonants and nasalized vowels. We have also developed recognition algorithms for nasal murmurs and nasalized vowels in continuous speech. We finished the preliminary development of a system that aligns a speech waveform with the corresponding phonetic transcription.

  13. Methods and apparatus for non-acoustic speech characterization and recognition

    SciTech Connect

    Holzrichter, J.F.

    1999-12-21

    By simultaneously recording EM wave reflections and acoustic speech information, the positions and velocities of the speech organs as speech is articulated can be defined for each acoustic speech unit. Well defined time frames and feature vectors describing the speech, to the degree required, can be formed. Such feature vectors can uniquely characterize the speech unit being articulated each time frame. The onset of speech, rejection of external noise, vocalized pitch periods, articulator conditions, accurate timing, the identification of the speaker, acoustic speech unit recognition, and organ mechanical parameters can be determined.

  14. Methods and apparatus for non-acoustic speech characterization and recognition

    DOEpatents

    Holzrichter, John F.

    1999-01-01

    By simultaneously recording EM wave reflections and acoustic speech information, the positions and velocities of the speech organs as speech is articulated can be defined for each acoustic speech unit. Well defined time frames and feature vectors describing the speech, to the degree required, can be formed. Such feature vectors can uniquely characterize the speech unit being articulated each time frame. The onset of speech, rejection of external noise, vocalized pitch periods, articulator conditions, accurate timing, the identification of the speaker, acoustic speech unit recognition, and organ mechanical parameters can be determined.

  15. Acoustics of Clear Speech: Effect of Instruction

    ERIC Educational Resources Information Center

    Lam, Jennifer; Tjaden, Kris; Wilding, Greg

    2012-01-01

    Purpose: This study investigated how different instructions for eliciting clear speech affected selected acoustic measures of speech. Method: Twelve speakers were audio-recorded reading 18 different sentences from the Assessment of Intelligibility of Dysarthric Speech (Yorkston & Beukelman, 1984). Sentences were produced in habitual, clear,…

  16. Acoustic Characteristics of Ataxic Speech in Japanese Patients with Spinocerebellar Degeneration (SCD)

    ERIC Educational Resources Information Center

    Ikui, Yukiko; Tsukuda, Mamoru; Kuroiwa, Yoshiyuki; Koyano, Shigeru; Hirose, Hajime; Taguchi, Takahide

    2012-01-01

    Background: In English- and German-speaking countries, ataxic speech is often described as showing scanning based on acoustic impressions. Although the term "scanning" is generally considered to represent abnormal speech features including prosodic excess or insufficiency, any precise acoustic analysis of ataxic speech has not been performed in…

  17. Near-Term Fetuses Process Temporal Features of Speech

    ERIC Educational Resources Information Center

    Granier-Deferre, Carolyn; Ribeiro, Aurelie; Jacquet, Anne-Yvonne; Bassereau, Sophie

    2011-01-01

    The perception of speech and music requires processing of variations in spectra and amplitude over different time intervals. Near-term fetuses can discriminate acoustic features, such as frequencies and spectra, but whether they can process complex auditory streams, such as speech sequences and more specifically their temporal variations, fast or…

  18. Acoustic assessment of speech privacy curtains in two nursing units

    PubMed Central

    Pope, Diana S.; Miller-Klein, Erik T.

    2016-01-01

    Hospitals have complex soundscapes that create challenges to patient care. Extraneous noise and high reverberation rates impair speech intelligibility, which leads to raised voices. In an unintended spiral, the increasing noise may result in diminished speech privacy, as people speak loudly to be heard over the din. The products available to improve hospital soundscapes include construction materials that absorb sound (acoustic ceiling tiles, carpet, wall insulation) and reduce reverberation rates. Enhanced privacy curtains are now available and offer potential for a relatively simple way to improve speech privacy and speech intelligibility by absorbing sound at the hospital patient's bedside. Acoustic assessments were performed over 2 days on two nursing units with a similar design in the same hospital. One unit was built with the 1970s’ standard hospital construction and the other was newly refurbished (2013) with sound-absorbing features. In addition, we determined the effect of an enhanced privacy curtain versus standard privacy curtains using acoustic measures of speech privacy and speech intelligibility indexes. Privacy curtains provided auditory protection for the patients. In general, that protection was increased by the use of enhanced privacy curtains. On an average, the enhanced curtain improved sound absorption from 20% to 30%; however, there was considerable variability, depending on the configuration of the rooms tested. Enhanced privacy curtains provide measureable improvement to the acoustics of patient rooms but cannot overcome larger acoustic design issues. To shorten reverberation time, additional absorption, and compact and more fragmented nursing unit floor plate shapes should be considered. PMID:26780959

  19. Acoustic assessment of speech privacy curtains in two nursing units.

    PubMed

    Pope, Diana S; Miller-Klein, Erik T

    2016-01-01

    Hospitals have complex soundscapes that create challenges to patient care. Extraneous noise and high reverberation rates impair speech intelligibility, which leads to raised voices. In an unintended spiral, the increasing noise may result in diminished speech privacy, as people speak loudly to be heard over the din. The products available to improve hospital soundscapes include construction materials that absorb sound (acoustic ceiling tiles, carpet, wall insulation) and reduce reverberation rates. Enhanced privacy curtains are now available and offer potential for a relatively simple way to improve speech privacy and speech intelligibility by absorbing sound at the hospital patient's bedside. Acoustic assessments were performed over 2 days on two nursing units with a similar design in the same hospital. One unit was built with the 1970s' standard hospital construction and the other was newly refurbished (2013) with sound-absorbing features. In addition, we determined the effect of an enhanced privacy curtain versus standard privacy curtains using acoustic measures of speech privacy and speech intelligibility indexes. Privacy curtains provided auditory protection for the patients. In general, that protection was increased by the use of enhanced privacy curtains. On an average, the enhanced curtain improved sound absorption from 20% to 30%; however, there was considerable variability, depending on the configuration of the rooms tested. Enhanced privacy curtains provide measureable improvement to the acoustics of patient rooms but cannot overcome larger acoustic design issues. To shorten reverberation time, additional absorption, and compact and more fragmented nursing unit floor plate shapes should be considered. PMID:26780959

  20. Acoustic assessment of speech privacy curtains in two nursing units.

    PubMed

    Pope, Diana S; Miller-Klein, Erik T

    2016-01-01

    Hospitals have complex soundscapes that create challenges to patient care. Extraneous noise and high reverberation rates impair speech intelligibility, which leads to raised voices. In an unintended spiral, the increasing noise may result in diminished speech privacy, as people speak loudly to be heard over the din. The products available to improve hospital soundscapes include construction materials that absorb sound (acoustic ceiling tiles, carpet, wall insulation) and reduce reverberation rates. Enhanced privacy curtains are now available and offer potential for a relatively simple way to improve speech privacy and speech intelligibility by absorbing sound at the hospital patient's bedside. Acoustic assessments were performed over 2 days on two nursing units with a similar design in the same hospital. One unit was built with the 1970s' standard hospital construction and the other was newly refurbished (2013) with sound-absorbing features. In addition, we determined the effect of an enhanced privacy curtain versus standard privacy curtains using acoustic measures of speech privacy and speech intelligibility indexes. Privacy curtains provided auditory protection for the patients. In general, that protection was increased by the use of enhanced privacy curtains. On an average, the enhanced curtain improved sound absorption from 20% to 30%; however, there was considerable variability, depending on the configuration of the rooms tested. Enhanced privacy curtains provide measureable improvement to the acoustics of patient rooms but cannot overcome larger acoustic design issues. To shorten reverberation time, additional absorption, and compact and more fragmented nursing unit floor plate shapes should be considered.

  1. Analog Acoustic Expression in Speech Communication

    ERIC Educational Resources Information Center

    Shintel, Hadas; Nusbaum, Howard C.; Okrent, Arika

    2006-01-01

    We present the first experimental evidence of a phenomenon in speech communication we call "analog acoustic expression." Speech is generally thought of as conveying information in two distinct ways: discrete linguistic-symbolic units such as words and sentences represent linguistic meaning, and continuous prosodic forms convey information about…

  2. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2006-08-08

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  3. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2004-03-23

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  4. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2006-02-14

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  5. Acoustic characteristics of listener-constrained speech

    NASA Astrophysics Data System (ADS)

    Ashby, Simone; Cummins, Fred

    2003-04-01

    Relatively little is known about the acoustical modifications speakers employ to meet the various constraints-auditory, linguistic and otherwise-of their listeners. Similarly, the manner by which perceived listener constraints interact with speakers' adoption of specialized speech registers is poorly Hypo (H&H) theory offers a framework for examining the relationship between speech production and output-oriented goals for communication, suggesting that under certain circumstances speakers may attempt to minimize phonetic ambiguity by employing a ``hyperarticulated'' speaking style (Lindblom, 1990). It remains unclear, however, what the acoustic correlates of hyperarticulated speech are, and how, if at all, we might expect phonetic properties to change respective to different listener-constrained conditions. This paper is part of a preliminary investigation concerned with comparing the prosodic characteristics of speech produced across a range of listener constraints. Analyses are drawn from a corpus of read hyperarticulated speech data comprising eight adult, female speakers of English. Specialized registers include speech to foreigners, infant-directed speech, speech produced under noisy conditions, and human-machine interaction. The authors gratefully acknowledge financial support of the Irish Higher Education Authority, allocated to Fred Cummins for collaborative work with Media Lab Europe.

  6. Acoustic Evidence for Phonologically Mismatched Speech Errors

    ERIC Educational Resources Information Center

    Gormley, Andrea

    2015-01-01

    Speech errors are generally said to accommodate to their new phonological context. This accommodation has been validated by several transcription studies. The transcription methodology is not the best choice for detecting errors at this level, however, as this type of error can be difficult to perceive. This paper presents an acoustic analysis of…

  7. Perceptual centres in speech - an acoustic analysis

    NASA Astrophysics Data System (ADS)

    Scott, Sophie Kerttu

    Perceptual centres, or P-centres, represent the perceptual moments of occurrence of acoustic signals - the 'beat' of a sound. P-centres underlie the perception and production of rhythm in perceptually regular speech sequences. P-centres have been modelled both in speech and non speech (music) domains. The three aims of this thesis were toatest out current P-centre models to determine which best accounted for the experimental data bto identify a candidate parameter to map P-centres onto (a local approach) as opposed to the previous global models which rely upon the whole signal to determine the P-centre the final aim was to develop a model of P-centre location which could be applied to speech and non speech signals. The first aim was investigated by a series of experiments in which a) speech from different speakers was investigated to determine whether different models could account for variation between speakers b) whether rendering the amplitude time plot of a speech signal affects the P-centre of the signal c) whether increasing the amplitude at the offset of a speech signal alters P-centres in the production and perception of speech. The second aim was carried out by a) manipulating the rise time of different speech signals to determine whether the P-centre was affected, and whether the type of speech sound ramped affected the P-centre shift b) manipulating the rise time and decay time of a synthetic vowel to determine whether the onset alteration was had more affect on P-centre than the offset manipulation c) and whether the duration of a vowel affected the P-centre, if other attributes (amplitude, spectral contents) were held constant. The third aim - modelling P-centres - was based on these results. The Frequency dependent Amplitude Increase Model of P-centre location (FAIM) was developed using a modelling protocol, the APU GammaTone Filterbank and the speech from different speakers. The P-centres of the stimuli corpus were highly predicted by attributes of

  8. Clinical investigation of speech signal features among patients with schizophrenia

    PubMed Central

    ZHANG, Jing; PAN, Zhongde; GUI, Chao; CUI, Donghong

    2016-01-01

    Background A new area of interest in the search for biomarkers for schizophrenia is the study of the acoustic parameters of speech called 'speech signal features'. Several of these features have been shown to be related to emotional responsiveness, a characteristic that is notably restricted in patients with schizophrenia, particularly those with prominent negative symptoms. Aim Assess the relationship of selected acoustic parameters of speech to the severity of clinical symptoms in patients with chronic schizophrenia and compare these characteristics between patients and matched healthy controls. Methods Ten speech signal features-six prosody features, formant bandwidth and amplitude, and two spectral features-were assessed using 15-minute speech samples obtained by smartphone from 26 inpatients with chronic schizophrenia (at enrollment and 1 week later) and from 30 healthy controls (at enrollment only). Clinical symptoms of the patients were also assessed at baseline and 1 week later using the Positive and Negative Syndrome Scale, the Scale for the Assessment of Negative Symptoms, and the Clinical Global Impression-Schizophrenia scale. Results In the patient group the symptoms were stable over the 1-week interval and the 1-week test-retest reliability of the 10 speech features was good (intraclass correlation coefficients [ICC] ranging from 0.55 to 0.88). Comparison of the speech features between patients and controls found no significant differences in the six prosody features or in the formant bandwidth and amplitude features, but the two spectral features were different: the Mel-frequency cepstral coefficient (MFCC) scores were significantly lower in the patient group than in the control group, and the linear prediction coding (LPC) scores were significantly higher in the patient group than in the control group. Within the patient group, 10 of the 170 associations between the 10 speech features considered and the 17 clinical parameters considered were

  9. Clinical investigation of speech signal features among patients with schizophrenia

    PubMed Central

    ZHANG, Jing; PAN, Zhongde; GUI, Chao; CUI, Donghong

    2016-01-01

    Background A new area of interest in the search for biomarkers for schizophrenia is the study of the acoustic parameters of speech called 'speech signal features'. Several of these features have been shown to be related to emotional responsiveness, a characteristic that is notably restricted in patients with schizophrenia, particularly those with prominent negative symptoms. Aim Assess the relationship of selected acoustic parameters of speech to the severity of clinical symptoms in patients with chronic schizophrenia and compare these characteristics between patients and matched healthy controls. Methods Ten speech signal features-six prosody features, formant bandwidth and amplitude, and two spectral features-were assessed using 15-minute speech samples obtained by smartphone from 26 inpatients with chronic schizophrenia (at enrollment and 1 week later) and from 30 healthy controls (at enrollment only). Clinical symptoms of the patients were also assessed at baseline and 1 week later using the Positive and Negative Syndrome Scale, the Scale for the Assessment of Negative Symptoms, and the Clinical Global Impression-Schizophrenia scale. Results In the patient group the symptoms were stable over the 1-week interval and the 1-week test-retest reliability of the 10 speech features was good (intraclass correlation coefficients [ICC] ranging from 0.55 to 0.88). Comparison of the speech features between patients and controls found no significant differences in the six prosody features or in the formant bandwidth and amplitude features, but the two spectral features were different: the Mel-frequency cepstral coefficient (MFCC) scores were significantly lower in the patient group than in the control group, and the linear prediction coding (LPC) scores were significantly higher in the patient group than in the control group. Within the patient group, 10 of the 170 associations between the 10 speech features considered and the 17 clinical parameters considered were

  10. Emotion Identification Using Extremely Low Frequency Components of Speech Feature Contours

    PubMed Central

    Lin, Chang-Hong; Liao, Wei-Kai; Hsieh, Wen-Chi; Liao, Wei-Jiun

    2014-01-01

    The investigations of emotional speech identification can be divided into two main parts, features and classifiers. In this paper, how to extract an effective speech feature set for the emotional speech identification is addressed. In our speech feature set, we use not only statistical analysis of frame-based acoustical features, but also the approximated speech feature contours, which are obtained by extracting extremely low frequency components to speech feature contours. Furthermore, principal component analysis (PCA) is applied to the approximated speech feature contours so that an efficient representation of approximated contours can be derived. The proposed speech feature set is fed into support vector machines (SVMs) to perform multiclass emotion identification. The experimental results demonstrate the performance of the proposed system with 82.26% identification rate. PMID:24982991

  11. Segment-based acoustic models for continuous speech recognition

    NASA Astrophysics Data System (ADS)

    Ostendorf, Mari; Rohlicek, J. R.

    1994-02-01

    In work, we are interested in the problem of large vocabulary, speaker-independent continuous speech recognition, and primarily in the acoustic modeling component of this problem. In developing acoustic models for speech recognition, we have conflicting goals. On one hand, the models should be robust to inter- and intra-speaker variability, to the use of a different vocabulary in recognition than in training, and to the effects of moderately noisy environments. In order to accomplish this, we need to model gross features and global trends. On the other hand, the models must be sensitive and detailed enough to detect fine acoustic differences between similar words in a large vocabulary task. To answer these opposing demands requires improvements in acoustic modeling at several levels: the frame level (e.g. signal processing), the phoneme level (e.g. modeling feature dynamics), and the utterance level (e.g. defining a structural context for representing the intra-utterance dependence across phonemes). This project address the problem of acoustic modeling specifically focusing on modeling at the segment level and above.

  12. Acoustic evidence for phonologically mismatched speech errors.

    PubMed

    Gormley, Andrea

    2015-04-01

    Speech errors are generally said to accommodate to their new phonological context. This accommodation has been validated by several transcription studies. The transcription methodology is not the best choice for detecting errors at this level, however, as this type of error can be difficult to perceive. This paper presents an acoustic analysis of speech errors that uncovers non-accommodated or mismatch errors. A mismatch error is a sub-phonemic error that results in an incorrect surface phonology. This type of error could arise during the processing of phonological rules or they could be made at the motor level of implementation. The results of this work have important implications for both experimental and theoretical research. For experimentalists, it validates the tools used for error induction and the acoustic determination of errors free of the perceptual bias. For theorists, this methodology can be used to test the nature of the processes proposed in language production.

  13. Acoustic analysis of speech under stress.

    PubMed

    Sondhi, Savita; Khan, Munna; Vijay, Ritu; Salhan, Ashok K; Chouhan, Satish

    2015-01-01

    When a person is emotionally charged, stress could be discerned in his voice. This paper presents a simplified and a non-invasive approach to detect psycho-physiological stress by monitoring the acoustic modifications during a stressful conversation. Voice database consists of audio clips from eight different popular FM broadcasts wherein the host of the show vexes the subjects who are otherwise unaware of the charade. The audio clips are obtained from real-life stressful conversations (no simulated emotions). Analysis is done using PRAAT software to evaluate mean fundamental frequency (F0) and formant frequencies (F1, F2, F3, F4) both in neutral and stressed state. Results suggest that F0 increases with stress; however, formant frequency decreases with stress. Comparison of Fourier and chirp spectra of short vowel segment shows that for relaxed speech, the two spectra are similar; however, for stressed speech, they differ in the high frequency range due to increased pitch modulation. PMID:26558301

  14. System And Method For Characterizing Voiced Excitations Of Speech And Acoustic Signals, Removing Acoustic Noise From Speech, And Synthesizi

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2006-04-25

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  15. Acoustic characteristics of Korean stops in Korean child-directed speech

    NASA Astrophysics Data System (ADS)

    Kim, Minjung; Stoel-Gammon, Carol

    2005-04-01

    A variety of cross-linguistic studies have documented that the acoustic properties of speech addressed to young children include exaggeration of pitch contours and acoustically salient features of phonetic units. It has been suggested that phonetic modifications of child-directed speech facilitate young children's speech perception by providing detailed phonetic information about the target word. While there are several studies reporting vowel modifications in speech to infants (i.e., hyper-articulated vowels), there has been relatively little research about consonant modifications in speech to young children (except for VOT). The present study examines acoustic properties of Korean stops in Korean mothers' speech to their children aged 29 to 38 months (N=6). Korean tense, lax, and aspirated stops are all voiceless in word-initial position, and are perceptually differentiated by several acoustic parameters including VOT, f0 of the following vowel, and the amplitude difference of the first and second harmonics at the voice onset of the following vowel. This study compares values of these parameters in Korean motherese to those in speech to adult Koreans from same speakers. Results focus on the acoustic properties of Korean stops in child-directed speech and how they are modified to help Korean young children learn the three-way phonetic contrast.

  16. Acoustic Study of Acted Emotions in Speech

    NASA Astrophysics Data System (ADS)

    Wang, Rong

    An extensive set of carefully recorded utterances provided a speech database for investigating acoustic correlates among eight emotional states. Four professional actors and four professional actresses simulated the emotional states of joy, conversation, nervousness, anger, sadness, hate, fear, and depression. The values of 14 acoustic parameters were extracted from analyses of the simulated portrayals. Normalization of the parameters was made to reduce the talker-dependence. Correlates of emotion were investigated by means of principal components analysis. Sadness and depression were found to be "ambiguous" with respect to each other, but "unique" with respect to joy and anger in the principal components space. Joy, conversation, nervousness, anger, hate, and fear did not separate well in the space and so exhibited ambiguity with respect to one another. The different talkers expressed joy, anger, sadness, and depression more consistently than the other four emotions. The analysis results were compared with the results of a subjective study using the same speech database and considerable consistency between the two was found.

  17. Acoustic differences among casual, conversational, and read speech

    NASA Astrophysics Data System (ADS)

    Pinnow, DeAnna

    Speech is a complex behavior that allows speakers to use many variations to satisfy the demands connected with multiple speaking environments. Speech research typically obtains speech samples in a controlled laboratory setting using read material, yet anecdotal observations of such speech, particularly from talkers with a speech and language impairment, have identified a "performance" effect in the produced speech which masks the characteristics of impaired speech outside of the lab (Goberman, Recker, & Parveen, 2010). The aim of the current study was to investigate acoustic differences among laboratory read, laboratory conversational, and casual speech through well-defined speech tasks in the laboratory and in talkers' natural environments. Eleven healthy research participants performed lab recording tasks (19 read sentences and a dialogue about their life) and collected natural-environment recordings of themselves over 3-day periods using portable recorders. Segments were analyzed for articulatory, voice, and prosodic acoustic characteristics using computer software and hand counting. The current study results indicate that lab-read speech was significantly different from casual speech: greater articulation range, improved voice quality measures, lower speech rate, and lower mean pitch. One implication of the results is that different laboratory techniques may be beneficial in obtaining speech samples that are more like casual speech, thus making it easier to correctly analyze abnormal speech characteristics with fewer errors.

  18. Acoustics in Halls for Speech and Music

    NASA Astrophysics Data System (ADS)

    Gade, Anders C.

    This chapter deals specifically with concepts, tools, and architectural variables of importance when designing auditoria for speech and music. The focus will be on cultivating the useful components of the sound in the room rather than on avoiding noise from outside or from installations, which is dealt with in Chap. 11. The chapter starts by presenting the subjective aspects of the room acoustic experience according to consensus at the time of writing. Then follows a description of their objective counterparts, the objective room acoustic parameters, among which the classical reverberation time measure is only one of many, but still of fundamental value. After explanations on how these parameters can be measured and predicted during the design phase, the remainder of the chapter deals with how the acoustic properties can be controlled by the architectural design of auditoria. This is done by presenting the influence of individual design elements as well as brief descriptions of halls designed for specific purposes, such as drama, opera, and symphonic concerts. Finally, some important aspects of loudspeaker installations in auditoria are briefly touched upon.

  19. The acoustic properties of bilingual infant-directed speech.

    PubMed

    Danielson, D Kyle; Seidl, Amanda; Onishi, Kristine H; Alamian, Golnoush; Cristia, Alejandrina

    2014-02-01

    Does the acoustic input for bilingual infants equal the conjunction of the input heard by monolinguals of each separate language? The present letter tackles this question, focusing on maternal speech addressed to 11-month-old infants, on the cusp of perceptual attunement. The acoustic characteristics of the point vowels /a,i,u/ were measured in the spontaneous infant-directed speech of French-English bilingual mothers, as well as in the speech of French and English monolingual mothers. Bilingual caregivers produced their two languages with acoustic prosodic separation equal to that of the monolinguals, while also conveying distinct spectral characteristics of the point vowels in their two languages.

  20. A Comprehensive Noise Robust Speech Parameterization Algorithm Using Wavelet Packet Decomposition-Based Denoising and Speech Feature Representation Techniques

    NASA Astrophysics Data System (ADS)

    Kotnik, Bojan; Kačič, Zdravko

    2007-12-01

    This paper concerns the problem of automatic speech recognition in noise-intense and adverse environments. The main goal of the proposed work is the definition, implementation, and evaluation of a novel noise robust speech signal parameterization algorithm. The proposed procedure is based on time-frequency speech signal representation using wavelet packet decomposition. A new modified soft thresholding algorithm based on time-frequency adaptive threshold determination was developed to efficiently reduce the level of additive noise in the input noisy speech signal. A two-stage Gaussian mixture model (GMM)-based classifier was developed to perform speech/nonspeech as well as voiced/unvoiced classification. The adaptive topology of the wavelet packet decomposition tree based on voiced/unvoiced detection was introduced to separately analyze voiced and unvoiced segments of the speech signal. The main feature vector consists of a combination of log-root compressed wavelet packet parameters, and autoregressive parameters. The final output feature vector is produced using a two-staged feature vector postprocessing procedure. In the experimental framework, the noisy speech databases Aurora 2 and Aurora 3 were applied together with corresponding standardized acoustical model training/testing procedures. The automatic speech recognition performance achieved using the proposed noise robust speech parameterization procedure was compared to the standardized mel-frequency cepstral coefficient (MFCC) feature extraction procedures ETSI ES 201 108 and ETSI ES 202 050.

  1. Clear Speech Variants: An Acoustic Study in Parkinson's Disease

    ERIC Educational Resources Information Center

    Lam, Jennifer; Tjaden, Kris

    2016-01-01

    Purpose: The authors investigated how different variants of clear speech affect segmental and suprasegmental acoustic measures of speech in speakers with Parkinson's disease and a healthy control group. Method: A total of 14 participants with Parkinson's disease and 14 control participants served as speakers. Each speaker produced 18 different…

  2. Acoustic Markers of Prosodic Boundaries in Spanish Spontaneous Alaryngeal Speech

    ERIC Educational Resources Information Center

    Cuenca, M. H.; Barrio, M. M.

    2010-01-01

    Prosodic information aids segmentation of the continuous speech signal and thereby facilitates auditory speech processing. Durational and pitch variations are prosodic cues especially necessary to convey prosodic boundaries, but alaryngeal speakers have inconsistent control over acoustic parameters such as F0 and duration, being as a result noisy…

  3. Study of acoustic correlates associate with emotional speech

    NASA Astrophysics Data System (ADS)

    Yildirim, Serdar; Lee, Sungbok; Lee, Chul Min; Bulut, Murtaza; Busso, Carlos; Kazemzadeh, Ebrahim; Narayanan, Shrikanth

    2004-10-01

    This study investigates the acoustic characteristics of four different emotions expressed in speech. The aim is to obtain detailed acoustic knowledge on how a speech signal is modulated by changes from neutral to a certain emotional state. Such knowledge is necessary for automatic emotion recognition and classification and emotional speech synthesis. Speech data obtained from two semi-professional actresses are analyzed and compared. Each subject produces 211 sentences with four different emotions; neutral, sad, angry, happy. We analyze changes in temporal and acoustic parameters such as magnitude and variability of segmental duration, fundamental frequency and the first three formant frequencies as a function of emotion. Acoustic differences among the emotions are also explored with mutual information computation, multidimensional scaling and acoustic likelihood comparison with normal speech. Results indicate that speech associated with anger and happiness is characterized by longer duration, shorter interword silence, higher pitch and rms energy with wider ranges. Sadness is distinguished from other emotions by lower rms energy and longer interword silence. Interestingly, the difference in formant pattern between [happiness/anger] and [neutral/sadness] are better reflected in back vowels such as /a/(/father/) than in front vowels. Detailed results on intra- and interspeaker variability will be reported.

  4. Optimizing acoustical conditions for speech intelligibility in classrooms

    NASA Astrophysics Data System (ADS)

    Yang, Wonyoung

    High speech intelligibility is imperative in classrooms where verbal communication is critical. However, the optimal acoustical conditions to achieve a high degree of speech intelligibility have previously been investigated with inconsistent results, and practical room-acoustical solutions to optimize the acoustical conditions for speech intelligibility have not been developed. This experimental study validated auralization for speech-intelligibility testing, investigated the optimal reverberation for speech intelligibility for both normal and hearing-impaired listeners using more realistic room-acoustical models, and proposed an optimal sound-control design for speech intelligibility based on the findings. The auralization technique was used to perform subjective speech-intelligibility tests. The validation study, comparing auralization results with those of real classroom speech-intelligibility tests, found that if the room to be auralized is not very absorptive or noisy, speech-intelligibility tests using auralization are valid. The speech-intelligibility tests were done in two different auralized sound fields---approximately diffuse and non-diffuse---using the Modified Rhyme Test and both normal and hearing-impaired listeners. A hybrid room-acoustical prediction program was used throughout the work, and it and a 1/8 scale-model classroom were used to evaluate the effects of ceiling barriers and reflectors. For both subject groups, in approximately diffuse sound fields, when the speech source was closer to the listener than the noise source, the optimal reverberation time was zero. When the noise source was closer to the listener than the speech source, the optimal reverberation time was 0.4 s (with another peak at 0.0 s) with relative output power levels of the speech and noise sources SNS = 5 dB, and 0.8 s with SNS = 0 dB. In non-diffuse sound fields, when the noise source was between the speaker and the listener, the optimal reverberation time was 0.6 s with

  5. Evaluating a topographical mapping from speech acoustics to tongue positions

    SciTech Connect

    Hogden, J.; Heard, M.

    1995-05-01

    The {ital continuity} {ital mapping} algorithm---a procedure for learning to recover the relative positions of the articulators from speech signals---is evaluated using human speech data. The advantage of continuity mapping is that it is an unsupervised algorithm; that is, it can potentially be trained to make a mapping from speech acoustics to speech articulation without articulator measurements. The procedure starts by vector quantizing short windows of a speech signal so that each window is represented (encoded) by a single number. Next, multidimensional scaling is used to map quantization codes that were temporally close in the encoded speech to nearby points in a {ital continuity} {ital map}. Since speech sounds produced sufficiently close together in time must have been produced by similar articulator configurations, and speech sounds produced close together in time are close to each other in the continuity map, sounds produced by similar articulator positions should be mapped to similar positions in the continuity map. The data set used for evaluating the continuity mapping algorithm is comprised of simultaneously collected articulator and acoustic measurements made using an electromagnetic midsagittal articulometer on a human subject. Comparisons between measured articulator positions and those recovered using continuity mapping will be presented.

  6. Investigation of the optimum acoustical conditions for speech using auralization

    NASA Astrophysics Data System (ADS)

    Yang, Wonyoung; Hodgson, Murray

    2001-05-01

    Speech intelligibility is mainly affected by reverberation and by signal-to-noise level difference, the difference between the speech-signal and background-noise levels at a receiver. An important question for the design of rooms for speech (e.g., classrooms) is, what are the optimal values of these factors? This question has been studied experimentally and theoretically. Experimental studies found zero optimal reverberation time, but theoretical predictions found nonzero reverberation times. These contradictory results are partly caused by the different ways of accounting for background noise. Background noise sources and their locations inside the room are the most detrimental factors in speech intelligibility. However, noise levels also interact with reverberation in rooms. In this project, two major room-acoustical factors for speech intelligibility were controlled using speech and noise sources of known relative output levels located in a virtual room with known reverberation. Speech intelligibility test signals were played in the virtual room and auralized for listeners. The Modified Rhyme Test (MRT) and babble noise were used to measure subjective speech intelligibility quality. Optimal reverberation times, and the optimal values of other speech intelligibility metrics, for normal-hearing people and for hard-of-hearing people, were identified and compared.

  7. Acoustic Speech Analysis Of Wayang Golek Puppeteer

    NASA Astrophysics Data System (ADS)

    Hakim, Faisal Abdul; Mandasari, Miranti Indar; Sarwono, Joko

    2010-12-01

    Active disguising speech is one problem to be taken into account in forensic speaker verification or identification processes. The verification processes are usually carried out by comparison between unknown samples and known samples. Active disguising can be occurred on both samples. To simulate the condition of speech disguising, voices of Wayang Golek Puppeteer were used. It is assumed that wayang golek puppeteer is a master of disguise. He can manipulate his voice into many different types of character's voices. This paper discusses the speech characteristics of 2 puppeteers. Comparison was made between the voices of puppeteer's habitual voice with his manipulated voice.

  8. EEG oscillations entrain their phase to high-level features of speech sound.

    PubMed

    Zoefel, Benedikt; VanRullen, Rufin

    2016-01-01

    Phase entrainment of neural oscillations, the brain's adjustment to rhythmic stimulation, is a central component in recent theories of speech comprehension: the alignment between brain oscillations and speech sound improves speech intelligibility. However, phase entrainment to everyday speech sound could also be explained by oscillations passively following the low-level periodicities (e.g., in sound amplitude and spectral content) of auditory stimulation-and not by an adjustment to the speech rhythm per se. Recently, using novel speech/noise mixture stimuli, we have shown that behavioral performance can entrain to speech sound even when high-level features (including phonetic information) are not accompanied by fluctuations in sound amplitude and spectral content. In the present study, we report that neural phase entrainment might underlie our behavioral findings. We observed phase-locking between electroencephalogram (EEG) and speech sound in response not only to original (unprocessed) speech but also to our constructed "high-level" speech/noise mixture stimuli. Phase entrainment to original speech and speech/noise sound did not differ in the degree of entrainment, but rather in the actual phase difference between EEG signal and sound. Phase entrainment was not abolished when speech/noise stimuli were presented in reverse (which disrupts semantic processing), indicating that acoustic (rather than linguistic) high-level features play a major role in the observed neural entrainment. Our results provide further evidence for phase entrainment as a potential mechanism underlying speech processing and segmentation, and for the involvement of high-level processes in the adjustment to the rhythm of speech.

  9. Music, Speech and Hearing: A Course in Descriptive Acoustics.

    ERIC Educational Resources Information Center

    Strong, William J.; Dudley, J. Duane

    A general undergraduate education course in descriptive acoustics for students in music, speech, and environmental disciplines is described. Student background, course philosophy, and course format are presented. Resource materials, which include books, a study guide, films, audio tapes, demonstrations, walk-in laboratories, and an examination…

  10. Acoustical conditions for speech communication in active elementary school classrooms

    NASA Astrophysics Data System (ADS)

    Sato, Hiroshi; Bradley, John

    2005-04-01

    Detailed acoustical measurements were made in 34 active elementary school classrooms with typical rectangular room shape in schools near Ottawa, Canada. There was an average of 21 students in classrooms. The measurements were made to obtain accurate indications of the acoustical quality of conditions for speech communication during actual teaching activities. Mean speech and noise levels were determined from the distribution of recorded sound levels and the average speech-to-noise ratio was 11 dBA. Measured mid-frequency reverberation times (RT) during the same occupied conditions varied from 0.3 to 0.6 s, and were a little less than for the unoccupied rooms. RT values were not related to noise levels. Octave band speech and noise levels, useful-to-detrimental ratios, and Speech Transmission Index values were also determined. Key results included: (1) The average vocal effort of teachers corresponded to louder than Pearsons Raised voice level; (2) teachers increase their voice level to overcome ambient noise; (3) effective speech levels can be enhanced by up to 5 dB by early reflection energy; and (4) student activity is seen to be the dominant noise source, increasing average noise levels by up to 10 dBA during teaching activities. [Work supported by CLLRnet.

  11. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2002-01-01

    Low power EM waves are used to detect motions of vocal tract tissues of the human speech system before, during, and after voiced speech. A voiced excitation function is derived. The excitation function provides speech production information to enhance speech characterization and to enable noise removal from human speech.

  12. Speech Intelligibility Advantages using an Acoustic Beamformer Display

    NASA Technical Reports Server (NTRS)

    Begault, Durand R.; Sunder, Kaushik; Godfroy, Martine; Otto, Peter

    2015-01-01

    A speech intelligibility test conforming to the Modified Rhyme Test of ANSI S3.2 "Method for Measuring the Intelligibility of Speech Over Communication Systems" was conducted using a prototype 12-channel acoustic beamformer system. The target speech material (signal) was identified against speech babble (noise), with calculated signal-noise ratios of 0, 5 and 10 dB. The signal was delivered at a fixed beam orientation of 135 deg (re 90 deg as the frontal direction of the array) and the noise at 135 deg (co-located) and 0 deg (separated). A significant improvement in intelligibility from 57% to 73% was found for spatial separation for the same signal-noise ratio (0 dB). Significant effects for improved intelligibility due to spatial separation were also found for higher signal-noise ratios (5 and 10 dB).

  13. RAPID ACOUSTIC PROCESSING IN THE AUDITORY BRAINSTEM IS NOT RELATED TO CORTICAL ASYMMETRY FOR THE SYLLABLE RATE OF SPEECH

    PubMed Central

    Abrams, Daniel A.; Nicol, Trent; Zecker, Steven; Kraus, Nina

    2010-01-01

    Objective Temporal acuity in the auditory brainstem is correlated with left-dominant patterns of cortical asymmetry for processing rapid speech-sound stimuli. Here we investigate whether a similar relationship exists between brainstem processing of rapid speech components and cortical processing of syllable patterns in speech. Methods We measured brainstem and cortical evoked potentials in response to speech tokens in 23 children. We used established measures of auditory brainstem and cortical activity to examine functional relationships between these structures. Results We found no relationship between brainstem responses to fast acoustic elements of speech and right-dominant cortical processing of syllable patterns. Conclusions Brainstem processing of rapid elements in speech is not functionally related to rightward cortical asymmetry associated with the processing of syllable-rate features in speech. Viewed together with previous evidence linking brainstem timing with leftward cortical asymmetry for faster acoustic features, findings support the existence of distinct mechanisms for encoding rapid vs. slow elements of speech. Significance Results provide a fundamental advance in our knowledge of the segregation of subcortical input associated with cortical asymmetries for acoustic rate processing in the human auditory system. Implications of these findings for auditory perception, reading ability and development are discussed. PMID:20378402

  14. The Effects of Noise on Speech Recognition in Cochlear Implant Subjects: Predictions and Analysis Using Acoustic Models

    NASA Astrophysics Data System (ADS)

    Remus, Jeremiah J.; Collins, Leslie M.

    2005-12-01

    Cochlear implants can provide partial restoration of hearing, even with limited spectral resolution and loss of fine temporal structure, to severely deafened individuals. Studies have indicated that background noise has significant deleterious effects on the speech recognition performance of cochlear implant patients. This study investigates the effects of noise on speech recognition using acoustic models of two cochlear implant speech processors and several predictive signal-processing-based analyses. The results of a listening test for vowel and consonant recognition in noise are presented and analyzed using the rate of phonemic feature transmission for each acoustic model. Three methods for predicting patterns of consonant and vowel confusion that are based on signal processing techniques calculating a quantitative difference between speech tokens are developed and tested using the listening test results. Results of the listening test and confusion predictions are discussed in terms of comparisons between acoustic models and confusion prediction performance.

  15. Speech recognition: Acoustic-phonetic knowledge acquisition and representation

    NASA Astrophysics Data System (ADS)

    Zue, Victor W.

    1988-09-01

    The long-term research goal is to develop and implement speaker-independent continuous speech recognition systems. It is believed that the proper utilization of speech-specific knowledge is essential for such advanced systems. This research is thus directed toward the acquisition, quantification, and representation, of acoustic-phonetic and lexical knowledge, and the application of this knowledge to speech recognition algorithms. In addition, we are exploring new speech recognition alternatives based on artificial intelligence and connectionist techniques. We developed a statistical model for predicting the acoustic realization of stop consonants in various positions in the syllable template. A unification-based grammatical formalism was developed for incorporating this model into the lexical access algorithm. We provided an information-theoretic justification for the hierarchical structure of the syllable template. We analyzed segmented duration for vowels and fricatives in continuous speech. Based on contextual information, we developed durational models for vowels and fricatives that account for over 70 percent of the variance, using data from multiple, unknown speakers. We rigorously evaluated the ability of human spectrogram readers to identify stop consonants spoken by many talkers and in a variety of phonetic contexts. Incorporating the declarative knowledge used by the readers, we developed a knowledge-based system for stop identification. We achieved comparable system performance to that to the readers.

  16. LANGUAGE- AND TALKER-DEPENDENT VARIATION IN GLOBAL FEATURES OF NATIVE AND NON-NATIVE SPEECH.

    PubMed

    Bradlow, Ann R; Ackerman, Lauren; Burchfield, L Ann; Hesterberg, Lisa; Luque, Jenna; Mok, Kelsey

    2011-01-01

    We motivate and present a corpus of scripted and spontaneous speech in both the native and the non-native language of talkers from various language backgrounds. Using corpus recordings from 11 native English and 11 late Mandarin-English bilinguals we compared speech timing across native English, native Mandarin, and Mandarin-accented English. Findings showed similarities across native Mandarin and native English in speaking rate and in reduction of the number of acoustic relative to orthographic syllables. The two languages differed in silence-to-speech ratio and in the number of words between pauses, possibly reflecting phrase-level structural differences between English and Mandarin. Non-native English had a significantly slower speaking rate and lower rate of syllable reduction than both native English and native Mandarin. But, non-native English was similar to native English in terms of silence-to-speech ratio and was similar to native Mandarin in terms of words per pause. Finally, some talker-specificity in terms of (non)optimal speech timing appeared to transfer from native to non-native speech within the Mandarin-English bilinguals. These findings provide an empirical base for testing how language-dependent, structural features combine with general features of non-native speech production and with talker-dependent features in determining foreign-language speech production.

  17. Predicting the intelligibility of deaf children's speech from acoustic measures

    NASA Astrophysics Data System (ADS)

    Uchanski, Rosalie M.; Geers, Ann E.; Brenner, Christine M.; Tobey, Emily A.

    2001-05-01

    A weighted combination of speech-acoustic measures may provide an objective assessment of speech intelligibility in deaf children that could be used to evaluate the benefits of sensory aids and rehabilitation programs. This investigation compared the accuracy of two different approaches, multiple linear regression and a simple neural net. These two methods were applied to identical sets of acoustic measures, including both segmental (e.g., voice-onset times of plosives, spectral moments of fricatives, second formant frequencies of vowels) and suprasegmental measures (e.g., sentence duration, number and frequency of intersentence pauses). These independent variables were obtained from digitized recordings of deaf children's imitations of 11 simple sentences. The dependent measure was the percentage of spoken words from the 36 McGarr Sentences understood by groups of naive listeners. The two predictive methods were trained on speech measures obtained from 123 out of 164 8- and 9-year-old deaf children who used cochlear implants. Then, predictions were obtained using speech measures from the remaining 41 children. Preliminary results indicate that multiple linear regression is a better predictor of intelligibility than the neural net, accounting for 79% as opposed to 65% of the variance in the data. [Work supported by NIH.

  18. Tongue-Palate Contact Pressure, Oral Air Pressure, and Acoustics of Clear Speech

    ERIC Educational Resources Information Center

    Searl, Jeff; Evitts, Paul M.

    2013-01-01

    Purpose: The authors compared articulatory contact pressure (ACP), oral air pressure (Po), and speech acoustics for conversational versus clear speech. They also assessed the relationship of these measures to listener perception. Method: Twelve adults with normal speech produced monosyllables in a phrase using conversational and clear speech.…

  19. The acoustic features of human laughter

    NASA Astrophysics Data System (ADS)

    Bachorowski, Jo-Anne; Owren, Michael J.

    2002-05-01

    Remarkably little is known about the acoustic features of laughter, despite laughter's ubiquitous role in human vocal communication. Outcomes are described for 1024 naturally produced laugh bouts recorded from 97 young adults. Acoustic analysis focused on temporal characteristics, production modes, source- and filter-related effects, and indexical cues to laugher sex and individual identity. The results indicate that laughter is a remarkably complex vocal signal, with evident diversity in both production modes and fundamental frequency characteristics. Also of interest was finding a consistent lack of articulation effects in supralaryngeal filtering. Outcomes are compared to previously advanced hypotheses and conjectures about this species-typical vocal signal.

  20. Aging Affects Neural Synchronization to Speech-Related Acoustic Modulations

    PubMed Central

    Goossens, Tine; Vercammen, Charlotte; Wouters, Jan; van Wieringen, Astrid

    2016-01-01

    As people age, speech perception problems become highly prevalent, especially in noisy situations. In addition to peripheral hearing and cognition, temporal processing plays a key role in speech perception. Temporal processing of speech features is mediated by synchronized activity of neural oscillations in the central auditory system. Previous studies indicate that both the degree and hemispheric lateralization of synchronized neural activity relate to speech perception performance. Based on these results, we hypothesize that impaired speech perception in older persons may, in part, originate from deviances in neural synchronization. In this study, auditory steady-state responses that reflect synchronized activity of theta, beta, low and high gamma oscillations (i.e., 4, 20, 40, and 80 Hz ASSR, respectively) were recorded in young, middle-aged, and older persons. As all participants had normal audiometric thresholds and were screened for (mild) cognitive impairment, differences in synchronized neural activity across the three age groups were likely to be attributed to age. Our data yield novel findings regarding theta and high gamma oscillations in the aging auditory system. At an older age, synchronized activity of theta oscillations is increased, whereas high gamma synchronization is decreased. In contrast to young persons who exhibit a right hemispheric dominance for processing of high gamma range modulations, older adults show a symmetrical processing pattern. These age-related changes in neural synchronization may very well underlie the speech perception problems in aging persons. PMID:27378906

  1. Denoising of human speech using combined acoustic and em sensor signal processing

    SciTech Connect

    Ng, L C; Burnett, G C; Holzrichter, J F; Gable, T J

    1999-11-29

    Low Power EM radar-like sensors have made it possible to measure properties of the human speech production system in real-time, without acoustic interference. This greatly enhances the quality and quantify of information for many speech related applications. See Holzrichter, Burnett, Ng, and Lea, J. Acoustic. Soc. Am. 103 (1) 622 (1998). By using combined Glottal-EM- Sensor- and Acoustic-signals, segments of voiced, unvoiced, and no-speech can be reliably defined. Real-time Denoising filters can be constructed to remove noise from the user's corresponding speech signal.

  2. An acoustic comparison of two women's infant- and adult-directed speech

    NASA Astrophysics Data System (ADS)

    Andruski, Jean; Katz-Gershon, Shiri

    2003-04-01

    In addition to having prosodic characteristics that are attractive to infant listeners, infant-directed (ID) speech shares certain characteristics of adult-directed (AD) clear speech, such as increased acoustic distance between vowels, that might be expected to make ID speech easier for adults to perceive in noise than AD conversational speech. However, perceptual tests of two women's ID productions by Andruski and Bessega [J. Acoust. Soc. Am. 112, 2355] showed that is not always the case. In a word identification task that compared ID speech with AD clear and conversational speech, one speaker's ID productions were less well-identified than AD clear speech, but better identified than AD conversational speech. For the second woman, ID speech was the least accurately identified of the three speech registers. For both speakers, hard words (infrequent words with many lexical neighbors) were also at an increased disadvantage relative to easy words (frequent words with few lexical neighbors) in speech registers that were less accurately perceived. This study will compare several acoustic properties of these women's productions, including pitch and formant-frequency characteristics. Results of the acoustic analyses will be examined with the original perceptual results to suggest reasons for differences in listener's accuracy in identifying these two women's ID speech in noise.

  3. Acoustic Analysis of the Voiced-Voiceless Distinction in Dutch Tracheoesophageal Speech

    ERIC Educational Resources Information Center

    Jongmans, Petra; Wempe, Ton G.; van Tinteren, Harm; Hilgers, Frans J. M.; Pols, Louis C. W.; van As-Brooks, Corina J.

    2010-01-01

    Purpose: Confusions between voiced and voiceless plosives and voiced and voiceless fricatives are common in Dutch tracheoesophageal (TE) speech. This study investigates (a) which acoustic measures are found to convey a correct voicing contrast in TE speech and (b) whether different measures are found in TE speech than in normal laryngeal (NL)…

  4. Perceptual and Acoustic Reliability Estimates for the Speech Disorders Classification System (SDCS)

    ERIC Educational Resources Information Center

    Shriberg, Lawrence D.; Fourakis, Marios; Hall, Sheryl D.; Karlsson, Heather B.; Lohmeier, Heather L.; McSweeny, Jane L.; Potter, Nancy L.; Scheer-Cohen, Alison R.; Strand, Edythe A.; Tilkens, Christie M.; Wilson, David L.

    2010-01-01

    A companion paper describes three extensions to a classification system for paediatric speech sound disorders termed the Speech Disorders Classification System (SDCS). The SDCS uses perceptual and acoustic data reduction methods to obtain information on a speaker's speech, prosody, and voice. The present paper provides reliability estimates for…

  5. Acoustic landmarks drive delta-theta oscillations to enable speech comprehension by facilitating perceptual parsing.

    PubMed

    Doelling, Keith B; Arnal, Luc H; Ghitza, Oded; Poeppel, David

    2014-01-15

    A growing body of research suggests that intrinsic neuronal slow (<10 Hz) oscillations in auditory cortex appear to track incoming speech and other spectro-temporally complex auditory signals. Within this framework, several recent studies have identified critical-band temporal envelopes as the specific acoustic feature being reflected by the phase of these oscillations. However, how this alignment between speech acoustics and neural oscillations might underpin intelligibility is unclear. Here we test the hypothesis that the 'sharpness' of temporal fluctuations in the critical band envelope acts as a temporal cue to speech syllabic rate, driving delta-theta rhythms to track the stimulus and facilitate intelligibility. We interpret our findings as evidence that sharp events in the stimulus cause cortical rhythms to re-align and parse the stimulus into syllable-sized chunks for further decoding. Using magnetoencephalographic recordings, we show that by removing temporal fluctuations that occur at the syllabic rate, envelope-tracking activity is reduced. By artificially reinstating these temporal fluctuations, envelope-tracking activity is regained. These changes in tracking correlate with intelligibility of the stimulus. Together, the results suggest that the sharpness of fluctuations in the stimulus, as reflected in the cochlear output, drive oscillatory activity to track and entrain to the stimulus, at its syllabic rate. This process likely facilitates parsing of the stimulus into meaningful chunks appropriate for subsequent decoding, enhancing perception and intelligibility.

  6. Acoustic landmarks drive delta-theta oscillations to enable speech comprehension by facilitating perceptual parsing

    PubMed Central

    Doelling, Keith; Arnal, Luc; Ghitza, Oded; Poeppel, David

    2013-01-01

    A growing body of research suggests that intrinsic neuronal slow (< 10 Hz) oscillations in auditory cortex appear to track incoming speech and other spectro-temporally complex auditory signals. Within this framework, several recent studies have identified critical-band temporal envelopes as the specific acoustic feature being reflected by the phase of these oscillations. However, how this alignment between speech acoustics and neural oscillations might underpin intelligibility is unclear. Here we test the hypothesis that the ‘sharpness’ of temporal fluctuations in the critical band envelope acts as a temporal cue to speech syllabic rate, driving delta-theta rhythms to track the stimulus and facilitate intelligibility. We interpret our findings as evidence that sharp events in the stimulus cause cortical rhythms to re-align and parse the stimulus into syllable-sized chunks for further decoding. Using magnetoencephalographic recordings, we show that by removing temporal fluctuations that occur at the syllabic rate, envelope-tracking activity is reduced. By artificially reinstating these temporal fluctuations, envelope-tracking activity is regained. These changes in tracking correlate with intelligibility of the stimulus. Together, the results suggest that the sharpness of fluctuations in the stimulus, as reflected in the cochlear output, drive oscillatory activity to track and entrain to the stimulus, at its syllabic rate. This process likely facilitates parsing of the stimulus into meaningful chunks appropriate for subsequent decoding, enhancing perception and intelligibility. PMID:23791839

  7. Logopenic and nonfluent variants of primary progressive aphasia are differentiated by acoustic measures of speech production.

    PubMed

    Ballard, Kirrie J; Savage, Sharon; Leyton, Cristian E; Vogel, Adam P; Hornberger, Michael; Hodges, John R

    2014-01-01

    Differentiation of logopenic (lvPPA) and nonfluent/agrammatic (nfvPPA) variants of Primary Progressive Aphasia is important yet remains challenging since it hinges on expert based evaluation of speech and language production. In this study acoustic measures of speech in conjunction with voxel-based morphometry were used to determine the success of the measures as an adjunct to diagnosis and to explore the neural basis of apraxia of speech in nfvPPA. Forty-one patients (21 lvPPA, 20 nfvPPA) were recruited from a consecutive sample with suspected frontotemporal dementia. Patients were diagnosed using the current gold-standard of expert perceptual judgment, based on presence/absence of particular speech features during speaking tasks. Seventeen healthy age-matched adults served as controls. MRI scans were available for 11 control and 37 PPA cases; 23 of the PPA cases underwent amyloid ligand PET imaging. Measures, corresponding to perceptual features of apraxia of speech, were periods of silence during reading and relative vowel duration and intensity in polysyllable word repetition. Discriminant function analyses revealed that a measure of relative vowel duration differentiated nfvPPA cases from both control and lvPPA cases (r(2) = 0.47) with 88% agreement with expert judgment of presence of apraxia of speech in nfvPPA cases. VBM analysis showed that relative vowel duration covaried with grey matter intensity in areas critical for speech motor planning and programming: precentral gyrus, supplementary motor area and inferior frontal gyrus bilaterally, only affected in the nfvPPA group. This bilateral involvement of frontal speech networks in nfvPPA potentially affects access to compensatory mechanisms involving right hemisphere homologues. Measures of silences during reading also discriminated the PPA and control groups, but did not increase predictive accuracy. Findings suggest that a measure of relative vowel duration from of a polysyllable word repetition task

  8. Logopenic and Nonfluent Variants of Primary Progressive Aphasia Are Differentiated by Acoustic Measures of Speech Production

    PubMed Central

    Ballard, Kirrie J.; Savage, Sharon; Leyton, Cristian E.; Vogel, Adam P.; Hornberger, Michael; Hodges, John R.

    2014-01-01

    Differentiation of logopenic (lvPPA) and nonfluent/agrammatic (nfvPPA) variants of Primary Progressive Aphasia is important yet remains challenging since it hinges on expert based evaluation of speech and language production. In this study acoustic measures of speech in conjunction with voxel-based morphometry were used to determine the success of the measures as an adjunct to diagnosis and to explore the neural basis of apraxia of speech in nfvPPA. Forty-one patients (21 lvPPA, 20 nfvPPA) were recruited from a consecutive sample with suspected frontotemporal dementia. Patients were diagnosed using the current gold-standard of expert perceptual judgment, based on presence/absence of particular speech features during speaking tasks. Seventeen healthy age-matched adults served as controls. MRI scans were available for 11 control and 37 PPA cases; 23 of the PPA cases underwent amyloid ligand PET imaging. Measures, corresponding to perceptual features of apraxia of speech, were periods of silence during reading and relative vowel duration and intensity in polysyllable word repetition. Discriminant function analyses revealed that a measure of relative vowel duration differentiated nfvPPA cases from both control and lvPPA cases (r2 = 0.47) with 88% agreement with expert judgment of presence of apraxia of speech in nfvPPA cases. VBM analysis showed that relative vowel duration covaried with grey matter intensity in areas critical for speech motor planning and programming: precentral gyrus, supplementary motor area and inferior frontal gyrus bilaterally, only affected in the nfvPPA group. This bilateral involvement of frontal speech networks in nfvPPA potentially affects access to compensatory mechanisms involving right hemisphere homologues. Measures of silences during reading also discriminated the PPA and control groups, but did not increase predictive accuracy. Findings suggest that a measure of relative vowel duration from of a polysyllable word repetition task

  9. Moving to the Speed of Sound: Context Modulation of the Effect of Acoustic Properties of Speech

    ERIC Educational Resources Information Center

    Shintel, Hadas; Nusbaum, Howard C.

    2008-01-01

    Suprasegmental acoustic patterns in speech can convey meaningful information and affect listeners' interpretation in various ways, including through systematic analog mapping of message-relevant information onto prosody. We examined whether the effect of analog acoustic variation is governed by the acoustic properties themselves. For example, fast…

  10. Mandarin Speech Perception in Combined Electric and Acoustic Stimulation

    PubMed Central

    Li, Yongxin; Zhang, Guoping; Galvin, John J.; Fu, Qian-Jie

    2014-01-01

    For deaf individuals with residual low-frequency acoustic hearing, combined use of a cochlear implant (CI) and hearing aid (HA) typically provides better speech understanding than with either device alone. Because of coarse spectral resolution, CIs do not provide fundamental frequency (F0) information that contributes to understanding of tonal languages such as Mandarin Chinese. The HA can provide good representation of F0 and, depending on the range of aided acoustic hearing, first and second formant (F1 and F2) information. In this study, Mandarin tone, vowel, and consonant recognition in quiet and noise was measured in 12 adult Mandarin-speaking bimodal listeners with the CI-only and with the CI+HA. Tone recognition was significantly better with the CI+HA in noise, but not in quiet. Vowel recognition was significantly better with the CI+HA in quiet, but not in noise. There was no significant difference in consonant recognition between the CI-only and the CI+HA in quiet or in noise. There was a wide range in bimodal benefit, with improvements often greater than 20 percentage points in some tests and conditions. The bimodal benefit was compared to CI subjects’ HA-aided pure-tone average (PTA) thresholds between 250 and 2000 Hz; subjects were divided into two groups: “better” PTA (<50 dB HL) or “poorer” PTA (>50 dB HL). The bimodal benefit differed significantly between groups only for consonant recognition. The bimodal benefit for tone recognition in quiet was significantly correlated with CI experience, suggesting that bimodal CI users learn to better combine low-frequency spectro-temporal information from acoustic hearing with temporal envelope information from electric hearing. Given the small number of subjects in this study (n = 12), further research with Chinese bimodal listeners may provide more information regarding the contribution of acoustic and electric hearing to tonal language perception. PMID:25386962

  11. Acoustical Characteristics of Mastication Sounds: Application of Speech Analysis Techniques

    NASA Astrophysics Data System (ADS)

    Brochetti, Denise

    Food scientists have used acoustical methods to study characteristics of mastication sounds in relation to food texture. However, a model for analysis of the sounds has not been identified, and reliability of the methods has not been reported. Therefore, speech analysis techniques were applied to mastication sounds, and variation in measures of the sounds was examined. To meet these objectives, two experiments were conducted. In the first experiment, a digital sound spectrograph generated waveforms and wideband spectrograms of sounds by 3 adult subjects (1 male, 2 females) for initial chews of food samples differing in hardness and fracturability. Acoustical characteristics were described and compared. For all sounds, formants appeared in the spectrograms, and energy occurred across a 0 to 8000-Hz range of frequencies. Bursts characterized waveforms for peanut, almond, raw carrot, ginger snap, and hard candy. Duration and amplitude of the sounds varied with the subjects. In the second experiment, the spectrograph was used to measure the duration, amplitude, and formants of sounds for the initial 2 chews of cylindrical food samples (raw carrot, teething toast) differing in diameter (1.27, 1.90, 2.54 cm). Six adult subjects (3 males, 3 females) having normal occlusions and temporomandibular joints chewed the samples between the molar teeth and with the mouth open. Ten repetitions per subject were examined for each food sample. Analysis of estimates of variation indicated an inconsistent intrasubject variation in the acoustical measures. Food type and sample diameter also affected the estimates, indicating the variable nature of mastication. Generally, intrasubject variation was greater than intersubject variation. Analysis of ranks of the data indicated that the effect of sample diameter on the acoustical measures was inconsistent and depended on the subject and type of food. If inferences are to be made concerning food texture from acoustical measures of mastication

  12. How stable are acoustic metrics of contrastive speech rhythm?

    PubMed

    Wiget, Lukas; White, Laurence; Schuppler, Barbara; Grenon, Izabelle; Rauch, Olesya; Mattys, Sven L

    2010-03-01

    Acoustic metrics of contrastive speech rhythm, based on vocalic and intervocalic interval durations, are intended to capture stable typological differences between languages. They should consequently be robust to variation between speakers, sentence materials, and measurers. This paper assesses the impact of these sources of variation on the metrics %V (proportion of utterance comprised of vocalic intervals), VarcoV (rate-normalized standard deviation of vocalic interval duration), and nPVI-V (a measure of the durational variability between successive pairs of vocalic intervals). Five measurers analyzed the same corpus of speech: five sentences read by six speakers of Standard Southern British English. Differences between sentences were responsible for the greatest variation in rhythm scores. Inter-speaker differences were also a source of significant variability. However, there was relatively little variation due to segmentation differences between measurers following an agreed protocol. An automated phone alignment process was also used: Rhythm scores thus derived showed good agreement with the human measurers. A number of recommendations for researchers wishing to exploit contrastive rhythm metrics are offered in conclusion. PMID:20329856

  13. Age-related Changes in Acoustic Modifications of Mandarin Maternal Speech to Preverbal Infants and Five-Year-Old Children: A Longitudinal Study

    PubMed Central

    Liu, Huei-Mei; Tsao, Feng-Ming; Kuhl, Patricia K.

    2010-01-01

    Acoustic-phonetic exaggeration of infant-directed speech (IDS) is well documented, but few studies address whether these features are modified with a child's age. Mandarin-speaking mothers were recorded while addressing an adult and their child at two ages (7-12 months and 5 years) to examine the acoustic-phonetic differences between IDS and child–directed speech (CDS). CDS exhibits an exaggeration pattern resembling that of IDS—expanded vowel space, longer vowels, higher pitch, and greater lexical tone differences—when compared to ADS. Longitudinal analysis demonstrated that the extent of acoustic exaggeration is significantly smaller in CDS than in IDS. Age-related changes in maternal speech provide some support for the hypothesis that mothers adjust their speech directed toward children as a function of the child's language ability. PMID:19232142

  14. Acoustic properties of naturally produced clear speech at normal speaking rates

    NASA Astrophysics Data System (ADS)

    Krause, Jean C.; Braida, Louis D.

    2004-01-01

    Sentences spoken ``clearly'' are significantly more intelligible than those spoken ``conversationally'' for hearing-impaired listeners in a variety of backgrounds [Picheny et al., J. Speech Hear. Res. 28, 96-103 (1985); Uchanski et al., ibid. 39, 494-509 (1996); Payton et al., J. Acoust. Soc. Am. 95, 1581-1592 (1994)]. While producing clear speech, however, talkers often reduce their speaking rate significantly [Picheny et al., J. Speech Hear. Res. 29, 434-446 (1986); Uchanski et al., ibid. 39, 494-509 (1996)]. Yet speaking slowly is not solely responsible for the intelligibility benefit of clear speech (over conversational speech), since a recent study [Krause and Braida, J. Acoust. Soc. Am. 112, 2165-2172 (2002)] showed that talkers can produce clear speech at normal rates with training. This finding suggests that clear speech has inherent acoustic properties, independent of rate, that contribute to improved intelligibility. Identifying these acoustic properties could lead to improved signal processing schemes for hearing aids. To gain insight into these acoustical properties, conversational and clear speech produced at normal speaking rates were analyzed at three levels of detail (global, phonological, and phonetic). Although results suggest that talkers may have employed different strategies to achieve clear speech at normal rates, two global-level properties were identified that appear likely to be linked to the improvements in intelligibility provided by clear/normal speech: increased energy in the 1000-3000-Hz range of long-term spectra and increased modulation depth of low frequency modulations of the intensity envelope. Other phonological and phonetic differences associated with clear/normal speech include changes in (1) frequency of stop burst releases, (2) VOT of word-initial voiceless stop consonants, and (3) short-term vowel spectra.

  15. DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1

    NASA Astrophysics Data System (ADS)

    Garofolo, J. S.; Lamel, L. F.; Fisher, W. M.; Fiscus, J. G.; Pallett, D. S.

    1993-02-01

    The Texas Instruments/Massachusetts Institute of Technology (TIMIT) corpus of read speech has been designed to provide speech data for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of automatic speech recognition systems. TIMIT contains speech from 630 speakers representing 8 major dialect divisions of American English, each speaking 10 phonetically-rich sentences. The TIMIT corpus includes time-aligned orthographic, phonetic, and word transcriptions, as well as speech waveform data for each spoken sentence. The release of TIMIT contains several improvements over the Prototype CD-ROM released in December, 1988: (1) full 630-speaker corpus, (2) checked and corrected transcriptions, (3) word-alignment transcriptions, (4) NIST SPHERE-headered waveform files and header manipulation software, (5) phonemic dictionary, (6) new test and training subsets balanced for dialectal and phonetic coverage, and (7) more extensive documentation.

  16. Formant Centralization Ratio: A Proposal for a New Acoustic Measure of Dysarthric Speech

    ERIC Educational Resources Information Center

    Sapir, Shimon; Ramig, Lorraine O.; Spielman, Jennifer L.; Fox, Cynthia

    2010-01-01

    Purpose: The vowel space area (VSA) has been used as an acoustic metric of dysarthric speech, but with varying degrees of success. In this study, the authors aimed to test an alternative metric to the VSA--the "formant centralization ratio" (FCR), which is hypothesized to more effectively differentiate dysarthric from healthy speech and register…

  17. Effects of Training on the Acoustic-Phonetic Representation of Synthetic Speech

    ERIC Educational Resources Information Center

    Francis, Alexander L.; Nusbaum, Howard C.; Fenn, Kimberly

    2007-01-01

    Purpose: Investigate training-related changes in acoustic-phonetic representation of consonants produced by a text-to-speech (TTS) computer speech synthesizer. Method: Forty-eight adult listeners were trained to better recognize words produced by a TTS system. Nine additional untrained participants served as controls. Before and after training,…

  18. Acoustic diagnosis of pulmonary hypertension: automated speech- recognition-inspired classification algorithm outperforms physicians.

    PubMed

    Kaddoura, Tarek; Vadlamudi, Karunakar; Kumar, Shine; Bobhate, Prashant; Guo, Long; Jain, Shreepal; Elgendi, Mohamed; Coe, James Y; Kim, Daniel; Taylor, Dylan; Tymchak, Wayne; Schuurmans, Dale; Zemp, Roger J; Adatia, Ian

    2016-01-01

    We hypothesized that an automated speech- recognition-inspired classification algorithm could differentiate between the heart sounds in subjects with and without pulmonary hypertension (PH) and outperform physicians. Heart sounds, electrocardiograms, and mean pulmonary artery pressures (mPAp) were recorded simultaneously. Heart sound recordings were digitized to train and test speech-recognition-inspired classification algorithms. We used mel-frequency cepstral coefficients to extract features from the heart sounds. Gaussian-mixture models classified the features as PH (mPAp ≥ 25 mmHg) or normal (mPAp < 25 mmHg). Physicians blinded to patient data listened to the same heart sound recordings and attempted a diagnosis. We studied 164 subjects: 86 with mPAp ≥ 25 mmHg (mPAp 41 ± 12 mmHg) and 78 with mPAp < 25 mmHg (mPAp 17 ± 5 mmHg) (p  < 0.005). The correct diagnostic rate of the automated speech-recognition-inspired algorithm was 74% compared to 56% by physicians (p = 0.005). The false positive rate for the algorithm was 34% versus 50% (p = 0.04) for clinicians. The false negative rate for the algorithm was 23% and 68% (p = 0.0002) for physicians. We developed an automated speech-recognition-inspired classification algorithm for the acoustic diagnosis of PH that outperforms physicians that could be used to screen for PH and encourage earlier specialist referral. PMID:27609672

  19. Acoustic diagnosis of pulmonary hypertension: automated speech- recognition-inspired classification algorithm outperforms physicians

    PubMed Central

    Kaddoura, Tarek; Vadlamudi, Karunakar; Kumar, Shine; Bobhate, Prashant; Guo, Long; Jain, Shreepal; Elgendi, Mohamed; Coe, James Y; Kim, Daniel; Taylor, Dylan; Tymchak, Wayne; Schuurmans, Dale; Zemp, Roger J.; Adatia, Ian

    2016-01-01

    We hypothesized that an automated speech- recognition-inspired classification algorithm could differentiate between the heart sounds in subjects with and without pulmonary hypertension (PH) and outperform physicians. Heart sounds, electrocardiograms, and mean pulmonary artery pressures (mPAp) were recorded simultaneously. Heart sound recordings were digitized to train and test speech-recognition-inspired classification algorithms. We used mel-frequency cepstral coefficients to extract features from the heart sounds. Gaussian-mixture models classified the features as PH (mPAp ≥ 25 mmHg) or normal (mPAp < 25 mmHg). Physicians blinded to patient data listened to the same heart sound recordings and attempted a diagnosis. We studied 164 subjects: 86 with mPAp ≥ 25 mmHg (mPAp 41 ± 12 mmHg) and 78 with mPAp < 25 mmHg (mPAp 17 ± 5 mmHg) (p  < 0.005). The correct diagnostic rate of the automated speech-recognition-inspired algorithm was 74% compared to 56% by physicians (p = 0.005). The false positive rate for the algorithm was 34% versus 50% (p = 0.04) for clinicians. The false negative rate for the algorithm was 23% and 68% (p = 0.0002) for physicians. We developed an automated speech-recognition-inspired classification algorithm for the acoustic diagnosis of PH that outperforms physicians that could be used to screen for PH and encourage earlier specialist referral. PMID:27609672

  20. Acoustic diagnosis of pulmonary hypertension: automated speech- recognition-inspired classification algorithm outperforms physicians

    NASA Astrophysics Data System (ADS)

    Kaddoura, Tarek; Vadlamudi, Karunakar; Kumar, Shine; Bobhate, Prashant; Guo, Long; Jain, Shreepal; Elgendi, Mohamed; Coe, James Y.; Kim, Daniel; Taylor, Dylan; Tymchak, Wayne; Schuurmans, Dale; Zemp, Roger J.; Adatia, Ian

    2016-09-01

    We hypothesized that an automated speech- recognition-inspired classification algorithm could differentiate between the heart sounds in subjects with and without pulmonary hypertension (PH) and outperform physicians. Heart sounds, electrocardiograms, and mean pulmonary artery pressures (mPAp) were recorded simultaneously. Heart sound recordings were digitized to train and test speech-recognition-inspired classification algorithms. We used mel-frequency cepstral coefficients to extract features from the heart sounds. Gaussian-mixture models classified the features as PH (mPAp ≥ 25 mmHg) or normal (mPAp < 25 mmHg). Physicians blinded to patient data listened to the same heart sound recordings and attempted a diagnosis. We studied 164 subjects: 86 with mPAp ≥ 25 mmHg (mPAp 41 ± 12 mmHg) and 78 with mPAp < 25 mmHg (mPAp 17 ± 5 mmHg) (p  < 0.005). The correct diagnostic rate of the automated speech-recognition-inspired algorithm was 74% compared to 56% by physicians (p = 0.005). The false positive rate for the algorithm was 34% versus 50% (p = 0.04) for clinicians. The false negative rate for the algorithm was 23% and 68% (p = 0.0002) for physicians. We developed an automated speech-recognition-inspired classification algorithm for the acoustic diagnosis of PH that outperforms physicians that could be used to screen for PH and encourage earlier specialist referral.

  1. The Use of Artificial Neural Networks to Estimate Speech Intelligibility from Acoustic Variables: A Preliminary Analysis.

    ERIC Educational Resources Information Center

    Metz, Dale Evan; And Others

    1992-01-01

    A preliminary scheme for estimating the speech intelligibility of hearing-impaired speakers from acoustic parameters, using a computerized artificial neural network to process mathematically the acoustic input variables, is outlined. Tests with 60 hearing-impaired speakers found the scheme to be highly accurate in identifying speakers separated by…

  2. Speech feature discrimination in deaf children following cochlear implantation

    NASA Astrophysics Data System (ADS)

    Bergeson, Tonya R.; Pisoni, David B.; Kirk, Karen Iler

    2002-05-01

    Speech feature discrimination is a fundamental perceptual skill that is often assumed to underlie word recognition and sentence comprehension performance. To investigate the development of speech feature discrimination in deaf children with cochlear implants, we conducted a retrospective analysis of results from the Minimal Pairs Test (Robbins et al., 1988) selected from patients enrolled in a longitudinal study of speech perception and language development. The MP test uses a 2AFC procedure in which children hear a word and select one of two pictures (bat-pat). All 43 children were prelingually deafened, received a cochlear implant before 6 years of age or between ages 6 and 9, and used either oral or total communication. Children were tested once every 6 months to 1 year for 7 years; not all children were tested at each interval. By 2 years postimplant, the majority of these children achieved near-ceiling levels of discrimination performance for vowel height, vowel place, and consonant manner. Most of the children also achieved plateaus but did not reach ceiling performance for consonant place and voicing. The relationship between speech feature discrimination, spoken word recognition, and sentence comprehension will be discussed. [Work supported by NIH/NIDCD Research Grant No. R01DC00064 and NIH/NIDCD Training Grant No. T32DC00012.

  3. School cafeteria noise-The impact of room acoustics and speech intelligibility on children's voice levels

    NASA Astrophysics Data System (ADS)

    Bridger, Joseph F.

    2002-05-01

    The impact of room acoustics and speech intelligibility conditions of different school cafeterias on the voice levels of children is examined. Methods of evaluating cafeteria designs and predicting noise levels are discussed. Children are shown to modify their voice levels with changes in speech intelligibility like adults. Reverberation and signal to noise ratio are the important acoustical factors affecting speech intelligibility. Children have much more difficulty than adults in conditions where noise and reverberation are present. To evaluate the relationship of voice level and speech intelligibility, a database of real sound levels and room acoustics data was generated from measurements and data recorded during visits to a variety of existing cafeterias under different occupancy conditions. The effects of speech intelligibility and room acoustics on childrens voice levels are demonstrated. A new method is presented for predicting speech intelligibility conditions and resulting noise levels for the design of new cafeterias and renovation of existing facilities. Measurements are provided for an existing school cafeteria before and after new room acoustics treatments were added. This will be helpful for acousticians, architects, school systems, regulatory agencies, and Parent Teacher Associations to create less noisy cafeteria environments.

  4. A maximum likelihood approach to estimating articulator positions from speech acoustics

    SciTech Connect

    Hogden, J.

    1996-09-23

    This proposal presents an algorithm called maximum likelihood continuity mapping (MALCOM) which recovers the positions of the tongue, jaw, lips, and other speech articulators from measurements of the sound-pressure waveform of speech. MALCOM differs from other techniques for recovering articulator positions from speech in three critical respects: it does not require training on measured or modeled articulator positions, it does not rely on any particular model of sound propagation through the vocal tract, and it recovers a mapping from acoustics to articulator positions that is linearly, not topographically, related to the actual mapping from acoustics to articulation. The approach categorizes short-time windows of speech into a finite number of sound types, and assumes the probability of using any articulator position to produce a given sound type can be described by a parameterized probability density function. MALCOM then uses maximum likelihood estimation techniques to: (1) find the most likely smooth articulator path given a speech sample and a set of distribution functions (one distribution function for each sound type), and (2) change the parameters of the distribution functions to better account for the data. Using this technique improves the accuracy of articulator position estimates compared to continuity mapping -- the only other technique that learns the relationship between acoustics and articulation solely from acoustics. The technique has potential application to computer speech recognition, speech synthesis and coding, teaching the hearing impaired to speak, improving foreign language instruction, and teaching dyslexics to read. 34 refs., 7 figs.

  5. Language-specific developmental differences in speech production: a cross-language acoustic study.

    PubMed

    Li, Fangfang

    2012-01-01

    Speech productions of 40 English- and 40 Japanese-speaking children (aged 2-5) were examined and compared with the speech produced by 20 adult speakers (10 speakers per language). Participants were recorded while repeating words that began with "s" and "sh" sounds. Clear language-specific patterns in adults' speech were found, with English speakers differentiating "s" and "sh" in 1 acoustic dimension (i.e., spectral mean) and Japanese speakers differentiating the 2 categories in 3 acoustic dimensions (i.e., spectral mean, standard deviation, and onset F2 frequency). For both language groups, children's speech exhibited a gradual change from an early undifferentiated form to later differentiated categories. The separation processes, however, only occur in those acoustic dimensions used by adults in the corresponding languages.

  6. Language Comprehension in Language-Learning Impaired Children Improved with Acoustically Modified Speech

    NASA Astrophysics Data System (ADS)

    Tallal, Paula; Miller, Steve L.; Bedi, Gail; Byma, Gary; Wang, Xiaoqin; Nagarajan, Srikantan S.; Schreiner, Christoph; Jenkins, William M.; Merzenich, Michael M.

    1996-01-01

    A speech processing algorithm was developed to create more salient versions of the rapidly changing elements in the acoustic waveform of speech that have been shown to be deficiently processed by language-learning impaired (LLI) children. LLI children received extensive daily training, over a 4-week period, with listening exercises in which all speech was translated into this synthetic form. They also received daily training with computer "games" designed to adaptively drive improvements in temporal processing thresholds. Significant improvements in speech discrimination and language comprehension abilities were demonstrated in two independent groups of LLI children.

  7. A Hybrid Acoustic and Pronunciation Model Adaptation Approach for Non-native Speech Recognition

    NASA Astrophysics Data System (ADS)

    Oh, Yoo Rhee; Kim, Hong Kook

    In this paper, we propose a hybrid model adaptation approach in which pronunciation and acoustic models are adapted by incorporating the pronunciation and acoustic variabilities of non-native speech in order to improve the performance of non-native automatic speech recognition (ASR). Specifically, the proposed hybrid model adaptation can be performed at either the state-tying or triphone-modeling level, depending at which acoustic model adaptation is performed. In both methods, we first analyze the pronunciation variant rules of non-native speakers and then classify each rule as either a pronunciation variant or an acoustic variant. The state-tying level hybrid method then adapts pronunciation models and acoustic models by accommodating the pronunciation variants in the pronunciation dictionary and by clustering the states of triphone acoustic models using the acoustic variants, respectively. On the other hand, the triphone-modeling level hybrid method initially adapts pronunciation models in the same way as in the state-tying level hybrid method; however, for the acoustic model adaptation, the triphone acoustic models are then re-estimated based on the adapted pronunciation models and the states of the re-estimated triphone acoustic models are clustered using the acoustic variants. From the Korean-spoken English speech recognition experiments, it is shown that ASR systems employing the state-tying and triphone-modeling level adaptation methods can relatively reduce the average word error rates (WERs) by 17.1% and 22.1% for non-native speech, respectively, when compared to a baseline ASR system.

  8. The acoustics for speech of eight auditoriums in the city of Sao Paulo

    NASA Astrophysics Data System (ADS)

    Bistafa, Sylvio R.

    2002-11-01

    Eight auditoriums with a proscenium type of stage, which usually operate as dramatic theaters in the city of Sao Paulo, were acoustically surveyed in terms of their adequacy to unassisted speech. Reverberation times, early decay times, and speech levels were measured in different positions, together with objective measures of speech intelligibility. The measurements revealed reverberation time values rather uniform throughout the rooms, whereas significant variations were found in the values of the other acoustical measures with position. The early decay time was found to be better correlated with the objective measures of speech intelligibility than the reverberation time. The results from the objective measurements of speech intelligibility revealed that the speech transmission index STI, and its simplified version RaSTI, are strongly correlated with the early-to-late sound ratio C50 (1 kHz). However, it was found that the criterion value of acceptability of the latter is more easily met than the former. The results from these measurements enable to understand how the characteristics of the architectural design determine the acoustical quality for speech. Measurements of ST1-Gade were made as an attempt to validate it as an objective measure of ''support'' for the actor. The preliminary diagnosing results with ray tracing simulations will also be presented.

  9. A magnetic resonance imaging study on the articulatory and acoustic speech parameters of Malay vowels

    PubMed Central

    2014-01-01

    The phonetic properties of six Malay vowels are investigated using magnetic resonance imaging (MRI) to visualize the vocal tract in order to obtain dynamic articulatory parameters during speech production. To resolve image blurring due to the tongue movement during the scanning process, a method based on active contour extraction is used to track tongue contours. The proposed method efficiently tracks tongue contours despite the partial blurring of MRI images. Consequently, the articulatory parameters that are effectively measured as tongue movement is observed, and the specific shape of the tongue and its position for all six uttered Malay vowels are determined. Speech rehabilitation procedure demands some kind of visual perceivable prototype of speech articulation. To investigate the validity of the measured articulatory parameters based on acoustic theory of speech production, an acoustic analysis based on the uttered vowels by subjects has been performed. As the acoustic speech and articulatory parameters of uttered speech were examined, a correlation between formant frequencies and articulatory parameters was observed. The experiments reported a positive correlation between the constriction location of the tongue body and the first formant frequency, as well as a negative correlation between the constriction location of the tongue tip and the second formant frequency. The results demonstrate that the proposed method is an effective tool for the dynamic study of speech production. PMID:25060583

  10. Speech and melody recognition in binaurally combined acoustic and electric hearing

    NASA Astrophysics Data System (ADS)

    Kong, Ying-Yee; Stickney, Ginger S.; Zeng, Fan-Gang

    2005-03-01

    Speech recognition in noise and music perception is especially challenging for current cochlear implant users. The present study utilizes the residual acoustic hearing in the nonimplanted ear in five cochlear implant users to elucidate the role of temporal fine structure at low frequencies in auditory perception and to test the hypothesis that combined acoustic and electric hearing produces better performance than either mode alone. The first experiment measured speech recognition in the presence of competing noise. It was found that, although the residual low-frequency (<1000 Hz) acoustic hearing produced essentially no recognition for speech recognition in noise, it significantly enhanced performance when combined with the electric hearing. The second experiment measured melody recognition in the same group of subjects and found that, contrary to the speech recognition result, the low-frequency acoustic hearing produced significantly better performance than the electric hearing. It is hypothesized that listeners with combined acoustic and electric hearing might use the correlation between the salient pitch in low-frequency acoustic hearing and the weak pitch in the envelope to enhance segregation between signal and noise. The present study suggests the importance and urgency of accurately encoding the fine-structure cue in cochlear implants. .

  11. Zebra finches are sensitive to prosodic features of human speech.

    PubMed

    Spierings, Michelle J; ten Cate, Carel

    2014-07-22

    Variation in pitch, amplitude and rhythm adds crucial paralinguistic information to human speech. Such prosodic cues can reveal information about the meaning or emphasis of a sentence or the emotional state of the speaker. To examine the hypothesis that sensitivity to prosodic cues is language independent and not human specific, we tested prosody perception in a controlled experiment with zebra finches. Using a go/no-go procedure, subjects were trained to discriminate between speech syllables arranged in XYXY patterns with prosodic stress on the first syllable and XXYY patterns with prosodic stress on the final syllable. To systematically determine the salience of the various prosodic cues (pitch, duration and amplitude) to the zebra finches, they were subjected to five tests with different combinations of these cues. The zebra finches generalized the prosodic pattern to sequences that consisted of new syllables and used prosodic features over structural ones to discriminate between stimuli. This strong sensitivity to the prosodic pattern was maintained when only a single prosodic cue was available. The change in pitch was treated as more salient than changes in the other prosodic features. These results show that zebra finches are sensitive to the same prosodic cues known to affect human speech perception. PMID:24870039

  12. Audio-video feature correlation: faces and speech

    NASA Astrophysics Data System (ADS)

    Durand, Gwenael; Montacie, Claude; Caraty, Marie-Jose; Faudemay, Pascal

    1999-08-01

    This paper presents a study of the correlation of features automatically extracted from the audio stream and the video stream of audiovisual documents. In particular, we were interested in finding out whether speech analysis tools could be combined with face detection methods, and to what extend they should be combined. A generic audio signal partitioning algorithm as first used to detect Silence/Noise/Music/Speech segments in a full length movie. A generic object detection method was applied to the keyframes extracted from the movie in order to detect the presence or absence of faces. The correlation between the presence of a face in the keyframes and of the corresponding voice in the audio stream was studied. A third stream, which is the script of the movie, is warped on the speech channel in order to automatically label faces appearing in the keyframes with the name of the corresponding character. We naturally found that extracted audio and video features were related in many cases, and that significant benefits can be obtained from the joint use of audio and video analysis methods.

  13. Mathematical model of acoustic speech production with mobile walls of the vocal tract

    NASA Astrophysics Data System (ADS)

    Lyubimov, N. A.; Zakharov, E. V.

    2016-03-01

    A mathematical speech production model is considered that describes acoustic oscillation propagation in a vocal tract with mobile walls. The wave field function satisfies the Helmholtz equation with boundary conditions of the third kind (impedance type). The impedance mode corresponds to a threeparameter pendulum oscillation model. The experimental research demonstrates the nonlinear character of how the mobility of the vocal tract walls influence the spectral envelope of a speech signal.

  14. Feature based passive acoustic detection of underwater threats

    NASA Astrophysics Data System (ADS)

    Stolkin, Rustam; Sutin, Alexander; Radhakrishnan, Sreeram; Bruno, Michael; Fullerton, Brian; Ekimov, Alexander; Raftery, Michael

    2006-05-01

    Stevens Institute of Technology is performing research aimed at determining the acoustical parameters that are necessary for detecting and classifying underwater threats. This paper specifically addresses the problems of passive acoustic detection of small targets in noisy urban river and harbor environments. We describe experiments to determine the acoustic signatures of these threats and the background acoustic noise. Based on these measurements, we present an algorithm for robustly discriminating threat presence from severe acoustic background noise. Measurements of the target's acoustic radiation signal were conducted in the Hudson River. The acoustic noise in the Hudson River was also recorded for various environmental conditions. A useful discriminating feature can be extracted from the acoustic signal of the threat, calculated by detecting packets of multi-spectral high frequency sound which occur repetitively at low frequency intervals. We use experimental data to show how the feature varies with range between the sensor and the detected underwater threat. We also estimate the effective detection range by evaluating this feature for hydrophone signals, recorded in the river both with and without threat presence.

  15. Evaluation of acoustical conditions for speech communication in working elementary school classrooms.

    PubMed

    Sato, Hiroshi; Bradley, John S

    2008-04-01

    Detailed acoustical measurements were made in 41 working elementary school classrooms near Ottawa, Canada to obtain more representative and more accurate indications of the acoustical quality of conditions for speech communication during actual teaching activities. This paper describes the room acoustics characteristics and noise environment of 27 traditional rectangular classrooms from the 41 measured rooms. The purpose of the work was to better understand how to improve speech communication between teachers and students. The study found, that on average, the students experienced: teacher speech levels of 60.4 dB A, noise levels of 49.1 dB A, and a mean speech-to-noise ratio of 11 dB A during teaching activities. The mean reverberation time in the occupied classrooms was 0.41 s, which was 10% less than in the unoccupied rooms. The reverberation time measurements were used to determine the average absorption added by each student. Detailed analyses of early and late-arriving speech sounds showed these sound levels could be predicted quite accurately and suggest improved approaches to room acoustics design.

  16. Vowel Acoustics in Adults with Apraxia of Speech

    ERIC Educational Resources Information Center

    Jacks, Adam; Mathes, Katey A.; Marquardt, Thomas P.

    2010-01-01

    Purpose: To investigate the hypothesis that vowel production is more variable in adults with acquired apraxia of speech (AOS) relative to healthy individuals with unimpaired speech. Vowel formant frequency measures were selected as the specific target of focus. Method: Seven adults with AOS and aphasia produced 15 repetitions of 6 American English…

  17. Precategorical Acoustic Storage and the Perception of Speech

    ERIC Educational Resources Information Center

    Frankish, Clive

    2008-01-01

    Theoretical accounts of both speech perception and of short term memory must consider the extent to which perceptual representations of speech sounds might survive in relatively unprocessed form. This paper describes a novel version of the serial recall task that can be used to explore this area of shared interest. In immediate recall of digit…

  18. Acoustic properties of vowels in clear and conversational speech by female non-native English speakers

    NASA Astrophysics Data System (ADS)

    Li, Chi-Nin; So, Connie K.

    2005-04-01

    Studies have shown that talkers can improve the intelligibility of their speech when instructed to speak as if talking to a hearing-impaired person. The improvement of speech intelligibility is associated with specific acoustic-phonetic changes: increases in vowel duration and fundamental frequency (F0), a wider pitch range, and a shift in formant frequencies for F1 and F2. Most previous studies of clear speech production have been conducted with native speakers; research with second language speakers is much less common. The present study examined the acoustic properties of non-native English vowels produced in a clear speaking style. Five female Cantonese speakers and a comparison group of English speakers were recorded producing four vowels (/i u ae a/) in /bVt/ context in conversational and clear speech. Vowel durations, F0, pitch range, and the first two formants for each of the four vowels were measured. Analyses revealed that for both groups of speakers, vowel durations, F0, pitch range, and F1 spoken clearly were greater than those produced conversationally. However, F2 was higher in conversational speech than in clear speech. The findings suggest that female non-native English speakers exhibit acoustic-phonetic patterns similar to those of native speakers when asked to produce English vowels clearly.

  19. Speech privacy and annoyance considerations in the acoustic environment of passenger cars of high-speed trains.

    PubMed

    Jeon, Jin Yong; Hong, Joo Young; Jang, Hyung Suk; Kim, Jae Hyeon

    2015-12-01

    It is necessary to consider not only annoyance of interior noises but also speech privacy to achieve acoustic comfort in a passenger car of a high-speed train because speech from other passengers can be annoying. This study aimed to explore an optimal acoustic environment to satisfy speech privacy and reduce annoyance in a passenger car. Two experiments were conducted using speech sources and compartment noise of a high speed train with varying speech-to-noise ratios (SNRA) and background noise levels (BNL). Speech intelligibility was tested in experiment I, and in experiment II, perceived speech privacy, annoyance, and acoustic comfort of combined sounds with speech and background noise were assessed. The results show that speech privacy and annoyance were significantly influenced by the SNRA. In particular, the acoustic comfort was evaluated as acceptable when the SNRA was less than -6 dB for both speech privacy and noise annoyance. In addition, annoyance increased significantly as the BNL exceeded 63 dBA, whereas the effect of the background-noise level on the speech privacy was not significant. These findings suggest that an optimal level of interior noise in a passenger car might exist between 59 and 63 dBA, taking normal speech levels into account.

  20. Education in acoustics and speech science using vocal-tract models.

    PubMed

    Arai, Takayuki

    2012-03-01

    Several vocal-tract models were reviewed, with special focus given to the sliding vocal-tract model [T. Arai, Acoust. Sci. Technol. 27(6), 384-388 (2006)]. All of the models have been shown to be excellent tools for teaching acoustics and speech science to elementary through university level students. The sliding three-tube model is based on Fant's three-tube model [G. Fant, Acoustic Theory of Speech Production (Mouton, The Hague, The Netherlands, 2006)] and consists of a long tube with a slider simulating tongue constriction. In this article, the design of the sliding vocal-tract model was reviewed. Then a science workshop was discussed where children were asked to make their own sliding vocal-tract models using simple materials. It was also discussed how the sliding vocal-tract model compares to our other vocal-tract models, emphasizing how the model can be used to instruct students at higher levels, such as undergraduate and graduate education in acoustics and speech science. Through this discussion the vocal-tract models were shown to be a powerful tool for education in acoustics and speech science for all ages of students.

  1. Perceiving speech in context: Compensation for contextual variability during acoustic cue encoding and categorization

    NASA Astrophysics Data System (ADS)

    Toscano, Joseph Christopher

    Several fundamental questions about speech perception concern how listeners understand spoken language despite considerable variability in speech sounds across different contexts (the problem of lack of invariance in speech). This contextual variability is caused by several factors, including differences between individual talkers' voices, variation in speaking rate, and effects of coarticulatory context. A number of models have been proposed to describe how the speech system handles differences across contexts. Critically, these models make different predictions about (1) whether contextual variability is handled at the level of acoustic cue encoding or categorization, (2) whether it is driven by feedback from category-level processes or interactions between cues, and (3) whether listeners discard fine-grained acoustic information to compensate for contextual variability. Separating the effects of cue- and category-level processing has been difficult because behavioral measures tap processes that occur well after initial cue encoding and are influenced by task demands and linguistic information. Recently, we have used the event-related brain potential (ERP) technique to examine cue encoding and online categorization. Specifically, we have looked at differences in the auditory N1 as a measure of acoustic cue encoding and the P3 as a measure of categorization. This allows us to examine multiple levels of processing during speech perception and can provide a useful tool for studying effects of contextual variability. Here, I apply this approach to determine the point in processing at which context has an effect on speech perception and to examine whether acoustic cues are encoded continuously. Several types of contextual variability (talker gender, speaking rate, and coarticulation), as well as several acoustic cues (voice onset time, formant frequencies, and bandwidths), are examined in a series of experiments. The results suggest that (1) at early stages of speech

  2. A speech processing study using an acoustic model of a multiple-channel cochlear implant

    NASA Astrophysics Data System (ADS)

    Xu, Ying

    1998-10-01

    A cochlear implant is an electronic device designed to provide sound information for adults and children who have bilateral profound hearing loss. The task of representing speech signals as electrical stimuli is central to the design and performance of cochlear implants. Studies have shown that the current speech- processing strategies provide significant benefits to cochlear implant users. However, the evaluation and development of speech-processing strategies have been complicated by hardware limitations and large variability in user performance. To alleviate these problems, an acoustic model of a cochlear implant with the SPEAK strategy is implemented in this study, in which a set of acoustic stimuli whose psychophysical characteristics are as close as possible to those produced by a cochlear implant are presented on normal-hearing subjects. To test the effectiveness and feasibility of this acoustic model, a psychophysical experiment was conducted to match the performance of a normal-hearing listener using model- processed signals to that of a cochlear implant user. Good agreement was found between an implanted patient and an age-matched normal-hearing subject in a dynamic signal discrimination experiment, indicating that this acoustic model is a reasonably good approximation of a cochlear implant with the SPEAK strategy. The acoustic model was then used to examine the potential of the SPEAK strategy in terms of its temporal and frequency encoding of speech. It was hypothesized that better temporal and frequency encoding of speech can be accomplished by higher stimulation rates and a larger number of activated channels. Vowel and consonant recognition tests were conducted on normal-hearing subjects using speech tokens processed by the acoustic model, with different combinations of stimulation rate and number of activated channels. The results showed that vowel recognition was best at 600 pps and 8 activated channels, but further increases in stimulation rate and

  3. Low-frequency speech cues and simulated electric-acoustic hearing

    PubMed Central

    Brown, Christopher A.; Bacon, Sid P.

    2009-01-01

    The addition of low-frequency acoustic information to real or simulated electric stimulation (so-called electric-acoustic stimulation or EAS) often results in large improvements in intelligibility, particularly in competing backgrounds. This may reflect the availability of fundamental frequency (F0) information in the acoustic region. The contributions of F0 and the amplitude envelope (as well as voicing) of speech to simulated EAS was examined by replacing the low-frequency speech with a tone that was modulated in frequency to track the F0 of the speech, in amplitude with the envelope of the low-frequency speech, or both. A four-channel vocoder simulated electric hearing. Significant benefit over vocoder alone was observed with the addition of a tone carrying F0 or envelope cues, and both cues combined typically provided significantly more benefit than either alone. The intelligibility improvement over vocoder was between 24 and 57 percentage points, and was unaffected by the presence of a tone carrying these cues from a background talker. These results confirm the importance of the F0 of target speech for EAS (in simulation). They indicate that significant benefit can be provided by a tone carrying F0 and amplitude envelope cues. The results support a glimpsing account of EAS and argue against segregation. PMID:19275323

  4. Acoustic and perceptual evaluation of category goodness of /t/ and /k/ in typical and misarticulated children's speech.

    PubMed

    Strömbergsson, Sofia; Salvi, Giampiero; House, David

    2015-06-01

    This investigation explores perceptual and acoustic characteristics of children's successful and unsuccessful productions of /t/ and /k/, with a specific aim of exploring perceptual sensitivity to phonetic detail, and the extent to which this sensitivity is reflected in the acoustic domain. Recordings were collected from 4- to 8-year-old children with a speech sound disorder (SSD) who misarticulated one of the target plosives, and compared to productions recorded from peers with typical speech development (TD). Perceptual responses were registered with regards to a visual-analog scale, ranging from "clear [t]" to "clear [k]." Statistical models of prototypical productions were built, based on spectral moments and discrete cosine transform features, and used in the scoring of SSD productions. In the perceptual evaluation, "clear substitutions" were rated as less prototypical than correct productions. Moreover, target-appropriate productions of /t/ and /k/ produced by children with SSD were rated as less prototypical than those produced by TD peers. The acoustical modeling could to a large extent discriminate between the gross categories /t/ and /k/, and scored the SSD utterances on a continuous scale that was largely consistent with the category of production. However, none of the methods exhibited the same sensitivity to phonetic detail as the human listeners.

  5. Disordered speech disrupts conversational entrainment: a study of acoustic-prosodic entrainment and communicative success in populations with communication challenges.

    PubMed

    Borrie, Stephanie A; Lubold, Nichola; Pon-Barry, Heather

    2015-01-01

    Conversational entrainment, a pervasive communication phenomenon in which dialogue partners adapt their behaviors to align more closely with one another, is considered essential for successful spoken interaction. While well-established in other disciplines, this phenomenon has received limited attention in the field of speech pathology and the study of communication breakdowns in clinical populations. The current study examined acoustic-prosodic entrainment, as well as a measure of communicative success, in three distinctly different dialogue groups: (i) healthy native vs. healthy native speakers (Control), (ii) healthy native vs. foreign-accented speakers (Accented), and (iii) healthy native vs. dysarthric speakers (Disordered). Dialogue group comparisons revealed significant differences in how the groups entrain on particular acoustic-prosodic features, including pitch, intensity, and jitter. Most notably, the Disordered dialogues were characterized by significantly less acoustic-prosodic entrainment than the Control dialogues. Further, a positive relationship between entrainment indices and communicative success was identified. These results suggest that the study of conversational entrainment in speech pathology will have essential implications for both scientific theory and clinical application in this domain.

  6. Effects of contextual cues on speech recognition in simulated electric-acoustic stimulation.

    PubMed

    Kong, Ying-Yee; Donaldson, Gail; Somarowthu, Ala

    2015-05-01

    Low-frequency acoustic cues have shown to improve speech perception in cochlear-implant listeners. However, the mechanisms underlying this benefit are still not well understood. This study investigated the extent to which low-frequency cues can facilitate listeners' use of linguistic knowledge in simulated electric-acoustic stimulation (EAS). Experiment 1 examined differences in the magnitude of EAS benefit at the phoneme, word, and sentence levels. Speech materials were processed via noise-channel vocoding and lowpass (LP) filtering. The amount of spectral degradation in the vocoder speech was varied by applying different numbers of vocoder channels. Normal-hearing listeners were tested on vocoder-alone, LP-alone, and vocoder + LP conditions. Experiment 2 further examined factors that underlie the context effect on EAS benefit at the sentence level by limiting the low-frequency cues to temporal envelope and periodicity (AM + FM). Results showed that EAS benefit was greater for higher-context than for lower-context speech materials even when the LP ear received only low-frequency AM + FM cues. Possible explanations for the greater EAS benefit observed with higher-context materials may lie in the interplay between perceptual and expectation-driven processes for EAS speech recognition, and/or the band-importance functions for different types of speech materials. PMID:25994712

  7. Effects of contextual cues on speech recognition in simulated electric-acoustic stimulation

    PubMed Central

    Kong, Ying-Yee; Donaldson, Gail; Somarowthu, Ala

    2015-01-01

    Low-frequency acoustic cues have shown to improve speech perception in cochlear-implant listeners. However, the mechanisms underlying this benefit are still not well understood. This study investigated the extent to which low-frequency cues can facilitate listeners' use of linguistic knowledge in simulated electric-acoustic stimulation (EAS). Experiment 1 examined differences in the magnitude of EAS benefit at the phoneme, word, and sentence levels. Speech materials were processed via noise-channel vocoding and lowpass (LP) filtering. The amount of spectral degradation in the vocoder speech was varied by applying different numbers of vocoder channels. Normal-hearing listeners were tested on vocoder-alone, LP-alone, and vocoder + LP conditions. Experiment 2 further examined factors that underlie the context effect on EAS benefit at the sentence level by limiting the low-frequency cues to temporal envelope and periodicity (AM + FM). Results showed that EAS benefit was greater for higher-context than for lower-context speech materials even when the LP ear received only low-frequency AM + FM cues. Possible explanations for the greater EAS benefit observed with higher-context materials may lie in the interplay between perceptual and expectation-driven processes for EAS speech recognition, and/or the band-importance functions for different types of speech materials. PMID:25994712

  8. Contributions of electric and acoustic hearing to bimodal speech and music perception.

    PubMed

    Crew, Joseph D; Galvin, John J; Landsberger, David M; Fu, Qian-Jie

    2015-01-01

    Cochlear implant (CI) users have difficulty understanding speech in noisy listening conditions and perceiving music. Aided residual acoustic hearing in the contralateral ear can mitigate these limitations. The present study examined contributions of electric and acoustic hearing to speech understanding in noise and melodic pitch perception. Data was collected with the CI only, the hearing aid (HA) only, and both devices together (CI+HA). Speech reception thresholds (SRTs) were adaptively measured for simple sentences in speech babble. Melodic contour identification (MCI) was measured with and without a masker instrument; the fundamental frequency of the masker was varied to be overlapping or non-overlapping with the target contour. Results showed that the CI contributes primarily to bimodal speech perception and that the HA contributes primarily to bimodal melodic pitch perception. In general, CI+HA performance was slightly improved relative to the better ear alone (CI-only) for SRTs but not for MCI, with some subjects experiencing a decrease in bimodal MCI performance relative to the better ear alone (HA-only). Individual performance was highly variable, and the contribution of either device to bimodal perception was both subject- and task-dependent. The results suggest that individualized mapping of CIs and HAs may further improve bimodal speech and music perception. PMID:25790349

  9. Classifying acoustic signals into phoneme categories: average and dyslexic readers make use of complex dynamical patterns and multifractal scaling properties of the speech signal.

    PubMed

    Hasselman, Fred

    2015-01-01

    Several competing aetiologies of developmental dyslexia suggest that the problems with acquiring literacy skills are causally entailed by low-level auditory and/or speech perception processes. The purpose of this study is to evaluate the diverging claims about the specific deficient peceptual processes under conditions of strong inference. Theoretically relevant acoustic features were extracted from a set of artificial speech stimuli that lie on a /bAk/-/dAk/ continuum. The features were tested on their ability to enable a simple classifier (Quadratic Discriminant Analysis) to reproduce the observed classification performance of average and dyslexic readers in a speech perception experiment. The 'classical' features examined were based on component process accounts of developmental dyslexia such as the supposed deficit in Envelope Rise Time detection and the deficit in the detection of rapid changes in the distribution of energy in the frequency spectrum (formant transitions). Studies examining these temporal processing deficit hypotheses do not employ measures that quantify the temporal dynamics of stimuli. It is shown that measures based on quantification of the dynamics of complex, interaction-dominant systems (Recurrence Quantification Analysis and the multifractal spectrum) enable QDA to classify the stimuli almost identically as observed in dyslexic and average reading participants. It seems unlikely that participants used any of the features that are traditionally associated with accounts of (impaired) speech perception. The nature of the variables quantifying the temporal dynamics of the speech stimuli imply that the classification of speech stimuli cannot be regarded as a linear aggregate of component processes that each parse the acoustic signal independent of one another, as is assumed by the 'classical' aetiologies of developmental dyslexia. It is suggested that the results imply that the differences in speech perception performance between average and

  10. Classifying acoustic signals into phoneme categories: average and dyslexic readers make use of complex dynamical patterns and multifractal scaling properties of the speech signal.

    PubMed

    Hasselman, Fred

    2015-01-01

    Several competing aetiologies of developmental dyslexia suggest that the problems with acquiring literacy skills are causally entailed by low-level auditory and/or speech perception processes. The purpose of this study is to evaluate the diverging claims about the specific deficient peceptual processes under conditions of strong inference. Theoretically relevant acoustic features were extracted from a set of artificial speech stimuli that lie on a /bAk/-/dAk/ continuum. The features were tested on their ability to enable a simple classifier (Quadratic Discriminant Analysis) to reproduce the observed classification performance of average and dyslexic readers in a speech perception experiment. The 'classical' features examined were based on component process accounts of developmental dyslexia such as the supposed deficit in Envelope Rise Time detection and the deficit in the detection of rapid changes in the distribution of energy in the frequency spectrum (formant transitions). Studies examining these temporal processing deficit hypotheses do not employ measures that quantify the temporal dynamics of stimuli. It is shown that measures based on quantification of the dynamics of complex, interaction-dominant systems (Recurrence Quantification Analysis and the multifractal spectrum) enable QDA to classify the stimuli almost identically as observed in dyslexic and average reading participants. It seems unlikely that participants used any of the features that are traditionally associated with accounts of (impaired) speech perception. The nature of the variables quantifying the temporal dynamics of the speech stimuli imply that the classification of speech stimuli cannot be regarded as a linear aggregate of component processes that each parse the acoustic signal independent of one another, as is assumed by the 'classical' aetiologies of developmental dyslexia. It is suggested that the results imply that the differences in speech perception performance between average and

  11. Classifying acoustic signals into phoneme categories: average and dyslexic readers make use of complex dynamical patterns and multifractal scaling properties of the speech signal

    PubMed Central

    2015-01-01

    Several competing aetiologies of developmental dyslexia suggest that the problems with acquiring literacy skills are causally entailed by low-level auditory and/or speech perception processes. The purpose of this study is to evaluate the diverging claims about the specific deficient peceptual processes under conditions of strong inference. Theoretically relevant acoustic features were extracted from a set of artificial speech stimuli that lie on a /bAk/-/dAk/ continuum. The features were tested on their ability to enable a simple classifier (Quadratic Discriminant Analysis) to reproduce the observed classification performance of average and dyslexic readers in a speech perception experiment. The ‘classical’ features examined were based on component process accounts of developmental dyslexia such as the supposed deficit in Envelope Rise Time detection and the deficit in the detection of rapid changes in the distribution of energy in the frequency spectrum (formant transitions). Studies examining these temporal processing deficit hypotheses do not employ measures that quantify the temporal dynamics of stimuli. It is shown that measures based on quantification of the dynamics of complex, interaction-dominant systems (Recurrence Quantification Analysis and the multifractal spectrum) enable QDA to classify the stimuli almost identically as observed in dyslexic and average reading participants. It seems unlikely that participants used any of the features that are traditionally associated with accounts of (impaired) speech perception. The nature of the variables quantifying the temporal dynamics of the speech stimuli imply that the classification of speech stimuli cannot be regarded as a linear aggregate of component processes that each parse the acoustic signal independent of one another, as is assumed by the ‘classical’ aetiologies of developmental dyslexia. It is suggested that the results imply that the differences in speech perception performance between

  12. The advantages of sound localization and speech perception of bilateral electric acoustic stimulation

    PubMed Central

    Moteki, Hideaki; Kitoh, Ryosuke; Tsukada, Keita; Iwasaki, Satoshi; Nishio, Shin-Ya

    2015-01-01

    Conclusion: Bilateral electric acoustic stimulation (EAS) effectively improved speech perception in noise and sound localization in patients with high-frequency hearing loss. Objective: To evaluate bilateral EAS efficacy of sound localization detection and speech perception in noise in two cases of high-frequency hearing loss. Methods: Two female patients, aged 38 and 45 years, respectively, received bilateral EAS sequentially. Pure-tone audiometry was performed preoperatively and postoperatively to evaluate the hearing preservation in the lower frequencies. Speech perception outcomes in quiet and noise and sound localization were assessed with unilateral and bilateral EAS. Results: Residual hearing in the lower frequencies was well preserved after insertion of a FLEX24 electrode (24 mm) using the round window approach. After bilateral EAS, speech perception improved in quiet and even more so in noise. In addition, the sound localization ability of both cases with bilateral EAS improved remarkably. PMID:25423260

  13. Vowel Acoustics in Dysarthria: Speech Disorder Diagnosis and Classification

    ERIC Educational Resources Information Center

    Lansford, Kaitlin L.; Liss, Julie M.

    2014-01-01

    Purpose: The purpose of this study was to determine the extent to which vowel metrics are capable of distinguishing healthy from dysarthric speech and among different forms of dysarthria. Method: A variety of vowel metrics were derived from spectral and temporal measurements of vowel tokens embedded in phrases produced by 45 speakers with…

  14. Noise Robust Feature Scheme for Automatic Speech Recognition Based on Auditory Perceptual Mechanisms

    NASA Astrophysics Data System (ADS)

    Cai, Shang; Xiao, Yeming; Pan, Jielin; Zhao, Qingwei; Yan, Yonghong

    Mel Frequency Cepstral Coefficients (MFCC) are the most popular acoustic features used in automatic speech recognition (ASR), mainly because the coefficients capture the most useful information of the speech and fit well with the assumptions used in hidden Markov models. As is well known, MFCCs already employ several principles which have known counterparts in the peripheral properties of human hearing: decoupling across frequency, mel-warping of the frequency axis, log-compression of energy, etc. It is natural to introduce more mechanisms in the auditory periphery to improve the noise robustness of MFCC. In this paper, a k-nearest neighbors based frequency masking filter is proposed to reduce the audibility of spectra valleys which are sensitive to noise. Besides, Moore and Glasberg's critical band equivalent rectangular bandwidth (ERB) expression is utilized to determine the filter bandwidth. Furthermore, a new bandpass infinite impulse response (IIR) filter is proposed to imitate the temporal masking phenomenon of the human auditory system. These three auditory perceptual mechanisms are combined with the standard MFCC algorithm in order to investigate their effects on ASR performance, and a revised MFCC extraction scheme is presented. Recognition performances with the standard MFCC, RASTA perceptual linear prediction (RASTA-PLP) and the proposed feature extraction scheme are evaluated on a medium-vocabulary isolated-word recognition task and a more complex large vocabulary continuous speech recognition (LVCSR) task. Experimental results show that consistent robustness against background noise is achieved on these two tasks, and the proposed method outperforms both the standard MFCC and RASTA-PLP.

  15. Contribution of Consonant Landmarks to Speech Recognition in Simulated Acoustic-Electric Hearing

    PubMed Central

    Chen, Fei; Loizou, Philipos C.

    2009-01-01

    Objectives The purpose of this study is to assess the contribution of information provided by obstruent consonants (e.g. stops, fricatives) to speech intelligibility in simulated acoustic-electric hearing. As a secondary objective, this study examines the performance of an objective measure that can potentially be used for predicting the intelligibility of vocoded speech. Design Noise-corrupted sentences are used in Exp. 1 in which the noise-corrupted obstruent consonants are replaced with clean obstruent consonants, while leaving the sonorant sounds (vowels, semivowels and nasals) corrupted. In one condition, listeners have only access to the low-frequency (<600 Hz) acoustic portion of the clean consonant spectra, in another condition listeners have only access to the higher frequency (>600 Hz) portion (vocoded) of the clean consonant spectra, and in the third condition they have access to both. In Exp. 2, we investigate a speech-coding strategy that selectively attenuates the low-frequency portion of the consonant spectra while leaving the vocoded portion corrupted by noise. Finally, using the data collected from Exp. 1 and 2, we evaluate the performance of an objective measure in terms of predicting intelligibility of vocoded speech. This measure was originally designed to predict speech quality and has never been evaluated with vocoded speech. Results Significant improvements (about 30 percentage points) in intelligibility were noted in Exp. 1 in steady and two-talker masker conditions when the listeners had access to the clean obstruent consonants in both the acoustic and vocoded portions of the spectrum. The improvement was more evident in the low SNR levels (−5 and 0 dB). Further analysis indicated that it was access to the vocoded portion of the consonant spectra, rather than access to the low-frequency acoustic portion of the consonant spectra that contributed the most to the large improvements in performance. In Exp. 2, a small (14 percentage points

  16. Fundamental frequency is critical to speech perception in noise in combined acoustic and electric hearinga

    PubMed Central

    Carroll, Jeff; Tiaden, Stephanie; Zeng, Fan-Gang

    2011-01-01

    Cochlear implant (CI) users have been shown to benefit from residual low-frequency hearing, specifically in pitch related tasks. It remains unclear whether this benefit is dependent on fundamental frequency (F0) or other acoustic cues. Three experiments were conducted to determine the role of F0, as well as its frequency modulated (FM) and amplitude modulated (AM) components, in speech recognition with a competing voice. In simulated CI listeners, the signal-to-noise ratio was varied to estimate the 50% correct response. Simulation results showed that the F0 cue contributes to a significant proportion of the benefit seen with combined acoustic and electric hearing, and additionally that this benefit is due to the FM rather than the AM component. In actual CI users, sentence recognition scores were collected with either the full F0 cue containing both the FM and AM components or the 500-Hz low-pass speech cue containing the F0 and additional harmonics. The F0 cue provided a benefit similar to the low-pass cue for speech in noise, but not in quiet. Poorer CI users benefited more from the F0 cue than better users. These findings suggest that F0 is critical to improving speech perception in noise in combined acoustic and electric hearing. PMID:21973360

  17. Prosodic Features and Speech Naturalness in Individuals with Dysarthria

    ERIC Educational Resources Information Center

    Klopfenstein, Marie I.

    2012-01-01

    Despite the importance of speech naturalness to treatment outcomes, little research has been done on what constitutes speech naturalness and how to best maximize naturalness in relationship to other treatment goals like intelligibility. In addition, previous literature alludes to the relationship between prosodic aspects of speech and speech…

  18. Separable spectro-temporal Gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition.

    PubMed

    Schädler, Marc René; Kollmeier, Birger

    2015-04-01

    To test if simultaneous spectral and temporal processing is required to extract robust features for automatic speech recognition (ASR), the robust spectro-temporal two-dimensional-Gabor filter bank (GBFB) front-end from Schädler, Meyer, and Kollmeier [J. Acoust. Soc. Am. 131, 4134-4151 (2012)] was de-composed into a spectral one-dimensional-Gabor filter bank and a temporal one-dimensional-Gabor filter bank. A feature set that is extracted with these separate spectral and temporal modulation filter banks was introduced, the separate Gabor filter bank (SGBFB) features, and evaluated on the CHiME (Computational Hearing in Multisource Environments) keywords-in-noise recognition task. From the perspective of robust ASR, the results showed that spectral and temporal processing can be performed independently and are not required to interact with each other. Using SGBFB features permitted the signal-to-noise ratio (SNR) to be lowered by 1.2 dB while still performing as well as the GBFB-based reference system, which corresponds to a relative improvement of the word error rate by 12.8%. Additionally, the real time factor of the spectro-temporal processing could be reduced by more than an order of magnitude. Compared to human listeners, the SNR needed to be 13 dB higher when using Mel-frequency cepstral coefficient features, 11 dB higher when using GBFB features, and 9 dB higher when using SGBFB features to achieve the same recognition performance. PMID:25920855

  19. Effects of deafness on acoustic characteristics of American English tense/lax vowels in maternal speech to infants

    PubMed Central

    Kondaurova, Maria V.; Bergeson, Tonya R.; Dilley, Laura C.

    2012-01-01

    Recent studies have demonstrated that mothers exaggerate phonetic properties of infant-directed (ID) speech. However, these studies focused on a single acoustic dimension (frequency), whereas speech sounds are composed of multiple acoustic cues. Moreover, little is known about how mothers adjust phonetic properties of speech to children with hearing loss. This study examined mothers’ production of frequency and duration cues to the American English tense/lax vowel contrast in speech to profoundly deaf (N = 14) and normal-hearing (N = 14) infants, and to an adult experimenter. First and second formant frequencies and vowel duration of tense (/i/, /u/) and lax (/I/, /ʊ/) vowels were measured. Results demonstrated that for both infant groups mothers hyperarticulated the acoustic vowel space and increased vowel duration in ID speech relative to adult-directed speech. Mean F2 values were decreased for the /u/ vowel and increased for the /I/ vowel, and vowel duration was longer for the /i/, /u/, and /I/ vowels in ID speech. However, neither acoustic cue differed in speech to hearing-impaired or normal-hearing infants. These results suggest that both formant frequencies and vowel duration that differentiate American English tense/lx vowel contrasts are modified in ID speech regardless of the hearing status of the addressee. PMID:22894224

  20. A Chimpanzee Recognizes Synthetic Speech With Significantly Reduced Acoustic Cues to Phonetic Content

    PubMed Central

    Heimbauer, Lisa A.; Beran, Michael J.; Owren, Michael J.

    2011-01-01

    Summary A long-standing debate concerns whether humans are specialized for speech perception [1–7], which some researchers argue is demonstrated by the ability to understand synthetic speech with significantly reduced acoustic cues to phonetic content [2–4,7]. We tested a chimpanzee (Pan troglodytes) that recognizes 128 spoken words [8,9], asking whether she could understand such speech. Three experiments presented 48 individual words, with the animal selecting a corresponding visuo-graphic symbol from among four alternatives. Experiment 1 tested spectrally reduced, noise-vocoded (NV) synthesis, originally developed to simulate input received by human cochlear-implant users [10]. Experiment 2 tested “impossibly unspeechlike” [3] sine-wave (SW) synthesis, which reduces speech to just three moving tones [11]. Although receiving only intermittent and non-contingent reward, the chimpanzee performed well above chance level, including when hearing synthetic versions for the first time. Recognition of SW words was least accurate, but improved in Experiment 3 when natural words in the same session were rewarded. The chimpanzee was more accurate with NV than SW versions, as were 32 human participants hearing these items. The chimpanzee's ability to spontaneously recognize acoustically reduced synthetic words suggests that experience rather than specialization is critical for speech-perception capabilities that some have suggested are uniquely human [12–14]. PMID:21723125

  1. Formant Centralization Ratio (FCR): A proposal for a new acoustic measure of dysarthric speech

    PubMed Central

    Sapir, Shimon; Ramig, Lorraine O.; Spielman, Jennifer L.; Fox, Cynthia

    2009-01-01

    Background and Aims The vowel space area (VSA) has been used as an acoustic metric of dysarthric speech, but with varying degrees of success. Here we test an alternative metric -- Formant centralization ratio (FCR) -- that is hypothesized to more effectively differentiate dysarthric from healthy speech and register treatment effects. Methods Speech recordings of 38 individuals with idiopathic Parkinson disease (IPD) and dysarthria (19 of whom received one month of intensive speech therapy (LSVT® LOUD)) and 14 healthy controls were acoustically analyzed. Vowels were extracted from short phrases. The same vowel-formant elements were used to construct the FCR, expressed as (F2u+F2𝔞+F1i+F1u)/(F2i+F1𝔞), the VSA, expressed as ABS((F1i*(F2𝔞–F2u)+F1𝔞*(F2u–F2i)+F1u*(F2i–F2𝔞))/2), a logarithmically scaled version of the VSA (LnVSA), and the F2i/F2u ratio. Results Unlike the VSA and the LnVSA, the FCR and F2i/F2u robustly differentiated dysarthric from healthy speech and were not gender-sensitive. All metrics effectively registered treatment effects and were strongly correlated with each other. Conclusions Albeit preliminary, the present findings indicate that the FCR is a sensitive, valid and reliable acoustic metric for distinguishing dysarthric from normal speech and for monitoring treatment effects, probably so because of reduced sensitivity to inter-speaker variability and enhanced sensitivity to vowel centralization. PMID:19948755

  2. From prosodic structure to acoustic saliency: A fMRI investigation of speech rate, clarity, and emphasis

    NASA Astrophysics Data System (ADS)

    Golfinopoulos, Elisa

    Acoustic variability in fluent speech can arise at many stages in speech production planning and execution. For example, at the phonological encoding stage, the grouping of phonemes into syllables determines which segments are coarticulated and, by consequence, segment-level acoustic variation. Likewise phonetic encoding, which determines the spatiotemporal extent of articulatory gestures, will affect the acoustic detail of segments. Functional magnetic resonance imaging (fMRI) was used to measure brain activity of fluent adult speakers in four speaking conditions: fast, normal, clear, and emphatic (or stressed) speech. These speech manner changes typically result in acoustic variations that do not change the lexical or semantic identity of productions but do affect the acoustic saliency of phonemes, syllables and/or words. Acoustic responses recorded inside the scanner were assessed quantitatively using eight acoustic measures and sentence duration was used as a covariate of non-interest in the neuroimaging analysis. Compared to normal speech, emphatic speech was characterized acoustically by a greater difference between stressed and unstressed vowels in intensity, duration, and fundamental frequency, and neurally by increased activity in right middle premotor cortex and supplementary motor area, and bilateral primary sensorimotor cortex. These findings are consistent with right-lateralized motor planning of prosodic variation in emphatic speech. Clear speech involved an increase in average vowel and sentence durations and average vowel spacing, along with increased activity in left middle premotor cortex and bilateral primary sensorimotor cortex. These findings are consistent with an increased reliance on feedforward control, resulting in hyper-articulation, under clear as compared to normal speech. Fast speech was characterized acoustically by reduced sentence duration and average vowel spacing, and neurally by increased activity in left anterior frontal

  3. Knowledge and attitudes of teachers regarding the impact of classroom acoustics on speech perception and learning.

    PubMed

    Ramma, Lebogang

    2009-01-01

    This study investigated the knowledge and attitude of primary school teachers regarding the impact of poor classroom acoustics on learners' speech perception and learning in class. Classrooms with excessive background noise and reflective surfaces could be a barrier to learning, and it is important that teachers are aware of this. There is currently limited research data about teachers' knowledge regarding the topic of classroom acoustics. Seventy teachers from three Johannesburg primary schools participated in this study. A survey by way of structured self-administered questionnaire was the primary data collection method. The findings of this study showed that most of the participants in this study did not have adequate knowledge of classroom acoustics. Most of the participants were also unaware of the impact that classrooms with poor acoustic environments can have on speech perception and learning. These results are discussed in relation to the practical implication of empowering teachers to manage the acoustic environment of their classrooms, limitations of the study as well as implications for future research.

  4. An eighth-scale speech source for subjective assessments in acoustic models

    NASA Astrophysics Data System (ADS)

    Orlowski, R. J.

    1981-08-01

    The design of a source is described which is suitable for making speech recordings in eighth-scale acoustic models of auditoria. An attempt was made to match the directionality of the source with the directionality of the human voice using data reported in the literature. A narrow aperture was required for the design which was provided by mounting an inverted conical horn over the diaphragm of a high frequency loudspeaker. Resonance problems were encountered with the use of a horn and a description is given of the electronic techniques adopted to minimize the effect of these resonances. Subjective and objective assessments on the completed speech source have proved satisfactory. It has been used in a modelling exercise concerned with the acoustic design of a theatre with a thrust-type stage.

  5. Acoustic-Phonetic Differences between Infant- and Adult-Directed Speech: The Role of Stress and Utterance Position

    ERIC Educational Resources Information Center

    Wang, Yuanyuan; Seidl, Amanda; Cristia, Alejandrina

    2015-01-01

    Previous studies have shown that infant-directed speech (IDS) differs from adult-directed speech (ADS) on a variety of dimensions. The aim of the current study was to investigate whether acoustic differences between IDS and ADS in English are modulated by prosodic structure. We compared vowels across the two registers (IDS, ADS) in both stressed…

  6. Speech after Radial Forearm Free Flap Reconstruction of the Tongue: A Longitudinal Acoustic Study of Vowel and Diphthong Sounds

    ERIC Educational Resources Information Center

    Laaksonen, Juha-Pertti; Rieger, Jana; Happonen, Risto-Pekka; Harris, Jeffrey; Seikaly, Hadi

    2010-01-01

    The purpose of this study was to use acoustic analyses to describe speech outcomes over the course of 1 year after radial forearm free flap (RFFF) reconstruction of the tongue. Eighteen Canadian English-speaking females and males with reconstruction for oral cancer had speech samples recorded (pre-operative, and 1 month, 6 months, and 1 year…

  7. Disordered speech disrupts conversational entrainment: a study of acoustic-prosodic entrainment and communicative success in populations with communication challenges

    PubMed Central

    Borrie, Stephanie A.; Lubold, Nichola; Pon-Barry, Heather

    2015-01-01

    Conversational entrainment, a pervasive communication phenomenon in which dialogue partners adapt their behaviors to align more closely with one another, is considered essential for successful spoken interaction. While well-established in other disciplines, this phenomenon has received limited attention in the field of speech pathology and the study of communication breakdowns in clinical populations. The current study examined acoustic-prosodic entrainment, as well as a measure of communicative success, in three distinctly different dialogue groups: (i) healthy native vs. healthy native speakers (Control), (ii) healthy native vs. foreign-accented speakers (Accented), and (iii) healthy native vs. dysarthric speakers (Disordered). Dialogue group comparisons revealed significant differences in how the groups entrain on particular acoustic–prosodic features, including pitch, intensity, and jitter. Most notably, the Disordered dialogues were characterized by significantly less acoustic-prosodic entrainment than the Control dialogues. Further, a positive relationship between entrainment indices and communicative success was identified. These results suggest that the study of conversational entrainment in speech pathology will have essential implications for both scientific theory and clinical application in this domain. PMID:26321996

  8. Monaural speech intelligibility and detection in maskers with varying amounts of spectro-temporal speech features.

    PubMed

    Schubotz, Wiebke; Brand, Thomas; Kollmeier, Birger; Ewert, Stephan D

    2016-07-01

    Speech intelligibility is strongly affected by the presence of maskers. Depending on the spectro-temporal structure of the masker and its similarity to the target speech, different masking aspects can occur which are typically referred to as energetic, amplitude modulation, and informational masking. In this study speech intelligibility and speech detection was measured in maskers that vary systematically in the time-frequency domain from steady-state noise to a single interfering talker. Male and female target speech was used in combination with maskers based on speech for the same or different gender. Observed data were compared to predictions of the speech intelligibility index, extended speech intelligibility index, multi-resolution speech-based envelope-power-spectrum model, and the short-time objective intelligibility measure. The different models served as analysis tool to help distinguish between the different masking aspects. Comparison shows that overall masking can to a large extent be explained by short-term energetic masking. However, the other masking aspects (amplitude modulation an informational masking) influence speech intelligibility as well. Additionally, it was obvious that all models showed considerable deviations from the data. Therefore, the current study provides a benchmark for further evaluation of speech prediction models. PMID:27475175

  9. Are you a good mimic? Neuro-acoustic signatures for speech imitation ability.

    PubMed

    Reiterer, Susanne M; Hu, Xiaochen; Sumathi, T A; Singh, Nandini C

    2013-01-01

    We investigated individual differences in speech imitation ability in late bilinguals using a neuro-acoustic approach. One hundred and thirty-eight German-English bilinguals matched on various behavioral measures were tested for "speech imitation ability" in a foreign language, Hindi, and categorized into "high" and "low ability" groups. Brain activations and speech recordings were obtained from 26 participants from the two extreme groups as they performed a functional neuroimaging experiment which required them to "imitate" sentences in three conditions: (A) German, (B) English, and (C) German with fake English accent. We used recently developed novel acoustic analysis, namely the "articulation space" as a metric to compare speech imitation abilities of the two groups. Across all three conditions, direct comparisons between the two groups, revealed brain activations (FWE corrected, p < 0.05) that were more widespread with significantly higher peak activity in the left supramarginal gyrus and postcentral areas for the low ability group. The high ability group, on the other hand showed significantly larger articulation space in all three conditions. In addition, articulation space also correlated positively with imitation ability (Pearson's r = 0.7, p < 0.01). Our results suggest that an expanded articulation space for high ability individuals allows access to a larger repertoire of sounds, thereby providing skilled imitators greater flexibility in pronunciation and language learning. PMID:24155739

  10. Are you a good mimic? Neuro-acoustic signatures for speech imitation ability

    PubMed Central

    Reiterer, Susanne M.; Hu, Xiaochen; Sumathi, T. A.; Singh, Nandini C.

    2013-01-01

    We investigated individual differences in speech imitation ability in late bilinguals using a neuro-acoustic approach. One hundred and thirty-eight German-English bilinguals matched on various behavioral measures were tested for “speech imitation ability” in a foreign language, Hindi, and categorized into “high” and “low ability” groups. Brain activations and speech recordings were obtained from 26 participants from the two extreme groups as they performed a functional neuroimaging experiment which required them to “imitate” sentences in three conditions: (A) German, (B) English, and (C) German with fake English accent. We used recently developed novel acoustic analysis, namely the “articulation space” as a metric to compare speech imitation abilities of the two groups. Across all three conditions, direct comparisons between the two groups, revealed brain activations (FWE corrected, p < 0.05) that were more widespread with significantly higher peak activity in the left supramarginal gyrus and postcentral areas for the low ability group. The high ability group, on the other hand showed significantly larger articulation space in all three conditions. In addition, articulation space also correlated positively with imitation ability (Pearson's r = 0.7, p < 0.01). Our results suggest that an expanded articulation space for high ability individuals allows access to a larger repertoire of sounds, thereby providing skilled imitators greater flexibility in pronunciation and language learning. PMID:24155739

  11. Are you a good mimic? Neuro-acoustic signatures for speech imitation ability.

    PubMed

    Reiterer, Susanne M; Hu, Xiaochen; Sumathi, T A; Singh, Nandini C

    2013-01-01

    We investigated individual differences in speech imitation ability in late bilinguals using a neuro-acoustic approach. One hundred and thirty-eight German-English bilinguals matched on various behavioral measures were tested for "speech imitation ability" in a foreign language, Hindi, and categorized into "high" and "low ability" groups. Brain activations and speech recordings were obtained from 26 participants from the two extreme groups as they performed a functional neuroimaging experiment which required them to "imitate" sentences in three conditions: (A) German, (B) English, and (C) German with fake English accent. We used recently developed novel acoustic analysis, namely the "articulation space" as a metric to compare speech imitation abilities of the two groups. Across all three conditions, direct comparisons between the two groups, revealed brain activations (FWE corrected, p < 0.05) that were more widespread with significantly higher peak activity in the left supramarginal gyrus and postcentral areas for the low ability group. The high ability group, on the other hand showed significantly larger articulation space in all three conditions. In addition, articulation space also correlated positively with imitation ability (Pearson's r = 0.7, p < 0.01). Our results suggest that an expanded articulation space for high ability individuals allows access to a larger repertoire of sounds, thereby providing skilled imitators greater flexibility in pronunciation and language learning.

  12. Reverberant speech recognition combining deep neural networks and deep autoencoders augmented with a phone-class feature

    NASA Astrophysics Data System (ADS)

    Mimura, Masato; Sakai, Shinsuke; Kawahara, Tatsuya

    2015-12-01

    We propose an approach to reverberant speech recognition adopting deep learning in the front-end as well as b a c k-e n d o f a r e v e r b e r a n t s p e e c h r e c o g n i t i o n s y s t e m, a n d a n o v e l m e t h o d t o i m p r o v e t h e d e r e v e r b e r a t i o n p e r f o r m a n c e of the front-end network using phone-class information. At the front-end, we adopt a deep autoencoder (DAE) for enhancing the speech feature parameters, and speech recognition is performed in the back-end using DNN-HMM acoustic models trained on multi-condition data. The system was evaluated through the ASR task in the Reverb Challenge 2014. The DNN-HMM system trained on the multi-condition training set achieved a conspicuously higher word accuracy compared to the MLLR-adapted GMM-HMM system trained on the same data. Furthermore, feature enhancement with the deep autoencoder contributed to the improvement of recognition accuracy especially in the more adverse conditions. While the mapping between reverberant and clean speech in DAE-based dereverberation is conventionally conducted only with the acoustic information, we presume the mapping is also dependent on the phone information. Therefore, we propose a new scheme (pDAE), which augments a phone-class feature to the standard acoustic features as input. Two types of the phone-class feature are investigated. One is the hard recognition result of monophones, and the other is a soft representation derived from the posterior outputs of monophone DNN. The augmented feature in either type results in a significant improvement (7-8 % relative) from the standard DAE.

  13. Decoding spectrotemporal features of overt and covert speech from the human cortex

    PubMed Central

    Martin, Stéphanie; Brunner, Peter; Holdgraf, Chris; Heinze, Hans-Jochen; Crone, Nathan E.; Rieger, Jochem; Schalk, Gerwin; Knight, Robert T.; Pasley, Brian N.

    2014-01-01

    Auditory perception and auditory imagery have been shown to activate overlapping brain regions. We hypothesized that these phenomena also share a common underlying neural representation. To assess this, we used electrocorticography intracranial recordings from epileptic patients performing an out loud or a silent reading task. In these tasks, short stories scrolled across a video screen in two conditions: subjects read the same stories both aloud (overt) and silently (covert). In a control condition the subject remained in a resting state. We first built a high gamma (70–150 Hz) neural decoding model to reconstruct spectrotemporal auditory features of self-generated overt speech. We then evaluated whether this same model could reconstruct auditory speech features in the covert speech condition. Two speech models were tested: a spectrogram and a modulation-based feature space. For the overt condition, reconstruction accuracy was evaluated as the correlation between original and predicted speech features, and was significant in each subject (p < 10−5; paired two-sample t-test). For the covert speech condition, dynamic time warping was first used to realign the covert speech reconstruction with the corresponding original speech from the overt condition. Reconstruction accuracy was then evaluated as the correlation between original and reconstructed speech features. Covert reconstruction accuracy was compared to the accuracy obtained from reconstructions in the baseline control condition. Reconstruction accuracy for the covert condition was significantly better than for the control condition (p < 0.005; paired two-sample t-test). The superior temporal gyrus, pre- and post-central gyrus provided the highest reconstruction information. The relationship between overt and covert speech reconstruction depended on anatomy. These results provide evidence that auditory representations of covert speech can be reconstructed from models that are built from an overt speech

  14. Acoustic-Emergent Phonology in the Amplitude Envelope of Child-Directed Speech.

    PubMed

    Leong, Victoria; Goswami, Usha

    2015-01-01

    When acquiring language, young children may use acoustic spectro-temporal patterns in speech to derive phonological units in spoken language (e.g., prosodic stress patterns, syllables, phonemes). Children appear to learn acoustic-phonological mappings rapidly, without direct instruction, yet the underlying developmental mechanisms remain unclear. Across different languages, a relationship between amplitude envelope sensitivity and phonological development has been found, suggesting that children may make use of amplitude modulation (AM) patterns within the envelope to develop a phonological system. Here we present the Spectral Amplitude Modulation Phase Hierarchy (S-AMPH) model, a set of algorithms for deriving the dominant AM patterns in child-directed speech (CDS). Using Principal Components Analysis, we show that rhythmic CDS contains an AM hierarchy comprising 3 core modulation timescales. These timescales correspond to key phonological units: prosodic stress (Stress AM, ~2 Hz), syllables (Syllable AM, ~5 Hz) and onset-rime units (Phoneme AM, ~20 Hz). We argue that these AM patterns could in principle be used by naïve listeners to compute acoustic-phonological mappings without lexical knowledge. We then demonstrate that the modulation statistics within this AM hierarchy indeed parse the speech signal into a primitive hierarchically-organised phonological system comprising stress feet (proto-words), syllables and onset-rime units. We apply the S-AMPH model to two other CDS corpora, one spontaneous and one deliberately-timed. The model accurately identified 72-82% (freely-read CDS) and 90-98% (rhythmically-regular CDS) stress patterns, syllables and onset-rime units. This in-principle demonstration that primitive phonology can be extracted from speech AMs is termed Acoustic-Emergent Phonology (AEP) theory. AEP theory provides a set of methods for examining how early phonological development is shaped by the temporal modulation structure of speech across

  15. Acoustic-Emergent Phonology in the Amplitude Envelope of Child-Directed Speech

    PubMed Central

    Leong, Victoria; Goswami, Usha

    2015-01-01

    When acquiring language, young children may use acoustic spectro-temporal patterns in speech to derive phonological units in spoken language (e.g., prosodic stress patterns, syllables, phonemes). Children appear to learn acoustic-phonological mappings rapidly, without direct instruction, yet the underlying developmental mechanisms remain unclear. Across different languages, a relationship between amplitude envelope sensitivity and phonological development has been found, suggesting that children may make use of amplitude modulation (AM) patterns within the envelope to develop a phonological system. Here we present the Spectral Amplitude Modulation Phase Hierarchy (S-AMPH) model, a set of algorithms for deriving the dominant AM patterns in child-directed speech (CDS). Using Principal Components Analysis, we show that rhythmic CDS contains an AM hierarchy comprising 3 core modulation timescales. These timescales correspond to key phonological units: prosodic stress (Stress AM, ~2 Hz), syllables (Syllable AM, ~5 Hz) and onset-rime units (Phoneme AM, ~20 Hz). We argue that these AM patterns could in principle be used by naïve listeners to compute acoustic-phonological mappings without lexical knowledge. We then demonstrate that the modulation statistics within this AM hierarchy indeed parse the speech signal into a primitive hierarchically-organised phonological system comprising stress feet (proto-words), syllables and onset-rime units. We apply the S-AMPH model to two other CDS corpora, one spontaneous and one deliberately-timed. The model accurately identified 72–82% (freely-read CDS) and 90–98% (rhythmically-regular CDS) stress patterns, syllables and onset-rime units. This in-principle demonstration that primitive phonology can be extracted from speech AMs is termed Acoustic-Emergent Phonology (AEP) theory. AEP theory provides a set of methods for examining how early phonological development is shaped by the temporal modulation structure of speech across

  16. Acoustic-Emergent Phonology in the Amplitude Envelope of Child-Directed Speech.

    PubMed

    Leong, Victoria; Goswami, Usha

    2015-01-01

    When acquiring language, young children may use acoustic spectro-temporal patterns in speech to derive phonological units in spoken language (e.g., prosodic stress patterns, syllables, phonemes). Children appear to learn acoustic-phonological mappings rapidly, without direct instruction, yet the underlying developmental mechanisms remain unclear. Across different languages, a relationship between amplitude envelope sensitivity and phonological development has been found, suggesting that children may make use of amplitude modulation (AM) patterns within the envelope to develop a phonological system. Here we present the Spectral Amplitude Modulation Phase Hierarchy (S-AMPH) model, a set of algorithms for deriving the dominant AM patterns in child-directed speech (CDS). Using Principal Components Analysis, we show that rhythmic CDS contains an AM hierarchy comprising 3 core modulation timescales. These timescales correspond to key phonological units: prosodic stress (Stress AM, ~2 Hz), syllables (Syllable AM, ~5 Hz) and onset-rime units (Phoneme AM, ~20 Hz). We argue that these AM patterns could in principle be used by naïve listeners to compute acoustic-phonological mappings without lexical knowledge. We then demonstrate that the modulation statistics within this AM hierarchy indeed parse the speech signal into a primitive hierarchically-organised phonological system comprising stress feet (proto-words), syllables and onset-rime units. We apply the S-AMPH model to two other CDS corpora, one spontaneous and one deliberately-timed. The model accurately identified 72-82% (freely-read CDS) and 90-98% (rhythmically-regular CDS) stress patterns, syllables and onset-rime units. This in-principle demonstration that primitive phonology can be extracted from speech AMs is termed Acoustic-Emergent Phonology (AEP) theory. AEP theory provides a set of methods for examining how early phonological development is shaped by the temporal modulation structure of speech across

  17. Voice-over: perceptual and acoustic analysis of vocal features.

    PubMed

    Medrado, Reny; Ferreira, Leslie Piccolotto; Behlau, Mara

    2005-09-01

    Voice-overs are professional voice users who use their voices to market products in the electronic media. The purposes of this study were to (1) analyze voice-overed and non-overed productions of an advertising text in two groups consisting of 10 male professional voice-overs and 10 male non-voice-overs; and (2) determine specific acoustic features of voice-over productions in both groups. A naïve group of listeners were engaged for the perceptual analysis of the recorded advertising text. The voice-overed production samples from both groups were submitted for analysis of acoustic and temporal features. The following parameters were analyzed: (1) the total text length, (2) the length of the three emphatic pauses, (3) values of the mean, (4) minimum, (5) maximum fundamental frequency, and (6) the semitone range. The majority of voice-overs and non-voice-overs were correctly identified by the listeners in both productions. However voice-overs were more consistently correctly identified than non-voice-overs. The total text length was greater for voice-overs. The pause time distribution was statistically more homogeneous for the voice-overs. The acoustic analysis indicated that the voice-overs had lower values of mean, minimum, and maximum fundamental frequency and a greater range of semitones. The voice-overs carry the voice-overed production features to their non-voice-overed production. PMID:16102662

  18. The Effectiveness of Clear Speech as a Masker

    ERIC Educational Resources Information Center

    Calandruccio, Lauren; Van Engen, Kristin; Dhar, Sumitrajit; Bradlow, Ann R.

    2010-01-01

    Purpose: It is established that speaking clearly is an effective means of enhancing intelligibility. Because any signal-processing scheme modeled after known acoustic-phonetic features of clear speech will likely affect both target and competing speech, it is important to understand how speech recognition is affected when a competing speech signal…

  19. Associations between speech features and phenotypic severity in Treacher Collins syndrome

    PubMed Central

    2014-01-01

    Background Treacher Collins syndrome (TCS, OMIM 154500) is a rare congenital disorder of craniofacial development. Characteristic hypoplastic malformations of the ears, zygomatic arch, mandible and pharynx have been described in detail. However, reports on the impact of these malformations on speech are few. Exploring speech features and investigating if speech function is related to phenotypic severity are essential for optimizing follow-up and treatment. Methods Articulation, nasal resonance, voice and intelligibility were examined in 19 individuals (5–74 years, median 34 years) divided into three groups comprising children 5–10 years (n = 4), adolescents 11–18 years (n = 4) and adults 29 years and older (n = 11). A speech composite score (0–6) was calculated to reflect the variability of speech deviations. TCS severity scores of phenotypic expression and total scores of Nordic Orofacial Test-Screening (NOT-S) measuring orofacial dysfunction were used in analyses of correlation with speech characteristics (speech composite scores). Results Children and adolescents presented with significantly higher speech composite scores (median 4, range 1–6) than adults (median 1, range 0–5). Nearly all children and adolescents (6/8) displayed speech deviations of articulation, nasal resonance and voice, while only three adults were identified with multiple speech aberrations. The variability of speech dysfunction in TCS was exhibited by individual combinations of speech deviations in 13/19 participants. The speech composite scores correlated with TCS severity scores and NOT-S total scores. Speech composite scores higher than 4 were associated with cleft palate. The percent of intelligible words in connected speech was significantly lower in children and adolescents (median 77%, range 31–99) than in adults (98%, range 93–100). Intelligibility of speech among the children was markedly inconsistent and clearly affecting the understandability

  20. Particle swarm optimization based feature enhancement and feature selection for improved emotion recognition in speech and glottal signals.

    PubMed

    Muthusamy, Hariharan; Polat, Kemal; Yaacob, Sazali

    2015-01-01

    In the recent years, many research works have been published using speech related features for speech emotion recognition, however, recent studies show that there is a strong correlation between emotional states and glottal features. In this work, Mel-frequency cepstralcoefficients (MFCCs), linear predictive cepstral coefficients (LPCCs), perceptual linear predictive (PLP) features, gammatone filter outputs, timbral texture features, stationary wavelet transform based timbral texture features and relative wavelet packet energy and entropy features were extracted from the emotional speech (ES) signals and its glottal waveforms(GW). Particle swarm optimization based clustering (PSOC) and wrapper based particle swarm optimization (WPSO) were proposed to enhance the discerning ability of the features and to select the discriminating features respectively. Three different emotional speech databases were utilized to gauge the proposed method. Extreme learning machine (ELM) was employed to classify the different types of emotions. Different experiments were conducted and the results show that the proposed method significantly improves the speech emotion recognition performance compared to previous works published in the literature.

  1. Particle Swarm Optimization Based Feature Enhancement and Feature Selection for Improved Emotion Recognition in Speech and Glottal Signals

    PubMed Central

    Muthusamy, Hariharan; Polat, Kemal; Yaacob, Sazali

    2015-01-01

    In the recent years, many research works have been published using speech related features for speech emotion recognition, however, recent studies show that there is a strong correlation between emotional states and glottal features. In this work, Mel-frequency cepstralcoefficients (MFCCs), linear predictive cepstral coefficients (LPCCs), perceptual linear predictive (PLP) features, gammatone filter outputs, timbral texture features, stationary wavelet transform based timbral texture features and relative wavelet packet energy and entropy features were extracted from the emotional speech (ES) signals and its glottal waveforms(GW). Particle swarm optimization based clustering (PSOC) and wrapper based particle swarm optimization (WPSO) were proposed to enhance the discerning ability of the features and to select the discriminating features respectively. Three different emotional speech databases were utilized to gauge the proposed method. Extreme learning machine (ELM) was employed to classify the different types of emotions. Different experiments were conducted and the results show that the proposed method significantly improves the speech emotion recognition performance compared to previous works published in the literature. PMID:25799141

  2. Control of spoken vowel acoustics and the influence of phonetic context in human speech sensorimotor cortex.

    PubMed

    Bouchard, Kristofer E; Chang, Edward F

    2014-09-17

    Speech production requires the precise control of vocal tract movements to generate individual speech sounds (phonemes) which, in turn, are rapidly organized into complex sequences. Multiple productions of the same phoneme can exhibit substantial variability, some of which is inherent to control of the vocal tract and its biomechanics, and some of which reflects the contextual effects of surrounding phonemes ("coarticulation"). The role of the CNS in these aspects of speech motor control is not well understood. To address these issues, we recorded multielectrode cortical activity directly from human ventral sensory-motor cortex (vSMC) during the production of consonant-vowel syllables. We analyzed the relationship between the acoustic parameters of vowels (pitch and formants) and cortical activity on a single-trial level. We found that vSMC activity robustly predicted acoustic parameters across vowel categories (up to 80% of variance), as well as different renditions of the same vowel (up to 25% of variance). Furthermore, we observed significant contextual effects on vSMC representations of produced phonemes that suggest active control of coarticulation: vSMC representations for vowels were biased toward the representations of the preceding consonant, and conversely, representations for consonants were biased toward upcoming vowels. These results reveal that vSMC activity for phonemes are not invariant and provide insight into the cortical mechanisms of coarticulation.

  3. Control of Spoken Vowel Acoustics and the Influence of Phonetic Context in Human Speech Sensorimotor Cortex

    PubMed Central

    Bouchard, Kristofer E.

    2014-01-01

    Speech production requires the precise control of vocal tract movements to generate individual speech sounds (phonemes) which, in turn, are rapidly organized into complex sequences. Multiple productions of the same phoneme can exhibit substantial variability, some of which is inherent to control of the vocal tract and its biomechanics, and some of which reflects the contextual effects of surrounding phonemes (“coarticulation”). The role of the CNS in these aspects of speech motor control is not well understood. To address these issues, we recorded multielectrode cortical activity directly from human ventral sensory-motor cortex (vSMC) during the production of consonant-vowel syllables. We analyzed the relationship between the acoustic parameters of vowels (pitch and formants) and cortical activity on a single-trial level. We found that vSMC activity robustly predicted acoustic parameters across vowel categories (up to 80% of variance), as well as different renditions of the same vowel (up to 25% of variance). Furthermore, we observed significant contextual effects on vSMC representations of produced phonemes that suggest active control of coarticulation: vSMC representations for vowels were biased toward the representations of the preceding consonant, and conversely, representations for consonants were biased toward upcoming vowels. These results reveal that vSMC activity for phonemes are not invariant and provide insight into the cortical mechanisms of coarticulation. PMID:25232105

  4. How private is your consultation? Acoustic and audiological measures of speech privacy in the otolaryngology clinic.

    PubMed

    Clamp, Philip J; Grant, David G; Zapala, David A; Hawkins, David B

    2011-01-01

    The right to confidentiality is a central tenet of the doctor-patient relationship. In the United Kingdom this right to confidentiality is recognised in published GMC guidance. In USA the Healthcare Insurance Portability and Accountability Act of 1996 (HIPAA) strengthened the legal requirement to protect patient information in all forms and failure to do so now constitutes a federal offence. The aims of this study are to assess the acoustic privacy of an otolaryngology outpatient consultation room. Acoustic privacy was measured using the articulation index (AI) and Bamford-Kowal-Bench (BKB) speech discrimination tests. BKB speech tests were calibrated to normal conversational volume (50 dB SPL). Both AI and BKB were calculated in four positions around the ENT clinic: within the consultation room, outside the consulting room door, in the nearest waiting area chair and in the farthest waiting area chair. Tests were undertaken with the clinic room door closed and open to assess the effect on privacy. With the clinic room door closed, mean BKB scores in nearest and farthest waiting area chairs were 51 and 41% respectively. AI scores in the waiting area chairs were 0.03 and 0.02. With the clinic room door open, privacy was lost in both AI and BKB testing, with almost 100% of word discernable at normal talking levels. The results of this study highlight the poor level of speech privacy within a standard ENT outpatient department. AI is a poor predictor or privacy.

  5. Teachers and Teaching: Speech Production Accommodations Due to Changes in the Acoustic Environment

    PubMed Central

    Hunter, Eric J.; Bottalico, Pasquale; Graetzer, Simone; Leishman, Timothy W.; Berardi, Mark L.; Eyring, Nathan G.; Jensen, Zachary R.; Rolins, Michael K.; Whiting, Jennifer K.

    2016-01-01

    School teachers have an elevated risk of voice problems due to the vocal demands in the workplace. This manuscript presents the results of three studies investigating teachers’ voice use at work. In the first study, 57 teachers were observed for 2 weeks (waking hours) to compare how they used their voice in the school environment and in non-school environments. In a second study, 45 participants performed a short vocal task in two different rooms: a variable acoustic room and an anechoic chamber. Subjects were taken back and forth between the two rooms. Each time they entered the variable acoustics room, the reverberation time and/or the background noise condition had been modified. In this latter study, subjects responded to questions about their vocal comfort and their perception of changes in the acoustic environment. In a third study, 20 untrained vocalists performed a simple vocal task in the following conditions: with and without background babble and with and without transparent plexiglass shields to increase the first reflection. Relationships were examined between [1] the results for the room acoustic parameters; [2] the subjects’ perception of the room; and [3] the recorded speech acoustic. Several differences between male and female subjects were found; some of those differences held for each room condition (at school vs. not at school, reverberation level, noise level, and early reflection). PMID:26949426

  6. Effect of several acoustic cues on perceiving Mandarin retroflex affricates and fricatives in continuous speech.

    PubMed

    Zhu, Jian; Chen, Yaping

    2016-07-01

    Relatively little attention has been paid to the perception of the three-way contrast between unaspirated affricates, aspirated affricates and fricatives in Mandarin Chinese. This study reports two experiments that explore the acoustic cues relevant to the contrast between the Mandarin retroflex series /tʂ/, /tʂ(h)/ and /ʂ/ in continuous speech. Twenty participants performed two three-alternative forced-choice tasks, in which acoustic cues including closure, frication duration (FD), aspiration, and vocalic contexts (VCs) were systematically manipulated and presented in a carrier phrase. A subsequent classification tree analysis shows that FD distinguishes /tʂ/ from /tʂ(h)/ and /ʂ/, and that closure cues the affricate manner. Interactions between VC and individual cues are also found. The FD threshold for separating /ʂ/ and /tʂ/ is susceptible to the influence of the following vocalic segments, shifting to lower values if frication is followed by the low vowel /a/. On the other hand, while aspiration cues /tʂ(h)/ before /a/ and //, this acoustic cue is obscured by gesture continuation when /tʂ(h)/ precedes its homorganic approximant /ɻ/ in natural speech, which might cause potential confusion between /tʂ(h)/ and /ʂ/. PMID:27475170

  7. Modification of computational auditory scene analysis (CASA) for noise-robust acoustic feature

    NASA Astrophysics Data System (ADS)

    Kwon, Minseok

    While there have been many attempts to mitigate interferences of background noise, the performance of automatic speech recognition (ASR) still can be deteriorated by various factors with ease. However, normal hearing listeners can accurately perceive sounds of their interests, which is believed to be a result of Auditory Scene Analysis (ASA). As a first attempt, the simulation of the human auditory processing, called computational auditory scene analysis (CASA), was fulfilled through physiological and psychological investigations of ASA. CASA comprised of Zilany-Bruce auditory model, followed by tracking fundamental frequency for voice segmentation and detecting pairs of onset/offset at each characteristic frequency (CF) for unvoiced segmentation. The resulting Time-Frequency (T-F) representation of acoustic stimulation was converted into acoustic feature, gammachirp-tone frequency cepstral coefficients (GFCC). 11 keywords with various environmental conditions are used and the robustness of GFCC was evaluated by spectral distance (SD) and dynamic time warping distance (DTW). In "clean" and "noisy" conditions, the application of CASA generally improved noise robustness of the acoustic feature compared to a conventional method with or without noise suppression using MMSE estimator. The intial study, however, not only showed the noise-type dependency at low SNR, but also called the evaluation methods in question. Some modifications were made to capture better spectral continuity from an acoustic feature matrix, to obtain faster processing speed, and to describe the human auditory system more precisely. The proposed framework includes: 1) multi-scale integration to capture more accurate continuity in feature extraction, 2) contrast enhancement (CE) of each CF by competition with neighboring frequency bands, and 3) auditory model modifications. The model modifications contain the introduction of higher Q factor, middle ear filter more analogous to human auditory system

  8. The role of metrical information in apraxia of speech. Perceptual and acoustic analyses of word stress.

    PubMed

    Aichert, Ingrid; Späth, Mona; Ziegler, Wolfram

    2016-02-01

    Several factors are known to influence speech accuracy in patients with apraxia of speech (AOS), e.g., syllable structure or word length. However, the impact of word stress has largely been neglected so far. More generally, the role of prosodic information at the phonetic encoding stage of speech production often remains unconsidered in models of speech production. This study aimed to investigate the influence of word stress on error production in AOS. Two-syllabic words with stress on the first (trochees) vs. the second syllable (iambs) were compared in 14 patients with AOS, three of them exhibiting pure AOS, and in a control group of six normal speakers. The patients produced significantly more errors on iambic than on trochaic words. A most prominent metrical effect was obtained for segmental errors. Acoustic analyses of word durations revealed a disproportionate advantage of the trochaic meter in the patients relative to the healthy controls. The results indicate that German apraxic speakers are sensitive to metrical information. It is assumed that metrical patterns function as prosodic frames for articulation planning, and that the regular metrical pattern in German, the trochaic form, has a facilitating effect on word production in patients with AOS. PMID:26792367

  9. Enhancing the magnitude spectrum of speech features for robust speech recognition

    NASA Astrophysics Data System (ADS)

    Hung, Jeih-weih; Fan, Hao-teng; Tu, Wen-hsiang

    2012-12-01

    In this article, we present an effective compensation scheme to improve noise robustness for the spectra of speech signals. In this compensation scheme, called magnitude spectrum enhancement (MSE), a voice activity detection (VAD) process is performed on the frame sequence of the utterance. The magnitude spectra of non-speech frames are then reduced while those of speech frames are amplified. In experiments conducted on the Aurora-2 noisy digits database, MSE achieves an error reduction rate of nearly 42% relative to baseline processing. This method outperforms well-known spectral-domain speech enhancement techniques, including spectral subtraction (SS) and Wiener filtering (WF). In addition, the proposed MSE can be integrated with cepstral-domain robustness methods, such as mean and variance normalization (MVN) and histogram normalization (HEQ), to achieve further improvements in recognition accuracy under noise-corrupted environments.

  10. The effects of selective attention and speech acoustics on neural speech-tracking in a multi-talker scene.

    PubMed

    Rimmele, Johanna M; Zion Golumbic, Elana; Schröger, Erich; Poeppel, David

    2015-07-01

    Attending to one speaker in multi-speaker situations is challenging. One neural mechanism proposed to underlie the ability to attend to a particular speaker is phase-locking of low-frequency activity in auditory cortex to speech's temporal envelope ("speech-tracking"), which is more precise for attended speech. However, it is not known what brings about this attentional effect, and specifically if it reflects enhanced processing of the fine structure of attended speech. To investigate this question we compared attentional effects on speech-tracking of natural versus vocoded speech which preserves the temporal envelope but removes the fine structure of speech. Pairs of natural and vocoded speech stimuli were presented concurrently and participants attended to one stimulus and performed a detection task while ignoring the other stimulus. We recorded magnetoencephalography (MEG) and compared attentional effects on the speech-tracking response in auditory cortex. Speech-tracking of natural, but not vocoded, speech was enhanced by attention, whereas neural tracking of ignored speech was similar for natural and vocoded speech. These findings suggest that the more precise speech-tracking of attended natural speech is related to processing its fine structure, possibly reflecting the application of higher-order linguistic processes. In contrast, when speech is unattended its fine structure is not processed to the same degree and thus elicits less precise speech-tracking more similar to vocoded speech.

  11. Speech Emotion Feature Selection Method Based on Contribution Analysis Algorithm of Neural Network

    NASA Astrophysics Data System (ADS)

    Wang, Xiaojia; Mao, Qirong; Zhan, Yongzhao

    2008-11-01

    There are many emotion features. If all these features are employed to recognize emotions, redundant features may be existed. Furthermore, recognition result is unsatisfying and the cost of feature extraction is high. In this paper, a method to select speech emotion features based on contribution analysis algorithm of NN is presented. The emotion features are selected by using contribution analysis algorithm of NN from the 95 extracted features. Cluster analysis is applied to analyze the effectiveness for the features selected, and the time of feature extraction is evaluated. Finally, 24 emotion features selected are used to recognize six speech emotions. The experiments show that this method can improve the recognition rate and the time of feature extraction

  12. Speech Emotion Feature Selection Method Based on Contribution Analysis Algorithm of Neural Network

    SciTech Connect

    Wang Xiaojia; Mao Qirong; Zhan Yongzhao

    2008-11-06

    There are many emotion features. If all these features are employed to recognize emotions, redundant features may be existed. Furthermore, recognition result is unsatisfying and the cost of feature extraction is high. In this paper, a method to select speech emotion features based on contribution analysis algorithm of NN is presented. The emotion features are selected by using contribution analysis algorithm of NN from the 95 extracted features. Cluster analysis is applied to analyze the effectiveness for the features selected, and the time of feature extraction is evaluated. Finally, 24 emotion features selected are used to recognize six speech emotions. The experiments show that this method can improve the recognition rate and the time of feature extraction.

  13. Acoustic feature recognition in the dogbane tiger moth, Cycnia tenera.

    PubMed

    Fullard, James H; Ratcliffe, John M; Christie, Christopher G

    2007-07-01

    Certain tiger moths (Arctiidae) defend themselves against bats by phonoresponding to their echolocation calls with trains of ultrasonic clicks. The dogbane tiger moth, Cycnia tenera, preferentially phonoresponds to the calls produced by attacking versus searching bats, suggesting that it either recognizes some acoustic feature of this phase of the bat's echolocation calls or that it simply reacts to their increased power as the bat closes. Here, we used a habituation/generalization paradigm to demonstrate that C. tenera responds neither to the shift in echolocation call frequencies nor to the change in pulse duration that is exhibited during the bat's attack phase unless these changes are accompanied by either an increase in duty cycle or a decrease in pulse period. To separate these features, we measured the moth's phonoresponse thresholds to pulsed stimuli with variable versus constant duty cycles and demonstrate that C. tenera is most sensitive to echolocation call periods expressed by an attacking bat. We suggest that, under natural conditions, C. tenera identifies an attacking bat by recognizing the pulse period of its echolocation calls but that this feature recognition is influenced by acoustic power and can be overridden by unnaturally intense sounds.

  14. Advantages from bilateral hearing in speech perception in noise with simulated cochlear implants and residual acoustic hearing.

    PubMed

    Schoof, Tim; Green, Tim; Faulkner, Andrew; Rosen, Stuart

    2013-02-01

    Acoustic simulations were used to study the contributions of spatial hearing that may arise from combining a cochlear implant with either a second implant or contralateral residual low-frequency acoustic hearing. Speech reception thresholds (SRTs) were measured in twenty-talker babble. Spatial separation of speech and noise was simulated using a spherical head model. While low-frequency acoustic information contralateral to the implant simulation produced substantially better SRTs there was no effect of spatial cues on SRT, even when interaural differences were artificially enhanced. Simulated bilateral implants showed a significant head shadow effect, but no binaural unmasking based on interaural time differences, and weak, inconsistent overall spatial release from masking. There was also a small but significant non-spatial summation effect. It appears that typical cochlear implant speech processing strategies may substantially reduce the utility of spatial cues, even in the absence of degraded neural processing arising from auditory deprivation. PMID:23363118

  15. The effects of selective attention and speech acoustics on neural speech-tracking in a multi-talker scene

    PubMed Central

    Rimmele, Johanna M.; Golumbic, Elana Zion; Schröger, Erich; Poeppel, David

    2015-01-01

    Attending to one speaker in multi-speaker situations is challenging. One neural mechanism proposed to underlie the ability to attend to a particular speaker is phase-locking of low-frequency activity in auditory cortex to speech’s temporal envelope (“speech-tracking”), which is more precise for attended speech. However, it is not known what brings about this attentional effect, and specifically if it reflects enhanced processing of the fine structure of attended speech. To investigate this question we compared attentional effects on speech-tracking of natural vs. vocoded speech which preserves the temporal envelope but removes the fine-structure of speech. Pairs of natural and vocoded speech stimuli were presented concurrently and participants attended to one stimulus and performed a detection task while ignoring the other stimulus. We recorded magnetoencephalography (MEG) and compared attentional effects on the speech-tracking response in auditory cortex. Speech-tracking of natural, but not vocoded, speech was enhanced by attention, whereas neural tracking of ignored speech was similar for natural and vocoded speech. These findings suggest that the more precise speech tracking of attended natural speech is related to processing its fine structure, possibly reflecting the application of higher-order linguistic processes. In contrast, when speech is unattended its fine structure is not processed to the same degree and thus elicits less precise speech tracking more similar to vocoded speech. PMID:25650107

  16. Revisiting Neil Armstrongs Moon-Landing Quote: Implications for Speech Perception, Function Word Reduction, and Acoustic Ambiguity.

    PubMed

    Baese-Berk, Melissa M; Dilley, Laura C; Schmidt, Stephanie; Morrill, Tuuli H; Pitt, Mark A

    2016-01-01

    Neil Armstrong insisted that his quote upon landing on the moon was misheard, and that he had said one small step for a man, instead of one small step for man. What he said is unclear in part because function words like a can be reduced and spectrally indistinguishable from the preceding context. Therefore, their presence can be ambiguous, and they may disappear perceptually depending on the rate of surrounding speech. Two experiments are presented examining production and perception of reduced tokens of for and for a in spontaneous speech. Experiment 1 investigates the distributions of several acoustic features of for and for a. The results suggest that the distributions of for and for a overlap substantially, both in terms of temporal and spectral characteristics. Experiment 2 examines perception of these same tokens when the context speaking rate differs. The perceptibility of the function word a varies as a function of this context speaking rate. These results demonstrate that substantial ambiguity exists in the original quote from Armstrong, and that this ambiguity may be understood through context speaking rate. PMID:27603209

  17. Revisiting Neil Armstrongs Moon-Landing Quote: Implications for Speech Perception, Function Word Reduction, and Acoustic Ambiguity

    PubMed Central

    Baese-Berk, Melissa M.; Dilley, Laura C.; Schmidt, Stephanie; Morrill, Tuuli H.; Pitt, Mark A.

    2016-01-01

    Neil Armstrong insisted that his quote upon landing on the moon was misheard, and that he had said one small step for a man, instead of one small step for man. What he said is unclear in part because function words like a can be reduced and spectrally indistinguishable from the preceding context. Therefore, their presence can be ambiguous, and they may disappear perceptually depending on the rate of surrounding speech. Two experiments are presented examining production and perception of reduced tokens of for and for a in spontaneous speech. Experiment 1 investigates the distributions of several acoustic features of for and for a. The results suggest that the distributions of for and for a overlap substantially, both in terms of temporal and spectral characteristics. Experiment 2 examines perception of these same tokens when the context speaking rate differs. The perceptibility of the function word a varies as a function of this context speaking rate. These results demonstrate that substantial ambiguity exists in the original quote from Armstrong, and that this ambiguity may be understood through context speaking rate. PMID:27603209

  18. Revisiting Neil Armstrongs Moon-Landing Quote: Implications for Speech Perception, Function Word Reduction, and Acoustic Ambiguity.

    PubMed

    Baese-Berk, Melissa M; Dilley, Laura C; Schmidt, Stephanie; Morrill, Tuuli H; Pitt, Mark A

    2016-01-01

    Neil Armstrong insisted that his quote upon landing on the moon was misheard, and that he had said one small step for a man, instead of one small step for man. What he said is unclear in part because function words like a can be reduced and spectrally indistinguishable from the preceding context. Therefore, their presence can be ambiguous, and they may disappear perceptually depending on the rate of surrounding speech. Two experiments are presented examining production and perception of reduced tokens of for and for a in spontaneous speech. Experiment 1 investigates the distributions of several acoustic features of for and for a. The results suggest that the distributions of for and for a overlap substantially, both in terms of temporal and spectral characteristics. Experiment 2 examines perception of these same tokens when the context speaking rate differs. The perceptibility of the function word a varies as a function of this context speaking rate. These results demonstrate that substantial ambiguity exists in the original quote from Armstrong, and that this ambiguity may be understood through context speaking rate.

  19. Feature Characteristics of Spontaneous Speech Production in Young Deaf Children.

    ERIC Educational Resources Information Center

    Geffner, Donna

    1980-01-01

    Sixty-five six-year-old deaf children from state supported schools were given an adaptation of the Goldman Fristoe test of articulation to assess their spontaneous speech production. Journal availability: Elsevier North Holland, Inc., 52 Vanderbilt Avenue, New York, NY 10017. (Author)

  20. Pattern Recognition Methods and Features Selection for Speech Emotion Recognition System

    PubMed Central

    Partila, Pavol; Voznak, Miroslav; Tovarek, Jaromir

    2015-01-01

    The impact of the classification method and features selection for the speech emotion recognition accuracy is discussed in this paper. Selecting the correct parameters in combination with the classifier is an important part of reducing the complexity of system computing. This step is necessary especially for systems that will be deployed in real-time applications. The reason for the development and improvement of speech emotion recognition systems is wide usability in nowadays automatic voice controlled systems. Berlin database of emotional recordings was used in this experiment. Classification accuracy of artificial neural networks, k-nearest neighbours, and Gaussian mixture model is measured considering the selection of prosodic, spectral, and voice quality features. The purpose was to find an optimal combination of methods and group of features for stress detection in human speech. The research contribution lies in the design of the speech emotion recognition system due to its accuracy and efficiency. PMID:26346654

  1. Lexico-semantic and acoustic-phonetic processes in the perception of noise-vocoded speech: implications for cochlear implantation

    PubMed Central

    McGettigan, Carolyn; Rosen, Stuart; Scott, Sophie K.

    2014-01-01

    Noise-vocoding is a transformation which, when applied to speech, severely reduces spectral resolution and eliminates periodicity, yielding a stimulus that sounds “like a harsh whisper” (Scott et al., 2000, p. 2401). This process simulates a cochlear implant, where the activity of many thousand hair cells in the inner ear is replaced by direct stimulation of the auditory nerve by a small number of tonotopically-arranged electrodes. Although a cochlear implant offers a powerful means of restoring some degree of hearing to profoundly deaf individuals, the outcomes for spoken communication are highly variable (Moore and Shannon, 2009). Some variability may arise from differences in peripheral representation (e.g., the degree of residual nerve survival) but some may reflect differences in higher-order linguistic processing. In order to explore this possibility, we used noise-vocoding to explore speech recognition and perceptual learning in normal-hearing listeners tested across several levels of the linguistic hierarchy: segments (consonants and vowels), single words, and sentences. Listeners improved significantly on all tasks across two test sessions. In the first session, individual differences analyses revealed two independently varying sources of variability: one lexico-semantic in nature and implicating the recognition of words and sentences, and the other an acoustic-phonetic factor associated with words and segments. However, consequent to learning, by the second session there was a more uniform covariance pattern concerning all stimulus types. A further analysis of phonetic feature recognition allowed greater insight into learning-related changes in perception and showed that, surprisingly, participants did not make full use of cues that were preserved in the stimuli (e.g., vowel duration). We discuss these findings in relation cochlear implantation, and suggest auditory training strategies to maximize speech recognition performance in the absence of

  2. Exploiting Acoustic and Syntactic Features for Automatic Prosody Labeling in a Maximum Entropy Framework

    PubMed Central

    Sridhar, Vivek Kumar Rangarajan; Bangalore, Srinivas; Narayanan, Shrikanth S.

    2009-01-01

    In this paper, we describe a maximum entropy-based automatic prosody labeling framework that exploits both language and speech information. We apply the proposed framework to both prominence and phrase structure detection within the Tones and Break Indices (ToBI) annotation scheme. Our framework utilizes novel syntactic features in the form of supertags and a quantized acoustic–prosodic feature representation that is similar to linear parameterizations of the prosodic contour. The proposed model is trained discriminatively and is robust in the selection of appropriate features for the task of prosody detection. The proposed maximum entropy acoustic–syntactic model achieves pitch accent and boundary tone detection accuracies of 86.0% and 93.1% on the Boston University Radio News corpus, and, 79.8% and 90.3% on the Boston Directions corpus. The phrase structure detection through prosodic break index labeling provides accuracies of 84% and 87% on the two corpora, respectively. The reported results are significantly better than previously reported results and demonstrate the strength of maximum entropy model in jointly modeling simple lexical, syntactic, and acoustic features for automatic prosody labeling. PMID:19603083

  3. Improved perception of speech in noise and Mandarin tones with acoustic simulations of harmonic coding for cochlear implants.

    PubMed

    Li, Xing; Nie, Kaibao; Imennov, Nikita S; Won, Jong Ho; Drennan, Ward R; Rubinstein, Jay T; Atlas, Les E

    2012-11-01

    Harmonic and temporal fine structure (TFS) information are important cues for speech perception in noise and music perception. However, due to the inherently coarse spectral and temporal resolution in electric hearing, the question of how to deliver harmonic and TFS information to cochlear implant (CI) users remains unresolved. A harmonic-single-sideband-encoder [(HSSE); Nie et al. (2008). Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing; Lie et al., (2010). Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing] strategy has been proposed that explicitly tracks the harmonics in speech and transforms them into modulators conveying both amplitude modulation and fundamental frequency information. For unvoiced speech, HSSE transforms the TFS into a slowly varying yet still noise-like signal. To investigate its potential, four- and eight-channel vocoder simulations of HSSE and the continuous-interleaved-sampling (CIS) strategy were implemented, respectively. Using these vocoders, five normal-hearing subjects' speech recognition performance was evaluated under different masking conditions; another five normal-hearing subjects' Mandarin tone identification performance was also evaluated. Additionally, the neural discharge patterns evoked by HSSE- and CIS-encoded Mandarin tone stimuli were simulated using an auditory nerve model. All subjects scored significantly higher with HSSE than with CIS vocoders. The modeling analysis demonstrated that HSSE can convey temporal pitch cues better than CIS. Overall, the results suggest that HSSE is a promising strategy to enhance speech perception with CIs. PMID:23145619

  4. Speech intelligibility in rooms: Effect of prior listening exposure interacts with room acoustics.

    PubMed

    Zahorik, Pavel; Brandewie, Eugene J

    2016-07-01

    There is now converging evidence that a brief period of prior listening exposure to a reverberant room can influence speech understanding in that environment. Although the effect appears to depend critically on the amplitude modulation characteristic of the speech signal reaching the ear, the extent to which the effect may be influenced by room acoustics has not been thoroughly evaluated. This study seeks to fill this gap in knowledge by testing the effect of prior listening exposure or listening context on speech understanding in five different simulated sound fields, ranging from anechoic space to a room with broadband reverberation time (T60) of approximately 3 s. Although substantial individual variability in the effect was observed and quantified, the context effect was, on average, strongly room dependent. At threshold, the effect was minimal in anechoic space, increased to a maximum of 3 dB on average in moderate reverberation (T60 = 1 s), and returned to minimal levels again in high reverberation. This interaction suggests that the functional effects of prior listening exposure may be limited to sound fields with moderate reverberation (0.4 ≤ T60 ≤ 1 s). PMID:27475133

  5. Discrimination of Speech Stimuli Based on Neuronal Response Phase Patterns Depends on Acoustics But Not Comprehension

    PubMed Central

    Poeppel, David

    2010-01-01

    Speech stimuli give rise to neural activity in the listener that can be observed as waveforms using magnetoencephalography. Although waveforms vary greatly from trial to trial due to activity unrelated to the stimulus, it has been demonstrated that spoken sentences can be discriminated based on theta-band (3–7 Hz) phase patterns in single-trial response waveforms. Furthermore, manipulations of the speech signal envelope and fine structure that reduced intelligibility were found to produce correlated reductions in discrimination performance, suggesting a relationship between theta-band phase patterns and speech comprehension. This study investigates the nature of this relationship, hypothesizing that theta-band phase patterns primarily reflect cortical processing of low-frequency (<40 Hz) modulations present in the acoustic signal and required for intelligibility, rather than processing exclusively related to comprehension (e.g., lexical, syntactic, semantic). Using stimuli that are quite similar to normal spoken sentences in terms of low-frequency modulation characteristics but are unintelligible (i.e., their time-inverted counterparts), we find that discrimination performance based on theta-band phase patterns is equal for both types of stimuli. Consistent with earlier findings, we also observe that whereas theta-band phase patterns differ across stimuli, power patterns do not. We use a simulation model of the single-trial response to spoken sentence stimuli to demonstrate that phase-locked responses to low-frequency modulations of the acoustic signal can account not only for the phase but also for the power results. The simulation offers insight into the interpretation of the empirical results with respect to phase-resetting and power-enhancement models of the evoked response. PMID:20484530

  6. An acoustical assessment of pitch-matching accuracy in relation to speech frequency, speech frequency range, age and gender in preschool children

    NASA Astrophysics Data System (ADS)

    Trollinger, Valerie L.

    This study investigated the relationship between acoustical measurement of singing accuracy in relationship to speech fundamental frequency, speech fundamental frequency range, age and gender in preschool-aged children. Seventy subjects from Southeastern Pennsylvania; the San Francisco Bay Area, California; and Terre Haute, Indiana, participated in the study. Speech frequency was measured by having the subjects participate in spontaneous and guided speech activities with the researcher, with 18 diverse samples extracted from each subject's recording for acoustical analysis for fundamental frequency in Hz with the CSpeech computer program. The fundamental frequencies were averaged together to derive a mean speech frequency score for each subject. Speech range was calculated by subtracting the lowest fundamental frequency produced from the highest fundamental frequency produced, resulting in a speech range measured in increments of Hz. Singing accuracy was measured by having the subjects each echo-sing six randomized patterns using the pitches Middle C, D, E, F♯, G and A (440), using the solfege syllables of Do and Re, which were recorded by a 5-year-old female model. For each subject, 18 samples of singing were recorded. All samples were analyzed by the CSpeech for fundamental frequency. For each subject, deviation scores in Hz were derived by calculating the difference between what the model sang in Hz and what the subject sang in response in Hz. Individual scores for each child consisted of an overall mean total deviation frequency, mean frequency deviations for each pattern, and mean frequency deviation for each pitch. Pearson correlations, MANOVA and ANOVA analyses, Multiple Regressions and Discriminant Analysis revealed the following findings: (1) moderate but significant (p < .001) relationships emerged between mean speech frequency and the ability to sing the pitches E, F♯, G and A in the study; (2) mean speech frequency also emerged as the strongest

  7. Quantifying the Effect of Compression Hearing Aid Release Time on Speech Acoustics and Intelligibility

    ERIC Educational Resources Information Center

    Jenstad, Lorienne M.; Souza, Pamela E.

    2005-01-01

    Compression hearing aids have the inherent, and often adjustable, feature of release time from compression. Research to date does not provide a consensus on how to choose or set release time. The current study had 2 purposes: (a) a comprehensive evaluation of the acoustic effects of release time for a single-channel compression system in quiet and…

  8. Transient Auditory Storage of Acoustic Details Is Associated with Release of Speech from Informational Masking in Reverberant Conditions

    ERIC Educational Resources Information Center

    Huang, Ying; Huang, Qiang; Chen, Xun; Wu, Xihong; Li, Liang

    2009-01-01

    Perceptual integration of the sound directly emanating from the source with reflections needs both temporal storage and correlation computation of acoustic details. We examined whether the temporal storage is frequency dependent and associated with speech unmasking. In Experiment 1, a break in correlation (BIC) between interaurally correlated…

  9. Discrimination and Comprehension of Synthetic Speech by Students with Visual Impairments: The Case of Similar Acoustic Patterns

    ERIC Educational Resources Information Center

    Papadopoulos, Konstantinos; Argyropoulos, Vassilios S.; Kouroupetroglou, Georgios

    2008-01-01

    This study examined the perceptions held by sighted students and students with visual impairments of the intelligibility and comprehensibility of similar acoustic patterns produced by synthetic speech. It determined the types of errors the students made and compared the performance of the two groups on auditory discrimination and comprehension.

  10. Observed features of acoustic gravity waves in the heterosphere

    NASA Astrophysics Data System (ADS)

    Fedorenko, A. K.; Kryuchkov, E. I.

    2014-01-01

    According to measurements on the Dynamic Explorer 2 satellite, features of the propagation of acoustic gravity waves (AGWs) in the multicomponent upper atmosphere have been investigated. In the altitude range 250-400 km in wave concentration variations of some atmospheric gases, amplitude and phase differences have been observed. Using the approach proposed in this paper, in different gases, AGW variations have been divided into components associated with elastic compression, adiabatic expansion, and the vertical background distribution. The amplitude and phase differences observed in different gases are explained on the basis of analyzing these components. It is shown how to use this effect in order to determine the wave propagation, the vertical displacement of the volume element, the wave frequency, and the spatial distribution of the wave energy density.

  11. Speech recognition in reverberant and noisy environments employing multiple feature extractors and i-vector speaker adaptation

    NASA Astrophysics Data System (ADS)

    Alam, Md Jahangir; Gupta, Vishwa; Kenny, Patrick; Dumouchel, Pierre

    2015-12-01

    The REVERB challenge provides a common framework for the evaluation of feature extraction techniques in the presence of both reverberation and additive background noise. State-of-the-art speech recognition systems perform well in controlled environments, but their performance degrades in realistic acoustical conditions, especially in real as well as simulated reverberant environments. In this contribution, we utilize multiple feature extractors including the conventional mel-filterbank, multi-taper spectrum estimation-based mel-filterbank, robust mel and compressive gammachirp filterbank, iterative deconvolution-based dereverberated mel-filterbank, and maximum likelihood inverse filtering-based dereverberated mel-frequency cepstral coefficient features for speech recognition with multi-condition training data. In order to improve speech recognition performance, we combine their results using ROVER (Recognizer Output Voting Error Reduction). For two- and eight-channel tasks, to get benefited from the multi-channel data, we also use ROVER, instead of the multi-microphone signal processing method, to reduce word error rate by selecting the best scoring word at each channel. As in a previous work, we also apply i-vector-based speaker adaptation which was found effective. In speech recognition task, speaker adaptation tries to reduce mismatch between the training and test speakers. Speech recognition experiments are conducted on the REVERB challenge 2014 corpora using the Kaldi recognizer. In our experiments, we use both utterance-based batch processing and full batch processing. In the single-channel task, full batch processing reduced word error rate (WER) from 10.0 to 9.3 % on SimData as compared to utterance-based batch processing. Using full batch processing, we obtained an average WER of 9.0 and 23.4 % on the SimData and RealData, respectively, for the two-channel task, whereas for the eight-channel task on the SimData and RealData, the average WERs found were 8

  12. Comparing the effects of reverberation and of noise on speech recognition in simulated electric-acoustic listening

    PubMed Central

    Helms Tillery, Kate; Brown, Christopher A.; Bacon, Sid P.

    2012-01-01

    Cochlear implant users report difficulty understanding speech in both noisy and reverberant environments. Electric-acoustic stimulation (EAS) is known to improve speech intelligibility in noise. However, little is known about the potential benefits of EAS in reverberation, or about how such benefits relate to those observed in noise. The present study used EAS simulations to examine these questions. Sentences were convolved with impulse responses from a model of a room whose estimated reverberation times were varied from 0 to 1 sec. These reverberated stimuli were then vocoded to simulate electric stimulation, or presented as a combination of vocoder plus low-pass filtered speech to simulate EAS. Monaural sentence recognition scores were measured in two conditions: reverberated speech and speech in a reverberated noise. The long-term spectrum and amplitude modulations of the noise were equated to the reverberant energy, allowing a comparison of the effects of the interferer (speech vs noise). Results indicate that, at least in simulation, (1) EAS provides significant benefit in reverberation; (2) the benefits of EAS in reverberation may be underestimated by those in a comparable noise; and (3) the EAS benefit in reverberation likely arises from partially preserved cues in this background accessible via the low-frequency acoustic component. PMID:22280603

  13. Acoustic Differences between Humorous and Sincere Communicative Intentions

    ERIC Educational Resources Information Center

    Hoicka, Elena; Gattis, Merideth

    2012-01-01

    Previous studies indicate that the acoustic features of speech discriminate between positive and negative communicative intentions, such as approval and prohibition. Two studies investigated whether acoustic features of speech can discriminate between two positive communicative intentions: humour and sweet-sincerity, where sweet-sincerity involved…

  14. Recognition of emotions in Mexican Spanish speech: an approach based on acoustic modelling of emotion-specific vowels.

    PubMed

    Caballero-Morales, Santiago-Omar

    2013-01-01

    An approach for the recognition of emotions in speech is presented. The target language is Mexican Spanish, and for this purpose a speech database was created. The approach consists in the phoneme acoustic modelling of emotion-specific vowels. For this, a standard phoneme-based Automatic Speech Recognition (ASR) system was built with Hidden Markov Models (HMMs), where different phoneme HMMs were built for the consonants and emotion-specific vowels associated with four emotional states (anger, happiness, neutral, sadness). Then, estimation of the emotional state from a spoken sentence is performed by counting the number of emotion-specific vowels found in the ASR's output for the sentence. With this approach, accuracy of 87-100% was achieved for the recognition of emotional state of Mexican Spanish speech.

  15. Recognition of Emotions in Mexican Spanish Speech: An Approach Based on Acoustic Modelling of Emotion-Specific Vowels

    PubMed Central

    Caballero-Morales, Santiago-Omar

    2013-01-01

    An approach for the recognition of emotions in speech is presented. The target language is Mexican Spanish, and for this purpose a speech database was created. The approach consists in the phoneme acoustic modelling of emotion-specific vowels. For this, a standard phoneme-based Automatic Speech Recognition (ASR) system was built with Hidden Markov Models (HMMs), where different phoneme HMMs were built for the consonants and emotion-specific vowels associated with four emotional states (anger, happiness, neutral, sadness). Then, estimation of the emotional state from a spoken sentence is performed by counting the number of emotion-specific vowels found in the ASR's output for the sentence. With this approach, accuracy of 87–100% was achieved for the recognition of emotional state of Mexican Spanish speech. PMID:23935410

  16. Feature Compensation Employing Multiple Environmental Models for Robust In-Vehicle Speech Recognition

    NASA Astrophysics Data System (ADS)

    Kim, Wooil; Hansen, John H. L.

    An effective feature compensation method is developed for reliable speech recognition in real-life in-vehicle environments. The CU-Move corpus, used for evaluation, contains a range of speech and noise signals collected for a number of speakers under actual driving conditions. PCGMM-based feature compensation, considered in this paper, utilizes parallel model combination to generate noise-corrupted speech model by combining clean speech and the noise model. In order to address unknown time-varying background noise, an interpolation method of multiple environmental models is employed. To alleviate computational expenses due to multiple models, an Environment Transition Model is employed, which is motivated from Noise Language Model used in Environmental Sniffing. An environment dependent scheme of mixture sharing technique is proposed and shown to be more effective in reducing the computational complexity. A smaller environmental model set is determined by the environment transition model for mixture sharing. The proposed scheme is evaluated on the connected single digits portion of the CU-Move database using the Aurora2 evaluation toolkit. Experimental results indicate that our feature compensation method is effective for improving speech recognition in real-life in-vehicle conditions. A reduction of 73.10% of the computational requirements was obtained by employing the environment dependent mixture sharing scheme with only a slight change in recognition performance. This demonstrates that the proposed method is effective in maintaining the distinctive characteristics among the different environmental models, even when selecting a large number of Gaussian components for mixture sharing.

  17. Acoustic markers of prominence influence infants' and adults' segmentation of speech sequences.

    PubMed

    Bion, Ricardo A H; Benavides-Varela, Silvia; Nespor, Marina

    2011-03-01

    Two experiments investigated the way acoustic markers of prominence influence the grouping of speech sequences by adults and 7-month-old infants. In the first experiment, adults were familiarized with and asked to memorize sequences of adjacent syllables that alternated in either pitch or duration. During the test phase, participants heard pairs of syllables with constant pitch and duration and were asked whether the syllables had appeared adjacently during familiarization. Adults were better at remembering pairs of syllables that during familiarization had short syllables preceding long syllables, or high-pitched syllables preceding low-pitched syllables. In the second experiment, infants were familiarized and tested with similar stimuli as in the first experiment, and their preference for pairs of syllables was accessed using the head-turn preference paradigm.When familiarized with syllables alternating in pitch, infants showed a preference to listen to pairs of syllables that had high pitch in the first syllable. However, no preference was found when the familiarization stream alternated in duration. It is proposed that these perceptual biases help infants and adults find linguistic units in the continuous speech stream.While the bias for grouping based on pitch appears early in development, biases for durational grouping might rely on more extensive linguistic experience. PMID:21524015

  18. Statistical evidence that musical universals derive from the acoustic characteristics of human speech

    NASA Astrophysics Data System (ADS)

    Schwartz, David; Howe, Catharine; Purves, Dale

    2003-04-01

    Listeners of all ages and societies produce a similar consonance ordering of chromatic scale tone combinations. Despite intense interest in this perceptual phenomenon over several millennia, it has no generally accepted explanation in physical, psychological, or physiological terms. Here we show that the musical universal of consonance ordering can be understood in terms of the statistical relationship between a pattern of sound pressure at the ear and the possible generative sources of the acoustic energy pattern. Since human speech is the principal naturally occurring source of tone-evoking (i.e., periodic) sound energy for human listeners, we obtained normalized spectra from more than 100000 recorded speech segments. The probability distribution of amplitude/frequency combinations derived from these spectra predicts both the fundamental frequency ratios that define the chromatic scale intervals and the consonance ordering of chromatic scale tone combinations. We suggest that these observations reveal the statistical character of the perceptual process by which the auditory system guides biologically successful behavior in response to inherently ambiguous sound stimuli.

  19. A machine for neural computation of acoustical patterns with application to real time speech recognition

    NASA Astrophysics Data System (ADS)

    Mueller, P.; Lazzaro, J.

    1986-08-01

    400 analog electronic neurons have been assembled and connected for the analysis and recognition of acoustical patterns, including speech. Input to the net comes from a set of 18 band pass filters (Qmax 300 dB/octave; 180 to 6000 Hz, log scale). The net is organized into two parts, the first performs in real time the decomposition of the input patterns into their primitives of energy, space (frequency) and time relations. The other part decodes the set of primitives. 216 neurons are dedicated to pattern decomposition. The output of the individual filters is rectified and fed to two sets of 18 neurons in an opponent center-surround organization of synaptic connections (``on center'' and (``off center''). These units compute maxima and minima of energy at different frequencies. The next two sets of neutrons compute the temporal boundaries (``on'') and ``off'') and the following two the movement of the energy maxima (formants) up or down the frequency axis. There are in addition ``hyperacuity'' units which expand the frequency resolution to 36, other units tuned to a particular range of duration of the ``on center'' units and others tuned exclusively to very low energy sounds. In order to recognize speech sounds at the phoneme or diphone level, the set of primitives belonging to the phoneme is decoded such that only one neuron or a non-overlapping group of neurons fire when the sound pattern is present at the input. For display and translation into phonetic symbols the output from these neurons is fed into an EPROM decoder and computer which displays in real time a phonetic representation of the speech input.

  20. The effect of different open plan and enclosed classroom acoustic conditions on speech perception in Kindergarten children.

    PubMed

    Mealings, Kiri T; Demuth, Katherine; Buchholz, Jörg M; Dillon, Harvey

    2015-10-01

    Open plan classrooms, where several classes are in the same room, have recently re-emerged in Australian primary schools. This paper explores how the acoustics of four Kindergarten classrooms [an enclosed classroom (25 children), double classroom (44 children), fully open plan triple classroom (91 children), and a semi-open plan K-6 "21st century learning space" (205 children)] affect speech perception. Twenty-two to 23 5-6-year-old children in each classroom participated in an online four-picture choice speech perception test while adjacent classes engaged in quiet versus noisy activities. The noise levels recorded during the test were higher the larger the classroom, except in the noisy condition for the K-6 classroom, possibly due to acoustic treatments. Linear mixed effects models revealed children's performance accuracy and speed decreased as noise level increased. Additionally, children's speech perception abilities decreased the further away they were seated from the loudspeaker in noise levels above 50 dBA. These results suggest that fully open plan classrooms are not appropriate learning environments for critical listening activities with young children due to their high intrusive noise levels which negatively affect speech perception. If open plan classrooms are desired, they need to be acoustically designed to be appropriate for critical listening activities.

  1. Speaker height estimation from speech: Fusing spectral regression and statistical acoustic models.

    PubMed

    Hansen, John H L; Williams, Keri; Bořil, Hynek

    2015-08-01

    Estimating speaker height can assist in voice forensic analysis and provide additional side knowledge to benefit automatic speaker identification or acoustic model selection for automatic speech recognition. In this study, a statistical approach to height estimation that incorporates acoustic models within a non-uniform height bin width Gaussian mixture model structure as well as a formant analysis approach that employs linear regression on selected phones are presented. The accuracy and trade-offs of these systems are explored by examining the consistency of the results, location, and causes of error as well a combined fusion of the two systems using data from the TIMIT corpus. Open set testing is also presented using the Multi-session Audio Research Project corpus and publicly available YouTube audio to examine the effect of channel mismatch between training and testing data and provide a realistic open domain testing scenario. The proposed algorithms achieve a highly competitive performance to previously published literature. Although the different data partitioning in the literature and this study may prevent performance comparisons in absolute terms, the mean average error of 4.89 cm for males and 4.55 cm for females provided by the proposed algorithm on TIMIT utterances containing selected phones suggest a considerable estimation error decrease compared to past efforts.

  2. Speech timing and linguistic rhythm: on the acoustic bases of rhythm typologies.

    PubMed

    Rathcke, Tamara V; Smith, Rachel H

    2015-05-01

    Research into linguistic rhythm has been dominated by the idea that languages can be classified according to rhythmic templates, amenable to assessment by acoustic measures of vowel and consonant durations. This study tested predictions of two proposals explaining the bases of rhythmic typologies: the Rhythm Class Hypothesis which assumes that the templates arise from an extensive vs a limited use of durational contrasts, and the Control and Compensation Hypothesis which proposes that the templates are rooted in more vs less flexible speech production strategies. Temporal properties of segments, syllables and rhythmic feet were examined in two accents of British English, a "stress-timed" variety from Leeds, and a "syllable-timed" variety spoken by Panjabi-English bilinguals from Bradford. Rhythm metrics were calculated. A perception study confirmed that the speakers of the two varieties differed in their perceived rhythm. The results revealed that both typologies were informative in that to a certain degree, they predicted temporal patterns of the two varieties. None of the metrics tested was capable of adequately reflecting the temporal complexity found in the durational data. These findings contribute to the critical evaluation of the explanatory adequacy of rhythm metrics. Acoustic bases and limitations of the traditional rhythmic typologies are discussed.

  3. Speaker height estimation from speech: Fusing spectral regression and statistical acoustic models.

    PubMed

    Hansen, John H L; Williams, Keri; Bořil, Hynek

    2015-08-01

    Estimating speaker height can assist in voice forensic analysis and provide additional side knowledge to benefit automatic speaker identification or acoustic model selection for automatic speech recognition. In this study, a statistical approach to height estimation that incorporates acoustic models within a non-uniform height bin width Gaussian mixture model structure as well as a formant analysis approach that employs linear regression on selected phones are presented. The accuracy and trade-offs of these systems are explored by examining the consistency of the results, location, and causes of error as well a combined fusion of the two systems using data from the TIMIT corpus. Open set testing is also presented using the Multi-session Audio Research Project corpus and publicly available YouTube audio to examine the effect of channel mismatch between training and testing data and provide a realistic open domain testing scenario. The proposed algorithms achieve a highly competitive performance to previously published literature. Although the different data partitioning in the literature and this study may prevent performance comparisons in absolute terms, the mean average error of 4.89 cm for males and 4.55 cm for females provided by the proposed algorithm on TIMIT utterances containing selected phones suggest a considerable estimation error decrease compared to past efforts. PMID:26328721

  4. Speech prosody abnormalities and specific dimensional schizotypy features: are relationships limited to male participants?

    PubMed

    Bedwell, Jeffrey S; Cohen, Alex S; Trachik, Benjamin J; Deptula, Andrew E; Mitchell, Jonathan C

    2014-10-01

    In schizophrenia, diminished vocal expressivity is associated with lower quality of life. Studies using computerized acoustic analysis of speech have found no evidence of diminished vocal prosody related to categorically defined schizotypy, a subclinical analogue of schizophrenia. However, existing studies have not examined the interaction between schizotypy and sex with vocal prosody measures. The current study examined 44 young adults (50% men) who were recruited to represent a continuous range of schizotypy. Speech samples were digitally recorded during autobiographical narratives and analyzed for prosody. In the male participants, variability of fundamental frequency and variability of intensity were each negatively related to the Schizotypal Personality Questionnaire (SPQ) ideas of reference subscale, whereas SPQ suspiciousness was related to a greater number of utterances, and SPQ odd behavior was related to a greater number of pauses. Because the relationships were restricted to men, and not significant in women, the results may explain earlier negative findings with schizotypy.

  5. Speech Prosody Abnormalities and Specific Dimensional Schizotypy Features: Are Relationships Limited to Males?

    PubMed Central

    Bedwell, Jeffrey S.; Cohen, Alex S.; Trachik, Benjamin J.; Deptula, Andrew E.; Mitchell, Jonathan C.

    2014-01-01

    In schizophrenia, diminished vocal expressivity is associated with lower quality of life. Studies using computerized acoustic analysis of speech have found no evidence of diminished vocal prosody related to categorically-defined schizotypy, a subclinical analog of schizophrenia. However, existing studies have not examined the interaction between schizotypy and sex with vocal prosody measures. The current study examined 44 young adults (50% male) who were recruited to represent a continuous range of schizotypy. Speech samples were digitally recorded during autobiographical narratives and analyzed for prosody. In male participants, variability of fundamental frequency and variability of intensity were each negatively related to Schizotypal Personality Questionnaire (SPQ) Ideas of Reference subscale, while SPQ Suspiciousness was related to a greater number of utterances, and SPQ Odd Behavior was related to a greater number of pauses. As the relationships were restricted to males, and not significant in females, results may explain earlier negative findings with schizotypy. PMID:25198702

  6. Joint spatial-spectral feature space clustering for speech activity detection from ECoG signals.

    PubMed

    Kanas, Vasileios G; Mporas, Iosif; Benz, Heather L; Sgarbas, Kyriakos N; Bezerianos, Anastasios; Crone, Nathan E

    2014-04-01

    Brain-machine interfaces for speech restoration have been extensively studied for more than two decades. The success of such a system will depend in part on selecting the best brain recording sites and signal features corresponding to speech production. The purpose of this study was to detect speech activity automatically from electrocorticographic signals based on joint spatial-frequency clustering of the ECoG feature space. For this study, the ECoG signals were recorded while a subject performed two different syllable repetition tasks. We found that the optimal frequency resolution to detect speech activity from ECoG signals was 8 Hz, achieving 98.8% accuracy by employing support vector machines as a classifier. We also defined the cortical areas that held the most information about the discrimination of speech and nonspeech time intervals. Additionally, the results shed light on the distinct cortical areas associated with the two syllables repetition tasks and may contribute to the development of portable ECoG-based communication.

  7. Perception of Suprasegmental Features of Speech by Children with Cochlear Implants and Children with Hearing Aids

    ERIC Educational Resources Information Center

    Most, Tova; Peled, Miriam

    2007-01-01

    This study assessed perception of suprasegmental features of speech by 30 prelingual children with sensorineural hearing loss. Ten children had cochlear implants (CIs), and 20 children wore hearing aids (HA): 10 with severe hearing loss and 10 with profound hearing loss. Perception of intonation, syllable stress, word emphasis, and word pattern…

  8. MOOD STATE PREDICTION FROM SPEECH OF VARYING ACOUSTIC QUALITY FOR INDIVIDUALS WITH BIPOLAR DISORDER

    PubMed Central

    Gideon, John; Provost, Emily Mower; McInnis, Melvin

    2016-01-01

    Speech contains patterns that can be altered by the mood of an individual. There is an increasing focus on automated and distributed methods to collect and monitor speech from large groups of patients suffering from mental health disorders. However, as the scope of these collections increases, the variability in the data also increases. This variability is due in part to the range in the quality of the devices, which in turn affects the quality of the recorded data, negatively impacting the accuracy of automatic assessment. It is necessary to mitigate variability effects in order to expand the impact of these technologies. This paper explores speech collected from phone recordings for analysis of mood in individuals with bipolar disorder. Two different phones with varying amounts of clipping, loudness, and noise are employed. We describe methodologies for use during preprocessing, feature extraction, and data modeling to correct these differences and make the devices more comparable. The results demonstrate that these pipeline modifications result in statistically significantly higher performance, which highlights the potential of distributed mental health systems. PMID:27570493

  9. Phonological Segments and Features as Planning Units in Speech Production.

    ERIC Educational Resources Information Center

    Roelofs, Ardi

    1999-01-01

    Examined phonological processes in spoken-word production, applying a form-preparation model to the question of whether phonological features could be preplanned to facilitate word production. Results are explained in terms of the WEAVER model of word-form encoding, which follows a serial encoding of segments with a parallel activation of…

  10. Linguistic Features of Speech in Tense Emotional State.

    ERIC Educational Resources Information Center

    Nosenko, E. E.

    The report contains a comparison of oral expression by the same experimental subjects under normal conditions and in a state of emotional stress. The study permits isolation of linguistic features of the formation of oral expression under emotional tension. This report is a translation of an article originally written in Russian. (NTIS/KM)

  11. Flow and acoustic features of a supersonic tapered nozzle

    NASA Astrophysics Data System (ADS)

    Gutmark, E.; Bowman, H. L.; Schadow, K. C.

    1992-05-01

    The acoustic and flow characteristics of a supersonic tapered jet were measured for free and shrouded flow configurations. Measurements were performed for a full range of pressure ratios including over- and underexpanded and design conditions. The supersonic tapered jet is issued from a converging-diverging nozzle with a 3∶1 rectangular slotted throat and a conical diverging section leading to a circular exit. The jet was compared to circular and rectangular supersonic jets operating at identical conditions. The distinct feature of the jet is the absence of screech tones in the entire range of operation. Its near-field pressure fluctuations have a wide band spectrum in the entire range of measurements, for Mach numbers of 1 to 2.5, for over- and underexpanded conditions. The free jet's spreading rate is nearly constant and similar to the rectangular jet, and in a shroud, the pressure drop it is inducing is linearly proportional to the primary jet Mach number. This behavior persisted in high adverse pressure gradients at overexpanded conditions, and with nozzle divergence angles of up to 35°, no inside flow separation was observed.

  12. Acoustic source characteristics, across-formant integration, and speech intelligibility under competitive conditions.

    PubMed

    Roberts, Brian; Summers, Robert J; Bailey, Peter J

    2015-06-01

    An important aspect of speech perception is the ability to group or select formants using cues in the acoustic source characteristics--for example, fundamental frequency (F0) differences between formants promote their segregation. This study explored the role of more radical differences in source characteristics. Three-formant (F1+F2+F3) synthetic speech analogues were derived from natural sentences. In Experiment 1, F1+F3 were generated by passing a harmonic glottal source (F0 = 140 Hz) through second-order resonators (H1+H3); in Experiment 2, F1+F3 were tonal (sine-wave) analogues (T1+T3). F2 could take either form (H2 or T2). In some conditions, the target formants were presented alone, either monaurally or dichotically (left ear = F1+F3; right ear = F2). In others, they were accompanied by a competitor for F2 (F1+F2C+F3; F2), which listeners must reject to optimize recognition. Competitors (H2C or T2C) were created using the time-reversed frequency and amplitude contours of F2. Dichotic presentation of F2 and F2C ensured that the impact of the competitor arose primarily through informational masking. In the absence of F2C, the effect of a source mismatch between F1+F3 and F2 was relatively modest. When F2C was present, intelligibility was lowest when F2 was tonal and F2C was harmonic, irrespective of which type matched F1+F3. This finding suggests that source type and context, rather than similarity, govern the phonetic contribution of a formant. It is proposed that wideband harmonic analogues are more effective informational maskers than narrowband tonal analogues, and so become dominant in across-frequency integration of phonetic information when placed in competition. PMID:25751040

  13. Sine-wave and noise-vocoded sine-wave speech in a tone language: Acoustic details matter.

    PubMed

    Rosen, Stuart; Hui, Sze Ngar Catherine

    2015-12-01

    Sine-wave speech (SWS) is a highly simplified version of speech consisting only of frequency- and amplitude-modulated sinusoids representing the formants. That listeners can successfully understand SWS has led to claims that speech perception must be based on abstract properties of the stimuli far removed from their specific acoustic form. Here it is shown, in bilingual Cantonese/English listeners, that performance with Cantonese SWS is improved by noise vocoding, with no effect on English SWS utterances. This manipulation preserves the abstract informational structure in the signals but changes its surface form. The differential effects of noise vocoding likely arise from the fact that Cantonese is a tonal language and hence more reliant on fundamental frequency (F0) contours for its intelligibility. SWS does not preserve tonal information from the original speech but does have false tonal information signalled by the lowest frequency sinusoid. Noise vocoding SWS appears to minimise the tonal percept, which thus interferes less in the perception of Cantonese. It has no effect in English, which is minimally reliant on F0 variations for intelligibility. Therefore it is not only the informational structure of a sound that is important but also how its acoustic detail interacts with the phonological structure of a given language.

  14. Effects of a music therapy voice protocol on speech intelligibility, vocal acoustic measures, and mood of individuals with Parkinson's disease.

    PubMed

    Haneishi, E

    2001-01-01

    This study examined the effects of a Music Therapy Voice Protocol (MTVP) on speech intelligibility, vocal intensity, maximum vocal range, maximum duration of sustained vowel phonation, vocal fundamental frequency, vocal fundamental frequency variability, and mood of individuals with Parkinson's disease. Four female patients, who demonstrated voice and speech problems, served as their own controls and participated in baseline assessment (study pretest), a series of MTVP sessions involving vocal and singing exercises, and final evaluation (study posttest). In study pre and posttests, data for speech intelligibility and all acoustic variables were collected. Statistically significant increases were found in speech intelligibility, as rated by caregivers, and in vocal intensity from study pretest to posttest as the results of paired samples t-tests. In addition, before and after each MTVP session (session pre and posttests), self-rated mood scores and selected acoustic variables were collected. No significant differences were found in any of the variables from the session pretests to posttests, across the entire treatment period, or their interactions as the results of two-way ANOVAs with repeated measures. Although not significant, the mean of mood scores in session posttests (M = 8.69) was higher than that in session pretests (M = 7.93). PMID:11796078

  15. Sine-wave and noise-vocoded sine-wave speech in a tone language: Acoustic details matter.

    PubMed

    Rosen, Stuart; Hui, Sze Ngar Catherine

    2015-12-01

    Sine-wave speech (SWS) is a highly simplified version of speech consisting only of frequency- and amplitude-modulated sinusoids representing the formants. That listeners can successfully understand SWS has led to claims that speech perception must be based on abstract properties of the stimuli far removed from their specific acoustic form. Here it is shown, in bilingual Cantonese/English listeners, that performance with Cantonese SWS is improved by noise vocoding, with no effect on English SWS utterances. This manipulation preserves the abstract informational structure in the signals but changes its surface form. The differential effects of noise vocoding likely arise from the fact that Cantonese is a tonal language and hence more reliant on fundamental frequency (F0) contours for its intelligibility. SWS does not preserve tonal information from the original speech but does have false tonal information signalled by the lowest frequency sinusoid. Noise vocoding SWS appears to minimise the tonal percept, which thus interferes less in the perception of Cantonese. It has no effect in English, which is minimally reliant on F0 variations for intelligibility. Therefore it is not only the informational structure of a sound that is important but also how its acoustic detail interacts with the phonological structure of a given language. PMID:26723325

  16. Voice Modulations in German Ironic Speech

    ERIC Educational Resources Information Center

    Scharrer, Lisa; Christmann, Ursula; Knoll, Monja

    2011-01-01

    Previous research has shown that in different languages ironic speech is acoustically modulated compared to literal speech, and these modulations are assumed to aid the listener in the comprehension process by acting as cues that mark utterances as ironic. The present study was conducted to identify paraverbal features of German "ironic criticism"…

  17. Speech spectrogram expert

    SciTech Connect

    Johannsen, J.; Macallister, J.; Michalek, T.; Ross, S.

    1983-01-01

    Various authors have pointed out that humans can become quite adept at deriving phonetic transcriptions from speech spectrograms (as good as 90percent accuracy at the phoneme level). The authors describe an expert system which attempts to simulate this performance. The speech spectrogram expert (spex) is actually a society made up of three experts: a 2-dimensional vision expert, an acoustic-phonetic expert, and a phonetics expert. The visual reasoning expert finds important visual features of the spectrogram. The acoustic-phonetic expert reasons about how visual features relates to phonemes, and about how phonemes change visually in different contexts. The phonetics expert reasons about allowable phoneme sequences and transformations, and deduces an english spelling for phoneme strings. The speech spectrogram expert is highly interactive, allowing users to investigate hypotheses and edit rules. 10 references.

  18. Acoustic changes in the production of lexical stress during Lombard speech.

    PubMed

    Arciuli, Joanne; Simpson, Briony S; Vogel, Adam P; Ballard, Kirrie J

    2014-06-01

    The Lombard effect describes the phenomenon of individuals increasing their vocal intensity when speaking in the presence of background noise. Here, we conducted an investigation of the production of lexical stress during Lombard speech. Participants (N = 27) produced the same sentences in three conditions: one quiet condition and two noise conditions at 70 dB (white noise; multi-talker babble). Manual acoustic analyses (syllable duration, vowel intensity, and vowel fundamental frequency) were completed for repeated productions of two trisyllabic words with opposing patterns of lexical stress (weak-strong; strong-weak) in each of the three conditions. In total, 324 productions were analysed (12 utterances per participant). Results revealed that, rather than increasing vocal intensity equally across syllables, participants alter the degree of stress contrastivity when speaking in noise. This was especially evident in the production of strong-weak lexical stress where there was an increase in contrastivity across syllables in terms of intensity and fundamental frequency. This preliminary study paves the way for further research that is needed to establish these findings using a larger set of multisyllabic stimuli.

  19. Aerodynamic and acoustical measures of speech, operatic, and Broadway vocal styles in a professional female singer.

    PubMed

    Stone, R E; Cleveland, Thomas F; Sundberg, P Johan; Prokop, Jan

    2003-09-01

    Understanding how the voice is used in different styles of singing is commonly based on intuitive descriptions offered by performers who are proficient in only one style. Such descriptions are debatable, lack reproducibility, and lack scientifically derived explanations of the characteristics. We undertook acoustic and aerodynamic analyses of a female subject with professional experience in both operatic and Broadway styles of singing, who sang examples in these two styles. How representative the examples are of the respective styles was investigated by means of a listening test. Further, as a reference point, we compared the styles with her speech. Variation in styles associated with pitch and vocal loudness was investigated for various parameters: subglottal pressure, closed quotient, glottal leakage, H1-H2 difference (the level difference between the two lowest partials of the source spectrum), and glottal compliance (the ratio between the air volume displaced in a glottal pulse and the subglottal pressure). Formant frequencies, long-term-average spectrum, and vibrato characteristics were also studied. Characteristics of operatic style emerge as distinctly different from Broadway style, the latter being more similar to speaking. PMID:14513952

  20. Production and perception of clear speech

    NASA Astrophysics Data System (ADS)

    Bradlow, Ann R.

    2003-04-01

    When a talker believes that the listener is likely to have speech perception difficulties due to a hearing loss, background noise, or a different native language, she or he will typically adopt a clear speaking style. Previous research has established that, with a simple set of instructions to the talker, ``clear speech'' can be produced by most talkers under laboratory recording conditions. Furthermore, there is reliable evidence that adult listeners with either impaired or normal hearing typically find clear speech more intelligible than conversational speech. Since clear speech production involves listener-oriented articulatory adjustments, a careful examination of the acoustic-phonetic and perceptual consequences of the conversational-to-clear speech transformation can serve as an effective window into talker- and listener-related forces in speech communication. Furthermore, clear speech research has considerable potential for the development of speech enhancement techniques. After reviewing previous and current work on the acoustic properties of clear versus conversational speech, this talk will present recent data from a cross-linguistic study of vowel production in clear speech and a cross-population study of clear speech perception. Findings from these studies contribute to an evolving view of clear speech production and perception as reflecting both universal, auditory and language-specific, phonological contrast enhancement features.

  1. Language-Specific Developmental Differences in Speech Production: A Cross-Language Acoustic Study

    ERIC Educational Resources Information Center

    Li, Fangfang

    2012-01-01

    Speech productions of 40 English- and 40 Japanese-speaking children (aged 2-5) were examined and compared with the speech produced by 20 adult speakers (10 speakers per language). Participants were recorded while repeating words that began with "s" and "sh" sounds. Clear language-specific patterns in adults' speech were found, with English…

  2. Gunshot acoustic signature specific features and false alarms reduction

    NASA Astrophysics Data System (ADS)

    Donzier, Alain; Millet, Joel

    2005-05-01

    This paper provides a detailed analysis of the most specific parameters of gunshot signatures through models as well as through real data. The models for the different contributions to gunshot typical signature (shock and muzzle blast) are presented and used to discuss the variation of measured signatures over the different environmental conditions and shot configurations. The analysis is followed by a description of the performance requirements for gunshot detection systems, from sniper detection that was the main concern 10 years ago, to the new and more challenging conditions faced in today operations. The work presented examines the process of how systems are deployed and used as well as how the operational environment has changed. The main sources of false alarms and new threats such as RPGs and mortars that acoustic gunshot detection systems have to face today are also defined and discussed. Finally, different strategies for reducing false alarms are proposed based on the acoustic signatures. Different strategies are presented through various examples of specific missions ranging from vehicle protection to area protection. These strategies not only include recommendation on how to handle acoustic information for the best efficiency of the acoustic detector but also recommends some add-on sensors to enhance system overall performance.

  3. Clinical and acoustical variability in hypokinetic dysarthria

    SciTech Connect

    Metter, E.J.; Hanson, W.R.

    1986-10-01

    Ten male patients with parkinsonism secondary to Parkinson's disease or progressive supranuclear palsy had clinical neurological, speech, and acoustical speech evaluations. In addition, seven of the patients were evaluated by x-ray computed tomography (CT) and (F-18)-fluorodeoxyglucose (FDG) positron emission tomography (PET). Extensive variability of speech features, both clinical and acoustical, were found and seemed to be independent of the severity of any parkinsonian sign, CT, or FDG PET. In addition, little relationship existed between the variability across each measured speech feature. What appeared to be important for the appearance of abnormal acoustic measures was the degree of overall severity of the dysarthria. These observations suggest that a better understanding of hypokinetic dysarthria may result from more extensive examination of the variability between patients. Emphasizing a specific feature such as rapid speaking rate in characterizing hypokinetic dysarthria focuses on a single and inconstant finding in a complex speech pattern.

  4. Modeling words with subword units in an articulatorily constrained speech recognition algorithm

    SciTech Connect

    Hogden, J.

    1997-11-20

    The goal of speech recognition is to find the most probable word given the acoustic evidence, i.e. a string of VQ codes or acoustic features. Speech recognition algorithms typically take advantage of the fact that the probability of a word, given a sequence of VQ codes, can be calculated.

  5. Acoustics in educational settings. Subcommittee on Acoustics in Educational Settings of the Bioacoustics Standards and Noise Standards Committee American Speech-Language-Hearing Association.

    PubMed

    1995-03-01

    The Americans with Disabilities Act (enacted July 26, 1990) has brought into focus the need for removing barriers and improving accessibility of all buildings and facilities. It is clear that the definition of barrier must be expanded to include not only structural features that limit physical accessibility, but also acoustical barriers that limit access to communication and information. Acoustical interference caused by inappropriate levels of background noise and reverberation presents a barrier to learning and communication in educational settings and school-sponsored extracurricular activities, particularly for students with hearing loss or other language/learning concerns. ASHA has provided these guidelines and acoustical improvement strategies in order to assist communication-related professionals, teachers, school officials, architects, contractors, state education agencies, and others in developing the best possible learning environment for all students. Additional research on both the acoustical characteristics of learning environments and the communication requirements of learners is encouraged. PMID:7696882

  6. Voice modulations in German ironic speech.

    PubMed

    Scharrer, Lisa; Christmann, Ursula; Knoll, Monja

    2011-12-01

    Previous research has shown that in different languages ironic speech is acoustically modulated compared to literal speech,and these modulations are assumed to aid the listener in the comprehension process by acting as cues that mark utterances as ironic. The present study was conducted to identify paraverbal features of German 'ironic criticism' that may possibly act as irony cues by comparing acoustic measures of ironic and literal speech. For this purpose, samples of scripted ironic and literal target utterances produced by 14 female speakers were recorded and acoustically analyzed. Results showed that in contrast to literal remarks, ironic criticism was characterized by a decreased mean fundamental frequency (F0), raised energy levels and increased vowel duration, whereas F0-contours differed only marginally between both speech types. Furthermore, we found ironic speech to be characterized by vowel hyperarticulation,an acoustic feature which has so far not been considered as a possible irony cue. Contrary to our expectations, voice modulations in ironic speech were applied independently from the availability of additional, visual irony cues.The results are discussed in light of previous findings on acoustic features of irony yielded for other languages.

  7. Associations between voice ergonomic risk factors and acoustic features of the voice.

    PubMed

    Rantala, Leena M; Hakala, Suvi; Holmqvist, Sofia; Sala, Eeva

    2015-10-01

    The associations between voice ergonomic risk factors in 40 classrooms and the acoustic parameters of 40 schoolteachers' voices were investigated. The risk factors assessed were connected to participants' working practices, working postures, and the indoor air quality in their workplaces. The teachers recorded spontaneous speech and sustained /a/ before and after a working day. Fundamental frequency, sound pressure level, the slope of the spectrum, perturbation, and harmonic-to-noise ratio were analysed. The results showed that the more the voice ergonomic risk factors were involved, the louder the teachers' voices became. Working practices correlated most often with the acoustic parameters; associations were found especially before a working day. The results suggest that a risky voice ergonomic environment affects voice production. PMID:24007529

  8. Associations between voice ergonomic risk factors and acoustic features of the voice.

    PubMed

    Rantala, Leena M; Hakala, Suvi; Holmqvist, Sofia; Sala, Eeva

    2015-10-01

    The associations between voice ergonomic risk factors in 40 classrooms and the acoustic parameters of 40 schoolteachers' voices were investigated. The risk factors assessed were connected to participants' working practices, working postures, and the indoor air quality in their workplaces. The teachers recorded spontaneous speech and sustained /a/ before and after a working day. Fundamental frequency, sound pressure level, the slope of the spectrum, perturbation, and harmonic-to-noise ratio were analysed. The results showed that the more the voice ergonomic risk factors were involved, the louder the teachers' voices became. Working practices correlated most often with the acoustic parameters; associations were found especially before a working day. The results suggest that a risky voice ergonomic environment affects voice production.

  9. Designing acoustics for linguistically diverse classrooms: Effects of background noise, reverberation and talker foreign accent on speech comprehension by native and non-native English-speaking listeners

    NASA Astrophysics Data System (ADS)

    Peng, Zhao Ellen

    The current classroom acoustics standard (ANSI S12.60-2010) recommends core learning spaces not to exceed background noise level (BNL) of 35 dBA and reverberation time (RT) of 0.6 second, based on speech intelligibility performance mainly by the native English-speaking population. Existing literature has not correlated these recommended values well with student learning outcomes. With a growing population of non-native English speakers in American classrooms, the special needs for perceiving degraded speech among non-native listeners, either due to realistic room acoustics or talker foreign accent, have not been addressed in the current standard. This research seeks to investigate the effects of BNL and RT on the comprehension of English speech from native English and native Mandarin Chinese talkers as perceived by native and non-native English listeners, and to provide acoustic design guidelines to supplement the existing standard. This dissertation presents two studies on the effects of RT and BNL on more realistic classroom learning experiences. How do native and non-native English-speaking listeners perform on speech comprehension tasks under adverse acoustic conditions, if the English speech is produced by talkers of native English (Study 1) versus native Mandarin Chinese (Study 2)? Speech comprehension materials were played back in a listening chamber to individual listeners: native and non-native English-speaking in Study 1; native English, native Mandarin Chinese, and other non-native English-speaking in Study 2. Each listener was screened for baseline English proficiency level, and completed dual tasks simultaneously involving speech comprehension and adaptive dot-tracing under 15 acoustic conditions, comprised of three BNL conditions (RC-30, 40, and 50) and five RT scenarios (0.4 to 1.2 seconds). The results show that BNL and RT negatively affect both objective performance and subjective perception of speech comprehension, more severely for non

  10. Acoustic Analysis of Clear Versus Conversational Speech in Individuals with Parkinson Disease

    ERIC Educational Resources Information Center

    Goberman, A.M.; Elmer, L.W.

    2005-01-01

    A number of studies have been devoted to the examination of clear versus conversational speech in non-impaired speakers. The purpose of these previous studies has been primarily to help increase speech intelligibility for the benefit of hearing-impaired listeners. The goal of the present study was to examine differences between conversational and…

  11. A Computational Model of Word Segmentation from Continuous Speech Using Transitional Probabilities of Atomic Acoustic Events

    ERIC Educational Resources Information Center

    Rasanen, Okko

    2011-01-01

    Word segmentation from continuous speech is a difficult task that is faced by human infants when they start to learn their native language. Several studies indicate that infants might use several different cues to solve this problem, including intonation, linguistic stress, and transitional probabilities between subsequent speech sounds. In this…

  12. Acoustic Analysis of the Speech of Children with Cochlear Implants: A Longitudinal Study

    ERIC Educational Resources Information Center

    Liker, Marko; Mildner, Vesna; Sindija, Branka

    2007-01-01

    The aim of the study was to analyse the speech of the children with cochlear implants, and compare it with the speech of hearing controls. We focused on three categories of Croatian sounds: vowels (F1 and F2 frequencies), fricatives (noise frequencies of /s/ and /[esh]/ ), and affricates (total duration and the pattern of stop-fricative components…

  13. Prosodic Contrasts in Ironic Speech

    ERIC Educational Resources Information Center

    Bryant, Gregory A.

    2010-01-01

    Prosodic features in spontaneous speech help disambiguate implied meaning not explicit in linguistic surface structure, but little research has examined how these signals manifest themselves in real conversations. Spontaneously produced verbal irony utterances generated between familiar speakers in conversational dyads were acoustically analyzed…

  14. Fifty years of progress in acoustic phonetics

    NASA Astrophysics Data System (ADS)

    Stevens, Kenneth N.

    2004-10-01

    Three events that occurred 50 or 60 years ago shaped the study of acoustic phonetics, and in the following few decades these events influenced research and applications in speech disorders, speech development, speech synthesis, speech recognition, and other subareas in speech communication. These events were: (1) the source-filter theory of speech production (Chiba and Kajiyama; Fant); (2) the development of the sound spectrograph and its interpretation (Potter, Kopp, and Green; Joos); and (3) the birth of research that related distinctive features to acoustic patterns (Jakobson, Fant, and Halle). Following these events there has been systematic exploration of the articulatory, acoustic, and perceptual bases of phonological categories, and some quantification of the sources of variability in the transformation of this phonological representation of speech into its acoustic manifestations. This effort has been enhanced by studies of how children acquire language in spite of this variability and by research on speech disorders. Gaps in our knowledge of this inherent variability in speech have limited the directions of applications such as synthesis and recognition of speech, and have led to the implementation of data-driven techniques rather than theoretical principles. Some examples of advances in our knowledge, and limitations of this knowledge, are reviewed.

  15. Acoustic correlates of vowel intelligibility in clear and conversational speech for young normal-hearing and elderly hearing-impaired listeners.

    PubMed

    Ferguson, Sarah Hargus; Quené, Hugo

    2014-06-01

    The present investigation carried out acoustic analyses of vowels in clear and conversational speech produced by 41 talkers. Mixed-effects models were then deployed to examine relationships among acoustic and perceptual data for these vowels. Acoustic data include vowel duration, steady-state formant frequencies, and two measures of dynamic formant movement. Perceptual data consist of vowel intelligibility in noise for young normal-hearing and elderly hearing-impaired listeners, as reported by Ferguson in 2004 and 2012 [J. Acoust. Soc. Am. 116, 2365-2373 (2004); J. Speech Lang. Hear. Res. 55, 779-790 (2012)], respectively. Significant clear speech effects were observed for all acoustic metrics, although not all measures changed for all vowels and considerable talker variability was observed. Mixed-effects analyses revealed that the contribution of duration and steady-state formant information to vowel intelligibility differed for the two listener groups. This outcome is consistent with earlier research suggesting that hearing loss, and possibly aging, alters the way acoustic cues are used for identifying vowels.

  16. The neuroanatomical and functional organization of speech perception.

    PubMed

    Scott, Sophie K; Johnsrude, Ingrid S

    2003-02-01

    A striking property of speech perception is its resilience in the face of acoustic variability (among speech sounds produced by different speakers at different times, for example). The robustness of speech perception might, in part, result from multiple, complementary representations of the input, which operate in both acoustic-phonetic feature-based and articulatory-gestural domains. Recent studies of the anatomical and functional organization of the non-human primate auditory cortical system point to multiple, parallel, hierarchically organized processing pathways that involve the temporal, parietal and frontal cortices. Functional neuroimaging evidence indicates that a similar organization might underlie speech perception in humans. These parallel, hierarchical processing 'streams', both within and across hemispheres, might operate on distinguishable, complementary types of representations and subserve complementary types of processing. Two long-opposing views of speech perception have posited a basis either in acoustic feature processing or in gestural motor processing; the view put forward here might help reconcile these positions. PMID:12536133

  17. A Model for Speech Processing in Second Language Listening Activities

    ERIC Educational Resources Information Center

    Zoghbor, Wafa Shahada

    2016-01-01

    Teachers' understanding of the process of speech perception could inform practice in listening classrooms. Catford (1950) developed a model for speech perception taking into account the influence of the acoustic features of the linguistic forms used by the speaker, whereby the listener "identifies" and "interprets" these…

  18. Cross-Channel Amplitude Sweeps Are Crucial to Speech Intelligibility

    ERIC Educational Resources Information Center

    Prendergast, Garreth; Green, Gary G. R.

    2012-01-01

    Classical views of speech perception argue that the static and dynamic characteristics of spectral energy peaks (formants) are the acoustic features that underpin phoneme recognition. Here we use representations where the amplitude modulations of sub-band filtered speech are described, precisely, in terms of co-sinusoidal pulses. These pulses are…

  19. Acoustic and aerodynamic characteristics of Country-Western, Operatic and Broadway singing styles compared to speech

    NASA Astrophysics Data System (ADS)

    Stone, Robert E.; Cleveland, Thomas F.; Sundberg, P. Johan

    2003-04-01

    Acoustic and aerodynamic measures were used to objectively describe characteristics of Country-Western (C-W) singing in a group of six premier performers in a series of studies and of operatic and Broadway singing in a female subject with professional experience in both styles of singing. For comparison purposes the same measures also were applied to individuals while speaking the same material as sung. Changes in pitch and vocal loudness were investigated for various dependent variables, including subglottal pressure, closed quotient, glottal leakage, H1-H2 difference [the level difference between the two lowest partials of the source spectrum and glottal compliance (the ratio between the air volume displaced in a glottal pulse and the subglottal pressure)], formant frequencies, long-term-average spectrum and vibrato characteristics (in operatic versus Broadway singing). Data from C-W singers suggest they use higher sub-glottal pressures in singing than in speaking. Changes in vocal intensity for doubling sub-glottal pressure is less than reported for classical singers. Several measures were similar for both speaking and C-W singing. Whereas results provide objective specification of differences between operatic and Broadway styles of singing, the latter seems similar to features of conversational speaking style.

  20. Nonlinear Statistical Modeling of Speech

    NASA Astrophysics Data System (ADS)

    Srinivasan, S.; Ma, T.; May, D.; Lazarou, G.; Picone, J.

    2009-12-01

    Contemporary approaches to speech and speaker recognition decompose the problem into four components: feature extraction, acoustic modeling, language modeling and search. Statistical signal processing is an integral part of each of these components, and Bayes Rule is used to merge these components into a single optimal choice. Acoustic models typically use hidden Markov models based on Gaussian mixture models for state output probabilities. This popular approach suffers from an inherent assumption of linearity in speech signal dynamics. Language models often employ a variety of maximum entropy techniques, but can employ many of the same statistical techniques used for acoustic models. In this paper, we focus on introducing nonlinear statistical models to the feature extraction and acoustic modeling problems as a first step towards speech and speaker recognition systems based on notions of chaos and strange attractors. Our goal in this work is to improve the generalization and robustness properties of a speech recognition system. Three nonlinear invariants are proposed for feature extraction: Lyapunov exponents, correlation fractal dimension, and correlation entropy. We demonstrate an 11% relative improvement on speech recorded under noise-free conditions, but show a comparable degradation occurs for mismatched training conditions on noisy speech. We conjecture that the degradation is due to difficulties in estimating invariants reliably from noisy data. To circumvent these problems, we introduce two dynamic models to the acoustic modeling problem: (1) a linear dynamic model (LDM) that uses a state space-like formulation to explicitly model the evolution of hidden states using an autoregressive process, and (2) a data-dependent mixture of autoregressive (MixAR) models. Results show that LDM and MixAR models can achieve comparable performance with HMM systems while using significantly fewer parameters. Currently we are developing Bayesian parameter estimation and

  1. Acoustic features of normal-hearing pre-term infant cry.

    PubMed

    Cacace, A T; Robb, M P; Saxman, J H; Risemberg, H; Koltai, P

    1995-11-01

    Acoustic features of expiratory cry vocalizations were studied in 125 pre-term infants prior to being discharged from a level-3 neonatal intensive care unit. The purpose was to describe various phonatory behaviors in infants in whom significant hearing loss could be ruled out. We also compared these results with normal-hearing full-term infants, and evaluated whether linkage exists among acoustic cry features and various anthropometric, diagnostic and treatment variables obtained throughout the peri- and neonatal periods. Our analysis revealed that cry duration was significantly related to total days receiving respiratory assistance. The occurrence of other complex spectral and temporal aspects of acoustic cry vocalizations including harmonic doubling and vibrato also increased in infants receiving some form of respiratory assistance. The presence of harmonic doubling also depended on weight and conceptional age at test. The discussion focuses on the implication of these relationships and directions for future research. PMID:8557478

  2. Acoustic Variations in Adductor Spasmodic Dysphonia as a Function of Speech Task.

    ERIC Educational Resources Information Center

    Sapienza, Christine M.; Walton, Suzanne; Murry, Thomas

    1999-01-01

    Acoustic phonatory events were identified in 14 women diagnosed with adductor spasmodic dysphonia (ADSD), a focal laryngeal dystonia that disturbs phonatory function, and compared with those of 14 age-matched women with no vocal dysfunction. Findings indicated ADSD subjects produced more aberrant acoustic events than controls during tasks of…

  3. Acoustic features of infant vocalic utterances at 3, 6, and 9 months.

    PubMed

    Kent, R D; Murray, A D

    1982-08-01

    Recordings were obtained of the comfort-state vocalizations of infants at 3, 6, and 9 months of age during a session of play and vocal interaction with the infant's mother and the experimenter. Acoustic analysis, primarily spectrography, was used to determine utterance durations, formant frequencies of vocalic utterances, patterns of f0 frequency change during vocalizations, variations in source excitation of the vocal tract, and general properties of the utterances. Most utterances had durations of less than 400 ms although occasional sounds lasted 2 s or more. An increase in the ranges of both the F1 and F2 frequencies was observed across both periods of age increase, but the center of the F1-F2 plot for the group vowels appeared to change very little. Phonatory characteristics were at least generally compatible with published descriptions of infant cry. The f0 frequency averaged 445 Hz for 3-month-olds, 450 Hz for 6-month-olds, and 415 Hz for 9-month-olds. As has been previously reported for infant cry, the vocalizations frequently were associated with tremor (vibrato), harmonic doubling, abrupt f0 shift, vocal fry (or roll), and noise segments. Thus, from a strictly acoustic perspective, early cry and the later vocalizations of cooing and babbling appear to be vocal performances in continuity. Implications of the acoustic analyses are discussed for phonetic development and speech acquisition.

  4. Perception of Suprasegmental Speech Features via Bimodal Stimulation: Cochlear Implant on One Ear and Hearing Aid on the Other

    ERIC Educational Resources Information Center

    Most, Tova; Harel, Tamar; Shpak, Talma; Luntz, Michal

    2011-01-01

    Purpose: The purpose of the study was to evaluate the contribution of acoustic hearing to the perception of suprasegmental features by adults who use a cochlear implant (CI) and a hearing aid (HA) in opposite ears. Method: 23 adults participated in this study. Perception of suprasegmental features--intonation, syllable stress, and word…

  5. Neuromagnetic Evidence for a Featural Distinction of English Consonants: Sensor- and Source-Space Data

    ERIC Educational Resources Information Center

    Scharinger, Mathias; Merickel, Jennifer; Riley, Joshua; Idsardi, William J.

    2011-01-01

    Speech sounds can be classified on the basis of their underlying articulators or on the basis of the acoustic characteristics resulting from particular articulatory positions. Research in speech perception suggests that distinctive features are based on both articulatory and acoustic information. In recent years, neuroelectric and neuromagnetic…

  6. Comparison of Acoustic and Kinematic Approaches to Measuring Utterance-Level Speech Variability

    ERIC Educational Resources Information Center

    Howell, Peter; Anderson, Andrew J.; Bartrip, Jon; Bailey, Eleanor

    2009-01-01

    Purpose: The spatiotemporal index (STI) is one measure of variability. As currently implemented, kinematic data are used, requiring equipment that cannot be used with some patient groups or in scanners. An experiment is reported that addressed whether STI can be extended to an audio measure of sound pressure of the speech envelope over time that…

  7. The Exploitation of Subphonemic Acoustic Detail in L2 Speech Segmentation

    ERIC Educational Resources Information Center

    Shoemaker, Ellenor

    2014-01-01

    The current study addresses an aspect of second language (L2) phonological acquisition that has received little attention to date--namely, the acquisition of allophonic variation as a word boundary cue. The role of subphonemic variation in the segmentation of speech by native speakers has been indisputably demonstrated; however, the acquisition of…

  8. Impact of Aberrant Acoustic Properties on the Perception of Sound Quality in Electrolarynx Speech

    ERIC Educational Resources Information Center

    Meltzner, Geoffrey S.; Hillman, Robert E.

    2005-01-01

    A large percentage of patients who have undergone laryngectomy to treat advanced laryngeal cancer rely on an electrolarynx (EL) to communicate verbally. Although serviceable, EL speech is plagued by shortcomings in both sound quality and intelligibility. This study sought to better quantify the relative contributions of previously identified…

  9. Alarming features: birds use specific acoustic properties to identify heterospecific alarm calls

    PubMed Central

    Fallow, Pamela M.; Pitcher, Benjamin J.; Magrath, Robert D.

    2013-01-01

    Vertebrates that eavesdrop on heterospecific alarm calls must distinguish alarms from sounds that can safely be ignored, but the mechanisms for identifying heterospecific alarm calls are poorly understood. While vertebrates learn to identify heterospecific alarms through experience, some can also respond to unfamiliar alarm calls that are acoustically similar to conspecific alarm calls. We used synthetic calls to test the role of specific acoustic properties in alarm call identification by superb fairy-wrens, Malurus cyaneus. Individuals fled more often in response to synthetic calls with peak frequencies closer to those of conspecific calls, even if other acoustic features were dissimilar to that of fairy-wren calls. Further, they then spent more time in cover following calls that had both peak frequencies and frequency modulation rates closer to natural fairy-wren means. Thus, fairy-wrens use similarity in specific acoustic properties to identify alarms and adjust a two-stage antipredator response. Our study reveals how birds respond to heterospecific alarm calls without experience, and, together with previous work using playback of natural calls, shows that both acoustic similarity and learning are important for interspecific eavesdropping. More generally, this study reconciles contrasting views on the importance of alarm signal structure and learning in recognition of heterospecific alarms. PMID:23303539

  10. A combination of vocal fo dynamic and summary features discriminates between three pragmatic categories of infant-directed speech.

    PubMed

    Katz, G S; Cohn, J F; Moore, C A

    1996-02-01

    To assess the relative contribution of dynamic and summary features of vocal fundamental frequency (f0) to the statistical discrimination of pragmatic categories in infant-directed speech, 49 mothers were instructed to use their voice to get their 4-month-old baby's attention, show approval, and provide comfort. Vocal f0 from 621 tokens was extracted using a Computerized Speech Laboratory and custom software. Dynamic features were measured with convergent methods (visual judgment and quantitative modeling of f0 contour shape). Summary features were f0 mean, standard deviation, and duration. Dynamic and summary features both individually and in combination statistically discriminated between each of the pragmatic categories. Classification rates were 69% and 62% in initial and cross-validation DFAs, respectively.

  11. New Features Using Robust MVDR Spectrum of Filtered Autocorrelation Sequence for Robust Speech Recognition

    PubMed Central

    Gazor, Saeed

    2013-01-01

    This paper presents a novel noise-robust feature extraction method for speech recognition using the robust perceptual minimum variance distortionless response (MVDR) spectrum of temporally filtered autocorrelation sequence. The perceptual MVDR spectrum of the filtered short-time autocorrelation sequence can reduce the effects of residue of the nonstationary additive noise which remains after filtering the autocorrelation. To achieve a more robust front-end, we also modify the robust distortionless constraint of the MVDR spectral estimation method via revised weighting of the subband power spectrum values based on the sub-band signal to noise ratios (SNRs), which adjusts it to the new proposed approach. This new function allows the components of the input signal at the frequencies least affected by noise to pass with larger weights and attenuates more effectively the noisy and undesired components. This modification results in reduction of the noise residuals of the estimated spectrum from the filtered autocorrelation sequence, thereby leading to a more robust algorithm. Our proposed method, when evaluated on Aurora 2 task for recognition purposes, outperformed all Mel frequency cepstral coefficients (MFCC) as the baseline, relative autocorrelation sequence MFCC (RAS-MFCC), and the MVDR-based features in several different noisy conditions. PMID:24501584

  12. Acoustic Analyses of Speech Sounds and Rhythms in Japanese- and English-Learning Infants

    PubMed Central

    Yamashita, Yuko; Nakajima, Yoshitaka; Ueda, Kazuo; Shimada, Yohko; Hirsh, David; Seno, Takeharu; Smith, Benjamin Alexander

    2013-01-01

    The purpose of this study was to explore developmental changes, in terms of spectral fluctuations and temporal periodicity with Japanese- and English-learning infants. Three age groups (15, 20, and 24 months) were selected, because infants diversify phonetic inventories with age. Natural speech of the infants was recorded. We utilized a critical-band-filter bank, which simulated the frequency resolution in adults’ auditory periphery. First, the correlations between the power fluctuations of the critical-band outputs represented by factor analysis were observed in order to see how the critical bands should be connected to each other, if a listener is to differentiate sounds in infants’ speech. In the following analysis, we analyzed the temporal fluctuations of factor scores by calculating autocorrelations. The present analysis identified three factors as had been observed in adult speech at 24 months of age in both linguistic environments. These three factors were shifted to a higher frequency range corresponding to the smaller vocal tract size of the infants. The results suggest that the vocal tract structures of the infants had developed to become adult-like configuration by 24 months of age in both language environments. The amount of utterances with periodic nature of shorter time increased with age in both environments. This trend was clearer in the Japanese environment. PMID:23450824

  13. Depression Diagnoses and Fundamental Frequency-Based Acoustic Cues in Maternal Infant-Directed Speech

    ERIC Educational Resources Information Center

    Porritt, Laura L.; Zinser, Michael C.; Bachorowski, Jo-Anne; Kaplan, Peter S.

    2014-01-01

    F[subscript 0]-based acoustic measures were extracted from a brief, sentence-final target word spoken during structured play interactions between mothers and their 3- to 14-month-old infants and were analyzed based on demographic variables and DSM-IV Axis-I clinical diagnoses and their common modifiers. F[subscript 0] range (?F[subscript 0]) was…

  14. Linguistic Emphasis in Maternal Speech to Preschool Language Learners with Language Impairments: An Acoustical Perspective.

    ERIC Educational Resources Information Center

    Scheffel, Debora L.; Ingrisano, Dennis R-S

    2000-01-01

    The nature of linguistic emphasis was studied from audio recordings of 29 mother-child dyads. Nineteen dyads involved mothers interacting with their 4-year-olds who evidenced language impairments. Approximately 84 percent of children evidencing language impairments could be so classified based on the acoustic variables associated with maternal use…

  15. Intelligibility of Telephone Speech for the Hearing Impaired When Various Microphones Are Used for Acoustic Coupling.

    ERIC Educational Resources Information Center

    Janota, Claus P.; Janota, Jeanette Olach

    1991-01-01

    Various candidate microphones were evaluated for acoustic coupling of hearing aids to a telephone receiver. Results from testing by 9 hearing-impaired adults found comparable listening performance with a pressure gradient microphone at a 10 decibel higher level of interfering noise than with a normal pressure-sensitive microphone. (Author/PB)

  16. Speech, Speech!

    ERIC Educational Resources Information Center

    McComb, Gordon

    1982-01-01

    Discussion focuses on the nature of computer-generated speech and voice synthesis today. State-of-the-art devices for home computers are called text-to-speech (TTS) systems. Details about the operation and use of TTS synthesizers are provided, and the time saving in programing over previous methods is emphasized. (MP)

  17. Music-induced emotions can be predicted from a combination of brain activity and acoustic features.

    PubMed

    Daly, Ian; Williams, Duncan; Hallowell, James; Hwang, Faustina; Kirke, Alexis; Malik, Asad; Weaver, James; Miranda, Eduardo; Nasuto, Slawomir J

    2015-12-01

    It is widely acknowledged that music can communicate and induce a wide range of emotions in the listener. However, music is a highly-complex audio signal composed of a wide range of complex time- and frequency-varying components. Additionally, music-induced emotions are known to differ greatly between listeners. Therefore, it is not immediately clear what emotions will be induced in a given individual by a piece of music. We attempt to predict the music-induced emotional response in a listener by measuring the activity in the listeners electroencephalogram (EEG). We combine these measures with acoustic descriptors of the music, an approach that allows us to consider music as a complex set of time-varying acoustic features, independently of any specific music theory. Regression models are found which allow us to predict the music-induced emotions of our participants with a correlation between the actual and predicted responses of up to r=0.234,p<0.001. This regression fit suggests that over 20% of the variance of the participant's music induced emotions can be predicted by their neural activity and the properties of the music. Given the large amount of noise, non-stationarity, and non-linearity in both EEG and music, this is an encouraging result. Additionally, the combination of measures of brain activity and acoustic features describing the music played to our participants allows us to predict music-induced emotions with significantly higher accuracies than either feature type alone (p<0.01).

  18. Acoustic Correlates of Emphatic Stress in Central Catalan

    ERIC Educational Resources Information Center

    Nadeu, Marianna; Hualde, Jose Ignacio

    2012-01-01

    A common feature of public speech in Catalan is the placement of prominence on lexically unstressed syllables ("emphatic stress"). This paper presents an acoustic study of radio speech data. Instances of emphatic stress were perceptually identified. Within-word comparison between vowels with emphatic stress and vowels with primary lexical stress…

  19. Requirements for the evaluation of computational speech segregation systems.

    PubMed

    May, Tobias; Dau, Torsten

    2014-12-01

    Recent studies on computational speech segregation reported improved speech intelligibility in noise when estimating and applying an ideal binary mask with supervised learning algorithms. However, an important requirement for such systems in technical applications is their robustness to acoustic conditions not considered during training. This study demonstrates that the spectro-temporal noise variations that occur during training and testing determine the achievable segregation performance. In particular, such variations strongly affect the identification of acoustical features in the system associated with perceptual attributes in speech segregation. The results could help establish a framework for a systematic evaluation of future segregation systems.

  20. Acoustic features of male baboon loud calls: Influences of context, age, and individuality

    NASA Astrophysics Data System (ADS)

    Fischer, Julia; Hammerschmidt, Kurt; Cheney, Dorothy L.; Seyfarth, Robert M.

    2002-03-01

    The acoustic structure of loud calls (``wahoos'') recorded from free-ranging male baboons (Papio cynocephalus ursinus) in the Moremi Game Reserve, Botswana, was examined for differences between and within contexts, using calls given in response to predators (alarm wahoos), during male contests (contest wahoos), and when a male had become separated from the group (contact wahoos). Calls were recorded from adolescent, subadult, and adult males. In addition, male alarm calls were compared with those recorded from females. Despite their superficial acoustic similarity, the analysis revealed a number of significant differences between alarm, contest, and contact wahoos. Contest wahoos are given at a much higher rate, exhibit lower frequency characteristics, have a longer ``hoo'' duration, and a relatively louder ``hoo'' portion than alarm wahoos. Contact wahoos are acoustically similar to contest wahoos, but are given at a much lower rate. Both alarm and contest wahoos also exhibit significant differences among individuals. Some of the acoustic features that vary in relation to age and sex presumably reflect differences in body size, whereas others are possibly related to male stamina and endurance. The finding that calls serving markedly different functions constitute variants of the same general call type suggests that the vocal production in nonhuman primates is evolutionarily constrained.

  1. Do acoustic features of lion, Panthera leo, roars reflect sex and male condition?

    PubMed

    Pfefferle, Dana; West, Peyton M; Grinnell, Jon; Packer, Craig; Fischer, Julia

    2007-06-01

    Long distance calls function to regulate intergroup spacing, attract mating partners, and/or repel competitors. Therefore, they may not only provide information about the sex (if both sexes are calling) but also about the condition of the caller. This paper provides a description of the acoustic features of roars recorded from 18 male and 6 female lions (Panthera leo) living in the Serengeti National park, Tanzania. After analyzing whether these roars differ between the sexes, tests whether male roars may function as indicators of their fighting ability or condition were conducted. Therefore, call characteristics were tested for relation to anatomical features as size, mane color, or mane length. Call characteristics included acoustic parameters that previously had been implied as indicators of size and fighting ability, e.g., call length, fundamental frequency, and peak frequency. The analysis revealed differences in relation to sex, which were entirely explained by variation in body size. No evidence that acoustic variables were related to male condition was found, indicating that sexual selection might only be a weak force modulating the lion's roar. Instead, lion roars may have mainly been selected to effectively advertise territorial boundaries.

  2. A Robust Approach For Acoustic Noise Suppression In Speech Using ANFIS

    NASA Astrophysics Data System (ADS)

    Martinek, Radek; Kelnar, Michal; Vanus, Jan; Bilik, Petr; Zidek, Jan

    2015-11-01

    The authors of this article deals with the implementation of a combination of techniques of the fuzzy system and artificial intelligence in the application area of non-linear noise and interference suppression. This structure used is called an Adaptive Neuro Fuzzy Inference System (ANFIS). This system finds practical use mainly in audio telephone (mobile) communication in a noisy environment (transport, production halls, sports matches, etc). Experimental methods based on the two-input adaptive noise cancellation concept was clearly outlined. Within the experiments carried out, the authors created, based on the ANFIS structure, a comprehensive system for adaptive suppression of unwanted background interference that occurs in audio communication and degrades the audio signal. The system designed has been tested on real voice signals. This article presents the investigation and comparison amongst three distinct approaches to noise cancellation in speech; they are LMS (least mean squares) and RLS (recursive least squares) adaptive filtering and ANFIS. A careful review of literatures indicated the importance of non-linear adaptive algorithms over linear ones in noise cancellation. It was concluded that the ANFIS approach had the overall best performance as it efficiently cancelled noise even in highly noise-degraded speech. Results were drawn from the successful experimentation, subjective-based tests were used to analyse their comparative performance while objective tests were used to validate them. Implementation of algorithms was experimentally carried out in Matlab to justify the claims and determine their relative performances.

  3. Cortical speech and non-speech discrimination in relation to cognitive measures in preschool children.

    PubMed

    Kuuluvainen, Soila; Alku, Paavo; Makkonen, Tommi; Lipsanen, Jari; Kujala, Teija

    2016-03-01

    Effective speech sound discrimination at preschool age is known to be a prerequisite for the development of language skills and later literacy acquisition. However, the speech specificity of cortical discrimination skills in small children is currently not known, as previous research has either studied speech functions without comparison with non-speech sounds, or used much simpler sounds such as harmonic or sinusoidal tones as control stimuli. We investigated the cortical discrimination of five syllable features (consonant, vowel, vowel duration, fundamental frequency, and intensity), covering both segmental and prosodic phonetic changes, and their acoustically matched non-speech counterparts in 63 6-year-old typically developed children, by using a multi-feature mismatch negativity (MMN) paradigm. Each of the five investigated features elicited a unique pattern of differentiating negativities: an early differentiating negativity, MMN, and a late differentiating negativity. All five studied features showed speech-related enhancement of at least one of these responses, suggesting experience-related neural commitment in both phonetic and prosodic speech processing. In addition, the cognitive performance and language skills of the children were tested extensively. The speech-related neural enhancement was positively associated with the level of performance in several neurocognitive tasks, indicating a relationship between successful establishment of cortical memory traces for speech and enhanced cognitive functioning. The results contribute to the understanding of typical developmental trajectories of linguistic vs. non-linguistic auditory skills, and provide a reference for future studies investigating deficits in language-related disorders at preschool age.

  4. Cortical speech and non-speech discrimination in relation to cognitive measures in preschool children.

    PubMed

    Kuuluvainen, Soila; Alku, Paavo; Makkonen, Tommi; Lipsanen, Jari; Kujala, Teija

    2016-03-01

    Effective speech sound discrimination at preschool age is known to be a prerequisite for the development of language skills and later literacy acquisition. However, the speech specificity of cortical discrimination skills in small children is currently not known, as previous research has either studied speech functions without comparison with non-speech sounds, or used much simpler sounds such as harmonic or sinusoidal tones as control stimuli. We investigated the cortical discrimination of five syllable features (consonant, vowel, vowel duration, fundamental frequency, and intensity), covering both segmental and prosodic phonetic changes, and their acoustically matched non-speech counterparts in 63 6-year-old typically developed children, by using a multi-feature mismatch negativity (MMN) paradigm. Each of the five investigated features elicited a unique pattern of differentiating negativities: an early differentiating negativity, MMN, and a late differentiating negativity. All five studied features showed speech-related enhancement of at least one of these responses, suggesting experience-related neural commitment in both phonetic and prosodic speech processing. In addition, the cognitive performance and language skills of the children were tested extensively. The speech-related neural enhancement was positively associated with the level of performance in several neurocognitive tasks, indicating a relationship between successful establishment of cortical memory traces for speech and enhanced cognitive functioning. The results contribute to the understanding of typical developmental trajectories of linguistic vs. non-linguistic auditory skills, and provide a reference for future studies investigating deficits in language-related disorders at preschool age. PMID:26647120

  5. Predicting Speech Intelligibility with a Multiple Speech Subsystems Approach in Children with Cerebral Palsy

    ERIC Educational Resources Information Center

    Lee, Jimin; Hustad, Katherine C.; Weismer, Gary

    2014-01-01

    Purpose: Speech acoustic characteristics of children with cerebral palsy (CP) were examined with a multiple speech subsystems approach; speech intelligibility was evaluated using a prediction model in which acoustic measures were selected to represent three speech subsystems. Method: Nine acoustic variables reflecting different subsystems, and…

  6. Scanning Acoustic Microscopy for Characterization of Coatings and Near-Surface Features of Ceramics

    SciTech Connect

    Qu, Jun; Blau, Peter Julian

    2006-01-01

    Scanning Acoustic Microscopy (SAcM) has been widely used for non-destructive evaluation (NDE) in various fields such as material characterization, electronics, and biomedicine. SAcM uses high-frequency acoustic waves (60 MHz to 2.0 GHz) providing much higher resolution (up to 0.5 {micro}m) compared to conventional ultrasonic NDE, which is typically about 500 {micro}m. SAcM offers the ability to non-destructively image subsurface features and visualize the variations in elastic properties. These attributes make SAcM a valuable tool for characterizing near-surface material properties and detecting fine-scale flaws. This paper presents some recent applications of SAcM in detecting subsurface damage, assessing coatings, and visualizing residual stress for ceramic and semiconductor materials.

  7. Prenatal features of Pena-Shokeir sequence with atypical response to acoustic stimulation.

    PubMed

    Pittyanont, Sirida; Jatavan, Phudit; Suwansirikul, Songkiat; Tongsong, Theera

    2016-09-01

    A fetal sonographic screening examination performed at 23 weeks showed polyhydramnios, micrognathia, fixed postures of all long bones, but no movement and no breathing. The fetus showed fetal heart rate acceleration but no movement when acoustic stimulation was applied with artificial larynx. All these findings persisted on serial examinations. The neonate was stillborn at 37 weeks and a final diagnosis of Pena-Shokeir sequence was made. In addition to typical sonographic features of Pena-Shokeir sequence, fetal heart rate accelerations with no movement in response to acoustic stimulation suggests that peripheral myopathy may possibly play an important role in the pathogenesis of the disease. © 2016 Wiley Periodicals, Inc. J Clin Ultrasound 44:459-462, 2016. PMID:27312123

  8. Is the linguistic content of speech less salient than its perceptual features in autism?

    PubMed

    Järvinen-Pasley, Anna; Pasley, John; Heaton, Pamela

    2008-02-01

    Open-ended tasks are rarely used to investigate cognition in autism. No known studies have directly examined whether increased attention to the perceptual level of speech in autism might contribute to a reduced tendency to process language meaningfully. The present study investigated linguistic versus perceptual speech processing preferences. Children with autism and controls were tested on a quasi-open-format paradigm, in which speech stimuli contained competing linguistic and perceptual information, and could be processed at either level. Relative to controls, children with autism exhibited superior perceptual processing of speech. However, whilst their tendency to preferentially process linguistic rather than perceptual information was weaker than that of controls, it was nevertheless their primary processing mode. Implications for language acquisition in autism are discussed.

  9. [Quantification and improvement of speech transmission performance using headphones in acoustic stimulated functional magnetic resonance imaging].

    PubMed

    Yamamura, Ken ichiro; Takatsu, Yasuo; Miyati, Tosiaki; Kimura, Tetsuya

    2014-10-01

    Functional magnetic resonance imaging (fMRI) has made a major contribution to the understanding of higher brain function, but fMRI with auditory stimulation, used in the planning of brain tumor surgery, is often inaccurate because there is a risk that the sounds used in the trial may not be correctly transmitted to the subjects due to acoustic noise. This prompted us to devise a method of digitizing sound transmission ability from the accuracy rate of 67 syllables, classified into three types. We evaluated this with and without acoustic noise during imaging. We also improved the structure of the headphones and compared their sound transmission ability with that of conventional headphones attached to an MRI device (a GE Signa HDxt 3.0 T). We calculated and compared the sound transmission ability of the conventional headphones with that of the improved model. The 95 percent upper confidence limit (UCL) was used as the threshold for accuracy rate of hearing for both headphone models. There was a statistically significant difference between the conventional model and the improved model during imaging (p < 0.01). The rate of accuracy of the improved model was 16 percent higher. 29 and 22 syllables were accurate at a 95% UCL in the improved model and the conventional model, respectively. This study revealed the evaluation system used in this study to be useful for correctly identifying syllables during fMRI.

  10. Effects of speech style, room acoustics, and vocal fatigue on vocal effort.

    PubMed

    Bottalico, Pasquale; Graetzer, Simone; Hunter, Eric J

    2016-05-01

    Vocal effort is a physiological measure that accounts for changes in voice production as vocal loading increases. It has been quantified in terms of sound pressure level (SPL). This study investigates how vocal effort is affected by speaking style, room acoustics, and short-term vocal fatigue. Twenty subjects were recorded while reading a text at normal and loud volumes in anechoic, semi-reverberant, and reverberant rooms in the presence of classroom babble noise. The acoustics in each environment were modified by creating a strong first reflection in the talker position. After each task, the subjects answered questions addressing their perception of the vocal effort, comfort, control, and clarity of their own voice. Variation in SPL for each subject was measured per task. It was found that SPL and self-reported effort increased in the loud style and decreased when the reflective panels were present and when reverberation time increased. Self-reported comfort and control decreased in the loud style, while self-reported clarity increased when panels were present. The lowest magnitude of vocal fatigue was experienced in the semi-reverberant room. The results indicate that early reflections may be used to reduce vocal effort without modifying reverberation time.

  11. Nonlinear spectro-temporal features based on a cochlear model for automatic speech recognition in a noisy situation.

    PubMed

    Choi, Yong-Sun; Lee, Soo-Young

    2013-09-01

    A nonlinear speech feature extraction algorithm was developed by modeling human cochlear functions, and demonstrated as a noise-robust front-end for speech recognition systems. The algorithm was based on a model of the Organ of Corti in the human cochlea with such features as such as basilar membrane (BM), outer hair cells (OHCs), and inner hair cells (IHCs). Frequency-dependent nonlinear compression and amplification of OHCs were modeled by lateral inhibition to enhance spectral contrasts. In particular, the compression coefficients had frequency dependency based on the psychoacoustic evidence. Spectral subtraction and temporal adaptation were applied in the time-frame domain. With long-term and short-term adaptation characteristics, these factors remove stationary or slowly varying components and amplify the temporal changes such as onset or offset. The proposed features were evaluated with a noisy speech database and showed better performance than the baseline methods such as mel-frequency cepstral coefficients (MFCCs) and RASTA-PLP in unknown noisy conditions. PMID:23558292

  12. Speech Modification by a Deaf Child through Dynamic Orometric Modeling and Feedback.

    ERIC Educational Resources Information Center

    Fletcher, Samuel G.; Hasegawa, Akira

    1983-01-01

    A three and one-half-year-old profoundly deaf girl, whose physiologic, acoustic, and phonetic data indicated poor speech production, rapidly learned goal articulation gestures (positional and timing features of speech) after visual articulatory modeling and feedbck on tongue position with a microprocessor based instrument and video display.…

  13. Acoustic evidence for the development of gestural coordination in the speech of 2-year-olds: a longitudinal study.

    PubMed

    Goodell, E W; Studdert-Kennedy, M

    1993-08-01

    Studies of child phonology have often assumed that young children first master a repertoire of phonemes and then build their lexicon by forming combinations of these abstract, contrastive units. However, evidence from children's systematic errors suggests that children first build a repertoire of words as integral sequences of gestures and then gradually differentiate these sequences into their gestural and segmental components. Recently, experimental support for this position has been found in the acoustic records of the speech of 3-, 5-, and 7-year-old children, suggesting that even in older children some phonemes have not yet fully segregated as units of gestural organization and control. The present longitudinal study extends this work to younger children (22- and 32-month-olds). Results demonstrate clear differences in the duration and coordination of gestures between children and adults, and a clear shift toward the patterns of adult speakers during roughly the third year of life. Details of the child-adult differences and developmental changes vary from one aspect of an utterance to another. PMID:8377484

  14. Is infant-directed speech prosody a result of the vocal expression of emotion?

    PubMed

    Trainor, L J; Austin, C M; Desjardins, R N

    2000-05-01

    Many studies have found that infant-directed (ID) speech has higher pitch, has more exaggerated pitch contours, has a larger pitch range, has a slower tempo, and is more rhythmic than typical adult-directed (AD) speech. We show that the ID speech style reflects free vocal expression of emotion to infants, in comparison with more inhibited expression of emotion in typical AD speech. When AD speech does express emotion, the same acoustic features are used as in ID speech. We recorded ID and AD samples of speech expressing love-comfort, fear, and surprise. The emotions were equally discriminable in the ID and AD samples. Acoustic analyses showed few differences between the ID and AD samples, but robust differences across the emotions. We conclude that ID prosody itself is not special. What is special is the widespread expression of emotion to infants in comparison with the more inhibited expression of emotion in typical adult interactions.

  15. Subjective evaluation of speech and noise in learning environments in the realm of classroom acoustics: Results from laboratory and field experiments

    NASA Astrophysics Data System (ADS)

    Meis, Markus; Nocke, Christian; Hofmann, Simone; Becker, Bernhard

    2005-04-01

    The impact of different acoustical conditions in learning environments on noise annoyance and the evaluation of speech quality were tested in a series of three experiments. In Experiment 1 (n=79) the auralization of seven classrooms with reverberation times from 0.55 to 3.21 s [average between 250 Hz to 2 kHz] served to develop a Semantic Differential, evaluating a simulated teacher's voice. Four factors were found: acoustical comfort, roughness, sharpness, and loudness. In Experiment 2, the effects of two classroom renovations were examined from a holistic perspective. The rooms were treated acoustically with acoustic ceilings (RT=0.5 s [250 Hz-2 kHz]) and muffling floor materials as well as non-acoustically with a new lighting system and color design. The results indicate that pupils (n=61) in renovated classrooms judged the simulated voice more positively, were less annoyed from the noise in classrooms, and were more motivated to participate in the lessons. In Experiment 3 the sound environments from six different lecture rooms (RT=0.8 to 1.39 s [250 Hz-2 kHz]) in two Universities of Oldenburg were evaluated by 321 students during the lectures. Evidence found supports the assumption that acoustical comfort in rooms is dependent on frequency for rooms with higher reverberation times.

  16. Music-induced emotions can be predicted from a combination of brain activity and acoustic features.

    PubMed

    Daly, Ian; Williams, Duncan; Hallowell, James; Hwang, Faustina; Kirke, Alexis; Malik, Asad; Weaver, James; Miranda, Eduardo; Nasuto, Slawomir J

    2015-12-01

    It is widely acknowledged that music can communicate and induce a wide range of emotions in the listener. However, music is a highly-complex audio signal composed of a wide range of complex time- and frequency-varying components. Additionally, music-induced emotions are known to differ greatly between listeners. Therefore, it is not immediately clear what emotions will be induced in a given individual by a piece of music. We attempt to predict the music-induced emotional response in a listener by measuring the activity in the listeners electroencephalogram (EEG). We combine these measures with acoustic descriptors of the music, an approach that allows us to consider music as a complex set of time-varying acoustic features, independently of any specific music theory. Regression models are found which allow us to predict the music-induced emotions of our participants with a correlation between the actual and predicted responses of up to r=0.234,p<0.001. This regression fit suggests that over 20% of the variance of the participant's music induced emotions can be predicted by their neural activity and the properties of the music. Given the large amount of noise, non-stationarity, and non-linearity in both EEG and music, this is an encouraging result. Additionally, the combination of measures of brain activity and acoustic features describing the music played to our participants allows us to predict music-induced emotions with significantly higher accuracies than either feature type alone (p<0.01). PMID:26544602

  17. Improving robustness of speech recognition systems

    NASA Astrophysics Data System (ADS)

    Mitra, Vikramjit

    2010-11-01

    speech databases: X-ray microbeam and Aurora-2 were annotated, where the former was used to train a TV-estimator and the latter was used to train a Dynamic Bayesian Network (DBN) based ASR architecture. The DBN architecture used two sets of observation: (a) acoustic features in the form of mel-frequency cepstral coefficients (MFCCs) and (b) TVs (estimated from the acoustic speech signal). In this setup the articulatory gestures were modeled as hidden random variables, hence eliminating the necessity for explicit gesture recognition. Word recognition results using the DBN architecture indicate that articulatory representations not only can help to account for coarticulatory variations but can also significantly improve the noise robustness of ASR system.

  18. Asymmetry in infants' selective attention to facial features during visual processing of infant-directed speech

    PubMed Central

    Smith, Nicholas A.; Gibilisco, Colleen R.; Meisinger, Rachel E.; Hankey, Maren

    2013-01-01

    Two experiments used eye tracking to examine how infant and adult observers distribute their eye gaze on videos of a mother producing infant- and adult-directed speech. Both groups showed greater attention to the eyes than to the nose and mouth, as well as an asymmetrical focus on the talker's right eye for infant-directed speech stimuli. Observers continued to look more at the talker's apparent right eye when the video stimuli were mirror flipped, suggesting that the asymmetry reflects a perceptual processing bias rather than a stimulus artifact, which may be related to cerebral lateralization of emotion processing. PMID:24062705

  19. Hemispheric Asymmetries in Speech Perception: Sense, Nonsense and Modulations

    PubMed Central

    Rosen, Stuart; Wise, Richard J. S.; Chadha, Shabneet; Conway, Eleanor-Jayne; Scott, Sophie K.

    2011-01-01

    Background The well-established left hemisphere specialisation for language processing has long been claimed to be based on a low-level auditory specialization for specific acoustic features in speech, particularly regarding ‘rapid temporal processing’. Methodology A novel analysis/synthesis technique was used to construct a variety of sounds based on simple sentences which could be manipulated in spectro-temporal complexity, and whether they were intelligible or not. All sounds consisted of two noise-excited spectral prominences (based on the lower two formants in the original speech) which could be static or varying in frequency and/or amplitude independently. Dynamically varying both acoustic features based on the same sentence led to intelligible speech but when either or both acoustic features were static, the stimuli were not intelligible. Using the frequency dynamics from one sentence with the amplitude dynamics of another led to unintelligible sounds of comparable spectro-temporal complexity to the intelligible ones. Positron emission tomography (PET) was used to compare which brain regions were active when participants listened to the different sounds. Conclusions Neural activity to spectral and amplitude modulations sufficient to support speech intelligibility (without actually being intelligible) was seen bilaterally, with a right temporal lobe dominance. A left dominant response was seen only to intelligible sounds. It thus appears that the left hemisphere specialisation for speech is based on the linguistic properties of utterances, not on particular acoustic features. PMID:21980349

  20. Explicit authenticity and stimulus features interact to modulate BOLD response induced by emotional speech.

    PubMed

    Drolet, Matthis; Schubotz, Ricarda I; Fischer, Julia

    2013-06-01

    Context has been found to have a profound effect on the recognition of social stimuli and correlated brain activation. The present study was designed to determine whether knowledge about emotional authenticity influences emotion recognition expressed through speech intonation. Participants classified emotionally expressive speech in an fMRI experimental design as sad, happy, angry, or fearful. For some trials, stimuli were cued as either authentic or play-acted in order to manipulate participant top-down belief about authenticity, and these labels were presented both congruently and incongruently to the emotional authenticity of the stimulus. Contrasting authentic versus play-acted stimuli during uncued trials indicated that play-acted stimuli spontaneously up-regulate activity in the auditory cortex and regions associated with emotional speech processing. In addition, a clear interaction effect of cue and stimulus authenticity showed up-regulation in the posterior superior temporal sulcus and the anterior cingulate cortex, indicating that cueing had an impact on the perception of authenticity. In particular, when a cue indicating an authentic stimulus was followed by a play-acted stimulus, additional activation occurred in the temporoparietal junction, probably pointing to increased load on perspective taking in such trials. While actual authenticity has a significant impact on brain activation, individual belief about stimulus authenticity can additionally modulate the brain response to differences in emotionally expressive speech.

  1. A Study of Additive Noise Model for Robust Speech Recognition

    NASA Astrophysics Data System (ADS)

    Awatade, Manisha H.

    2011-12-01

    A model of how speech amplitude spectra are affected by additive noise is studied. Acoustic features are extracted based on the noise robust parts of speech spectra without losing discriminative information. An existing two non-linear processing methods, harmonic demodulation and spectral peak-to-valley ratio locking, are designed to minimize mismatch between clean and noisy speech features. Previously studied methods, including peak isolation [1], do not require noise estimation and are effective in dealing with both stationary and non-stationary noise.

  2. Segregation of unvoiced speech from nonspeech interference.

    PubMed

    Hu, Guoning; Wang, DeLiang

    2008-08-01

    Monaural speech segregation has proven to be extremely challenging. While efforts in computational auditory scene analysis have led to considerable progress in voiced speech segregation, little attention has been given to unvoiced speech, which lacks harmonic structure and has weaker energy, hence more susceptible to interference. This study proposes a new approach to the problem of segregating unvoiced speech from nonspeech interference. The study first addresses the question of how much speech is unvoiced. The segregation process occurs in two stages: Segmentation and grouping. In segmentation, the proposed model decomposes an input mixture into contiguous time-frequency segments by a multiscale analysis of event onsets and offsets. Grouping of unvoiced segments is based on Bayesian classification of acoustic-phonetic features. The proposed model for unvoiced speech segregation joins an existing model for voiced speech segregation to produce an overall system that can deal with both voiced and unvoiced speech. Systematic evaluation shows that the proposed system extracts a majority of unvoiced speech without including much interference, and it performs substantially better than spectral subtraction. PMID:18681616

  3. Neural Oscillations Carry Speech Rhythm through to Comprehension

    PubMed Central

    Peelle, Jonathan E.; Davis, Matthew H.

    2012-01-01

    A key feature of speech is the quasi-regular rhythmic information contained in its slow amplitude modulations. In this article we review the information conveyed by speech rhythm, and the role of ongoing brain oscillations in listeners’ processing of this content. Our starting point is the fact that speech is inherently temporal, and that rhythmic information conveyed by the amplitude envelope contains important markers for place and manner of articulation, segmental information, and speech rate. Behavioral studies demonstrate that amplitude envelope information is relied upon by listeners and plays a key role in speech intelligibility. Extending behavioral findings, data from neuroimaging – particularly electroencephalography (EEG) and magnetoencephalography (MEG) – point to phase locking by ongoing cortical oscillations to low-frequency information (~4–8 Hz) in the speech envelope. This phase modulation effectively encodes a prediction of when important events (such as stressed syllables) are likely to occur, and acts to increase sensitivity to these relevant acoustic cues. We suggest a framework through which such neural entrainment to speech rhythm can explain effects of speech rate on word and segment perception (i.e., that the perception of phonemes and words in connected speech is influenced by preceding speech rate). Neuroanatomically, acoustic amplitude modulations are processed largely bilaterally in auditory cortex, with intelligible speech resulting in differential recruitment of left-hemisphere regions. Notable among these is lateral anterior temporal cortex, which we propose functions in a domain-general fashion to support ongoing memory and integration of meaningful input. Together, the reviewed evidence suggests that low-frequency oscillations in the acoustic speech signal form the foundation of a rhythmic hierarchy supporting spoken language, mirrored by phase-locked oscillations in the human brain. PMID:22973251

  4. Comparison of spatial frequency domain features for the detection of side attack explosive ballistics in synthetic aperture acoustics

    NASA Astrophysics Data System (ADS)

    Dowdy, Josh; Anderson, Derek T.; Luke, Robert H.; Ball, John E.; Keller, James M.; Havens, Timothy C.

    2016-05-01

    Explosive hazards in current and former conflict zones are a threat to both military and civilian personnel. As a result, much effort has been dedicated to identifying automated algorithms and systems to detect these threats. However, robust detection is complicated due to factors like the varied composition and anatomy of such hazards. In order to solve this challenge, a number of platforms (vehicle-based, handheld, etc.) and sensors (infrared, ground penetrating radar, acoustics, etc.) are being explored. In this article, we investigate the detection of side attack explosive ballistics via a vehicle-mounted acoustic sensor. In particular, we explore three acoustic features, one in the time domain and two on synthetic aperture acoustic (SAA) beamformed imagery. The idea is to exploit the varying acoustic frequency profile of a target due to its unique geometry and material composition with respect to different viewing angles. The first two features build their angle specific frequency information using a highly constrained subset of the signal data and the last feature builds its frequency profile using all available signal data for a given region of interest (centered on the candidate target location). Performance is assessed in the context of receiver operating characteristic (ROC) curves on cross-validation experiments for data collected at a U.S. Army test site on different days with multiple target types and clutter. Our preliminary results are encouraging and indicate that the top performing feature is the unrolled two dimensional discrete Fourier transform (DFT) of SAA beamformed imagery.

  5. Auditory emotion recognition impairments in Schizophrenia: Relationship to acoustic features and cognition

    PubMed Central

    Gold, Rinat; Butler, Pamela; Revheim, Nadine; Leitman, David; Hansen, John A.; Gur, Ruben; Kantrowitz, Joshua T.; Laukka, Petri; Juslin, Patrik N.; Silipo, Gail S.; Javitt, Daniel C.

    2013-01-01

    Objective Schizophrenia is associated with deficits in ability to perceive emotion based upon tone of voice. The basis for this deficit, however, remains unclear and assessment batteries remain limited. We evaluated performance in schizophrenia on a novel voice emotion recognition battery with well characterized physical features, relative to impairments in more general emotional and cognitive function. Methods We studied in a primary sample of 92 patients relative to 73 controls. Stimuli were characterized according to both intended emotion and physical features (e.g., pitch, intensity) that contributed to the emotional percept. Parallel measures of visual emotion recognition, pitch perception, general cognition, and overall outcome were obtained. More limited measures were obtained in an independent replication sample of 36 patients, 31 age-matched controls, and 188 general comparison subjects. Results Patients showed significant, large effect size deficits in voice emotion recognition (F=25.4, p<.00001, d=1.1), and were preferentially impaired in recognition of emotion based upon pitch-, but not intensity-features (group X feature interaction: F=7.79, p=.006). Emotion recognition deficits were significantly correlated with pitch perception impairments both across (r=56, p<.0001) and within (r=.47, p<.0001) group. Path analysis showed both sensory-specific and general cognitive contributions to auditory emotion recognition deficits in schizophrenia. Similar patterns of results were observed in the replication sample. Conclusions The present study demonstrates impairments in auditory emotion recognition in schizophrenia relative to acoustic features of underlying stimuli. Furthermore, it provides tools and highlights the need for greater attention to physical features of stimuli used for study of social cognition in neuropsychiatric disorders. PMID:22362394

  6. Neurophysiological evidence that musical training influences the recruitment of right hemispheric homologues for speech perception.

    PubMed

    Jantzen, McNeel G; Howe, Bradley M; Jantzen, Kelly J

    2014-01-01

    Musicians have a more accurate temporal and tonal representation of auditory stimuli than their non-musician counterparts (Musacchia et al., 2007; Parbery-Clark et al., 2009a; Zendel and Alain, 2009; Kraus and Chandrasekaran, 2010). Musicians who are adept at the production and perception of music are also more sensitive to key acoustic features of speech such as voice onset timing and pitch. Together, these data suggest that musical training may enhance the processing of acoustic information for speech sounds. In the current study, we sought to provide neural evidence that musicians process speech and music in a similar way. We hypothesized that for musicians, right hemisphere areas traditionally associated with music are also engaged for the processing of speech sounds. In contrast we predicted that in non-musicians processing of speech sounds would be localized to traditional left hemisphere language areas. Speech stimuli differing in voice onset time was presented using a dichotic listening paradigm. Subjects either indicated aural location for a specified speech sound or identified a specific speech sound from a directed aural location. Musical training effects and organization of acoustic features were reflected by activity in source generators of the P50. This included greater activation of right middle temporal gyrus and superior temporal gyrus in musicians. The findings demonstrate recruitment of right hemisphere in musicians for discriminating speech sounds and a putative broadening of their language network. Musicians appear to have an increased sensitivity to acoustic features and enhanced selective attention to temporal features of speech that is facilitated by musical training and supported, in part, by right hemisphere homologues of established speech processing regions of the brain. PMID:24624107

  7. Formant-frequency variation and informational masking of speech by extraneous formants: evidence against dynamic and speech-specific acoustical constraints.

    PubMed

    Roberts, Brian; Summers, Robert J; Bailey, Peter J

    2014-08-01

    How speech is separated perceptually from other speech remains poorly understood. Recent research indicates that the ability of an extraneous formant to impair intelligibility depends on the variation of its frequency contour. This study explored the effects of manipulating the depth and pattern of that variation. Three formants (F1+F2+F3) constituting synthetic analogues of natural sentences were distributed across the 2 ears, together with a competitor for F2 (F2C) that listeners must reject to optimize recognition (left = F1+F2C; right = F2+F3). The frequency contours of F1 - F3 were each scaled to 50% of their natural depth, with little effect on intelligibility. Competitors were created either by inverting the frequency contour of F2 about its geometric mean (a plausibly speech-like pattern) or using a regular and arbitrary frequency contour (triangle wave, not plausibly speech-like) matched to the average rate and depth of variation for the inverted F2C. Adding a competitor typically reduced intelligibility; this reduction depended on the depth of F2C variation, being greatest for 100%-depth, intermediate for 50%-depth, and least for 0%-depth (constant) F2Cs. This suggests that competitor impact depends on overall depth of frequency variation, not depth relative to that for the target formants. The absence of tuning (i.e., no minimum in intelligibility for the 50% case) suggests that the ability to reject an extraneous formant does not depend on similarity in the depth of formant-frequency variation. Furthermore, triangle-wave competitors were as effective as their more speech-like counterparts, suggesting that the selection of formants from the ensemble also does not depend on speech-specific constraints. PMID:24842068

  8. Perception of Sentence Stress in Speech Correlates with the Temporal Unpredictability of Prosodic Features

    ERIC Educational Resources Information Center

    Kakouros, Sofoklis; Räsänen, Okko

    2016-01-01

    Numerous studies have examined the acoustic correlates of sentential stress and its underlying linguistic functionality. However, the mechanism that connects stress cues to the listener's attentional processing has remained unclear. Also, the learnability versus innateness of stress perception has not been widely discussed. In this work, we…

  9. Early recognition of speech

    PubMed Central

    Remez, Robert E; Thomas, Emily F

    2013-01-01

    Classic research on the perception of speech sought to identify minimal acoustic correlates of each consonant and vowel. In explaining perception, this view designated momentary components of an acoustic spectrum as cues to the recognition of elementary phonemes. This conceptualization of speech perception is untenable given the findings of phonetic sensitivity to modulation independent of the acoustic and auditory form of the carrier. The empirical key is provided by studies of the perceptual organization of speech, a low-level integrative function that finds and follows the sensory effects of speech amid concurrent events. These projects have shown that the perceptual organization of speech is keyed to modulation; fast; unlearned; nonsymbolic; indifferent to short-term auditory properties; and organization requires attention. The ineluctably multisensory nature of speech perception also imposes conditions that distinguish language among cognitive systems. WIREs Cogn Sci 2013, 4:213–223. doi: 10.1002/wcs.1213 PMID:23926454

  10. Speech masking and cancelling and voice obscuration

    DOEpatents

    Holzrichter, John F.

    2013-09-10

    A non-acoustic sensor is used to measure a user's speech and then broadcasts an obscuring acoustic signal diminishing the user's vocal acoustic output intensity and/or distorting the voice sounds making them unintelligible to persons nearby. The non-acoustic sensor is positioned proximate or contacting a user's neck or head skin tissue for sensing speech production information.

  11. A Technique for Estimating Intensity of Emotional Expressions and Speaking Styles in Speech Based on Multiple-Regression HSMM

    NASA Astrophysics Data System (ADS)

    Nose, Takashi; Kobayashi, Takao

    In this paper, we propose a technique for estimating the degree or intensity of emotional expressions and speaking styles appearing in speech. The key idea is based on a style control technique for speech synthesis using a multiple regression hidden semi-Markov model (MRHSMM), and the proposed technique can be viewed as the inverse of the style control. In the proposed technique, the acoustic features of spectrum, power, fundamental frequency, and duration are simultaneously modeled using the MRHSMM. We derive an algorithm for estimating explanatory variables of the MRHSMM, each of which represents the degree or intensity of emotional expressions and speaking styles appearing in acoustic features of speech, based on a maximum likelihood criterion. We show experimental results to demonstrate the ability of the proposed technique using two types of speech data, simulated emotional speech and spontaneous speech with different speaking styles. It is found that the estimated values have correlation with human perception.

  12. Acoustics

    NASA Technical Reports Server (NTRS)

    Goodman, Jerry R.; Grosveld, Ferdinand

    2007-01-01

    The acoustics environment in space operations is important to maintain at manageable levels so that the crewperson can remain safe, functional, effective, and reasonably comfortable. High acoustic levels can produce temporary or permanent hearing loss, or cause other physiological symptoms such as auditory pain, headaches, discomfort, strain in the vocal cords, or fatigue. Noise is defined as undesirable sound. Excessive noise may result in psychological effects such as irritability, inability to concentrate, decrease in productivity, annoyance, errors in judgment, and distraction. A noisy environment can also result in the inability to sleep, or sleep well. Elevated noise levels can affect the ability to communicate, understand what is being said, hear what is going on in the environment, degrade crew performance and operations, and create habitability concerns. Superfluous noise emissions can also create the inability to hear alarms or other important auditory cues such as an equipment malfunctioning. Recent space flight experience, evaluations of the requirements in crew habitable areas, and lessons learned (Goodman 2003; Allen and Goodman 2003; Pilkinton 2003; Grosveld et al. 2003) show the importance of maintaining an acceptable acoustics environment. This is best accomplished by having a high-quality set of limits/requirements early in the program, the "designing in" of acoustics in the development of hardware and systems, and by monitoring, testing and verifying the levels to ensure that they are acceptable.

  13. Design Foundations for Content-Rich Acoustic Interfaces: Investigating Audemes as Referential Non-Speech Audio Cues

    ERIC Educational Resources Information Center

    Ferati, Mexhid Adem

    2012-01-01

    To access interactive systems, blind and visually impaired users can leverage their auditory senses by using non-speech sounds. The current structure of non-speech sounds, however, is geared toward conveying user interface operations (e.g., opening a file) rather than large theme-based information (e.g., a history passage) and, thus, is ill-suited…

  14. Multilevel Analysis in Analyzing Speech Data

    ERIC Educational Resources Information Center

    Guddattu, Vasudeva; Krishna, Y.

    2011-01-01

    The speech produced by human vocal tract is a complex acoustic signal, with diverse applications in phonetics, speech synthesis, automatic speech recognition, speaker identification, communication aids, speech pathology, speech perception, machine translation, hearing research, rehabilitation and assessment of communication disorders and many…

  15. Production and perception of clear speech in Croatian and English

    NASA Astrophysics Data System (ADS)

    Smiljanić, Rajka; Bradlow, Ann R.

    2005-09-01

    Previous research has established that naturally produced English clear speech is more intelligible than English conversational speech. The major goal of this paper was to establish the presence of the clear speech effect in production and perception of a language other than English, namely Croatian. A systematic investigation of the conversational-to-clear speech transformations across languages with different phonological properties (e.g., large versus small vowel inventory) can provide a window into the interaction of general auditory-perceptual and phonological, structural factors that contribute to the high intelligibility of clear speech. The results of this study showed that naturally produced clear speech is a distinct, listener-oriented, intelligibility-enhancing mode of speech production in both languages. Furthermore, the acoustic-phonetic features of the conversational-to-clear speech transformation revealed cross-language similarities in clear speech production strategies. In both languages, talkers exhibited a decrease in speaking rate and an increase in pitch range, as well as an expansion of the vowel space. Notably, the findings of this study showed equivalent vowel space expansion in English and Croatian clear speech, despite the difference in vowel inventory size across the two languages, suggesting that the extent of vowel contrast enhancement in hyperarticulated clear speech is independent of vowel inventory size.

  16. A self-organizing neural network architecture for auditory and speech perception with applications to acoustic and other temporal prediction problems

    NASA Astrophysics Data System (ADS)

    Cohen, Michael; Grossberg, Stephen

    1994-09-01

    This project is developing autonomous neural network models for the real-time perception and production of acoustic and speech signals. Our SPINET pitch model was developed to take realtime acoustic input and to simulate the key pitch data. SPINET was embedded into a model for auditory scene analysis, or how the auditory system separates sound sources in environments with multiple sources. The model groups frequency components based on pitch and spatial location cues and resonantly binds them within different streams. The model simulates psychophysical grouping data, such as how an ascending, tone groups with a descending tone even if noise exists at the intersection point, and how a tone before and after a noise burst is perceived to continue through the noise. These resonant streams input to working memories, wherein phonetic percepts adapt to global speech rate. Computer simulations quantitatively generate the experimentally observed category boundary shifts for voiced stop pairs that have the same or different place of articulation, including why the interval to hear a double (geminate) stop is twice as long as that to hear two different stops. This model also uses resonant feedback, here between list categories and working memory.

  17. Robust Speech Rate Estimation for Spontaneous Speech

    PubMed Central

    Wang, Dagen; Narayanan, Shrikanth S.

    2010-01-01

    In this paper, we propose a direct method for speech rate estimation from acoustic features without requiring any automatic speech transcription. We compare various spectral and temporal signal analysis and smoothing strategies to better characterize the underlying syllable structure to derive speech rate. The proposed algorithm extends the methods of spectral subband correlation by including temporal correlation and the use of prominent spectral subbands for improving the signal correlation essential for syllable detection. Furthermore, to address some of the practical robustness issues in previously proposed methods, we introduce some novel components into the algorithm such as the use of pitch confidence for filtering spurious syllable envelope peaks, magnifying window for tackling neighboring syllable smearing, and relative peak measure thresholds for pseudo peak rejection. We also describe an automated approach for learning algorithm parameters from data, and find the optimal settings through Monte Carlo simulations and parameter sensitivity analysis. Final experimental evaluations are conducted based on a portion of the Switchboard corpus for which manual phonetic segmentation information, and published results for direct comparison are available. The results show a correlation coefficient of 0.745 with respect to the ground truth based on manual segmentation. This result is about a 17% improvement compared to the current best single estimator and a 11% improvement over the multiestimator evaluated on the same Switchboard database. PMID:20428476

  18. Differences in acoustic features of vocalizations produced by killer whales cross-socialized with bottlenose dolphins.

    PubMed

    Musser, Whitney B; Bowles, Ann E; Grebner, Dawn M; Crance, Jessica L

    2014-10-01

    Limited previous evidence suggests that killer whales (Orcinus orca) are capable of vocal production learning. However, vocal contextual learning has not been studied, nor the factors promoting learning. Vocalizations were collected from three killer whales with a history of exposure to bottlenose dolphins (Tursiops truncatus) and compared with data from seven killer whales held with conspecifics and nine bottlenose dolphins. The three whales' repertoires were distinguishable by a higher proportion of click trains and whistles. Time-domain features of click trains were intermediate between those of whales held with conspecifics and dolphins. These differences provided evidence for contextual learning. One killer whale spontaneously learned to produce artificial chirps taught to dolphins; acoustic features fell within the range of inter-individual differences among the dolphins. This whale also produced whistles similar to a stereotyped whistle produced by one dolphin. Thus, results provide further support for vocal production learning and show that killer whales are capable of contextual learning. That killer whales produce similar repertoires when associated with another species suggests substantial vocal plasticity and motivation for vocal conformity with social associates. PMID:25324098

  19. Differences in acoustic features of vocalizations produced by killer whales cross-socialized with bottlenose dolphins.

    PubMed

    Musser, Whitney B; Bowles, Ann E; Grebner, Dawn M; Crance, Jessica L

    2014-10-01

    Limited previous evidence suggests that killer whales (Orcinus orca) are capable of vocal production learning. However, vocal contextual learning has not been studied, nor the factors promoting learning. Vocalizations were collected from three killer whales with a history of exposure to bottlenose dolphins (Tursiops truncatus) and compared with data from seven killer whales held with conspecifics and nine bottlenose dolphins. The three whales' repertoires were distinguishable by a higher proportion of click trains and whistles. Time-domain features of click trains were intermediate between those of whales held with conspecifics and dolphins. These differences provided evidence for contextual learning. One killer whale spontaneously learned to produce artificial chirps taught to dolphins; acoustic features fell within the range of inter-individual differences among the dolphins. This whale also produced whistles similar to a stereotyped whistle produced by one dolphin. Thus, results provide further support for vocal production learning and show that killer whales are capable of contextual learning. That killer whales produce similar repertoires when associated with another species suggests substantial vocal plasticity and motivation for vocal conformity with social associates.

  20. The Auditory-Brainstem Response to Continuous, Non-repetitive Speech Is Modulated by the Speech Envelope and Reflects Speech Processing

    PubMed Central

    Reichenbach, Chagit S.; Braiman, Chananel; Schiff, Nicholas D.; Hudspeth, A. J.; Reichenbach, Tobias

    2016-01-01

    The auditory-brainstem response (ABR) to short and simple acoustical signals is an important clinical tool used to diagnose the integrity of the brainstem. The ABR is also employed to investigate the auditory brainstem in a multitude of tasks related to hearing, such as processing speech or selectively focusing on one speaker in a noisy environment. Such research measures the response of the brainstem to short speech signals such as vowels or words. Because the voltage signal of the ABR has a tiny amplitude, several hundred to a thousand repetitions of the acoustic signal are needed to obtain a reliable response. The large number of repetitions poses a challenge to assessing cognitive functions due to neural adaptation. Here we show that continuous, non-repetitive speech, lasting several minutes, may be employed to measure the ABR. Because the speech is not repeated during the experiment, the precise temporal form of the ABR cannot be determined. We show, however, that important structural features of the ABR can nevertheless be inferred. In particular, the brainstem responds at the fundamental frequency of the speech signal, and this response is modulated by the envelope of the voiced parts of speech. We accordingly introduce a novel measure that assesses the ABR as modulated by the speech envelope, at the fundamental frequency of speech and at the characteristic latency of the response. This measure has a high signal-to-noise ratio and can hence be employed effectively to measure the ABR to continuous speech. We use this novel measure to show that the ABR is weaker to intelligible speech than to unintelligible, time-reversed speech. The methods presented here can be employed for further research on speech processing in the auditory brainstem and can lead to the development of future clinical diagnosis of brainstem function. PMID:27303286

  1. Psychoacoustic cues to emotion in speech prosody and music.

    PubMed

    Coutinho, Eduardo; Dibben, Nicola

    2013-01-01

    There is strong evidence of shared acoustic profiles common to the expression of emotions in music and speech, yet relatively limited understanding of the specific psychoacoustic features involved. This study combined a controlled experiment and computational modelling to investigate the perceptual codes associated with the expression of emotion in the acoustic domain. The empirical stage of the study provided continuous human ratings of emotions perceived in excerpts of film music and natural speech samples. The computational stage created a computer model that retrieves the relevant information from the acoustic stimuli and makes predictions about the emotional expressiveness of speech and music close to the responses of human subjects. We show that a significant part of the listeners' second-by-second reported emotions to music and speech prosody can be predicted from a set of seven psychoacoustic features: loudness, tempo/speech rate, melody/prosody contour, spectral centroid, spectral flux, sharpness, and roughness. The implications of these results are discussed in the context of cross-modal similarities in the communication of emotion in the acoustic domain. PMID:23057507

  2. Psychoacoustic cues to emotion in speech prosody and music.

    PubMed

    Coutinho, Eduardo; Dibben, Nicola

    2013-01-01

    There is strong evidence of shared acoustic profiles common to the expression of emotions in music and speech, yet relatively limited understanding of the specific psychoacoustic features involved. This study combined a controlled experiment and computational modelling to investigate the perceptual codes associated with the expression of emotion in the acoustic domain. The empirical stage of the study provided continuous human ratings of emotions perceived in excerpts of film music and natural speech samples. The computational stage created a computer model that retrieves the relevant information from the acoustic stimuli and makes predictions about the emotional expressiveness of speech and music close to the responses of human subjects. We show that a significant part of the listeners' second-by-second reported emotions to music and speech prosody can be predicted from a set of seven psychoacoustic features: loudness, tempo/speech rate, melody/prosody contour, spectral centroid, spectral flux, sharpness, and roughness. The implications of these results are discussed in the context of cross-modal similarities in the communication of emotion in the acoustic domain.

  3. Acoustic features contributing to the individuality of wild agile gibbon (Hylobates agilis agilis) songs.

    PubMed

    Oyakawa, Chisako; Koda, Hiroki; Sugiura, Hideki

    2007-07-01

    We examined acoustic individuality in wild agile gibbon Hylobates agilis agilis and determined the acoustic variables that contribute to individual discrimination using multivariate analyses. We recorded 125 female-specific songs (great calls) from six groups in west Sumatra and measured 58 acoustic variables for each great call. We performed principal component analysis to summarize the 58 variables into six acoustic principal components (PCs). Generally, each PC corresponded to a part of the great call. Significant individual differences were found across six individual gibbons in each of the six PCs. Moreover, strong acoustic individuality was found in the introductory and climax parts of the great call. In contrast, the terminal part contributed little to individual identification. Discriminant analysis showed that these PCs contributed to individual discrimination with high repeatability. Although we cannot conclude that agile gibbon use these acoustic components for individual discrimination, they are potential candidates for individual recognition.

  4. Experimental investigation of the effects of the acoustical conditions in a simulated classroom on speech recognition and learning in children a

    PubMed Central

    Valente, Daniel L.; Plevinsky, Hallie M.; Franco, John M.; Heinrichs-Graham, Elizabeth C.; Lewis, Dawna E.

    2012-01-01

    The potential effects of acoustical environment on speech understanding are especially important as children enter school where students’ ability to hear and understand complex verbal information is critical to learning. However, this ability is compromised because of widely varied and unfavorable classroom acoustics. The extent to which unfavorable classroom acoustics affect children’s performance on longer learning tasks is largely unknown as most research has focused on testing children using words, syllables, or sentences as stimuli. In the current study, a simulated classroom environment was used to measure comprehension performance of two classroom learning activities: a discussion and lecture. Comprehension performance was measured for groups of elementary-aged students in one of four environments with varied reverberation times and background noise levels. The reverberation time was either 0.6 or 1.5 s, and the signal-to-noise level was either +10 or +7 dB. Performance is compared to adult subjects as well as to sentence-recognition in the same condition. Significant differences were seen in comprehension scores as a function of age and condition; both increasing background noise and reverberation degraded performance in comprehension tasks compared to minimal differences in measures of sentence-recognition. PMID:22280587

  5. Fatigue features study on the crankshaft material of 42CrMo steel using acoustic emission

    NASA Astrophysics Data System (ADS)

    Shi, Yue; Dong, Lihong; Wang, Haidou; Li, Guolu; Liu, Shenshui

    2016-09-01

    Crankshaft is regarded as an important component of engines, and it is an important application of remanufacturing because of its high added value. However, the fatigue failure research of remanufactured crankshaft is still in its primary stage. Thus, monitoring and investigating the fatigue failure of the remanufacturing crankshaft is crucial. In this paper, acoustic emission (AE) technology and machine vision are used to monitor the four-point bending fatigue of 42CrMo, which is the material of crankshaft. The specimens are divided into two categories, namely, pre-existing crack and non-preexisting crack, which simulate the crankshaft and crankshaft blank, respectively. The analysis methods of parameter-based AE techniques, wavelet transform (WT) and SEM analysis are combined to identify the stage of fatigue failure. The stage of fatigue failure is the basis of using AE technology in the field of remanufacturing crankshafts. The experiment results show that the fatigue crack propagation style is a transgranular fracture and the fracture is a brittle fracture. The difference mainly depends on the form of crack initiation. Various AE signals are detected by parameter analysis method. Wavelet threshold denoising and WT are combined to extract the spectral features of AE signals at different fatigue failure stages.

  6. A gearbox fault diagnosis scheme based on near-field acoustic holography and spatial distribution features of sound field

    NASA Astrophysics Data System (ADS)

    Lu, Wenbo; Jiang, Weikang; Yuan, Guoqing; Yan, Li

    2013-05-01

    Vibration signal analysis is the main technique in machine condition monitoring or fault diagnosis, whereas in some cases vibration-based diagnosis is restrained because of its contact measurement. Acoustic-based diagnosis (ABD) with non-contact measurement has received little attention, although sound field may contain abundant information related to fault pattern. A new scheme of ABD for gearbox based on near-field acoustic holography (NAH) and spatial distribution features of sound field is presented in this paper. It focuses on applying distribution information of sound field to gearbox fault diagnosis. A two-stage industrial helical gearbox is experimentally studied in a semi-anechoic chamber and a lab workshop, respectively. Firstly, multi-class faults (mild pitting, moderate pitting, severe pitting and tooth breakage) are simulated, respectively. Secondly, sound fields and corresponding acoustic images in different gearbox running conditions are obtained by fast Fourier transform (FFT) based NAH. Thirdly, by introducing texture analysis to fault diagnosis, spatial distribution features are extracted from acoustic images for capturing fault patterns underlying the sound field. Finally, the features are fed into multi-class support vector machine for fault pattern identification. The feasibility and effectiveness of our proposed scheme is demonstrated on the good experimental results and the comparison with traditional ABD method. Even with strong noise interference, spatial distribution features of sound field can reliably reveal the fault patterns of gearbox, and thus the satisfactory accuracy can be obtained. The combination of histogram features and gray level gradient co-occurrence matrix features is suggested for good diagnosis accuracy and low time cost.

  7. Perception of Speech Features by French-Speaking Children with Cochlear Implants

    ERIC Educational Resources Information Center

    Bouton, Sophie; Serniclaes, Willy; Bertoncini, Josiane; Cole, Pascale

    2012-01-01

    Purpose: The present study investigates the perception of phonological features in French-speaking children with cochlear implants (CIs) compared with normal-hearing (NH) children matched for listening age. Method: Scores for discrimination and identification of minimal pairs for all features defining consonants (e.g., place, voicing, manner,…

  8. Smartphone Application for the Analysis of Prosodic Features in Running Speech with a Focus on Bipolar Disorders: System Performance Evaluation and Case Study.

    PubMed

    Guidi, Andrea; Salvi, Sergio; Ottaviano, Manuel; Gentili, Claudio; Bertschy, Gilles; de Rossi, Danilo; Scilingo, Enzo Pasquale; Vanello, Nicola

    2015-11-06

    Bipolar disorder is one of the most common mood disorders characterized by large and invalidating mood swings. Several projects focus on the development of decision support systems that monitor and advise patients, as well as clinicians. Voice monitoring and speech signal analysis can be exploited to reach this goal. In this study, an Android application was designed for analyzing running speech using a smartphone device. The application can record audio samples and estimate speech fundamental frequency, F0, and its changes. F0-related features are estimated locally on the smartphone, with some advantages with respect to remote processing approaches in terms of privacy protection and reduced upload costs. The raw features can be sent to a central server and further processed. The quality of the audio recordings, algorithm reliability and performance of the overall system were evaluated in terms of voiced segment detection and features estimation. The results demonstrate that mean F0 from each voiced segment can be reliably estimated, thus describing prosodic features across the speech sample. Instead, features related to F0 variability within each voiced segment performed poorly. A case study performed on a bipolar patient is presented.

  9. Smartphone Application for the Analysis of Prosodic Features in Running Speech with a Focus on Bipolar Disorders: System Performance Evaluation and Case Study

    PubMed Central

    Guidi, Andrea; Salvi, Sergio; Ottaviano, Manuel; Gentili, Claudio; Bertschy, Gilles; de Rossi, Danilo; Scilingo, Enzo Pasquale; Vanello, Nicola

    2015-01-01

    Bipolar disorder is one of the most common mood disorders characterized by large and invalidating mood swings. Several projects focus on the development of decision support systems that monitor and advise patients, as well as clinicians. Voice monitoring and speech signal analysis can be exploited to reach this goal. In this study, an Android application was designed for analyzing running speech using a smartphone device. The application can record audio samples and estimate speech fundamental frequency, F0, and its changes. F0-related features are estimated locally on the smartphone, with some advantages with respect to remote processing approaches in terms of privacy protection and reduced upload costs. The raw features can be sent to a central server and further processed. The quality of the audio recordings, algorithm reliability and performance of the overall system were evaluated in terms of voiced segment detection and features estimation. The results demonstrate that mean F0 from each voiced segment can be reliably estimated, thus describing prosodic features across the speech sample. Instead, features related to F0 variability within each voiced segment performed poorly. A case study performed on a bipolar patient is presented. PMID:26561811

  10. Differential Diagnosis of Severe Speech Disorders Using Speech Gestures

    ERIC Educational Resources Information Center

    Bahr, Ruth Huntley

    2005-01-01

    The differentiation of childhood apraxia of speech from severe phonological disorder is a common clinical problem. This article reports on an attempt to describe speech errors in children with childhood apraxia of speech on the basis of gesture use and acoustic analyses of articulatory gestures. The focus was on the movement of articulators and…

  11. Measurement of acoustical characteristics of mosques in Saudi Arabia.

    PubMed

    Abdou, Adel A

    2003-03-01

    The study of mosque acoustics, with regard to acoustical characteristics, sound quality for speech intelligibility, and other applicable acoustic criteria, has been largely neglected. In this study a background as to why mosques are designed as they are and how mosque design is influenced by worship considerations is given. In the study the acoustical characteristics of typically constructed contemporary mosques in Saudi Arabia have been investigated, employing a well-known impulse response. Extensive field measurements were taken in 21 representative mosques of different sizes and architectural features in order to characterize their acoustical quality and to identify the impact of air conditioning, ceiling fans, and sound reinforcement systems on their acoustics. Objective room-acoustic indicators such as reverberation time (RT) and clarity (C50) were measured. Background noise (BN) was assessed with and without the operation of air conditioning and fans. The speech transmission index (STI) was also evaluated with and without the operation of existing sound reinforcement systems. The existence of acoustical deficiencies was confirmed and quantified. The study, in addition to describing mosque acoustics, compares design goals to results obtained in practice and suggests acoustical target values for mosque design. The results show that acoustical quality in the investigated mosques deviates from optimum conditions when unoccupied, but is much better in the occupied condition.

  12. The Critical Shortage of Speech-Language Pathologists in the Public School Setting: Features of the Work Environment that Affect Recruitment and Retention

    ERIC Educational Resources Information Center

    Edgar, Debra L.; Rosa-Lugo, Linda I.

    2007-01-01

    Purpose: The primary focus of this study was to elicit the perspectives of speech-language pathologists (SLPs) regarding features of the work environment that contribute to and/or hinder recruitment and retention in the public school setting. Method: A questionnaire was distributed to SLPs employed in 10 school districts in Central Florida…

  13. Effect of Acoustic Spectrographic Instruction on Production of English /i/ and /I/ by Spanish Pre-Service English Teachers

    ERIC Educational Resources Information Center

    Quintana-Lara, Marcela

    2014-01-01

    This study investigates the effects of Acoustic Spectrographic Instruction on the production of the English phonological contrast /i/ and / I /. Acoustic Spectrographic Instruction is based on the assumption that physical representations of speech sounds and spectrography allow learners to objectively see and modify those non-accurate features in…

  14. Maternal depression and the learning-promoting effects of infant-directed speech: Roles of maternal sensitivity, depression diagnosis, and speech acoustic cues.

    PubMed

    Kaplan, Peter S; Danko, Christina M; Cejka, Anna M; Everhart, Kevin D

    2015-11-01

    The hypothesis that the associative learning-promoting effects of infant-directed speech (IDS) depend on infants' social experience was tested in a conditioned-attention paradigm with a cumulative sample of 4- to 14-month-old infants. Following six forward pairings of a brief IDS segment and a photographic slide of a smiling female face, infants of clinically depressed mothers exhibited evidence of having acquired significantly weaker voice-face associations than infants of non-depressed mothers. Regression analyses revealed that maternal depression was significantly related to infant learning even after demographic correlates of depression, antidepressant medication use, and extent of pitch modulation in maternal IDS had been taken into account. However, after maternal depression had been accounted for, maternal emotional availability, coded by blind raters from separate play interactions, accounted for significant further increments in the proportion of variance accounted for in infant learning scores. Both maternal depression and maternal insensitivity negatively, and additively, predicted poor learning.

  15. Room Acoustics

    NASA Astrophysics Data System (ADS)

    Kuttruff, Heinrich; Mommertz, Eckard

    The traditional task of room acoustics is to create or formulate conditions which ensure the best possible propagation of sound in a room from a sound source to a listener. Thus, objects of room acoustics are in particular assembly halls of all kinds, such as auditoria and lecture halls, conference rooms, theaters, concert halls or churches. Already at this point, it has to be pointed out that these conditions essentially depend on the question if speech or music should be transmitted; in the first case, the criterion for transmission quality is good speech intelligibility, in the other case, however, the success of room-acoustical efforts depends on other factors that cannot be quantified that easily, not least it also depends on the hearing habits of the listeners. In any case, absolutely "good acoustics" of a room do not exist.

  16. Is Birdsong More Like Speech or Music?

    PubMed

    Shannon, Robert V

    2016-04-01

    Music and speech share many acoustic cues but not all are equally important. For example, harmonic pitch is essential for music but not for speech. When birds communicate is their song more like speech or music? A new study contrasting pitch and spectral patterns shows that birds perceive their song more like humans perceive speech. PMID:26944220

  17. ON THE NATURE OF SPEECH SCIENCE.

    ERIC Educational Resources Information Center

    PETERSON, GORDON E.

    IN THIS ARTICLE THE NATURE OF THE DISCIPLINE OF SPEECH SCIENCE IS CONSIDERED AND THE VARIOUS BASIC AND APPLIED AREAS OF THE DISCIPLINE ARE DISCUSSED. THE BASIC AREAS ENCOMPASS THE VARIOUS PROCESSES OF THE PHYSIOLOGY OF SPEECH PRODUCTION, THE ACOUSTICAL CHARACTERISTICS OF SPEECH, INCLUDING THE SPEECH WAVE TYPES AND THE INFORMATION-BEARING ACOUSTIC…

  18. Temporal lobe regions engaged during normal speech comprehension.

    PubMed

    Crinion, Jennifer T; Lambon-Ralph, Matthew A; Warburton, Elizabeth A; Howard, David; Wise, Richard J S

    2003-05-01

    Processing of speech is obligatory. Thus, during normal speech comprehension, the listener is aware of the overall meaning of the speaker's utterance without the need to direct attention to individual linguistic and paralinguistic (intonational, prosodic, etc.) features contained within the speech signal. However, most functional neuroimaging studies of speech perception have used metalinguistic tasks that required the subjects to attend to specific features of the stimuli. Such tasks have demanded a forced-choice decision and a motor response from the subjects, which will engage frontal systems and may include unpredictable top-down modulation of the signals observed in one or more of the temporal lobe neural systems engaged during speech perception. This study contrasted the implicit comprehension of simple narrative speech with listening to reversed versions of the narratives: the latter are as acoustically complex as speech but are unintelligible in terms of both linguistic and paralinguistic information. The result demonstrated that normal comprehension, free of task demands that do not form part of everyday discourse, engages regions distributed between the two temporal lobes, more widely on the left. In particular, comprehension is dependent on anterolateral and ventral left temporal regions, as suggested by observations on patients with semantic dementia, as well as posterior regions described in studies on aphasic stroke patients. The only frontal contribution was confined to the ventrolateral left prefrontal cortex, compatible with observations that comprehension of simple speech is preserved in patients with left posterior frontal infarction. PMID:12690058

  19. Low-Frequency Cortical Entrainment to Speech Reflects Phoneme-Level Processing.

    PubMed

    Di Liberto, Giovanni M; O'Sullivan, James A; Lalor, Edmund C

    2015-10-01

    The human ability to understand speech is underpinned by a hierarchical auditory system whose successive stages process increasingly complex attributes of the acoustic input. It has been suggested that to produce categorical speech perception, this system must elicit consistent neural responses to speech tokens (e.g., phonemes) despite variations in their acoustics. Here, using electroencephalography (EEG), we provide evidence for this categorical phoneme-level speech processing by showing that the relationship between continuous speech and neural activity is best described when that speech is represented using both low-level spectrotemporal information and categorical labeling of phonetic features. Furthermore, the mapping between phonemes and EEG becomes more discriminative for phonetic features at longer latencies, in line with what one might expect from a hierarchical system. Importantly, these effects are not seen for time-reversed speech. These findings may form the basis for future research on natural language processing in specific cohorts of interest and for broader insights into how brains transform acoustic input into meaning. PMID:26412129

  20. COSMIC TRANSPARENCY: A TEST WITH THE BARYON ACOUSTIC FEATURE AND TYPE Ia SUPERNOVAE

    SciTech Connect

    More, Surhud; Hogg, David W.; Bovy, Jo

    2009-05-10

    Conservation of the phase-space density of photons plus Lorentz invariance requires that the cosmological luminosity distance be larger than the angular diameter distance by a factor of (1 + z){sup 2}, where z is the redshift. Because this is a fundamental symmetry, this prediction-known sometimes as the 'Etherington relation' or the 'Tolman test'-is independent of the world model, or even the assumptions of homogeneity and isotropy. It depends, however, on Lorentz invariance and transparency. Transparency can be affected by intergalactic dust or interactions between photons and the dark sector. Baryon acoustic feature (BAF) and type Ia supernovae (SNeIa) measures of the expansion history are differently sensitive to the angular diameter and luminosity distances and can therefore be used in conjunction to limit cosmic transparency. At the present day, the comparison only limits the change {delta}{tau} in the optical depth from redshift 0.20 to 0.35 at visible wavelengths to {delta}{tau} < 0.13 at 95% confidence. In a model with a constant comoving number density n of scatterers of constant proper cross section {sigma}, this limit implies n{sigma} < 2 x 10{sup -4} h Mpc{sup -1}. These limits depend weakly on the cosmological world model. Assuming a concordance world model, the best-fit value of {delta}{tau} to current data is negative at the 2{sigma} level. This could signal interesting new physics or could be the result of unidentified systematics in the BAF/SNeIa measurements. Within the next few years, the limits on transparency could extend to redshifts z {approx} 2.5 and improve to n{sigma} < 1.1 x 10{sup -5} h Mpc{sup -1}. Cosmic variance will eventually limit the sensitivity of any test using the BAF at the n{sigma} {approx} 4 x 10{sup -7} h Mpc{sup -1} level. Comparison with other measures of the transparency is provided; no other measure in the visible is as free of astrophysical assumptions.

  1. Two-microphone spatial filtering provides speech reception benefits for cochlear implant users in difficult acoustic environments

    PubMed Central

    Goldsworthy, Raymond L.; Delhorne, Lorraine A.; Desloge, Joseph G.; Braida, Louis D.

    2014-01-01

    This article introduces and provides an assessment of a spatial-filtering algorithm based on two closely-spaced (∼1 cm) microphones in a behind-the-ear shell. The evaluated spatial-filtering algorithm used fast (∼10 ms) temporal-spectral analysis to determine the location of incoming sounds and to enhance sounds arriving from straight ahead of the listener. Speech reception thresholds (SRTs) were measured for eight cochlear implant (CI) users using consonant and vowel materials under three processing conditions: An omni-directional response, a dipole-directional response, and the spatial-filtering algorithm. The background noise condition used three simultaneous time-reversed speech signals as interferers located at 90°, 180°, and 270°. Results indicated that the spatial-filtering algorithm can provide speech reception benefits of 5.8 to 10.7 dB SRT compared to an omni-directional response in a reverberant room with multiple noise sources. Given the observed SRT benefits, coupled with an efficient design, the proposed algorithm is promising as a CI noise-reduction solution. PMID:25096120

  2. Effects of Semantic Context and Feedback on Perceptual Learning of Speech Processed through an Acoustic Simulation of a Cochlear Implant

    ERIC Educational Resources Information Center

    Loebach, Jeremy L.; Pisoni, David B.; Svirsky, Mario A.

    2010-01-01

    The effect of feedback and materials on perceptual learning was examined in listeners with normal hearing who were exposed to cochlear implant simulations. Generalization was most robust when feedback paired the spectrally degraded sentences with their written transcriptions, promoting mapping between the degraded signal and its acoustic-phonetic…

  3. Investigations of High Pressure Acoustic Waves in Resonators with Seal-Like Features

    NASA Technical Reports Server (NTRS)

    Daniels, Christopher C.; Steinetz, Bruce M.; Finkbeiner, Joshua R.; Li, Xiao-Fan; Raman, Ganesh

    2004-01-01

    1) Standing waves with maximum pressures of 188 kPa have been produced in resonators containing ambient pressure air; 2) Addition of structures inside the resonator shifts the fundamental frequency and decreases the amplitude of the generated pressure waves; 3) Addition of holes to the resonator does reduce the magnitude of the acoustic waves produced, but their addition does not prohibit the generation of large magnitude non-linear standing waves; 4) The feasibility of reducing leakage using non-linear acoustics has been confirmed.

  4. Tutorial on architectural acoustics

    NASA Astrophysics Data System (ADS)

    Shaw, Neil; Talaske, Rick; Bistafa, Sylvio

    2002-11-01

    This tutorial is intended to provide an overview of current knowledge and practice in architectural acoustics. Topics covered will include basic concepts and history, acoustics of small rooms (small rooms for speech such as classrooms and meeting rooms, music studios, small critical listening spaces such as home theatres) and the acoustics of large rooms (larger assembly halls, auditoria, and performance halls).

  5. The neural basis of sublexical speech and corresponding nonspeech processing: a combined EEG-MEG study.

    PubMed

    Kuuluvainen, Soila; Nevalainen, Päivi; Sorokin, Alexander; Mittag, Maria; Partanen, Eino; Putkinen, Vesa; Seppänen, Miia; Kähkönen, Seppo; Kujala, Teija

    2014-03-01

    We addressed the neural organization of speech versus nonspeech sound processing by investigating preattentive cortical auditory processing of changes in five features of a consonant-vowel syllable (consonant, vowel, sound duration, frequency, and intensity) and their acoustically matched nonspeech counterparts in a simultaneous EEG-MEG recording of mismatch negativity (MMN/MMNm). Overall, speech-sound processing was enhanced compared to nonspeech sound processing. This effect was strongest for changes which affect word meaning (consonant, vowel, and vowel duration) in the left and for the vowel identity change in the right hemisphere also. Furthermore, in the right hemisphere, speech-sound frequency and intensity changes were processed faster than their nonspeech counterparts, and there was a trend for speech-enhancement in frequency processing. In summary, the results support the proposed existence of long-term memory traces for speech sounds in the auditory cortices, and indicate at least partly distinct neural substrates for speech and nonspeech sound processing.

  6. Comparison of the Speech Syntactic Features between Hearing-Impaired and Normal Hearing Children

    PubMed Central

    PahlavanNezhad, Mohammad Reza; Tayarani Niknezhad, Hamid

    2014-01-01

    Introduction: The present study seeks to describe and analyze the syntactic features of children with severely hearing loss who had access to the hearing aids compared with children with normal hearing, assigning them to the same separate gender classes. Materials and Methods: In the present study, eight children with severe hearing impairment who used a hearing aid and eight hearing children matched for age and gender were selected using an available sampling method based on the principles of auditory-verbal approach. Hearing children had an average age of 5.45 ±1.9 years and subjects had a mean age of 5.43±2.17 years and their rehabilitation had begun before they were 18 months old. The assessment instrument of the study included the language development test, TOLDP-3. The syntactic skills of these children were analyzed and compared with the hearing children of the same age based on gender. Results: There was a significant difference between the syntactic scores of the hearing-impaired children and the scores of the hearing children of the same age in the “sentence imitation” (t=−2/90, P<0/05) and “grammatical completion” (t=−3/39, P<0/05) subtests, with no significant difference in the “grammatical understanding” subtest (t=1/67, P>0/05). Moreover, there was no significant difference between male and female children with hearing impairment in terms of syntactic skills development. Conclusion: With early diagnosis and timely rehabilitating intervention, children with hearing loss can perform in a similar way to children of their age with normal hearing in some syntactical areas. Furthermore, the gender factor in the present study had no effect on the development of syntactical skills of children with hearing loss. PMID:24744994

  7. Advances in speech processing

    NASA Astrophysics Data System (ADS)

    Ince, A. Nejat

    1992-10-01

    The field of speech processing is undergoing a rapid growth in terms of both performance and applications and this is fueled by the advances being made in the areas of microelectronics, computation, and algorithm design. The use of voice for civil and military communications is discussed considering advantages and disadvantages including the effects of environmental factors such as acoustic and electrical noise and interference and propagation. The structure of the existing NATO communications network and the evolving Integrated Services Digital Network (ISDN) concept are briefly reviewed to show how they meet the present and future requirements. The paper then deals with the fundamental subject of speech coding and compression. Recent advances in techniques and algorithms for speech coding now permit high quality voice reproduction at remarkably low bit rates. The subject of speech synthesis is next treated where the principle objective is to produce natural quality synthetic speech from unrestricted text input. Speech recognition where the ultimate objective is to produce a machine which would understand conversational speech with unrestricted vocabulary, from essentially any talker, is discussed. Algorithms for speech recognition can be characterized broadly as pattern recognition approaches and acoustic phonetic approaches. To date, the greatest degree of success in speech recognition has been obtained using pattern recognition paradigms. It is for this reason that the paper is concerned primarily with this technique.

  8. Isolating Neural Indices of Continuous Speech Processing at the Phonetic Level.

    PubMed

    Di Liberto, Giovanni M; Lalor, Edmund C

    2016-01-01

    The human ability to understand speech across an enormous range of listening conditions is underpinned by a hierarchical auditory processing system whose successive stages process increasingly complex attributes of the acoustic input. In order to produce a categorical perception of words and phonemes, it has been suggested that, while earlier areas of the auditory system undoubtedly respond to acoustic differences in speech tokens, later areas must exhibit consistent neural responses to those tokens. Neural indices of such hierarchical processing in the context of continuous speech have been identified using low-frequency scalp-recorded electroencephalography (EEG) data. The relationship between continuous speech and its associated neural responses has been shown to be best described when that speech is represented using both its low-level spectrotemporal information and also the categorical labelling of its phonetic features (Di Liberto et al., Curr Biol 25(19):2457-2465, 2015). While the phonetic features have been proven to carry extra-information not captured by the speech spectrotemporal representation, the causes of this EEG activity remain unclear. This study aims to demonstrate a framework for examining speech-specific processing and for disentangling high-level neural activity related to intelligibility from low-level activity in response to spectrotemporal fluctuations of speech. Preliminary results suggest that neural measure of processing at the phonetic level can be isolated.

  9. Isolating Neural Indices of Continuous Speech Processing at the Phonetic Level.

    PubMed

    Di Liberto, Giovanni M; Lalor, Edmund C

    2016-01-01

    The human ability to understand speech across an enormous range of listening conditions is underpinned by a hierarchical auditory processing system whose successive stages process increasingly complex attributes of the acoustic input. In order to produce a categorical perception of words and phonemes, it has been suggested that, while earlier areas of the auditory system undoubtedly respond to acoustic differences in speech tokens, later areas must exhibit consistent neural responses to those tokens. Neural indices of such hierarchical processing in the context of continuous speech have been identified using low-frequency scalp-recorded electroencephalography (EEG) data. The relationship between continuous speech and its associated neural responses has been shown to be best described when that speech is represented using both its low-level spectrotemporal information and also the categorical labelling of its phonetic features (Di Liberto et al., Curr Biol 25(19):2457-2465, 2015). While the phonetic features have been proven to carry extra-information not captured by the speech spectrotemporal representation, the causes of this EEG activity remain unclear. This study aims to demonstrate a framework for examining speech-specific processing and for disentangling high-level neural activity related to intelligibility from low-level activity in response to spectrotemporal fluctuations of speech. Preliminary results suggest that neural measure of processing at the phonetic level can be isolated. PMID:27080674

  10. Acoustic constituents of prosodic typology

    NASA Astrophysics Data System (ADS)

    Komatsu, Masahiko

    Different languages sound different, and considerable part of it derives from the typological difference of prosody. Although such difference is often referred to as lexical accent types (stress accent, pitch accent, and tone; e.g. English, Japanese, and Chinese respectively) and rhythm types (stress-, syllable-, and mora-timed rhythms; e.g. English, Spanish, and Japanese respectively), it is unclear whether these types are determined in terms of acoustic properties, The thesis intends to provide a potential basis for the description of prosody in terms of acoustics. It argues for the hypothesis that the source component of the source-filter model (acoustic features) approximately corresponds to prosody (linguistic features) through several experimental-phonetic studies. The study consists of four parts. (1) Preliminary experiment: Perceptual language identification tests were performed using English and Japanese speech samples whose frequency spectral information (i.e. non-source component) is heavily reduced. The results indicated that humans can discriminate languages with such signals. (2) Discussion on the linguistic information that the source component contains: This part constitutes the foundation of the argument of the thesis. Perception tests of consonants with the source signal indicated that the source component carries the information on broad categories of phonemes that contributes to the creation of rhythm. (3) Acoustic analysis: The speech samples of Chinese, English, Japanese, and Spanish, differing in prosodic types, were analyzed. These languages showed difference in acoustic characteristics of the source component. (4) Perceptual experiment: A language identification test for the above four languages was performed using the source signal with its acoustic features parameterized. It revealed that humans can discriminate prosodic types solely with the source features and that the discrimination is easier as acoustic information increases. The

  11. Acoustic and Perceptual Measurement of Expressive Prosody in High-Functioning Autism: Increased Pitch Range and What it Means to Listeners

    ERIC Educational Resources Information Center

    Nadig, Aparna; Shaw, Holly

    2012-01-01

    Are there consistent markers of atypical prosody in speakers with high functioning autism (HFA) compared to typically-developing speakers? We examined: (1) acoustic measurements of pitch range, mean pitch and speech rate in conversation, (2) perceptual ratings of conversation for these features and overall prosody, and (3) acoustic measurements of…

  12. Emotional recognition from the speech signal for a virtual education agent

    NASA Astrophysics Data System (ADS)

    Tickle, A.; Raghu, S.; Elshaw, M.

    2013-06-01

    This paper explores the extraction of features from the speech wave to perform intelligent emotion recognition. A feature extract tool (openSmile) was used to obtain a baseline set of 998 acoustic features from a set of emotional speech recordings from a microphone. The initial features were reduced to the most important ones so recognition of emotions using a supervised neural network could be performed. Given that the future use of virtual education agents lies with making the agents more interactive, developing agents with the capability to recognise and adapt to the emotional state of humans is an important step.

  13. Machine Translation from Speech

    NASA Astrophysics Data System (ADS)

    Schwartz, Richard; Olive, Joseph; McCary, John; Christianson, Caitlin

    This chapter describes approaches for translation from speech. Translation from speech presents two new issues. First, of course, we must recognize the speech in the source language. Although speech recognition has improved considerably over the last three decades, it is still far from being a solved problem. In the best of conditions, when the speech comes from high quality, carefully enunciated speech, on common topics (such as speech read by a trained news broadcaster), the word error rate is typically on the order of 5%. Humans can typically transcribe speech like this with less than 1% disagreement between annotators, so even this best number is still far worse than human performance. However, the task gets much harder when anything changes from this ideal condition. Some of the conditions that cause higher error rate are, if the topic is somewhat unusual, or the speakers are not reading so that their speech is more spontaneous, or if the speakers have an accent or are speaking a dialect, or if there is any acoustic degradation, such as noise or reverberation. In these cases, the word error can increase significantly to 20%, 30%, or higher. Accordingly, most of this chapter discusses techniques for improving speech recognition accuracy, while one section discusses techniques for integrating speech recognition with translation.

  14. Segmenting Words from Natural Speech: Subsegmental Variation in Segmental Cues

    ERIC Educational Resources Information Center

    Rytting, C. Anton; Brew, Chris; Fosler-Lussier, Eric

    2010-01-01

    Most computational models of word segmentation are trained and tested on transcripts of speech, rather than the speech itself, and assume that speech is converted into a sequence of symbols prior to word segmentation. We present a way of representing speech corpora that avoids this assumption, and preserves acoustic variation present in speech. We…

  15. Speech Acquisition and Automatic Speech Recognition for Integrated Spacesuit Audio Systems

    NASA Technical Reports Server (NTRS)

    Huang, Yiteng; Chen, Jingdong; Chen, Shaoyan

    2010-01-01

    A voice-command human-machine interface system has been developed for spacesuit extravehicular activity (EVA) missions. A multichannel acoustic signal processing method has been created for distant speech acquisition in noisy and reverberant environments. This technology reduces noise by exploiting differences in the statistical nature of signal (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, the automatic speech recognition (ASR) accuracy can be improved to the level at which crewmembers would find the speech interface useful. The developed speech human/machine interface will enable both crewmember usability and operational efficiency. It can enjoy a fast rate of data/text entry, small overall size, and can be lightweight. In addition, this design will free the hands and eyes of a suited crewmember. The system components and steps include beam forming/multi-channel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, model adaption, ASR HMM (Hidden Markov Model) training, and ASR decoding. A state-of-the-art phoneme recognizer can obtain an accuracy rate of 65 percent when the training and testing data are free of noise. When it is used in spacesuits, the rate drops to about 33 percent. With the developed microphone array speech-processing technologies, the performance is improved and the phoneme recognition accuracy rate rises to 44 percent. The recognizer can be further improved by combining the microphone array and HMM model adaptation techniques and using speech samples collected from inside spacesuits. In addition, arithmetic complexity models for the major HMMbased ASR components were developed. They can help real-time ASR system designers select proper tasks when in the face of constraints in computational resources.

  16. Common cues to emotion in the dynamic facial expressions of speech and song.

    PubMed

    Livingstone, Steven R; Thompson, William F; Wanderley, Marcelo M; Palmer, Caroline

    2015-01-01

    Speech and song are universal forms of vocalization that may share aspects of emotional expression. Research has focused on parallels in acoustic features, overlooking facial cues to emotion. In three experiments, we compared moving facial expressions in speech and song. In Experiment 1, vocalists spoke and sang statements each with five emotions. Vocalists exhibited emotion-dependent movements of the eyebrows and lip corners that transcended speech-song differences. Vocalists' jaw movements were coupled to their acoustic intensity, exhibiting differences across emotion and speech-song. Vocalists' emotional movements extended beyond vocal sound to include large sustained expressions, suggesting a communicative function. In Experiment 2, viewers judged silent videos of vocalists' facial expressions prior to, during, and following vocalization. Emotional intentions were identified accurately for movements during and after vocalization, suggesting that these movements support the acoustic message. Experiment 3 compared emotional identification in voice-only, face-only, and face-and-voice recordings. Emotion judgements for voice-only singing were poorly identified, yet were accurate for all other conditions, confirming that facial expressions conveyed emotion more accurately than the voice in song, yet were equivalent in speech. Collectively, these findings highlight broad commonalities in the facial cues to emotion in speech and song, yet highlight differences in perception and acoustic-motor production.

  17. Detection of prosodics by using a speech recognition system

    NASA Astrophysics Data System (ADS)

    Hupp, N. A.

    1991-07-01

    The problem was to determine the ability of a speech recognizer to extract prosodic speech features, such as pitch and stress, and to examine these features for application to future voice recognition systems. The Speech Systems Incorporated (SSI) speech recognizer demonstrated that it could detect prosodic features and that these features do indicate the word and/or syllable that is stressed by the speaker. The research examined the effect of prosodics, such as pitch, amplitude, and duration, on word and syllable stress by using the SSI. Subjects read phases and sentences, using a given intonation and stress. The three sections of the experiment compared questions and answers, words stressed within a sentence, and noun/verb pairs, such as object and subject. The results were analyzed both on the syllable level and the word level. In all cases, there was a significant increase in pitch, amplitude, and duration when comparing stressed words and syllables to unstressed words and syllables. When comparing unstressed words only, it was also noted that the first word in a sentence has an increase in pitch, amplitude, and duration. The threshold could be set in recognition systems for each of these parameters. Current speech recognizers do not use acoustic data above the word level. This research shows that we have the capability of developing better speech systems by incorporating prosodics with new linguistic software.

  18. Processing changes when listening to foreign-accented speech

    PubMed Central

    Romero-Rivas, Carlos; Martin, Clara D.; Costa, Albert

    2015-01-01

    This study investigates the mechanisms responsible for fast changes in processing foreign-accented speech. Event Related brain Potentials (ERPs) were obtained while native speakers of Spanish listened to native and foreign-accented speakers of Spanish. We observed a less positive P200 component for foreign-accented speech relative to native speech comprehension. This suggests that the extraction of spectral information and other important acoustic features was hampered during foreign-accented speech comprehension. However, the amplitude of the N400 component for foreign-accented speech comprehension decreased across the experiment, suggesting the use of a higher level, lexical mechanism. Furthermore, during native speech comprehension, semantic violations in the critical words elicited an N400 effect followed by a late positivity. During foreign-accented speech comprehension, semantic violations only elicited an N400 effect. Overall, our results suggest that, despite a lack of improvement in phonetic discrimination, native listeners experience changes at lexical-semantic levels of processing after brief exposure to foreign-accented speech. Moreover, these results suggest that lexical access, semantic integration and linguistic re-analysis processes are permeable to external factors, such as the accent of the speaker. PMID:25859209

  19. Laminar cortical dynamics of conscious speech perception: neural model of phonemic restoration using subsequent context in noise.

    PubMed

    Grossberg, Stephen; Kazerounian, Sohrob

    2011-07-01

    How are laminar circuits of neocortex organized to generate conscious speech and language percepts? How does the brain restore information that is occluded by noise, or absent from an acoustic signal, by integrating contextual information over many milliseconds to disambiguate noise-occluded acoustical signals? How are speech and language heard in the correct temporal order, despite the influence of contexts that may occur many milliseconds before or after each perceived word? A neural model describes key mechanisms in forming conscious speech percepts, and quantitatively simulates a critical example of contextual disambiguation of speech and language; namely, phonemic restoration. Here, a phoneme deleted from a speech stream is perceptually restored when it is replaced by broadband noise, even when the disambiguating context occurs after the phoneme was presented. The model describes how the laminar circuits within a hierarchy of cortical processing stages may interact to generate a conscious speech percept that is embodied by a resonant wave of activation that occurs between acoustic features, acoustic item chunks, and list chunks. Chunk-mediated gating allows speech to be heard in the correct temporal order, even when what is heard depends upon future context.

  20. The vocal repertoire of the domesticated zebra finch: a data-driven approach to decipher the information-bearing acoustic features of communication signals.

    PubMed

    Elie, Julie E; Theunissen, Frédéric E

    2016-03-01

    Although a universal code for the acoustic features of animal vocal communication calls may not exist, the thorough analysis of the distinctive acoustical features of vocalization categories is important not only to decipher the acoustical code for a specific species but also to understand the evolution of communication signals and the mechanisms used to produce and understand them. Here, we recorded more than 8000 examples of almost all the vocalizations of the domesticated zebra finch, Taeniopygia guttata: vocalizations produced to establish contact, to form and maintain pair bonds, to sound an alarm, to communicate distress or to advertise hunger or aggressive intents. We characterized each vocalization type using complete representations that avoided any a priori assumptions on the acoustic code, as well as classical bioacoustics measures that could provide more intuitive interpretations. We then used these acoustical features to rigorously determine the potential information-bearing acoustical features for each vocalization type using both a novel regularized classifier and an unsupervised clustering algorithm. Vocalization categories are discriminated by the shape of their frequency spectrum and by their pitch saliency (noisy to tonal vocalizations) but not particularly by their fundamental frequency. Notably, the spectral shape of zebra finch vocalizations contains peaks or formants that vary systematically across categories and that would be generated by active control of both the vocal organ (source) and the upper vocal tract (filter). PMID:26581377

  1. Simulation study and guidelines to generate Laser-induced Surface Acoustic Waves for human skin feature detection

    NASA Astrophysics Data System (ADS)

    Li, Tingting; Fu, Xing; Chen, Kun; Dorantes-Gonzalez, Dante J.; Li, Yanning; Wu, Sen; Hu, Xiaotang

    2015-12-01

    Despite the seriously increasing number of people contracting skin cancer every year, limited attention has been given to the investigation of human skin tissues. To this regard, Laser-induced Surface Acoustic Wave (LSAW) technology, with its accurate, non-invasive and rapid testing characteristics, has recently shown promising results in biological and biomedical tissues. In order to improve the measurement accuracy and efficiency of detecting important features in highly opaque and soft surfaces such as human skin, this paper identifies the most important parameters of a pulse laser source, as well as provides practical guidelines to recommended proper ranges to generate Surface Acoustic Waves (SAWs) for characterization purposes. Considering that melanoma is a serious type of skin cancer, we conducted a finite element simulation-based research on the generation and propagation of surface waves in human skin containing a melanoma-like feature, determine best pulse laser parameter ranges of variation, simulation mesh size and time step, working bandwidth, and minimal size of detectable melanoma.

  2. Prosodic strengthening and featural enhancement: Evidence from acoustic and articulatory realizations of /opena,eye/ in English

    NASA Astrophysics Data System (ADS)

    Cho, Taehong

    2005-06-01

    In this study the effects of accent and prosodic boundaries on the production of English vowels (/opena,eye/), by concurrently examining acoustic vowel formants and articulatory maxima of the tongue, jaw, and lips obtained with EMA (Electromagnetic Articulography) are investigated. The results demonstrate that prosodic strengthening (due to accent and/or prosodic boundaries) has differential effects depending on the source of prominence (in accented syllables versus at edges of prosodic domains; domain initially versus domain finally). The results are interpreted in terms of how the prosodic strengthening is related to phonetic realization of vowel features. For example, when accented, /eye/ was fronter in both acoustic and articulatory vowel spaces (enhancing [-back]), accompanied by an increase in both lip and jaw openings (enhancing sonority). By contrast, at edges of prosodic domains (especially domain-finally), /eye/ was not necessarily fronter, but higher (enhancing [+high]), accompanied by an increase only in the lip (not jaw) opening. This suggests that the two aspects of prosodic structure (accent versus boundary) are differentiated by distinct phonetic patterns. Further, it implies that prosodic strengthening, though manifested in fine-grained phonetic details, is not simply a low-level phonetic event but a complex linguistic phenomenon, closely linked to the enhancement of phonological features and positional strength that may license phonological contrasts. .

  3. Interaction of dust-ion acoustic solitary waves in nonplanar geometry with electrons featuring Tsallis distribution

    SciTech Connect

    Narayan Ghosh, Uday; Chatterjee, Prasanta; Tribeche, Mouloud

    2012-11-15

    The head-on collisions between nonplanar dust-ion acoustic solitary waves are dealt with by an extended version of Poincare-Lighthill-Kuo perturbation method, for a plasma having stationary dust grains, inertial ions, and nonextensive electrons. The nonplanar geometry modified analytical phase-shift after a head-on collision is derived. It is found that as the nonextensive character of the electrons becomes important, the phase-shift decreases monotonically before levelling-off at a constant value. This leads us to think that nonextensivity may have a stabilizing effect on the phase-shift.

  4. Acoustic analysis in Mudejar-Gothic churches: experimental results.

    PubMed

    Galindo, Miguel; Zamarreño, Teófilo; Girón, Sara

    2005-05-01

    This paper describes the preliminary results of research work in acoustics, conducted in a set of 12 Mudejar-Gothic churches in the city of Seville in the south of Spain. Despite common architectural style, the churches feature individual characteristics and have volumes ranging from 3947 to 10 708 m3. Acoustic parameters were measured in unoccupied churches according to the ISO-3382 standard. An extensive experimental study was carried out using impulse response analysis through a maximum length sequence measurement system in each church. It covered aspects such as reverberation (reverberation times, early decay times), distribution of sound levels (sound strength); early to late sound energy parameters derived from the impulse responses (center time, clarity for speech, clarity, definition, lateral energy fraction), and speech intelligibility (rapid speech transmission index), which all take both spectral and spatial distribution into account. Background noise was also measured to obtain the NR indices. The study describes the acoustic field inside each temple and establishes a discussion for each one of the acoustic descriptors mentioned by using the theoretical models available and the principles of architectural acoustics. Analysis of the quality of the spaces for music and speech is carried out according to the most widespread criteria for auditoria. PMID:15957758

  5. Acoustic analysis in Mudejar-Gothic churches: experimental results.

    PubMed

    Galindo, Miguel; Zamarreño, Teófilo; Girón, Sara

    2005-05-01

    This paper describes the preliminary results of research work in acoustics, conducted in a set of 12 Mudejar-Gothic churches in the city of Seville in the south of Spain. Despite common architectural style, the churches feature individual characteristics and have volumes ranging from 3947 to 10 708 m3. Acoustic parameters were measured in unoccupied churches according to the ISO-3382 standard. An extensive experimental study was carried out using impulse response analysis through a maximum length sequence measurement system in each church. It covered aspects such as reverberation (reverberation times, early decay times), distribution of sound levels (sound strength); early to late sound energy parameters derived from the impulse responses (center time, clarity for speech, clarity, definition, lateral energy fraction), and speech intelligibility (rapid speech transmission index), which all take both spectral and spatial distribution into account. Background noise was also measured to obtain the NR indices. The study describes the acoustic field inside each temple and establishes a discussion for each one of the acoustic descriptors mentioned by using the theoretical models available and the principles of architectural acoustics. Analysis of the quality of the spaces for music and speech is carried out according to the most widespread criteria for auditoria.

  6. Acoustic analysis in Mudejar-Gothic churches: Experimental results

    NASA Astrophysics Data System (ADS)

    Galindo, Miguel; Zamarreño, Teófilo; Girón, Sara

    2005-05-01

    This paper describes the preliminary results of research work in acoustics, conducted in a set of 12 Mudejar-Gothic churches in the city of Seville in the south of Spain. Despite common architectural style, the churches feature individual characteristics and have volumes ranging from 3947 to 10 708 m3. Acoustic parameters were measured in unoccupied churches according to the ISO-3382 standard. An extensive experimental study was carried out using impulse response analysis through a maximum length sequence measurement system in each church. It covered aspects such as reverberation (reverberation times, early decay times), distribution of sound levels (sound strength); early to late sound energy parameters derived from the impulse responses (center time, clarity for speech, clarity, definition, lateral energy fraction), and speech intelligibility (rapid speech transmission index), which all take both spectral and spatial distribution into account. Background noise was also measured to obtain the NR indices. The study describes the acoustic field inside each temple and establishes a discussion for each one of the acoustic descriptors mentioned by using the theoretical models available and the principles of architectural acoustics. Analysis of the quality of the spaces for music and speech is carried out according to the most widespread criteria for auditoria. .

  7. Speech Research

    NASA Astrophysics Data System (ADS)

    Several articles addressing topics in speech research are presented. The topics include: exploring the functional significance of physiological tremor: A biospectroscopic approach; differences between experienced and inexperienced listeners to deaf speech; a language-oriented view of reading and its disabilities; Phonetic factors in letter detection; categorical perception; Short-term recall by deaf signers of American sign language; a common basis for auditory sensory storage in perception and immediate memory; phonological awareness and verbal short-term memory; initiation versus execution time during manual and oral counting by stutterers; trading relations in the perception of speech by five-year-old children; the role of the strap muscles in pitch lowering; phonetic validation of distinctive features; consonants and syllable boundaires; and vowel information in postvocalic frictions.

  8. Age-related changes in prosodic features of maternal speech to prelingually deaf infants with cochlear implants

    PubMed Central

    Kondaurova, Maria V.; Bergeson, Tonya R.; Xu, Huipuing

    2012-01-01

    This study investigated prosodic and structural characteristics of infant-directed speech to hearing-impaired infants as they gain hearing experience with a cochlear implant over a 12-month period of time. Mothers were recorded during a play interaction with their HI infants (N = 27, mean age 18.4 months) at 3, 6, and 12 months post-implantation. Two separate control groups of mothers with age-matched normal-hearing infants (NH-AM) (N = 21, mean age 18.1 months) and hearing experience-matched normal-hearing infants (NH-EM) (N = 24, mean age 3.1 months) were recorded at three testing sessions. Mothers produced less exaggerated pitch characteristics, a larger number of syllables per utterance, and faster speaking rate when interacting with NH-AM as compared to HI infants. Mothers also produced more syllables and demonstrated a trend suggesting faster speaking rate in speech to NH-EM relative to HI infants. Age-related modifications included decreased pitch standard deviation and increased number of syllables in speech to NH-AM infants and increased number of syllables in speech to HI and NH-EM infants across the 12-month period. These results suggest that mothers are sensitive to the hearing status of their infants and modify characteristics of infant-direct speech over time. PMID:24244108

  9. The Tracking of Speech Envelope in the Human Cortex

    PubMed Central

    Kubanek, Jan; Brunner, Peter; Gunduz, Aysegul; Poeppel, David; Schalk, Gerwin

    2013-01-01

    Humans are highly adept at processing speech. Recently, it has been shown that slow temporal information in speech (i.e., the envelope of speech) is critical for speech comprehension. Furthermore, it has been found that evoked electric potentials in human cortex are correlated with the speech envelope. However, it has been unclear whether this essential linguistic feature is encoded differentially in specific regions, or whether it is represented throughout the auditory system. To answer this question, we recorded neural data with high temporal resolution directly from the cortex while human subjects listened to a spoken story. We found that the gamma activity in human auditory cortex robustly tracks the speech envelope. The effect is so marked that it is observed during a single presentation of the spoken story to each subject. The effect is stronger in regions situated relatively early in the auditory pathway (belt areas) compared to other regions involved in speech processing, including the superior temporal gyrus (STG) and the posterior inferior frontal gyrus (Broca's region). To further distinguish whether speech envelope is encoded in the auditory system as a phonological (speech-related), or instead as a more general acoustic feature, we also probed the auditory system with a melodic stimulus. We found that belt areas track melody envelope weakly, and as the only region considered. Together, our data provide the first direct electrophysiological evidence that the envelope of speech is robustly tracked in non-primary auditory cortex (belt areas in particular), and suggest that the considered higher-order regions (STG and Broca's region) partake in a more abstract linguistic analysis. PMID:23408924

  10. General American Speech and Phonic Symbols.

    ERIC Educational Resources Information Center

    Calvert, Donald R.

    1982-01-01

    General American Symbols, speech and phonic symbols adapted from the Northampton symbols, are presented as a simplified system for teaching reading and speech to deaf children. Ways to use symbols for indicating features of speech production are suggested. (Author)

  11. A phonetic investigation of single word versus connected speech production in children with persisting speech difficulties relating to cleft palate.

    PubMed

    Howard, Sara

    2013-03-01

    Objective : To investigate the phonetic and phonological parameters of speech production associated with cleft palate in single words and in sentence repetition in order to explore the impact of connected speech processes, prosody, and word juncture on word production across contexts. Participants : Two boys (aged 9 years 5 months and 11 years 0 months) with persisting speech impairments related to a history of unilateral cleft lip and palate formed the main focus of the study; three typical adult male speakers provided control data. Method : Audio, video, and electropalatographic recordings were made of the participants producing single words and repeating two sets of sentences. The data were transcribed and the electropalatographic recordings were analyzed to explore lingual-palatal contact patterns across the different speech conditions. Acoustic analysis was used to further inform the perceptual analysis and to make specific durational measurements. Results : The two boys' speech production differed across the speech conditions. Both boys showed typical and atypical phonetic features in their connected speech production. One boy, although often unintelligible, resembled the adult speakers more closely prosodically and in his specific connected speech behaviors at word boundaries. The second boy produced developmentally atypical phonetic adjustments at word boundaries that appeared to promote intelligibility at the expense of naturalness. Conclusion : For older children with persisting speech impairments, it is particularly important to examine specific features of connected speech production, including word juncture and prosody. Sentence repetition data provide useful information to this end, but further investigations encompassing detailed perceptual and instrumental analysis of real conversational data are warranted.

  12. Nonlinear ion-acoustic double-layers in electronegative plasmas with electrons featuring Tsallis distribution

    NASA Astrophysics Data System (ADS)

    Ghebache, Siham; Tribeche, Mouloud

    2016-04-01

    Weakly nonlinear ion-acoustic (IA) double-layers (DLs), which accompany electronegative plasmas composed of positive ions, negative ions, and nonextensive electrons are investigated. A generalized Korteweg-de Vries equation with a cubic nonlinearity is derived using a reductive perturbation method. Different types of electronegative plasmas inspired from the experimental studies of Ichiki et al. (2001) are discussed. It is shown that the IA wave phase velocity, in different mixtures of negative and positive ions, decreases as the nonextensive parameter q increases, before levelling-off at a constant value for larger q. Moreover, a relative increase of Q involves an enhancement of the IA phase velocity. Existence domains of either solitary waves or double-layers are then presented and their parametric dependence is determined. Owing to the electron nonextensivity, our present plasma model can admit compressive as well as rarefactive IA-DLs.

  13. Nonlinear features of ion acoustic shock waves in dissipative magnetized dusty plasma

    SciTech Connect

    Sahu, Biswajit; Sinha, Anjana; Roychoudhury, Rajkumar

    2014-10-15

    The nonlinear propagation of small as well as arbitrary amplitude shocks is investigated in a magnetized dusty plasma consisting of inertia-less Boltzmann distributed electrons, inertial viscous cold ions, and stationary dust grains without dust-charge fluctuations. The effects of dissipation due to viscosity of ions and external magnetic field, on the properties of ion acoustic shock structure, are investigated. It is found that for small amplitude waves, the Korteweg-de Vries-Burgers (KdVB) equation, derived using Reductive Perturbation Method, gives a qualitative behaviour of the transition from oscillatory wave to shock structure. The exact numerical solution for arbitrary amplitude wave differs somehow in the details from the results obtained from KdVB equation. However, the qualitative nature of the two solutions is similar in the sense that a gradual transition from KdV oscillation to shock structure is observed with the increase of the dissipative parameter.

  14. A Screening Approach for Classroom Acoustics Using Web-Based Listening Tests and Subjective Ratings

    PubMed Central

    Persson Waye, Kerstin; Magnusson, Lennart; Fredriksson, Sofie; Croy, Ilona

    2015-01-01

    Background Perception of speech is crucial in school where speech is the main mode of communication. The aim of the study was to evaluate whether a web based approach including listening tests and questionnaires could be used as a screening tool for poor classroom acoustics. The prime focus was the relation between pupils’ comprehension of speech, the classroom acoustics and their description of the acoustic qualities of the classroom. Methodology/Principal Findings In total, 1106 pupils aged 13-19, from 59 classes and 38 schools in Sweden participated in a listening study using Hagerman’s sentences administered via Internet. Four listening conditions were applied: high and low background noise level and positions close and far away from the loudspeaker. The pupils described the acoustic quality of the classroom and teachers provided information on the physical features of the classroom using questionnaires. Conclusions/Significance In 69% of the classes, at least three pupils described the sound environment as adverse and in 88% of the classes one or more pupil reported often having difficulties concentrating due to noise. The pupils’ comprehension of speech was strongly influenced by the background noise level (p<0.001) and distance to the loudspeakers (p<0.001). Of the physical classroom features, presence of suspended acoustic panels (p<0.05) and length of the classroom (p<0.01) predicted speech comprehension. Of the pupils’ descriptions of acoustic qualities, clattery significantly (p<0.05) predicted speech comprehension. Clattery was furthermore associated to difficulties understanding each other, while the description noisy was associated to concentration difficulties. The majority of classrooms do not seem to have an optimal sound environment. The pupil’s descriptions of acoustic qualities and listening tests can be one way of predicting sound conditions in the classroom. PMID:25615692

  15. Speech Analysis Systems: An Evaluation.

    ERIC Educational Resources Information Center

    Read, Charles; And Others

    1992-01-01

    Performance characteristics are reviewed for seven computerized systems marketed for acoustic speech analysis: CSpeech, CSRE, ILS-PC, Kay Elemetrics model 550 Sona-Graph, MacSpeech Lab II, MSL, and Signalyze. Characteristics reviewed include system components, basic capabilities, documentation, user interface, data formats and journaling, and…

  16. Distinct Cortical Pathways for Music and Speech Revealed by Hypothesis-Free Voxel Decomposition.

    PubMed

    Norman-Haignere, Sam; Kanwisher, Nancy G; McDermott, Josh H

    2015-12-16

    The organization of human auditory cortex remains unresolved, due in part to the small stimulus sets common to fMRI studies and the overlap of neural populations within voxels. To address these challenges, we measured fMRI responses to 165 natural sounds and inferred canonical response profiles ("components") whose weighted combinations explained voxel responses throughout auditory cortex. This analysis revealed six components, each with interpretable response characteristics despite being unconstrained by prior functional hypotheses. Four components embodied selectivity for particular acoustic features (frequency, spectrotemporal modulation, pitch). Two others exhibited pronounced selectivity for music and speech, respectively, and were not explainable by standard acoustic features. Anatomically, music and speech selectivity concentrated in distinct regions of non-primary auditory cortex. However, music selectivity was weak in raw voxel responses, and its detection required a decomposition method. Voxel decomposition identifies primary dimensions of response variation across natural sounds, revealing distinct cortical pathways for music and speech. PMID:26687225

  17. Distinct Cortical Pathways for Music and Speech Revealed by Hypothesis-Free Voxel Decomposition.

    PubMed

    Norman-Haignere, Sam; Kanwisher, Nancy G; McDermott, Josh H

    2015-12-16

    The organization of human auditory cortex remains unresolved, due in part to the small stimulus sets common to fMRI studies and the overlap of neural populations within voxels. To address these challenges, we measured fMRI responses to 165 natural sounds and inferred canonical response profiles ("components") whose weighted combinations explained voxel responses throughout auditory cortex. This analysis revealed six components, each with interpretable response characteristics despite being unconstrained by prior functional hypotheses. Four components embodied selectivity for particular acoustic features (frequency, spectrotemporal modulation, pitch). Two others exhibited pronounced selectivity for music and speech, respectively, and were not explainable by standard acoustic features. Anatomically, music and speech selectivity concentrated in distinct regions of non-primary auditory cortex. However, music selectivity was weak in raw voxel responses, and its detection required a decomposition method. Voxel decomposition identifies primary dimensions of response variation across natural sounds, revealing distinct cortical pathways for music and speech.

  18. Automatic speech recognition

    NASA Astrophysics Data System (ADS)

    Espy-Wilson, Carol

    2005-04-01

    Great strides have been made in the development of automatic speech recognition (ASR) technology over the past thirty years. Most of this effort has been centered around the extension and improvement of Hidden Markov Model (HMM) approaches to ASR. Current commercially-available and industry systems based on HMMs can perform well for certain situational tasks that restrict variability such as phone dialing or limited voice commands. However, the holy grail of ASR systems is performance comparable to humans-in other words, the ability to automatically transcribe unrestricted conversational speech spoken by an infinite number of speakers under varying acoustic environments. This goal is far from being reached. Key to the success of ASR is effective modeling of variability in the speech signal. This tutorial will review the basics of ASR and the various ways in which our current knowledge of speech production, speech perception and prosody can be exploited to improve robustness at every level of the system.

  19. Human Superior Temporal Gyrus Organization of Spectrotemporal Modulation Tuning Derived from Speech Stimuli

    PubMed Central

    Hullett, Patrick W.; Hamilton, Liberty S.; Mesgarani, Nima; Schreiner, Christoph E.

    2016-01-01

    The human superior temporal gyrus (STG) is critical for speech perception, yet the organization of spectrotemporal processing of speech within the STG is not well understood. Here, to characterize the spatial organization of spectrotemporal processing of speech across human STG, we use high-density cortical surface field potential recordings while participants listened to natural continuous speech. While synthetic broad-band stimuli did not yield sustained activation of the STG, spectrotemporal receptive fields could be reconstructed from vigorous responses to speech stimuli. We find that the human STG displays a robust anterior–posterior spatial distribution of spectrotemporal tuning in which the posterior STG is tuned for temporally fast varying speech sounds that have relatively constant energy across the frequency axis (low spectral modulation) while the anterior STG is tuned for temporally slow varying speech sounds that have a high degree of spectral variation across the frequency axis (high spectral modulation). This work illustrates organization of spectrotemporal processing in the human STG, and illuminates processing of ethologically relevant speech signals in a region of the brain specialized for speech perception. SIGNIFICANCE STATEMENT Considerable evidence has implicated the human superior temporal gyrus (STG) in speech processing. However, the gross organization of spectrotemporal processing of speech within the STG is not well characterized. Here we use natural speech stimuli and advanced receptive field characterization methods to show that spectrotemporal features within speech are well organized along the posterior-to-anterior axis of the human STG. These findings demonstrate robust functional organization based on spectrotemporal modulation content, and illustrate that much of the encoded information in the STG represents the physical acoustic properties of speech stimuli. PMID:26865624

  20. Automatic detection of wheezes by evaluation of multiple acoustic feature extraction methods and C-weighted SVM

    NASA Astrophysics Data System (ADS)

    Sosa, Germán. D.; Cruz-Roa, Angel; González, Fabio A.

    2015-01-01

    This work addresses the problem of lung sound classification, in particular, the problem of distinguishing between wheeze and normal sounds. Wheezing sound detection is an important step to associate lung sounds with an abnormal state of the respiratory system, usually associated with tuberculosis or another chronic obstructive pulmonary diseases (COPD). The paper presents an approach for automatic lung sound classification, which uses different state-of-the-art sound features in combination with a C-weighted support vector machine (SVM) classifier that works better for unbalanced data. Feature extraction methods used here are commonly applied in speech recognition and related problems thanks to the fact that they capture the most informative spectral content from the original signals. The evaluated methods were: Fourier transform (FT), wavelet decomposition using Wavelet Packet Transform bank of filters (WPT) and Mel Frequency Cepstral Coefficients (MFCC). For comparison, we evaluated and contrasted the proposed approach against previous works using different combination of features and/or classifiers. The different methods were evaluated on a set of lung sounds including normal and wheezing sounds. A leave-two-out per-case cross-validation approach was used, which, in each fold, chooses as validation set a couple of cases, one including normal sounds and the other including wheezing sounds. Experimental results were reported in terms of traditional classification performance measures: sensitivity, specificity and balanced accuracy. Our best results using the suggested approach, C-weighted SVM and MFCC, achieve a 82.1% of balanced accuracy obtaining the best result for this problem until now. These results suggest that supervised classifiers based on kernel methods are able to learn better models for this challenging classification problem even using the same feature extraction methods.

  1. [ACOUSTIC FEATURES OF VOCALIZATIONS, REFLECTING THE DISCOMFORT AND COMFORT STATE OF INFANTS AGED THREE AND SIX MONTHS].

    PubMed

    Pavlikova, M I; Makarov, A K; Lyakso, E E

    2015-08-01

    The paper presented the possibility of recognition by adult the comfort and discomfort state of 3 and 6 months old infant's on the base of their vocalizations. The acoustic features of the vocalizations that are important for the recognition of the infant state of the characteristics of voice was described. It is shown that discomfort vocalizations differ from comfort ones on the basis of the average and maximum values of pitch, pitch values in the central and final part of the vocalization. A mathematical model is proposed and described a classification function signal of discomfort and comfort. Was found that the vocalizations of infants attributable adults with a probability of 0.75 and above the categories of comfort and discomfort with high reliability are recognized by the mathematical model based on a classification function.

  2. A keyword spotting model using perceptually significant energy features

    NASA Astrophysics Data System (ADS)

    Umakanthan, Padmalochini

    The task of a keyword recognition system is to detect the presence of certain words in a conversation based on the linguistic information present in human speech. Such keyword spotting systems have applications in homeland security, telephone surveillance and human-computer interfacing. General procedure of a keyword spotting system involves feature generation and matching. In this work, new set of features that are based on the psycho-acoustic masking nature of human speech are proposed. After developing these features a time aligned pattern matching process was implemented to locate the words in a set of unknown words. A word boundary detection technique based on frame classification using the nonlinear characteristics of speech is also addressed in this work. Validation of this keyword spotting model was done using widely acclaimed Cepstral features. The experimental results indicate the viability of using these perceptually significant features as an augmented feature set in keyword spotting.

  3. System and method for investigating sub-surface features of a rock formation using compressional acoustic sources

    DOEpatents

    Vu, Cung Khac; Skelt, Christopher; Nihei, Kurt; Johnson, Paul A.; Guyer, Robert; Ten Cate, James A.; Le Bas, Pierre-Yves; Larmat, Carene S.

    2016-09-27

    A system and method for investigating rock formations outside a borehole are provided. The method includes generating a first compressional acoustic wave at a first frequency by a first acoustic source; and generating a second compressional acoustic wave at a second frequency by a second acoustic source. The first and the second acoustic sources are arranged within a localized area of the borehole. The first and the second acoustic waves intersect in an intersection volume outside the borehole. The method further includes receiving a third shear acoustic wave at a third frequency, the third shear acoustic wave returning to the borehole due to a non-linear mixing process in a non-linear mixing zone within the intersection volume at a receiver arranged in the borehole. The third frequency is equal to a difference between the first frequency and the second frequency.

  4. Effect of body position on vocal tract acoustics: Acoustic pharyngometry and vowel formants.

    PubMed

    Vorperian, Houri K; Kurtzweil, Sara L; Fourakis, Marios; Kent, Ray D; Tillman, Katelyn K; Austin, Diane

    2015-08-01

    The anatomic basis and articulatory features of speech production are often studied with imaging studies that are typically acquired in the supine body position. It is important to determine if changes in body orientation to the gravitational field alter vocal tract dimensions and speech acoustics. The purpose of this study was to assess the effect of body position (upright versus supine) on (1) oral and pharyngeal measurements derived from acoustic pharyngometry and (2) acoustic measurements of fundamental frequency (F0) and the first four formant frequencies (F1-F4) for the quadrilateral point vowels. Data were obtained for 27 male and female participants, aged 17 to 35 yrs. Acoustic pharyngometry showed a statistically significant effect of body position on volumetric measurements, with smaller values in the supine than upright position, but no changes in length measurements. Acoustic analyses of vowels showed significantly larger values in the supine than upright position for the variables of F0, F3, and the Euclidean distance from the centroid to each corner vowel in the F1-F2-F3 space. Changes in body position affected measurements of vocal tract volume but not length. Body position also affected the aforementioned acoustic variables, but the main vowel formants were preserved.

  5. Effect of body position on vocal tract acoustics: Acoustic pharyngometry and vowel formants

    PubMed Central

    Vorperian, Houri K.; Kurtzweil, Sara L.; Fourakis, Marios; Kent, Ray D.; Tillman, Katelyn K.; Austin, Diane

    2015-01-01

    The anatomic basis and articulatory features of speech production are often studied with imaging studies that are typically acquired in the supine body position. It is important to determine if changes in body orientation to the gravitational field alter vocal tract dimensions and speech acoustics. The purpose of this study was to assess the effect of body position (upright versus supine) on (1) oral and pharyngeal measurements derived from acoustic pharyngometry and (2) acoustic measurements of fundamental frequency (F0) and the first four formant frequencies (F1–F4) for the quadrilateral point vowels. Data were obtained for 27 male and female participants, aged 17 to 35 yrs. Acoustic pharyngometry showed a statistically significant effect of body position on volumetric measurements, with smaller values in the supine than upright position, but no changes in length measurements. Acoustic analyses of vowels showed significantly larger values in the supine than upright position for the variables of F0, F3, and the Euclidean distance from the centroid to each corner vowel in the F1-F2-F3 space. Changes in body position affected measurements of vocal tract volume but not length. Body position also affected the aforementioned acoustic variables, but the main vowel formants were preserved. PMID:26328699

  6. Contributions of speech science to the technology of man-machine voice interactions

    NASA Technical Reports Server (NTRS)

    Lea, Wayne A.

    1977-01-01

    Research in speech understanding was reviewed. Plans which include prosodics research, phonological rules for speech understanding systems, and continued interdisciplinary phonetics research are discussed. Improved acoustic phonetic analysis capabilities in speech recognizers are suggested.

  7. Speaker independent acoustic-to-articulatory inversion

    NASA Astrophysics Data System (ADS)

    Ji, An

    Acoustic-to-articulatory inversion, the determination of articulatory parameters from acoustic signals, is a difficult but important problem for many speech processing applications, such as automatic speech recognition (ASR) and computer aided pronunciation training (CAPT). In recent years, several approaches have been successfully implemented for speaker dependent models with parallel acoustic and kinematic training data. However, in many practical applications inversion is needed for new speakers for whom no articulatory data is available. In order to address this problem, this dissertation introduces a novel speaker adaptation approach called Parallel Reference Speaker Weighting (PRSW), based on parallel acoustic and articulatory Hidden Markov Models (HMM). This approach uses a robust normalized articulatory space and palate referenced articulatory features combined with speaker-weighted adaptation to form an inversion mapping for new speakers that can accurately estimate articulatory trajectories. The proposed PRSW method is evaluated on the newly collected Marquette electromagnetic articulography -- Mandarin Accented English (EMA-MAE) corpus using 20 native English speakers. Cross-speaker inversion results show that given a good selection of reference speakers with consistent acoustic and articulatory patterns, the PRSW approach gives good speaker independent inversion performance even without kinematic training data.

  8. Automatic measurement and representation of prosodic features

    NASA Astrophysics Data System (ADS)

    Ying, Goangshiuan Shawn

    Effective measurement and representation of prosodic features of the acoustic signal for use in automatic speech recognition and understanding systems is the goal of this work. Prosodic features-stress, duration, and intonation-are variations of the acoustic signal whose domains are beyond the boundaries of each individual phonetic segment. Listeners perceive prosodic features through a complex combination of acoustic correlates such as intensity, duration, and fundamental frequency (F0). We have developed new tools to measure F0 and intensity features. We apply a probabilistic global error correction routine to an Average Magnitude Difference Function (AMDF) pitch detector. A new short-term frequency-domain Teager energy algorithm is used to measure the energy of a speech signal. We have conducted a series of experiments performing lexical stress detection on words in continuous English speech from two speech corpora. We have experimented with two different approaches, a segment-based approach and a rhythm unit-based approach, in lexical stress detection. The first approach uses pattern recognition with energy- and duration-based measurements as features to build Bayesian classifiers to detect the stress level of a vowel segment. In the second approach we define rhythm unit and use only the F0-based measurement and a scoring system to determine the stressed segment in the rhythm unit. A duration-based segmentation routine was developed to break polysyllabic words into rhythm units. The long-term goal of this work is to develop a system that can effectively detect the stress pattern for each word in continuous speech utterances. Stress information will be integrated as a constraint for pruning the word hypotheses in a word recognition system based on hidden Markov models.

  9. Lexical and Acoustic Features of Maternal Utterances Addressing Preverbal Infants in Picture Book Reading Link to 5-Year-Old Children's Language Development

    ERIC Educational Resources Information Center

    Liu, Huei-Mei

    2014-01-01

    Research Findings: I examined the long-term association between the lexical and acoustic features of maternal utterances during book reading and the language skills of infants and children. Maternal utterances were collected from 22 mother-child dyads in picture book-reading episodes when children were ages 6-12 months and 5 years. Two aspects of…

  10. Inconsistency of speech in children with childhood apraxia of speech, phonological disorders, and typical speech

    NASA Astrophysics Data System (ADS)

    Iuzzini, Jenya

    There is a lack of agreement on the features used to differentiate Childhood Apraxia of Speech (CAS) from Phonological Disorders (PD). One criterion which has gained consensus is lexical inconsistency of speech (ASHA, 2007); however, no accepted measure of this feature has been defined. Although lexical assessment provides information about consistency of an item across repeated trials, it may not capture the magnitude of inconsistency within an item. In contrast, segmental analysis provides more extensive information about consistency of phoneme usage across multiple contexts and word-positions. The current research compared segmental and lexical inconsistency metrics in preschool-aged children with PD, CAS, and typical development (TD) to determine how inconsistency varies with age in typical and disordered speakers, and whether CAS and PD were differentiated equally well by both assessment levels. Whereas lexical and segmental analyses may be influenced by listener characteristics or speaker intelligibility, the acoustic signal is less vulnerable to these factors. In addition, the acoustic signal may reveal information which is not evident in the perceptual signal. A second focus of the current research was motivated by Blumstein et al.'s (1980) classic study on voice onset time (VOT) in adults with acquired apraxia of speech (AOS) which demonstrated a motor impairment underlying AOS. In the current study, VOT analyses were conducted to determine the relationship between age and group with the voicing distribution for bilabial and alveolar plosives. Findings revealed that 3-year-olds evidenced significantly higher inconsistency than 5-year-olds; segmental inconsistency approached 0% in 5-year-olds with TD, whereas it persisted in children with PD and CAS suggesting that for child in this age-range, inconsistency is a feature of speech disorder rather than typical development (Holm et al., 2007). Likewise, whereas segmental and lexical inconsistency were

  11. Linking Speech Perception and Neurophysiology: Speech Decoding Guided by Cascaded Oscillators Locked to the Input Rhythm

    PubMed Central

    Ghitza, Oded

    2011-01-01

    The premise of this study is that current models of speech perception, which are driven by acoustic features alone, are incomplete, and that the role of decoding time during memory access must be incorporated to account for the patterns of observed recognition phenomena. It is postulated that decoding time is governed by a cascade of neuronal oscillators, which guide template-matching operations at a hierarchy of temporal scales. Cascaded cortical oscillations in the theta, beta, and gamma frequency bands are argued to be crucial for speech intelligibility. Intelligibility is high so long as these oscillations remain phase locked to the auditory input rhythm. A model (Tempo) is presented which is capable of emulating recent psychophysical data on the intelligibility of speech sentences as a function of “packaging” rate (Ghitza and Greenberg, 2009). The data show that intelligibility of speech that is time-compressed by a factor of 3 (i.e., a high syllabic rate) is poor (above 50% word error rate), but is substantially restored when the information stream is re-packaged by the insertion of silent gaps in between successive compressed-signal intervals – a counterintuitive finding, difficult to explain using classical models of speech perception, but emerging naturally from the Tempo architecture. PMID:21743809

  12. Acoustic Longitudinal Field NIF Optic Feature Detection Map Using Time-Reversal & MUSIC

    SciTech Connect

    Lehman, S K

    2006-02-09

    We developed an ultrasonic longitudinal field time-reversal and MUltiple SIgnal Classification (MUSIC) based detection algorithm for identifying and mapping flaws in fused silica NIF optics. The algorithm requires a fully multistatic data set, that is one with multiple, independently operated, spatially diverse transducers, each transmitter of which, in succession, launches a pulse into the optic and the scattered signal measured and recorded at every receiver. We have successfully localized engineered ''defects'' larger than 1 mm in an optic. We confirmed detection and localization of 3 mm and 5 mm features in experimental data, and a 0.5 mm in simulated data with sufficiently high signal-to-noise ratio. We present the theory, experimental results, and simulated results.

  13. Specific Features of Destabilization of the Wave Profile During Reflection of an Intense Acoustic Beam from a Soft Boundary

    NASA Astrophysics Data System (ADS)

    Deryabin, M. S.; Kasyanov, D. A.; Kurin, V. V.; Garasyov, M. A.

    2016-05-01

    We show that a significant energy redistribution occurs in the spectrum of reflected nonlinear waves, when an intense acoustic beam is reflected from an acoustically soft boundary, which manifests itself at short wave distances from a reflecting boundary. This effect leads to the appearance of extrema in the distributions of the amplitude and intensity of the field of the reflected acoustic beam near the reflecting boundary. The results of physical experiments are confirmed by numerical modeling of the process of transformation of nonlinear waves reflected from an acoustically soft boundary. Numerical modeling was performed by means of the Khokhlov—Zabolotskaya—Kuznetsov (KZK) equation.

  14. Deep bottleneck features for spoken language identification.

    PubMed

    Jiang, Bing; Song, Yan; Wei, Si; Liu, Jun-Hua; McLoughlin, Ian Vince; Dai, Li-Rong

    2014-01-01

    A key problem in spoken language identification (LID) is to design effective representations which are specific to language information. For example, in recent years, representations based on both phonotactic and acoustic features have proven their effectiveness for LID. Although advances in machine learning have led to significant improvements, LID performance is still lacking, especially for short duration speech utterances. With the hypothesis that language information is weak and represented only latently in speech, and is largely dependent on the statistical properties of the speech content, existing representations may be insufficient. Furthermore they may be susceptible to the variations caused by different speakers, specific content of the speech segments, and background noise. To address this, we propose using Deep Bottleneck Features (DBF) for spoken LID, motivated by the success of Deep Neural Networks (DNN) in speech recognition. We show that DBFs can form a low-dimensional compact representation of the original inputs with a powerful descriptive and discriminative capability. To evaluate the effectiveness of this, we design two acoustic models, termed DBF-TV and parallel DBF-TV (PDBF-TV), using a DBF based i-vector representation for each speech utterance. Results on NIST language recognition evaluation 2009 (LRE09) show significant improvements over state-of-the-art systems. By fusing the output of phonotactic and acoustic approaches, we achieve an EER of 1.08%, 1.89% and 7.01% for 30 s, 10 s and 3 s test utterances respectively. Furthermore, various DBF configurations have been extensively evaluated, and an optimal system proposed.

  15. Deep Bottleneck Features for Spoken Language Identification

    PubMed Central

    Jiang, Bing; Song, Yan; Wei, Si; Liu, Jun-Hua; McLoughlin, Ian Vince; Dai, Li-Rong

    2014-01-01

    A key problem in spoken language identification (LID) is to design effective representations which are specific to language information. For example, in recent years, representations based on both phonotactic and acoustic features have proven their effectiveness for LID. Although advances in machine learning have led to significant improvements, LID performance is still lacking, especially for short duration speech utterances. With the hypothesis that language information is weak and represented only latently in speech, and is largely dependent on the statistical properties of the speech content, existing representations may be insufficient. Furthermore they may be susceptible to the variations caused by different speakers, specific content of the speech segments, and background noise. To address this, we propose using Deep Bottleneck Features (DBF) for spoken LID, motivated by the success of Deep Neural Networks (DNN) in speech recognition. We show that DBFs can form a low-dimensional compact representation of the original inputs with a powerful descriptive and discriminative capability. To evaluate the effectiveness of this, we design two acoustic models, termed DBF-TV and parallel DBF-TV (PDBF-TV), using a DBF based i-vector representation for each speech utterance. Results on NIST language recognition evaluation 2009 (LRE09) show significant improvements over state-of-the-art systems. By fusing the output of phonotactic and acoustic approaches, we achieve an EER of 1.08%, 1.89% and 7.01% for 30 s, 10 s and 3 s test utterances respectively. Furthermore, various DBF configurations have been extensively evaluated, and an optimal system proposed. PMID:24983963

  16. Deep bottleneck features for spoken language identification.

    PubMed

    Jiang, Bing; Song, Yan; Wei, Si; Liu, Jun-Hua; McLoughlin, Ian Vince; Dai, Li-Rong

    2014-01-01

    A key problem in spoken language identification (LID) is to design effective representations which are specific to language information. For example, in recent years, representations based on both phonotactic and acoustic features have proven their effectiveness for LID. Although advances in machine learning have led to significant improvements, LID performance is still lacking, especially for short duration speech utterances. With the hypothesis that language information is weak and represented only latently in speech, and is largely dependent on the statistical properties of the speech content, existing representations may be insufficient. Furthermore they may be susceptible to the variations caused by different speakers, specific content of the speech segments, and background noise. To address this, we propose using Deep Bottleneck Features (DBF) for spoken LID, motivated by the success of Deep Neural Networks (DNN) in speech recognition. We show that DBFs can form a low-dimensional compact representation of the original inputs with a powerful descriptive and discriminative capability. To evaluate the effectiveness of this, we design two acoustic models, termed DBF-TV and parallel DBF-TV (PDBF-TV), using a DBF based i-vector representation for each speech utterance. Results on NIST language recognition evaluation 2009 (LRE09) show significant improvements over state-of-the-art systems. By fusing the output of phonotactic and acoustic approaches, we achieve an EER of 1.08%, 1.89% and 7.01% for 30 s, 10 s and 3 s test utterances respectively. Furthermore, various DBF configurations have been extensively evaluated, and an optimal system proposed. PMID:24983963

  17. Inherent emotional quality of human speech sounds.

    PubMed

    Myers-Schulz, Blake; Pujara, Maia; Wolf, Richard C; Koenigs, Michael

    2013-01-01

    During much of the past century, it was widely believed that phonemes-the human speech sounds that constitute words-have no inherent semantic meaning, and that the relationship between a combination of phonemes (a word) and its referent is simply arbitrary. Although recent work has challenged this picture by revealing psychological associations between certain phonemes and particular semantic contents, the precise mechanisms underlying these associations have not been fully elucidated. Here we provide novel evidence that certain phonemes have an inherent, non-arbitrary emotional quality. Moreover, we show that the perceived emotional valence of certain phoneme combinations depends on a specific acoustic feature-namely, the dynamic shift within the phonemes' first two frequency components. These data suggest a phoneme-relevant acoustic property influencing the communication of emotion in humans, and provide further evidence against previously held assumptions regarding the structure of human language. This finding has potential applications for a variety of social, educational, clinical, and marketing contexts.

  18. Neuromagnetic evidence for a featural distinction of English consonants: Sensor- and source-space data

    PubMed Central

    Scharinger, Mathias; Merickel, Jennifer; Riley, Joshua; Idsardi, William J.

    2011-01-01

    Speech sounds can be classified on the basis of their underlying articulators or on the basis of the acoustic characteristics resulting from particular articulatory positions. Research in speech perception suggests that distinctive features are based on both articulatory and acoustic information. In recent years, neuroelectric and neuromagnetic investigations provided evidence for the brain’s early sensitivity to distinctive feature and their acoustic consequences, particularly for place of articulation distinctions. Here, we compare English consonants in a Mismatch Field design across two broad and distinct places of articulation - LABIAL and CORONAL - and provide further evidence that early evoked auditory responses are sensitive to these features. We further add to the findings of asymmetric consonant processing, although we do not find support for coronal underspecification. Labial glides (Experiment 1) and fricatives (Experiment 2) elicited larger Mismatch responses than their coronal counterparts. Interestingly, their M100 dipoles differed along the anterior/posterior dimension in the auditory cortex that has previously been found to spatially reflect place of articulation differences. Our results are discussed with respect to acoustic and articulatory bases of featural speech sound classifications and with respect to a model that maps distinctive phonetic features onto long-term representations of speech sounds. PMID:21185073

  19. Fluid Dynamics of Human Phonation and Speech

    NASA Astrophysics Data System (ADS)

    Mittal, Rajat; Erath, Byron D.; Plesniak, Michael W.

    2013-01-01

    This article presents a review of the fluid dynamics, flow-structure interactions, and acoustics associated with human phonation and speech. Our voice is produced through the process of phonation in the larynx, and an improved understanding of the underlying physics of this process is essential to advancing the treatment of voice disorders. Insights into the physics of phonation and speech can also contribute to improved vocal training and the development of new speech compression and synthesis schemes. This article introduces the key biomechanical features of the laryngeal physiology, reviews the basic principles of voice production, and summarizes the progress made over the past half-century in understanding the flow physics of phonation and speech. Laryngeal pathologies, which significantly enhance the complexity of phonatory dynamics, are discussed. After a thorough examination of the state of the art in computational modeling and experimental investigations of phonatory biomechanics, we present a synopsis of the pacing issues in this arena and an outlook for research in this fascinating subject.

  20. The Natural Statistics of Audiovisual Speech

    PubMed Central

    Chandrasekaran, Chandramouli; Trubanova, Andrea; Stillittano, Sébastien; Caplier, Alice; Ghazanfar, Asif A.

    2009-01-01

    Humans, like other animals, are exposed to a continuous stream of signals, which are dynamic, multimodal, extended, and time varying in nature. This complex input space must be transduced and sampled by our sensory systems and transmitted to the brain where it can guide the selection of appropriate actions. To simplify this process, it's been suggested that the brain exploits statistical regularities in the stimulus space. Tests of this idea have largely been confined to unimodal signals and natural scenes. One important class of multisensory signals for which a quantitative input space characterization is unavailable is human speech. We do not understand what signals our brain has to actively piece together from an audiovisual speech stream to arrive at a percept versus what is already embedded in the signal structure of the stream itself. In essence, we do not have a clear understanding of the natural statistics of audiovisual speech. In the present study, we identified the following major statistical features of audiovisual speech. First, we observed robust correlations and close temporal correspondence between the area of the mouth opening and the acoustic envelope. Second, we found the strongest correlation between the area of the mouth opening and vocal tract resonances. Third, we observed that both area of the mouth opening and the voice envelope are temporally modulated in the 2–7 Hz frequency range. Finally, we show that the timing of mouth movements relative to the onset of the voice is consistently between 100 and 300 ms. We interpret these data in the context of recent neural theories of speech which suggest that speech communication is a reciprocally coupled, multisensory event, whereby the outputs of the signaler are matched to the neural processes of the receiver. PMID:19609344

  1. System and method for investigating sub-surface features of a rock formation with acoustic sources generating coded signals

    SciTech Connect

    Vu, Cung Khac; Nihei, Kurt; Johnson, Paul A; Guyer, Robert; Ten Cate, James A; Le Bas, Pierre-Yves; Larmat, Carene S

    2014-12-30

    A system and a method for investigating rock formations includes generating, by a first acoustic source, a first acoustic signal comprising a first plurality of pulses, each pulse including a first modulated signal at a central frequency; and generating, by a second acoustic source, a second acoustic signal comprising a second plurality of pulses. A receiver arranged within the borehole receives a detected signal including a signal being generated by a non-linear mixing process from the first-and-second acoustic signal in a non-linear mixing zone within the intersection volume. The method also includes-processing the received signal to extract the signal generated by the non-linear mixing process over noise or over signals generated by a linear interaction process, or both.

  2. Speech prosody in cerebellar ataxia

    NASA Astrophysics Data System (ADS)

    Casper, Maureen

    The present study sought an acoustic signature for the speech disturbance recognized in cerebellar degeneration. Magnetic resonance imaging was used for a radiological rating of cerebellar involvement in six cerebellar ataxic dysarthric speakers. Acoustic measures of the [pap] syllables in contrastive prosodic conditions and of normal vs. brain-damaged patients were used to further our understanding both of the speech degeneration that accompanies cerebellar pathology and of speech motor control and movement in general. Pair-wise comparisons of the prosodic conditions within the normal group showed statistically significant differences for four prosodic contrasts. For three of the four contrasts analyzed, the normal speakers showed both longer durations and higher formant and fundamental frequency values in the more prominent first condition of the contrast. The acoustic measures of the normal prosodic contrast values were then used as a model to measure the degree of speech deterioration for individual cerebellar subjects. This estimate of speech deterioration as determined by individual differences between cerebellar and normal subjects' acoustic values of the four prosodic contrasts was used in correlation analyses with MRI ratings. Moderate correlations between speech deterioration and cerebellar atrophy were found in the measures of syllable duration and f0. A strong negative correlation was found for F1. Moreover, the normal model presented by these acoustic data allows for a description of the flexibility of task- oriented behavior in normal speech motor control. These data challenge spatio-temporal theory which explains movement as an artifact of time wherein longer durations predict more extreme movements and give further evidence for gestural internal dynamics of movement in which time emerges from articulatory events rather than dictating those events. This model provides a sensitive index of cerebellar pathology with quantitative acoustic

  3. Discriminating between auditory and motor cortical responses to speech and non-speech mouth sounds

    PubMed Central

    Agnew, Z.K.; McGettigan, C.; Scott, S.K.

    2012-01-01

    Several perspectives on speech perception posit a central role for the representation of articulations in speech comprehension, supported by evidence for premotor activation when participants listen to speech. However no experiments have directly tested whether motor responses mirror the profile of selective auditory cortical responses to native speech sounds, or whether motor and auditory areas respond in different ways to sounds. We used fMRI to investigate cortical responses to speech and non-speech mouth (ingressive click) sounds. Speech sounds activated bilateral superior temporal gyri more than other sounds, a profile not seen in motor and premotor cortices. These results suggest that there are qualitative differences in the ways that temporal and motor areas are activated by speech and click sounds: anterior temporal lobe areas are sensitive to the acoustic/phonetic properties while motor responses may show more generalised responses to the acoustic stimuli. PMID:21812557

  4. Common cues to emotion in the dynamic facial expressions of speech and song

    PubMed Central

    Livingstone, Steven R.; Thompson, William F.; Wanderley, Marcelo M.; Palmer, Caroline

    2015-01-01

    Speech and song are universal forms of vocalization that may share aspects of emotional expression. Research has focused on parallels in acoustic features, overlooking facial cues to emotion. In three experiments, we compared moving facial expressions in speech and song. In Experiment 1, vocalists spoke and sang statements each with five emotions. Vocalists exhibited emotion-dependent movements of the eyebrows and lip corners that transcended speech–song differences. Vocalists’ jaw movements were coupled to their acoustic intensity, exhibiting differences across emotion and speech–song. Vocalists’ emotional movements extended beyond vocal sound to include large sustained expressions, suggesting a communicative function. In Experiment 2, viewers judged silent videos of vocalists’ facial expressions prior to, during, and following vocalization. Emotional intentions were identified accurately for movements during and after vocalization, suggesting that these movements support the acoustic message. Experiment 3 compared emotional identification in voice-only, face-only, and face-and-voice recordings. Emotion judgements for voice-only singing were poorly identified, yet were accurate for all other conditions, confirming that facial expressions conveyed emotion more accurately than the voice in song, yet were equivalent in speech. Collectively, these findings highlight broad commonalities in the facial cues to emotion in speech and song, yet highlight differences in perception and acoustic-motor production. PMID:25424388

  5. Factors affecting the quality of sound recording for speech and voice analysis.

    PubMed

    Vogel, Adam P; Morgan, Angela T

    2009-01-01

    The importance and utility of objective evidence-based measurement of the voice is well documented. Therefore, greater consideration needs to be given to the factors that influence the quality of voice and speech recordings. This manuscript aims to bring together the many features that affect acoustically acquired voice and speech. Specifically, the paper considers the practical requirements of individual speech acquisition configurations through examining issues relating to hardware, software and microphone selection, the impact of environmental noise, analogue to digital conversion and file format as well as the acoustic measures resulting from varying levels of signal integrity. The type of recording environment required by a user is often dictated by a variety of clinical and experimental needs, including: the acoustic measures being investigated; portability of equipment; an individual's budget; and the expertise of the user. As the quality of recorded signals is influenced by many factors, awareness of these issues is essential. This paper aims to highlight the importance of these methodological considerations to those previously uninitiated with voice and speech acoustics. With current technology, the highest quality recording would be made using a stand-alone hard disc recorder, an independent mixer to attenuate the incoming signal, and insulated wiring combined with a high quality microphone in an anechoic chamber or sound treated room.

  6. The auditory representation of speech sounds in human motor cortex

    PubMed Central

    Cheung, Connie; Hamilton, Liberty S; Johnson, Keith; Chang, Edward F

    2016-01-01

    In humans, listening to speech evokes neural responses in the motor cortex. This has been controversially interpreted as evidence that speech sounds are processed as articulatory gestures. However, it is unclear what information is actually encoded by such neural activity. We used high-density direct human cortical recordings while participants spoke and listened to speech sounds. Motor cortex neural patterns during listening were substantially different than during articulation of the same sounds. During listening, we observed neural activity in the superior and inferior regions of ventral motor cortex. During speaking, responses were distributed throughout somatotopic representations of speech articulators in motor cortex. The structure of responses in motor cortex during listening was organized along acoustic features similar to auditory cortex, rather than along articulatory features as during speaking. Motor cortex does not contain articulatory representations of perceived actions in speech, but rather, represents auditory vocal information. DOI: http://dx.doi.org/10.7554/eLife.12577.001 PMID:26943778

  7. Why would Musical Training Benefit the Neural Encoding of Speech? The OPERA Hypothesis.

    PubMed

    Patel, Aniruddh D

    2011-01-01

    Mounting evidence suggests that musical training benefits the neural encoding of speech. This paper offers a hypothesis specifying why such benefits occur. The "OPERA" hypothesis proposes that such benefits are driven by adaptive plasticity in speech-processing networks, and that this plasticity occurs when five conditions are met. These are: (1) Overlap: there is anatomical overlap in the brain networks that process an acoustic feature used in both music and speech (e.g., waveform periodicity, amplitude envelope), (2) Precision: music places higher demands on these shared networks than does speech, in terms of the precision of processing, (3) Emotion: the musical activities that engage this network elicit strong positive emotion, (4) Repetition: the musical activities that engage this network are frequently repeated, and (5) Attention: the musical activities that engage this network are associated with focused attention. According to the OPERA hypothesis, when these conditions are met neural plasticity drives the networks in question to function with higher precision than needed for ordinary speech communication. Yet since speech shares these networks with music, speech processing benefits. The OPERA hypothesis is used to account for the observed superior subcortical encoding of speech in musically trained individuals, and to suggest mechanisms by which musical training might improve linguistic reading abilities.

  8. Multiple levels of linguistic and paralinguistic features contribute to voice recognition

    PubMed Central

    Mary Zarate, Jean; Tian, Xing; Woods, Kevin J. P.; Poeppel, David

    2015-01-01

    Voice or speaker recognition is critical in a wide variety of social contexts. In this study, we investigated the contributions of acoustic, phonological, lexical, and semantic information toward voice recognition. Native English speaking participants were trained to recognize five speakers in five conditions: non-speech, Mandarin, German, pseudo-English, and English. We showed that voice recognition significantly improved as more information became available, from purely acoustic features in non-speech to additional phonological information varying in familiarity. Moreover, we found that the recognition performance is transferable between training and testing in phonologically familiar conditions (German, pseudo-English, and English), but not in unfamiliar (Mandarin) or non-speech conditions. These results provide evidence suggesting that bottom-up acoustic analysis and top-down influence from phonological processing collaboratively govern voice recognition. PMID:26088739

  9. Ion acoustic solitary waves and double layers in a plasma with two temperature electrons featuring Tsallis distribution

    SciTech Connect

    Shalini, Saini, N. S.

    2014-10-15

    The propagation properties of large amplitude ion acoustic solitary waves (IASWs) are studied in a plasma containing cold fluid ions and multi-temperature electrons (cool and hot electrons) with nonextensive distribution. Employing Sagdeev pseudopotential method, an energy balance equation has been derived and from the expression for Sagdeev potential function, ion acoustic solitary waves and double layers are investigated numerically. The Mach number (lower and upper limits) for the existence of solitary structures is determined. Positive as well as negative polarity solitary structures are observed. Further, conditions for the existence of ion acoustic double layers (IADLs) are also determined numerically in the form of the critical values of q{sub c}, f and the Mach number (M). It is observed that the nonextensivity of electrons (via q{sub c,h}), concentration of electrons (via f) and temperature ratio of cold to hot electrons (via β) significantly influence the characteristics of ion acoustic solitary waves as well as double layers.

  10. Speech Synthesis

    NASA Astrophysics Data System (ADS)

    Dutoit, Thierry; Bozkurt, Baris

    Text-to-speech (TTS) synthesis is the art of designing talking machines. It is often seen by engineers as an easy task, compared to speech recognition.1 It is true, indeed, that it is easier to create a bad, first trial text-to-speech (TTS) system than to design a rudimentary speech recognizer.

  11. System and method for investigating sub-surface features of a rock formation with acoustic sources generating conical broadcast signals

    SciTech Connect

    Vu, Cung Khac; Skelt, Christopher; Nihei, Kurt; Johnson, Paul A.; Guyer, Robert; Ten Cate, James A.; Le Bas, Pierre -Yves; Larmat, Carene S.

    2015-08-18

    A method of interrogating a formation includes generating a conical acoustic signal, at a first frequency--a second conical acoustic signal at a second frequency each in the between approximately 500 Hz and 500 kHz such that the signals intersect in a desired intersection volume outside the borehole. The method further includes receiving, a difference signal returning to the borehole resulting from a non-linear mixing of the signals in a mixing zone within the intersection volume.

  12. Perception of Words and Pitch Patterns in Song and Speech

    PubMed Central

    Merrill, Julia; Sammler, Daniela; Bangert, Marc; Goldhahn, Dirk; Lohmann, Gabriele; Turner, Robert; Friederici, Angela D.

    2012-01-01

    This functional magnetic resonance imaging study examines shared and distinct cortical areas involved in the auditory perception of song and speech at the level of their underlying constituents: words and pitch patterns. Univariate and multivariate analyses were performed to isolate the neural correlates of the word- and pitch-based discrimination between song and speech, corrected for rhythmic differences in both. Therefore, six conditions, arranged in a subtractive hierarchy were created: sung sentences including words, pitch and rhythm; hummed speech prosody and song melody containing only pitch patterns and rhythm; and as a control the pure musical or speech rhythm. Systematic contrasts between these balanced conditions following their hierarchical organization showed a great overlap between song and speech at all levels in the bilateral temporal lobe, but suggested a differential role of the inferior frontal gyrus (IFG) and intraparietal sulcus (IPS) in processing song and speech. While the left IFG coded for spoken words and showed predominance over the right IFG in prosodic pitch processing, an opposite lateralization was found for pitch in song. The IPS showed sensitivity to discrete pitch relations in song as opposed to the gliding pitch in speech. Finally, the superior temporal gyrus and premotor cortex coded for general differences between words and pitch patterns, irrespective of whether they were sung or spoken. Thus, song and speech share many features which are reflected in a fundamental similarity of brain areas involved in their perception. However, fine-grained acoustic differences on word and pitch level are reflected in the IPS and the lateralized activity of the IFG. PMID:22457659

  13. Time-forward speech intelligibility in time-reversed rooms

    PubMed Central

    Longworth-Reed, Laricia; Brandewie, Eugene; Zahorik, Pavel

    2009-01-01

    The effects of time-reversed room acoustics on word recognition abilities were examined using virtual auditory space techniques, which allowed for temporal manipulation of the room acoustics independent of the speech source signals. Two acoustical conditions were tested: one in which room acoustics were simulated in a realistic time-forward fashion and one in which the room acoustics were reversed in time, causing reverberation and acoustic reflections to precede the direct-path energy. Significant decreases in speech intelligibility—from 89% on average to less than 25%—were observed between the time-forward and time-reversed rooms. This result is not predictable using standard methods for estimating speech intelligibility based on the modulation transfer function of the room. It may instead be due to increased degradation of onset information in the speech signals when room acoustics are time-reversed. PMID:19173377

  14. Irregular Speech Rate Dissociates Auditory Cortical Entrainment, Evoked Responses, and Frontal Alpha

    PubMed Central

    Kayser, Stephanie J.; Ince, Robin A.A.; Gross, Joachim

    2015-01-01

    The entrainment of slow rhythmic auditory cortical activity to the temporal regularities in speech is considered to be a central mechanism underlying auditory perception. Previous work has shown that entrainment is reduced when the quality of the acoustic input is degraded, but has also linked rhythmic activity at similar time scales to the encoding of temporal expectations. To understand these bottom-up and top-down contributions to rhythmic entrainment, we manipulated the temporal predictive structure of speech by parametrically altering the distribution of pauses between syllables or words, thereby rendering the local speech rate irregular while preserving intelligibility and the envelope fluctuations of the acoustic signal. Recording EEG activity in human participants, we found that this manipulation did not alter neural processes reflecting the encoding of individual sound transients, such as evoked potentials. However, the manipulation significantly reduced the fidelity of auditory delta (but not theta) band entrainment to the speech envelope. It also reduced left frontal alpha power and this alpha reduction was predictive of the reduced delta entrainment across participants. Our results show that rhythmic auditory entrainment in delta and theta bands reflect functionally distinct processes. Furthermore, they reveal that delta entrainment is under top-down control and likely reflects prefrontal processes that are sensitive to acoustical regularities rather than the bottom-up encoding of acoustic features. SIGNIFICANCE STATEMENT The entrainment of rhythmic auditory cortical activity to the speech envelope is considered to be critical for hearing. Previous work has proposed divergent views in which entrainment reflects either early evoked responses related to sound encoding or high-level processes related to expectation or cognitive selection. Using a manipulation of speech rate, we dissociated auditory entrainment at different time scales. Specifically, our

  15. Common neural substrates support speech and non-speech vocal tract gestures.

    PubMed

    Chang, Soo-Eun; Kenney, Mary Kay; Loucks, Torrey M J; Poletto, Christopher J; Ludlow, Christy L

    2009-08-01

    The issue of whether speech is supported by the same neural substrates as non-speech vocal tract gestures has been contentious. In this fMRI study we tested whether producing non-speech vocal tract gestures in humans shares the same functional neuroanatomy as non-sense speech syllables. Production of non-speech vocal tract gestures, devoid of phonological content but similar to speech in that they had familiar acoustic and somatosensory targets, was compared to the production of speech syllables without meaning. Brain activation related to overt production was captured with BOLD fMRI using a sparse sampling design for both conditions. Speech and non-speech were compared using voxel-wise whole brain analyses, and ROI analyses focused on frontal and temporoparietal structures previously reported to support speech production. Results showed substantial activation overlap between speech and non-speech function in regions. Although non-speech gesture production showed greater extent and amplitude of activation in the regions examined, both speech and non-speech showed comparable left laterality in activation for both target perception and production. These findings posit a more general role of the previously proposed "auditory dorsal stream" in the left hemisphere--to support the production of vocal tract gestures that are not limited to speech processing.

  16. Externality, internality, and (in)dispensability of grammatical features in speech production: evidence from Czech declension and conjugation.

    PubMed

    Bordag, Denisa; Pechmann, Thomas

    2009-03-01

    In 3 picture-word experiments, the authors explored the activation of 2 grammatical features in Czech during lexical access: declensional class of nouns and conjugational class of verbs. Experiments 1 and 2 demonstrated congruency effects of declensional and conjugational class, respectively. Picture naming times were reliably longer if the declensional or conjugational classes of the pictures' names and the distractors were incongruent. Experiment 3 explored the origin of the congruency effect in more detail. Congruency effects were obtained for declensional class regardless of whether the target name and the distractor differed in form, speaking for competition at the lemma level. These findings are discussed in comparison with gender congruency effects. The authors propose a differentiation between externally and internally specified features of lemmas, especially with respect to the time course of their activation. Internal features that become available only when the lemma is activated (e.g., gender, declensional or conjugational class of nouns and verbs) can be bypassed or not, depending on the grammatical specification of the earlier available external features (like case or number). Following this argument, supposedly inconsistent findings regarding grammatical gender and declensional or conjugational class can be explained straightforwardly.

  17. Inherited and de novo 22q11.2 distal duplications in two patients with autistic features, speech delay and no dysmorphology

    PubMed Central

    Hantash, Feras M.; Wang, Boris T.; Owen, Renius; Ross, Leslie P.; Mahon, Loretta W.; Boyar, Fatih Z.; Anguiano, Arturo; Strom, Charles M.

    2012-01-01

    In a screen of patients by fluorescence in-situ hybridization and array comparative genomic hybridization in the past two years (July 2007--July 2009), we identified two patients with duplications in the 22q11.22-23, occurring outside the common DiGeorge syndrome/valocardiofacial syndrome region. Fluorescent in-situ hybridization, multiplex ligation-dependent probe amplification and high density bacterial artificial chromosomes and oligo arrays were used to identify the extent of the duplications. In one patient the duplication extended from LCR22-E/5 to LCR22-H/8, which is similar to recently described 22q11.2 distal duplications, while in the second patient, a de novo duplication was identified extending between LCR22-E/5 to LCR22-F/6. The second proband also harbored a de novo 15q14 duplication, complicating phenotype interpretation. The patients were affected with speech delay and autistic features, but neither reported cardiac concern or dysmorphic features.

  18. Inherited and de novo 22q11.2 distal duplications in two patients with autistic features, speech delay and no dysmorphology.

    PubMed

    Hantash, Feras M; Wang, Boris T; Owen, Renius; Ross, Leslie P; Mahon, Loretta W; Boyar, Fatih Z; Anguiano, Arturo; Strom, Charles M

    2012-06-01

    In a screen of patients by fluorescence in-situ hybridization and array comparative genomic hybridization in the past two years (July 2007--July 2009), we identified two patients with duplications in the 22q11.22-23, occurring outside the common DiGeorge syndrome/valocardiofacial syndrome region. Fluorescent in-situ hybridization, multiplex ligation-dependent probe amplification and high density bacterial artificial chromosomes and oligo arrays were used to identify the extent of the duplications. In one patient the duplication extended from LCR22-E/5 to LCR22-H/8, which is similar to recently described 22q11.2 distal duplications, while in the second patient, a de novo duplication was identified extending between LCR22-E/5 to LCR22-F/6. The second proband also harbored a de novo 15q14 duplication, complicating phenotype interpretation. The patients were affected with speech delay and autistic features, but neither reported cardiac concern or dysmorphic features. PMID:27625811

  19. Interactions between distal speech rate, linguistic knowledge, and speech environment.

    PubMed

    Morrill, Tuuli; Baese-Berk, Melissa; Heffner, Christopher; Dilley, Laura

    2015-10-01

    During lexical access, listeners use both signal-based and knowledge-based cues, and information from the linguistic context can affect the perception of acoustic speech information. Recent findings suggest that the various cues used in lexical access are implemented with flexibility and may be affected by information from the larger speech context. We conducted 2 experiments to examine effects of a signal-based cue (distal speech rate) and a knowledge-based cue (linguistic structure) on lexical perception. In Experiment 1, we manipulated distal speech rate in utterances where an acoustically ambiguous critical word was either obligatory for the utterance to be syntactically well formed (e.g., Conner knew that bread and butter (are) both in the pantry) or optional (e.g., Don must see the harbor (or) boats). In Experiment 2, we examined identical target utterances as in Experiment 1 but changed the distribution of linguistic structures in the fillers. The results of the 2 experiments demonstrate that speech rate and linguistic knowledge about critical word obligatoriness can both influence speech perception. In addition, it is possible to alter the strength of a signal-based cue by changing information in the speech environment. These results provide support for models of word segmentation that include flexible weighting of signal-based and knowledge-based cues.

  20. Dynamic encoding of speech sequence probability in human temporal cortex.

    PubMed

    Leonard, Matthew K; Bouchard, Kristofer E; Tang, Claire; Chang, Edward F

    2015-05-01

    Sensory processing involves identification of stimulus features, but also integration with the surrounding sensory and cognitive context. Previous work in animals and humans has shown fine-scale sensitivity to context in the form of learned knowledge about the statistics of the sensory environment, including relative probabilities of discrete units in a stream of sequential auditory input. These statistics are a defining characteristic of one of the most important sequential signals humans encounter: speech. For speech, extensive exposure to a language tunes listeners to the statistics of sound sequences. To address how speech sequence statistics are neurally encoded, we used high-resolution direct cortical recordings from human lateral superior temporal cortex as subjects listened to words and nonwords with varying transition probabilities between sound segments. In addition to their sensitivity to acoustic features (including contextual features, such as coarticulation), we found that neural responses dynamically encoded the language-level probability of both preceding and upcoming speech sounds. Transition probability first negatively modulated neural responses, followed by positive modulation of neural responses, consistent with coordinated predictive and retrospective recognition processes, respectively. Furthermore, transition probability encoding was different for real English words compared with nonwords, providing evidence for online interactions with high-order linguistic knowledge. These results demonstrate that sensory processing of deeply learned stimuli involves integrating physical stimulus features with their contextual sequential structure. Despite not being consciously aware of phoneme sequence statistics, listeners use this information to process spoken input and to link low-level acoustic representations with linguistic information about word identity and meaning.

  1. Speech-on-speech masking with variable access to the linguistic content of the masker speech.

    PubMed

    Calandruccio, Lauren; Dhar, Sumitrajit; Bradlow, Ann R

    2010-08-01

    It has been reported that listeners can benefit from a release in masking when the masker speech is spoken in a language that differs from the target speech compared to when the target and masker speech are spoken in the same language [Freyman, R. L. et al. (1999). J. Acoust. Soc. Am. 106, 3578-3588; Van Engen, K., and Bradlow, A. (2007), J. Acoust. Soc. Am. 121, 519-526]. It is unclear whether listeners benefit from this release in masking due to the lack of linguistic interference of the masker speech, from acoustic and phonetic differences between the target and masker languages, or a combination of these differences. In the following series of experiments, listeners' sentence recognition was evaluated using speech and noise maskers that varied in the amount of linguistic content, including native-English, Mandarin-accented English, and Mandarin speech. Results from three experiments indicated that the majority of differences observed between the linguistic maskers could be explained by spectral differences between the masker conditions. However, when the recognition task increased in difficulty, i.e., at a more challenging signal-to-noise ratio, a greater decrease in performance was observed for the maskers with more linguistically relevant information than what could be explained by spectral differences alone. PMID:20707455

  2. Robust cortical encoding of slow temporal modulations of speech.

    PubMed

    Ding, Nai; Simon, Jonathan Z

    2013-01-01

    This study investigates the neural representation of speech in complex listening environments. Subjects listened to a narrated story, masked by either another speech stream or by stationary noise. Neural recordings were made using magnetoencephalography (MEG), which can measure cortical activity synchronized to the temporal envelope of speech. When two speech streams are presented simultaneously, cortical activity is predominantly synchronized to the speech stream the listener attends to, even if the unattended, competing-speech stream is more intense (up to 8 dB). When speech is presented together with spectrally matched stationary noise, cortical activity remains precisely synchronized to the temporal envelope of speech until the noise is 9 dB more intense. Critically, the precision of the neural synchronization to speech predicts subjectively rated speech intelligibility in noise. Further analysis reveals that it is longer-latency (∼100 ms) neural responses, but not shorter-latency (∼50 ms) neural responses, that show selectivity to the attended speech and invariance to background noise. This indicates a processing transition, from encoding the acoustic scene to encoding the behaviorally important auditory object, in auditory cortex. In sum, it is demonstrated that neural synchronization to the speech envelope is robust to acoustic interference, whether speech or noise, and therefore provides a strong candidate for the neural basis of acoustic-background invariant speech recognition. PMID:23716243

  3. Acoustic and Perceptual Correlates of Stress in Nonwords Produced by Children with Suspected Developmental Apraxia of Speech and Children with Phonological Disorder.

    ERIC Educational Resources Information Center

    Munson, Benjamin; Bjorum, Elissa M.; Windsor, Jennifer

    2003-01-01

    This study examined whether accuracy in producing linguistic stress reliably distinguished between five children with suspected developmental apraxia of speech (sDAS) and five children with phonological disorder (PD). No group differences in the production of stress were found; however, listeners judged that nonword repetitions of the children…

  4. REGARDING THE LINE-OF-SIGHT BARYONIC ACOUSTIC FEATURE IN THE SLOAN DIGITAL SKY SURVEY AND BARYON OSCILLATION SPECTROSCOPIC SURVEY LUMINOUS RED GALAXY SAMPLES

    SciTech Connect

    Kazin, Eyal A.; Blanton, Michael R.; Scoccimarro, Roman; McBride, Cameron K.; Berlind, Andreas A.

    2010-08-20

    We analyze the line-of-sight baryonic acoustic feature in the two-point correlation function {xi} of the Sloan Digital Sky Survey luminous red galaxy (LRG) sample (0.16 < z < 0.47). By defining a narrow line-of-sight region, r{sub p} < 5.5 h {sup -1} Mpc, where r{sub p} is the transverse separation component, we measure a strong excess of clustering at {approx}110 h {sup -1} Mpc, as previously reported in the literature. We also test these results in an alternative coordinate system, by defining the line of sight as {theta} < 3{sup 0}, where {theta} is the opening angle. This clustering excess appears much stronger than the feature in the better-measured monopole. A fiducial {Lambda}CDM nonlinear model in redshift space predicts a much weaker signature. We use realistic mock catalogs to model the expected signal and noise. We find that the line-of-sight measurements can be explained well by our mocks as well as by a featureless {xi} = 0. We conclude that there is no convincing evidence that the strong clustering measurement is the line-of-sight baryonic acoustic feature. We also evaluate how detectable such a signal would be in the upcoming Baryon Oscillation Spectroscopic Survey (BOSS) LRG volume. Mock LRG catalogs (z < 0.6) suggest that (1) the narrow line-of-sight cylinder and cone defined above probably will not reveal a detectable acoustic feature in BOSS; (2) a clustering measurement as high as that in the current sample can be ruled out (or confirmed) at a high confidence level using a BOSS-sized data set; (3) an analysis with wider angular cuts, which provide better signal-to-noise ratios, can nevertheless be used to compare line-of-sight and transverse distances, and thereby constrain the expansion rate H(z) and diameter distance D{sub A}(z).

  5. Speech for the Deaf Child: Knowledge and Use.

    ERIC Educational Resources Information Center

    Connor, Leo E., Ed.

    Presented is a collection of 16 papers on speech development, handicaps, teaching methods, and educational trends for the aurally handicapped child. Arthur Boothroyd relates acoustic phonetics to speech teaching, and Jean Utley Lehman investigates a scheme of linguistic organization. Differences in speech production by deaf and normal hearing…

  6. Breathing-Impaired Speech after Brain Haemorrhage: A Case Study

    ERIC Educational Resources Information Center

    Heselwood, Barry

    2007-01-01

    Results are presented from an auditory and acoustic analysis of the speech of an adult male with impaired prosody and articulation due to brain haemorrhage. They show marked effects on phonation, speech rate and articulator velocity, and a speech rhythm disrupted by "intrusive" stresses. These effects are discussed in relation to the speaker's…

  7. Brainstem transcription of speech is disrupted in children with autism spectrum disorders.

    PubMed

    Russo, Nicole; Nicol, Trent; Trommer, Barbara; Zecker, Steve; Kraus, Nina

    2009-07-01

    Language impairment is a hallmark of autism spectrum disorders (ASD). The origin of the deficit is poorly understood although deficiencies in auditory processing have been detected in both perception and cortical encoding of speech sounds. Little is known about the processing and transcription of speech sounds at earlier (brainstem) levels or about how background noise may impact this transcription process. Unlike cortical encoding of sounds, brainstem representation preserves stimulus features with a degree of fidelity that enables a direct link between acoustic components of the speech syllable (e.g. onsets) to specific aspects of neural encoding (e.g. waves V and A). We measured brainstem responses to the syllable /da/, in quiet and background noise, in children with and without ASD. Children with ASD exhibited deficits in both the neural synchrony (timing) and phase locking (frequency encoding) of speech sounds, despite normal click-evoked brainstem responses. They also exhibited reduced magnitude and fidelity of speech-evoked responses and inordinate degradation of responses by background noise in comparison to typically developing controls. Neural synchrony in noise was significantly related to measures of core and receptive language ability. These data support the idea that abnormalities in the brainstem processing of speech contribute to the language impairment in ASD. Because it is both passively elicited and malleable, the speech-evoked brainstem response may serve as a clinical tool to assess auditory processing as well as the effects of auditory training in the ASD population.

  8. Brainstem transcription of speech is disrupted in children with autism spectrum disorders

    PubMed Central

    Russo, Nicole; Nicol, Trent; Trommer, Barbara; Zecker, Steve; Kraus, Nina

    2009-01-01

    Language impairment is a hallmark of autism spectrum disorders (ASD). The origin of the deficit is poorly understood although deficiencies in auditory processing have been detected in both perception and cortical encoding of speech sounds. Little is known about the processing and transcription of speech sounds at earlier (brainstem) levels or about how background noise may impact this transcription process. Unlike cortical encoding of sounds, brainstem representation preserves stimulus features with a degree of fidelity that enables a direct link between acoustic components of the speech syllable (e.g., onsets) to specific aspects of neural encoding (e.g., waves V and A). We measured brainstem responses to the syllable /da/, in quiet and background noise, in children with and without ASD. Children with ASD exhibited deficits in both the neural synchrony (timing) and phase locking (frequency encoding) of speech sounds, despite normal click-evoked brainstem responses. They also exhibited reduced magnitude and fidelity of speech-evoked responses and inordinate degradation of responses by background noise in comparison to typically developing controls. Neural synchrony in noise was significantly related to measures of core and receptive language ability. These data support the idea that abnormalities in the brainstem processing of speech contribute to the language impairment in ASD. Because it is both passively-elicited and malleable, the speech-evoked brainstem response may serve as a clinical tool to assess auditory processing as well as the effects of auditory training in the ASD population. PMID:19635083

  9. Detection and Classification of Whale Acoustic Signals

    NASA Astrophysics Data System (ADS)

    Xian, Yin

    vocalization data set. The word error rate of the DCTNet feature is similar to the MFSC in speech recognition tasks, suggesting that the convolutional network is able to reveal acoustic content of speech signals.

  10. Proposal for Classifying the Severity of Speech Disorder Using a Fuzzy Model in Accordance with the Implicational Model of Feature Complexity

    ERIC Educational Resources Information Center

    Brancalioni, Ana Rita; Magnago, Karine Faverzani; Keske-Soares, Marcia

    2012-01-01

    The objective of this study is to create a new proposal for classifying the severity of speech disorders using a fuzzy model in accordance with a linguistic model that represents the speech acquisition of Brazilian Portuguese. The fuzzy linguistic model was run in the MATLAB software fuzzy toolbox from a set of fuzzy rules, and it encompassed…

  11. Speech Development

    MedlinePlus

    ... W View More… Donate Donor Spotlight Fundraising Ideas Vehicle Donation Volunteer Efforts Speech Development skip to submenu ... Lip and Palate . Bzoch (1997). Cleft Palate Speech Management: A Multidisciplinary Approach . Shprintzen, Bardach (1995). Cleft Palate: ...

  12. Speech Problems

    MedlinePlus

    ... a person's ability to speak clearly. Some Common Speech Disorders Stuttering is a problem that interferes with fluent ... is a language disorder, while stuttering is a speech disorder. A person who stutters has trouble getting out ...

  13. Classroom Acoustics: Understanding Barriers to Learning.

    ERIC Educational Resources Information Center

    Crandell, Carl C., Ed.; Smaldino, Joseph J., Ed.

    2001-01-01

    This booklet explores classroom acoustics and their importance on the learning potential of children with hearing loss and related disabilities. The booklet also reviews research on classroom acoustics and the need for the development of classroom acoustics standards. Chapters examine: 1) a speech-perception model demonstrating the linkage between…

  14. Musical melody and speech intonation: singing a different tune.

    PubMed

    Zatorre, Robert J; Baum, Shari R

    2012-01-01

    Music and speech are often cited as characteristically human forms of communication. Both share the features of hierarchical structure, complex sound systems, and sensorimotor sequencing demands, and both are used to convey and influence emotions, among other functions [1]. Both music and speech also prominently use acoustical frequency modulations, perceived as variations in pitch, as part of their communicative repertoire. Given these similarities, and the fact that pitch perception and production involve the same peripheral transduction system (cochlea) and the same production mechanism (vocal tract), it might be natural to assume that pitch processing in speech and music would also depend on the same underlying cognitive and neural mechanisms. In this essay we argue that the processing of pitch information differs significantly for speech and music; specifically, we suggest that there are two pitch-related processing systems, one for more coarse-grained, approximate analysis and one for more fine-grained accurate representation, and that the latter is unique to music. More broadly, this dissociation offers clues about the interface between sensory and motor systems, and highlights the idea that multiple processing streams are a ubiquitous feature of neuro-cognitive architectures.

  15. Environment-dependent denoising autoencoder for distant-talking speech recognition

    NASA Astrophysics Data System (ADS)

    Ueda, Yuma; Wang, Longbiao; Kai, Atsuhiko; Ren, Bo

    2015-12-01

    In this paper, we propose an environment-dependent denoising autoencoder (DAE) and automatic environment identification based on a deep neural network (DNN) with blind reverberation estimation for robust distant-talking speech recognition. Recently, DAEs have been shown to be effective in many noise reduction and reverberation suppression applications because higher-level representations and increased flexibility of the feature mapping function can be learned. However, a DAE is not adequate in mismatched training and test environments. In a conventional DAE, parameters are trained using pairs of reverberant speech and clean speech under various acoustic conditions (that is, an environment-independent DAE). To address the above problem, we propose two environment-dependent DAEs to reduce the influence of mismatches between training and test environments. In the first approach, we train various DAEs using speech from different acoustic environments, and the DAE for the condition that best matches the test condition is automatically selected (that is, a two-step environment-dependent DAE). To improve environment identification performance, we propose a DNN that uses both reverberant speech and estimated reverberation. In the second approach, we add estimated reverberation features to the input of the DAE (that is, a one-step environment-dependent DAE or a reverberation-aware DAE). The proposed method is evaluated using speech in simulated and real reverberant environments. Experimental results show that the environment-dependent DAE outperforms the environment-independent one in both simulated and real reverberant environments. For two-step environment-dependent DAE, the performance of environment identification based on the proposed DNN approach is also better than that of the conventional DNN approach, in which only reverberant speech is used and reverberation is not blindly estimated. And, the one-step environment-dependent DAE significantly outperforms the two

  16. Speech perception at the interface of neurobiology and linguistics.

    PubMed

    Poeppel, David; Idsardi, William J; van Wassenhove, Virginie

    2008-03-12

    Speech perception consists of a set of computations that take continuously varying acoustic waveforms as input and generate discrete representations that make contact with the lexical representations stored in long-term memory as output. Because the perceptual objects that are recognized by the speech perception enter into subsequent linguistic computation, the format that is used for lexical representation and processing fundamentally constrains the speech perceptual processes. Consequently, theories of speech perception must, at some level, be tightly linked to theories of lexical representation. Minimally, speech perception must yield representations that smoothly and rapidly interface with stored lexical items. Adopting the perspective of Marr, we argue and provide neurobiological and psychophysical evidence for the following research programme. First, at the implementational level, speech perception is a multi-time resolution process, with perceptual analyses occurring concurrently on at least two time scales (approx. 20-80 ms, approx. 150-300 ms), commensurate with (sub)segmental and syllabic analyses, respectively. Second, at the algorithmic level, we suggest that perception proceeds on the basis of internal forward models, or uses an 'analysis-by-synthesis' approach. Third, at the computational level (in the sense of Marr), the theory of lexical representation that we adopt is principally informed by phonological research and assumes that words are represented in the mental lexicon in terms of sequences of discrete segments composed of distinctive features. One important goal of the research programme is to develop linking hypotheses between putative neurobiological primitives (e.g. temporal primitives) and those primitives derived from linguistic inquiry, to arrive ultimately at a biologically sensible and theoretically satisfying model of representation and computation in speech.

  17. Automatic audiovisual integration in speech perception.

    PubMed

    Gentilucci, Maurizio; Cattaneo, Luigi

    2005-11-01

    Two experiments aimed to determine whether features of both the visual and acoustical inputs are always merged into the perceived representation of speech and whether this audiovisual integration is based on either cross-modal binding functions or on imitation. In a McGurk paradigm, observers were required to repeat aloud a string of phonemes uttered by an actor (acoustical presentation of phonemic string) whose mouth, in contrast, mimicked pronunciation of a different string (visual presentation). In a control experiment participants read the same printed strings of letters. This condition aimed to analyze the pattern of voice and the lip kinematics controlling for imitation. In the control experiment and in the congruent audiovisual presentation, i.e. when the articulation mouth gestures were congruent with the emission of the string of phones, the voice spectrum and the lip kinematics varied according to the pronounced strings of phonemes. In the McGurk paradigm the participants were unaware of the incongruence between visual and acoustical stimuli. The acoustical analysis of the participants' spoken responses showed three distinct patterns: the fusion of the two stimuli (the McGurk effect), repetition of the acoustically presented string of phonemes, and, less frequently, of the string of phonemes corresponding to the mouth gestures mimicked by the actor. However, the analysis of the latter two responses showed that the formant 2 of the participants' voice spectra always differed from the value recorded in the congruent audiovisual presentation. It approached the value of the formant 2 of the string of phonemes presented in the other modality, which was apparently ignored. The lip kinematics of the participants repeating the string of phonemes acoustically presented were influenced by the observation of the lip movements mimicked by the actor, but only when pronouncing a labial consonant. The data are discussed in favor of the hypothesis that features of both

  18. Perception of Speech Reflects Optimal Use of Probabilistic Speech Cues

    ERIC Educational Resources Information Center

    Clayards, Meghan; Tanenhaus, Michael K.; Aslin, Richard N.; Jacobs, Robert A.

    2008-01-01

    Listeners are exquisitely sensitive to fine-grained acoustic detail within phonetic categories for sounds and words. Here we show that this sensitivity is optimal given the probabilistic nature of speech cues. We manipulated the probability distribution of one probabilistic cue, voice onset time (VOT), which differentiates word initial labial…

  19. Multi-voxel Patterns Reveal Functionally Differentiated Networks Underlying Auditory Feedback Processing of Speech

    PubMed Central

    Zheng, Zane Z.; Vicente-Grabovetsky, Alejandro; MacDonald, Ewen N.; Munhall, Kevin G.; Cusack, Rhodri; Johnsrude, Ingrid S.

    2013-01-01

    The everyday act of speaking involves the complex processes of speech motor control. An important component of control is monitoring, detection and processing of errors when auditory feedback does not correspond to the intended motor gesture. Here we show, using fMRI and converging operations within a multi-voxel pattern analysis framework, that this sensorimotor process is supported by functionally differentiated brain networks. During scanning, a real-time speech-tracking system was employed to deliver two acoustically different types of distorted auditory feedback or unaltered feedback while human participants were vocalizing monosyllabic words, and to present the same auditory stimuli while participants were passively listening. Whole-brain analysis of neural-pattern similarity revealed three functional networks that were differentially sensitive to distorted auditory feedback during vocalization, compared to during passive listening. One network of regions appears to encode an ‘error signal’ irrespective of acoustic features of the error: this network, including right angular gyrus, right supplementary motor area, and bilateral cerebellum, yielded consistent neural patterns across acoustically different, distorted feedback types, only during articulation (not during passive listening). In contrast, a fronto-temporal network appears sensitive to the speech features of auditory stimuli during passive listening; this preference for speech features was diminished when the same stimuli were presented as auditory concomitants of vocalization. A third network, showing a distinct functional pattern from the other two, appears to capture aspects of both neural response profiles. Taken together, our findings suggest that auditory feedback processing during speech motor control may rely on multiple, interactive, functionally differentiated neural systems. PMID:23467350

  20. High-speed imaging, acoustic features, and aeroacoustic computations of jet noise from Strombolian (and Vulcanian) explosions

    NASA Astrophysics Data System (ADS)

    Taddeucci, J.; Sesterhenn, J.; Scarlato, P.; Stampka, K.; Del Bello, E.; Pena Fernandez, J. J.; Gaudin, D.

    2014-05-01

    High-speed imaging of explosive eruptions at Stromboli (Italy), Fuego (Guatemala), and Yasur (Vanuatu) volcanoes allowed visualization of pressure waves from seconds-long explosions. From the explosion jets, waves radiate with variable geometry, timing, and apparent direction and velocity. Both the explosion jets and their wave fields are replicated well by numerical simulations of supersonic jets impulsively released from a pressurized vessel. The scaled acoustic signal from one explosion at Stromboli displays a frequency pattern with an excellent match to those from the simulated jets. We conclude that both the observed waves and the audible sound from the explosions are jet noise, i.e., the typical acoustic field radiating from high-velocity jets. Volcanic jet noise was previously quantified only in the infrasonic emissions from large, sub-Plinian to Plinian eruptions. Our combined approach allows us to define the spatial and temporal evolution of audible jet noise from supersonic jets in small-scale volcanic eruptions.

  1. Embedding speech into virtual realities

    NASA Technical Reports Server (NTRS)

    Bohn, Christian-Arved; Krueger, Wolfgang

    1993-01-01

    In this work a speaker-independent speech recognition system is presented, which is suitable for implementation in Virtual Reality applications. The use of an artificial neural network in connection with a special compression of the acoustic input leads to a system, which is robust, fast, easy to use and needs no additional hardware, beside a common VR-equipment.

  2. Subjective and objective assessments in classrooms following acoustical renovation

    NASA Astrophysics Data System (ADS)

    Astolfi, Arianna

    2005-04-01

    The effectiveness of an expensive acoustical intervention in an old Italian high school building has been assessed in this work. The school building has fifty classrooms, the majority of which were acoustically renovated. A subjective survey and measurements were performed in both the renovated and non renovated classrooms. With the competence of some psychologists from Turin University, a questionnaire was set up for the subjective analysis. The questionnaire, validated after numerous pilot tests, was submitted to all the students and the teachers in two periods of the year. The questions on acoustical features included questions on annoyance from room noise, reverberation, speech comprehension, overall acoustical satisfaction and the consequences of bad acoustical conditions. Apart from the acoustics, other aspects of environmental quality, such as the thermal and visual environmental features and IAQ were investigated. The statistical analysis of the subjective answers allowed aggregated information to be obtained on the users and different data to be correlated. The aim of the statistical correlation was to determine any significant relationships between the objective and subjective data, and between the overall satisfaction scores and the different environmental factors. The effects of bad environmental conditions and their influence on learning capacity were also examined.

  3. Intelligibility of laryngectomees' substitute speech: automatic speech recognition and subjective rating.

    PubMed

    Schuster, Maria; Haderlein, Tino; Nöth, Elmar; Lohscheller, Jörg; Eysholdt, Ulrich; Rosanowski, Frank

    2006-02-01

    Substitute speech after laryngectomy is characterized by restricted aero-acoustic properties in comparison with laryngeal speech and has therefore lower intelligibility. Until now, an objective means to determine and quantify the intelligibility has not existed, although the intelligibility can serve as a global outcome parameter of voice restoration after laryngectomy. An automatic speech recognition system was applied on recordings of a standard text read by 18 German male laryngectomees with tracheoesophageal substitute speech. The system was trained with normal laryngeal speakers and not adapted to severely disturbed voices. Substitute speech was compared to laryngeal speech of a control group. Subjective evaluation of intelligibility was performed by a panel of five experts and compared to automatic speech evaluation. Substitute speech showed lower syllables/s and lower word accuracy than laryngeal speech. Automatic speech recognition for substitute speech yielded word accuracy between 10.0 and 50% (28.7+/-12.1%) with sufficient discrimination. It complied with experts' subjective evaluations of intelligibility. The multi-rater kappa of the experts alone did not differ from the multi-rater kappa of experts and the recognizer. Automatic speech recognition serves as a good means to objectify and quantify global speech outcome of laryngectomees. For clinical use, the speech recognition system will be adapted to disturbed voices and can also be applied in other languages. PMID:16001246

  4. A fast algorithm for the phonemic segmentation of continuous speech

    NASA Astrophysics Data System (ADS)

    Smidt, D.

    1986-04-01

    The method of differential learning (DL method) was applied to the fast phonemic classification of acoustic speech spectra. The method was also tested with a simple algorithm for continuous speech recognition. In every learning step of the DL method only that single pattern component which deviates most from the reference value is used for a new rule. Several rules of this type were connected in a conjunctive or disjunctive way. Tests with a single speaker demonstrate good classification capability and a very high speed. The inclusion of automatically additional features selected according to their relevance is discussed. It is shown that there exists a correspondence between processes related to the DL method and pattern recognition in living beings with their ability for generalization and differentiation.

  5. Neural pathways for visual speech perception

    PubMed Central

    Bernstein, Lynne E.; Liebenthal, Einat

    2014-01-01

    This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody) can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns of activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1) The visual perception of speech relies on visual pathway representations of speech qua speech. (2) A proposed site of these representations, the temporal visual speech area (TVSA) has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS). (3) Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA. PMID:25520611

  6. Virtual acoustics displays

    NASA Technical Reports Server (NTRS)

    Wenzel, Elizabeth M.; Fisher, Scott S.; Stone, Philip K.; Foster, Scott H.

    1991-01-01

    The real time acoustic display capabilities are described which were developed for the Virtual Environment Workstation (VIEW) Project at NASA-Ames. The acoustic display is capable of generating localized acoustic cues in real time over headphones. An auditory symbology, a related collection of representational auditory 'objects' or 'icons', can be designed using ACE (Auditory Cue Editor), which links both discrete and continuously varying acoustic parameters with information or events in the display. During a given display scenario, the symbology can be dynamically coordinated in real time with 3-D visual objects, speech, and gestural displays. The types of displays feasible with the system range from simple warnings and alarms to the acoustic representation of multidimensional data or events.

  7. Applications for Subvocal Speech

    NASA Technical Reports Server (NTRS)

    Jorgensen, Charles; Betts, Bradley

    2007-01-01

    A research and development effort now underway is directed toward the use of subvocal speech for communication in settings in which (1) acoustic noise could interfere excessively with ordinary vocal communication and/or (2) acoustic silence or secrecy of communication is required. By "subvocal speech" is meant sub-audible electromyographic (EMG) signals, associated with speech, that are acquired from the surface of the larynx and lingual areas of the throat. Topics addressed in this effort include recognition of the sub-vocal EMG signals that represent specific original words or phrases; transformation (including encoding and/or enciphering) of the signals into forms that are less vulnerable to distortion, degradation, and/or interception; and reconstruction of the original words or phrases at the receiving end of a communication link. Potential applications include ordinary verbal communications among hazardous- material-cleanup workers in protective suits, workers in noisy environments, divers, and firefighters, and secret communications among law-enforcement officers and military personnel in combat and other confrontational situations.

  8. Revisiting speech interference in classrooms.

    PubMed

    Picard, M; Bradley, J S

    2001-01-01

    A review of the effects of ambient noise and reverberation on speech intelligibility in classrooms has been completed because of the long-standing lack of agreement on preferred acoustical criteria for unconstrained speech accessibility and communication in educational facilities. An overwhelming body of evidence has been collected to suggest that noise levels in particular are usually far in excess of any reasonable prescription for optimal conditions for understanding speech in classrooms. Quite surprisingly, poor classroom acoustics seem to be the prevailing condition for both normally-hearing and hearing-impaired students with reported A-weighted ambient noise levels 4-37 dB above values currently agreed upon to provide optimal understanding. Revision of currently proposed room acoustic performance criteria to ensure speech accessibility for all students indicates the need for a guideline weighted for age and one for more vulnerable groups. For teens (12-year-olds and older) and young adults having normal speech processing in noise, ambient noise levels not exceeding 40 dBA are suggested as acceptable, and reverberation times of about 0.5 s are concluded to be optimum. Younger students, having normal speech processing in noise for their age, would require noise levels ranging from 39 dBA for 10-11-year-olds to only 28.5 dBA for 6-7-year-olds. By contrast, groups suspected of delayed speech processing in noise may require levels as low as only 21.5 dBA at age 6-7. As one would expect, these more vulnerable students would include the hearing-impaired in the course of language development and non-native listeners. PMID:11688542

  9. Experimental comparison between speech transmission index, rapid speech transmission index, and speech intelligibility index.

    PubMed

    Larm, Petra; Hongisto, Valtteri

    2006-02-01

    During the acoustical design of, e.g., auditoria or open-plan offices, it is important to know how speech can be perceived in various parts of the room. Different objective methods have been developed to measure and predict speech intelligibility, and these have been extensively used in various spaces. In this study, two such methods were compared, the speech transmission index (STI) and the speech intelligibility index (SII). Also the simplification of the STI, the room acoustics speech transmission index (RASTI), was considered. These quantities are all based on determining an apparent speech-to-noise ratio on selected frequency bands and summing them using a specific weighting. For comparison, some data were needed on the possible differences of these methods resulting from the calculation scheme and also measuring equipment. Their prediction accuracy was also of interest. Measurements were made in a laboratory having adjustable noise level and absorption, and in a real auditorium. It was found that the measurement equipment, especially the selection of the loudspeaker, can greatly affect the accuracy of the results. The prediction accuracy of the RASTI was found acceptable, if the input values for the prediction are accurately known, even though the studied space was not ideally diffuse. PMID:16521772

  10. Experimental comparison between speech transmission index, rapid speech transmission index, and speech intelligibility index.

    PubMed

    Larm, Petra; Hongisto, Valtteri

    2006-02-01

    During the acoustical design of, e.g., auditoria or open-plan offices, it is important to know how speech can be perceived in various parts of the room. Different objective methods have been developed to measure and predict speech intelligibility, and these have been extensively used in various spaces. In this study, two such methods were compared, the speech transmission index (STI) and the speech intelligibility index (SII). Also the simplification of the STI, the room acoustics speech transmission index (RASTI), was considered. These quantities are all based on determining an apparent speech-to-noise ratio on selected frequency bands and summing them using a specific weighting. For comparison, some data were needed on the possible differences of these methods resulting from the calculation scheme and also measuring equipment. Their prediction accuracy was also of interest. Measurements were made in a laboratory having adjustable noise level and absorption, and in a real auditorium. It was found that the measurement equipment, especially the selection of the loudspeaker, can greatly affect the accuracy of the results. The prediction accuracy of the RASTI was found acceptable, if the input values for the prediction are accurately known, even though the studied space was not ideally diffuse.

  11. Discovering words in fluent speech: the contribution of two kinds of statistical information.

    PubMed

    Thiessen, Erik D; Erickson, Lucy C

    2012-01-01

    To efficiently segment fluent speech, infants must discover the predominant phonological form of words in the native language. In English, for example, content words typically begin with a stressed syllable. To discover this regularity, infants need to identify a set of words. We propose that statistical learning plays two roles in this process. First, it provides a cue that allows infants to segment words from fluent speech, even without language-specific phonological knowledge. Second, once infants have identified a set of lexical forms, they can learn from the distribution of acoustic features across those word forms. The current experiments demonstrate both processes are available to 5-month-old infants. This demonstration of sensitivity to statistical structure in speech, weighted more heavily than phonological cues to segmentation at an early age, is consistent with theoretical accounts that claim statistical learning plays a role in helping infants to adapt to the structure of their native language from very early in life.

  12. Watch what you say, your computer might be listening: A review of automated speech recognition

    NASA Technical Reports Server (NTRS)

    Degennaro, Stephen V.

    1991-01-01

    Spoken language is the most convenient and natural means by which people interact with each other and is, therefore, a promising candidate for human-machine interactions. Speech also offers an additional channel for hands-busy applications, complementing the use of motor output channels for control. Current speech recognition systems vary considerably across a number of important characteristics, including vocabulary size, speaking mode, training requirements for new speakers, robustness to acoustic environments, and accuracy. Algorithmically, these systems range from rule-based techniques through more probabilistic or self-learning approaches such as hidden Markov modeling and neural networks. This tutorial begins with a brief summary of the relevant features of current speech recognition systems and the strengths and weaknesses of the various algorithmic approaches.

  13. Discovering words in fluent speech: the contribution of two kinds of statistical information.

    PubMed

    Thiessen, Erik D; Erickson, Lucy C

    2012-01-01

    To efficiently segment fluent speech, infants must discover the predominant phonological form of words in the native language. In English, for example, content words typically begin with a stressed syllable. To discover this regularity, infants need to identify a set of words. We propose that statistical learning plays two roles in this process. First, it provides a cue that allows infants to segment words from fluent speech, even without language-specific phonological knowledge. Second, once infants have identified a set of lexical forms, they can learn from the distribution of acoustic features across those word forms. The current experiments demonstrate both processes are available to 5-month-old infants. This demonstration of sensitivity to statistical structure in speech, weighted more heavily than phonological cues to segmentation at an early age, is consistent with theoretical accounts that claim statistical learning plays a role in helping infants to adapt to the structure of their native language from very early in life. PMID:23335903

  14. Auditory-neurophysiological responses to speech during early childhood: Effects of background noise.

    PubMed

    White-Schwoch, Travis; Davies, Evan C; Thompson, Elaine C; Woodruff Carr, Kali; Nicol, Trent; Bradlow, Ann R; Kraus, Nina

    2015-10-01

    Early childhood is a critical period of auditory learning, during which children are constantly mapping sounds to meaning. But this auditory learning rarely occurs in ideal listening conditions-children are forced to listen against a relentless din. This background noise degrades the neural coding of these critical sounds, in turn interfering with auditory learning. Despite the importance of robust and reliable auditory processing during early childhood, little is known about the neurophysiology underlying speech processing in children so young. To better understand the physiological constraints these adverse listening scenarios impose on speech sound coding during early childhood, auditory-neurophysiological responses were elicited to a consonant-vowel syllable in quiet and background noise in a cohort of typically-developing preschoolers (ages 3-5 yr). Overall, responses were degraded in noise: they were smaller, less stable across trials, slower, and there was poorer coding of spectral content and the temporal envelope. These effects were exacerbated in response to the consonant transition relative to the vowel, suggesting that the neural coding of spectrotemporally-dynamic speech features is more tenuous in noise than the coding of static features-even in children this young. Neural coding of speech temporal fine structure, however, was more resilient to the addition of background noise than coding of temporal envelope information. Taken together, these results demonstrate that noise places a neurophysiological constraint on speech processing during early childhood by causing a breakdown in neural processing of speech acoustics. These results may explain why some listeners have inordinate difficulties understanding speech in noise. Speech-elicited auditory-neurophysiological responses offer objective insight into listening skills during early childhood by reflecting the integrity of neural coding in quiet and noise; this paper documents typical response

  15. Modifying Speech to Children based on their Perceived Phonetic Accuracy

    PubMed Central

    Julien, Hannah M.; Munson, Benjamin

    2014-01-01

    Purpose We examined the relationship between adults' perception of the accuracy of children's speech, and acoustic detail in their subsequent productions to children. Methods Twenty-two adults participated in a task in which they rated the accuracy of 2- and 3-year-old children's word-initial /s/and /∫/ using a visual analog scale (VAS), then produced a token of the same word as if they were responding to the child whose speech they had just rated. Result The duration of adults' fricatives varied as a function of their perception of the accuracy of children's speech: longer fricatives were produced following productions that they rated as inaccurate. This tendency to modify duration in response to perceived inaccurate tokens was mediated by measures of self-reported experience interacting with children. However, speakers did not increase the spectral distinctiveness of their fricatives following the perception of inaccurate tokens. Conclusion These results suggest that adults modify temporal features of their speech in response to perceiving children's inaccurate productions. These longer fricatives are potentially both enhanced input to children, and an error-corrective signal. PMID:22744140

  16. Articulatory-to-Acoustic Relations in Response to Speaking Rate and Loudness Manipulations

    ERIC Educational Resources Information Center

    Mefferd, Antje S.; Green, Jordan R.

    2010-01-01

    Purpose: In this investigation, the authors determined the strength of association between tongue kinematic and speech acoustics changes in response to speaking rate and loudness manipulations. Performance changes in the kinematic and acoustic domains were measured using two aspects of speech production presumably affecting speech clarity:…

  17. An Acoustic Study of the Relationships among Neurologic Disease, Dysarthria Type, and Severity of Dysarthria

    ERIC Educational Resources Information Center

    Kim, Yunjung; Kent, Raymond D.; Weismer, Gary

    2011-01-01

    Purpose: This study examined acoustic predictors of speech intelligibility in speakers with several types of dysarthria secondary to different diseases and conducted classification analysis solely by acoustic measures according to 3 variables (disease, speech severity, and dysarthria type). Method: Speech recordings from 107 speakers with…

  18. An Investigation of Place and Voice Features Using fMRI-Adaptation.

    PubMed

    Lawyer, Laurel; Corina, David

    2014-01-01

    A widely accepted view of speech perception holds that in order to comprehend language, the variable acoustic signal must be parsed into a set of abstract linguistic representations. However, the neural basis of early phonological processing, including the nature of featural encoding of speech, is still poorly understood. In part, progress in this domain has been constrained by the difficulty inherent in extricating the influence of acoustic modulations from those which can be ascribed to the abstract, featural content of the stimuli. A further concern is that group averaging techniques may obscure subtle individual differences in cortical regions involved in early language processing. In this paper we present the results of an fMRI-adaptation experiment which finds evidence of areas in the superior and medial temporal lobes which respond selectively to changes in the major feature categories of voicing and place of articulation. We present both single-subject and group-averaged analyses. PMID:24187438

  19. Headphone localization of speech

    NASA Technical Reports Server (NTRS)

    Begault, Durand R.; Wenzel, Elizabeth M.

    1993-01-01

    Three-dimensional acoustic display systems have recently been developed that synthesize virtual sound sources over headphones based on filtering by head-related transfer functions (HRTFs), the direction-dependent spectral changes caused primarily by the pinnae. In this study, 11 inexperienced subjects judged the apparent spatial location of headphone-presented speech stimuli filtered with nonindividualized HRTFs. About half of the subjects 'pulled' their judgments toward either the median or the lateral-vertical planes, and estimates were almost always elevated. Individual differences were pronounced for the distance judgments; 15 to 46 percent of stimuli were heard inside the head, with the shortest estimates near the median plane. The results suggest that most listeners can obtain useful azimuth information from speech stimuli filtered by nonindividualized HRTFs. Measurements of localization error and reversal rates are comparable with a previous study that used broadband noise stimuli.

  20. Improving on hidden Markov models: An articulatorily constrained, maximum likelihood approach to speech recognition and speech coding

    SciTech Connect

    Hogden, J.

    1996-11-05

    The goal of the proposed research is to test a statistical model of speech recognition that incorporates the knowledge that speech is produced by relatively slow motions of the tongue, lips, and other speech articulators. This model is called Maximum Likelihood Continuity Mapping (Malcom). Many speech researchers believe that by using constraints imposed by articulator motions, we can improve or replace the current hidden Markov model based speech recognition algorithms. Unfortunately, previous efforts to incorporate information about articulation into speech recognition algorithms have suffered because (1) slight inaccuracies in our knowledge or the formulation of our knowledge about articulation may decrease recognition performance, (2) small changes in the assumptions underlying models of speech production can lead to large changes in the speech derived from the models, and (3) collecting measurements of human articulator positions in sufficient quantity for training a speech recognition algorithm is still impractical. The most interesting (and in fact, unique) quality of Malcom is that, even though Malcom makes use of a mapping between acoustics and articulation, Malcom can be trained to recognize speech using only acoustic data. By learning the mapping between acoustics and articulation using only acoustic data, Malcom avoids the difficulties involved in collecting articulator position measurements and does not require an articulatory synthesizer model to estimate the mapping between vocal tract shapes and speech acoustics. Preliminary experiments that demonstrate that Malcom can learn the mapping between acoustics and articulation are discussed. Potential applications of Malcom aside from speech recognition are also discussed. Finally, specific deliverables resulting from the proposed research are described.

  1. Auditory-Perceptual Learning Improves Speech Motor Adaptation in Children

    PubMed Central

    Shiller, Douglas M.; Rochon, Marie-Lyne

    2015-01-01

    Auditory feedback plays an important role in children’s speech development by providing the child with information about speech outcomes that is used to learn and fine-tune speech motor plans. The use of auditory feedback in speech motor learning has been extensively studied in adults by examining oral motor responses to manipulations of auditory feedback during speech production. Children are also capable of adapting speech motor patterns to perceived changes in auditory feedback, however it is not known whether their capacity for motor learning is limited by immature auditory-perceptual abilities. Here, the link between speech perceptual ability and the capacity for motor learning was explored in two groups of 5–7-year-old children who underwent a period of auditory perceptual training followed by tests of speech motor adaptation to altered auditory feedback. One group received perceptual training on a speech acoustic property relevant to the motor task while a control group received perceptual training on an irrelevant speech contrast. Learned perceptual improvements led to an enhancement in speech motor adaptation (proportional to the perceptual change) only for the experimental group. The results indicate that children’s ability to perceive relevant speech acoustic properties has a direct influence on their capacity for sensory-based speech motor adaptation. PMID:24842067

  2. Auditory-perceptual learning improves speech motor adaptation in children.

    PubMed

    Shiller, Douglas M; Rochon, Marie-Lyne

    2014-08-01

    Auditory feedback plays an important role in children's speech development by providing the child with information about speech outcomes that is used to learn and fine-tune speech motor plans. The use of auditory feedback in speech motor learning has been extensively studied in adults by examining oral motor responses to manipulations of auditory feedback during speech production. Children are also capable of adapting speech motor patterns to perceived changes in auditory feedback; however, it is not known whether their capacity for motor learning is limited by immature auditory-perceptual abilities. Here, the link between speech perceptual ability and the capacity for motor learning was explored in two groups of 5- to 7-year-old children who underwent a period of auditory perceptual training followed by tests of speech motor adaptation to altered auditory feedback. One group received perceptual training on a speech acoustic property relevant to the motor task while a control group received perceptual training on an irrelevant speech contrast. Learned perceptual improvements led to an enhancement in speech motor adaptation (proportional to the perceptual change) only for the experimental group. The results indicate that children's ability to perceive relevant speech acoustic properties has a direct influence on their capacity for sensory-based speech motor adaptation.

  3. Speech disorders - children

    MedlinePlus

    ... of speech disorders may disappear on their own. Speech therapy may help with more severe symptoms or speech ... the disorder. Speech can often be improved with speech therapy. Early treatment is likely to have better results.

  4. Acoustic biosensors

    PubMed Central

    Fogel, Ronen; Seshia, Ashwin A.

    2016-01-01

    Resonant and acoustic wave devices have been researched for several decades for application in the gravimetric sensing of a variety of biological and chemical analytes. These devices operate by coupling the measurand (e.g. analyte adsorption) as a modulation in the physical properties of the acoustic wave (e.g. resonant frequency, acoustic velocity, dissipation) that can then be correlated with the amount of adsorbed analyte. These devices can also be miniaturized with advantages in terms of cost, size and scalability, as well as potential additional features including integration with microfluidics and electronics, scaled sensitivities associated with smaller dimensions and higher operational frequencies, the ability to multiplex detection across arrays of hundreds of devices embedded in a single chip, increased throughput and the ability to interrogate a wider range of modes including within the same device. Additionally, device fabrication is often compatible with semiconductor volume batch manufacturing techniques enabling cost scalability and a high degree of precision and reproducibility in the manufacturing process. Integration with microfluidics handling also enables suitable sample pre-processing/separation/purification/amplification steps that could improve selectivity and the overall signal-to-noise ratio. Three device types are reviewed here: (i) bulk acoustic wave sensors, (ii) surface acoustic wave sensors, and (iii) micro/nano-electromechanical system (MEMS/NEMS) sensors. PMID:27365040

  5. Acoustic biosensors.

    PubMed

    Fogel, Ronen; Limson, Janice; Seshia, Ashwin A

    2016-06-30

    Resonant and acoustic wave devices have been researched for several decades for application in the gravimetric sensing of a variety of biological and chemical analytes. These devices operate by coupling the measurand (e.g. analyte adsorption) as a modulation in the physical properties of the acoustic wave (e.g. resonant frequency, acoustic velocity, dissipation) that can then be correlated with the amount of adsorbed analyte. These devices can also be miniaturized with advantages in terms of cost, size and scalability, as well as potential additional features including integration with microfluidics and electronics, scaled sensitivities associated with smaller dimensions and higher operational frequencies, the ability to multiplex detection across arrays of hundreds of devices embedded in a single chip, increased throughput and the ability to interrogate a wider range of modes including within the same device. Additionally, device fabrication is often compatible with semiconductor volume batch manufacturing techniques enabling cost scalability and a high degree of precision and reproducibility in the manufacturing process. Integration with microfluidics handling also enables suitable sample pre-processing/separation/purification/amplification steps that could improve selectivity and the overall signal-to-noise ratio. Three device types are reviewed here: (i) bulk acoustic wave sensors, (ii) surface acoustic wave sensors, and (iii) micro/nano-electromechanical system (MEMS/NEMS) sensors. PMID:27365040

  6. The Rhythm of Perception: Acoustic Rhythmic Entrainment Induces Subsequent Perceptual Oscillation

    PubMed Central

    Hickok, Gregory; Farahbod, Haleh; Saberi, Kourosh

    2015-01-01

    Acoustic rhythms are pervasive in speech, music, and environmental sounds. Evidence for neural codes representing periodic information has recently emerged, which seem a likely neural basis for the ability to detect rhythm and rhythmic information has been found to modulate auditory system excitability, providing a potential mechanism for parsing the acoustic stream. Here we explore the effects of a previous rhythmic stimulus on subsequent auditory perception. We found that a low-frequency (3 Hz) amplitute modulated signal induces a subsequent oscillation of perceptual detectability of a brief non-periodic acoustic stimulus (1 kHz tone); the frequency but not phase of the perceptual oscillation matches the entrained stimulus-driven rhythmic oscillation. This provides evidence that rhythmic contexts have a direct influence on subsequent auditory perception of discrete acoustic events. Rhythm coding is likely a fundamental feature of auditory system design that predates the development of explicit human enjoyment of rhythm in music or poetry. PMID:25968248

  7. Variability in English vowels is comparable in articulation and acoustics

    PubMed Central

    Noiray, Aude; Iskarous, Khalil; Whalen, D. H.

    2014-01-01

    The nature of the links between speech production and perception has been the subject of longstanding debate. The present study investigated the articulatory parameter of tongue height and the acoustic F1-F0 difference for the phonological distinction of vowel height in American English front vowels. Multiple repetitions of /i, ɪ, e, ε, æ/ in [(h)Vd] sequences were recorded in seven adult speakers. Articulatory (ultrasound) and acoustic data were collected simultaneously to provide a direct comparison of variability in vowel production in both domains. Results showed idiosyncratic patterns of articulation for contrasting the three front vowel pairs /i-ɪ/, /e-ε/ and /ε-æ/ across subjects, with the degree of variability in vowel articulation comparable to that observed in the acoustics for all seven participants. However, contrary to what was expected, some speakers showed reversals for tongue height for /ɪ/-/e/ that was also reflected in acoustics with F1 higher for /ɪ/ than for /e/. The data suggest the phonological distinction of height is conveyed via speaker-specific articulatory-acoustic patterns that do not strictly match features descriptions. However, the acoustic signal is faithful to the articulatory configuration that generated it, carrying the crucial information for perceptual contrast. PMID:25101144

  8. Extensions to the Speech Disorders Classification System (SDCS).

    PubMed

    Shriberg, Lawrence D; Fourakis, Marios; Hall, Sheryl D; Karlsson, Heather B; Lohmeier, Heather L; McSweeny, Jane L; Potter, Nancy L; Scheer-Cohen, Alison R; Strand, Edythe A; Tilkens, Christie M; Wilson, David L

    2010-10-01

    This report describes three extensions to a classification system for paediatric speech sound disorders termed the Speech Disorders Classification System (SDCS). Part I describes a classification extension to the SDCS to differentiate motor speech disorders from speech delay and to differentiate among three sub-types of motor speech disorders. Part II describes the Madison Speech Assessment Protocol (MSAP), an ∼ 2-hour battery of 25 measures that includes 15 speech tests and tasks. Part III describes the Competence, Precision, and Stability Analytics (CPSA) framework, a current set of ∼ 90 perceptual- and acoustic-based indices of speech, prosody, and voice used to quantify and classify sub-types of Speech Sound Disorders (SSD). A companion paper provides reliability estimates for the perceptual and acoustic data reduction methods used in the SDCS. The agreement estimates in the companion paper support the reliability of SDCS methods and illustrate the complementary roles of perceptual and acoustic methods in diagnostic analyses of SSD of unknown origin. Examples of research using the extensions to the SDCS described in the present report include diagnostic findings for a sample of youth with motor speech disorders associated with galactosemia, and a test of the hypothesis of apraxia of speech in a group of children with autism spectrum disorders. All SDCS methods and reference databases running in the PEPPER (Programs to Examine Phonetic and Phonologic Evaluation Records) environment will be disseminated without cost when complete.

  9. Relationships between speech production and speech perception skills in young cochlear-implant users.

    PubMed

    Tye-Murray, N; Spencer, L; Gilbert-Bedia, E

    1995-11-01

    The purpose of this investigation was to examine the relationships between young cochlear-implant users' abilities to produce the speech features of nasality, voicing, duration, frication, and place of articulation and their abilities to utilize the features in three different perceptual conditions: audition-only, vision-only, and audition-plus-vision. Subjects were 23 prelingually deafened children who had at least 2 years of experience with a Cochlear Corporation Nucleus cochlear implant, and an average of 34 months. They completed both the production and perception version of the Children's Audio--visual Feature Test, which is comprised of ten consonant--vowel syllables. An information transmission analysis performed on the confusion matrices revealed that children produced the place of articulation fairly accurately and voicing, duration, and frication less accurately. Acoustic analysis indicated that voiced sounds were not distinguished from unvoiced sounds on the basis of voice onset time or syllabic duration. Subjects who were more likely to produce the place feature correctly were likely to have worn their cochlear implants for a greater length of time. Pearson correlations revealed that subjects who were most likely to hear the place of articulation, nasality, and voicing features in an audition-only condition were also most likely to speak these features correctly. Comparisons of test results collected longitudinally also revealed improvements in production of the features, probably as a result of cochlear-implant experience and/or maturation.

  10. Speech-cue transmission by an algorithm to increase consonant recognition in noise for hearing-impaired listeners

    PubMed Central

    Healy, Eric W.; Yoho, Sarah E.; Wang, Yuxuan; Apoux, Frédéric; Wang, DeLiang

    2014-01-01

    Consonant recognition was assessed following extraction of speech from noise using a more efficient version of the speech-segregation algorithm described in Healy, Yoho, Wang, and Wang [(2013) J. Acoust. Soc. Am. 134, 3029–3038]. Substantial increases in recognition were observed following algorithm processing, which were significantly larger for hearing-impaired (HI) than for normal-hearing (NH) listeners in both speech-shaped noise and babble backgrounds. As observed previously for sentence recognition, older HI listeners having access to the algorithm performed as well or better than young NH listeners in conditions of identical noise. It was also found that the binary masks estimated by the algorithm transmitted speech features to listeners in a fashion highly similar to that of the ideal binary mask (IBM), suggesting that the algorithm is estimating the IBM with substantial accuracy. Further, the speech features associated with voicing, manner of articulation, and place of articulation were all transmitted with relative uniformity and at relatively high levels, indicating that the algorithm and the IBM transmit speech cues without obvious deficiency. Because the current implementation of the algorithm is much more efficient, it should be more amenable to real-time implementation in devices such as hearing aids and cochlear implants. PMID:25480077

  11. Symbolic Speech

    ERIC Educational Resources Information Center

    Podgor, Ellen S.

    1976-01-01

    The concept of symbolic speech emanates from the 1967 case of United States v. O'Brien. These discussions of flag desecration, grooming and dress codes, nude entertainment, buttons and badges, and musical expression show that the courts place symbolic speech in different strata from verbal communication. (LBH)

  12. Speech Aids

    NASA Technical Reports Server (NTRS)

    1987-01-01

    Designed to assist deaf and hearing impaired-persons in achieving better speech, Resnick Worldwide Inc.'s device provides a visual means of cuing the deaf as a speech-improvement measure. This is done by electronically processing the subjects' sounds and comparing them with optimum values which are displayed for comparison.

  13. Speech coding

    SciTech Connect

    Ravishankar, C., Hughes Network Systems, Germantown, MD

    1998-05-08

    Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the

  14. Multiple-input multiple-output (MIMO) analog-to-feature converter chipsets for sub-wavelength acoustic source localization and bearing estimation

    NASA Astrophysics Data System (ADS)

    Chakrabartty, Shantanu

    2010-04-01

    Localization of acoustic sources using miniature microphone arrays poses a significant challenge due to fundamental limitations imposed by the physics of sound propagation. With sub-wavelength distances between the microphones, resolving acute localization cues become difficult due to precision artifacts. In this work, we present the design of a miniature, microphone array sensor based on a patented Multiple-input Multiple-output (MIMO) analog-to-feature converter (AFC) chip-sets which overcomes the limitations due to precision artifacts. Measured results from fabricated prototypes demonstrate a bearing range of 0 degrees to 90 degrees with a resolution less than 2 degrees. The power dissipation of the MIMO-ADC chip-set for this task was measured to be less than 75 microwatts making it ideal for portable, battery powered sniper and gunshot detection applications.

  15. Statistical modeling of infant-directed versus adult-directed speech: Insights from speech recognition

    NASA Astrophysics Data System (ADS)

    Kirchhoff, Katrin; Schimmel, Steven

    2003-10-01

    Studies on infant speech perception have shown that infant-directed speech (motherese) exhibits exaggerated acoustic properties, which are assumed to guide infants in the acquisition of phonemic categories. Training an automatic speech recognizer on such data might similarly lead to improved performance since classes can be expected to be more clearly separated in the training material. This claim was tested by training automatic speech recognizers on adult-directed (AD) versus infant-directed (ID) speech and testing them under identical versus mismatched conditions. 32 mother-infant conversations and 32 mother-adult conversations were used as training and test data. Both sets of conversations included a set of cue words containing unreduced vowels (e.g., sheep, boot, top, etc.), which mothers were encouraged to use repeatedly. Experiments on continuous speech recognition of the entire data set showed that recognizers trained on infant-directed speech did perform significantly better than those trained on adult-directed speech. However, isolated word recognition experiments focusing on the above-mentioned cue words showed that the drop in performance of the ID-trained speech recognizer on AD test speech was significantly smaller than vice versa, suggesting that speech with over-emphasized phonetic contrasts may indeed constitute better training material for speech recognition. [Work supported by CMBL, University of Washington.

  16. Phrase-level speech simulation with an airway modulation model of speech production.

    PubMed

    Story, Brad H

    2013-06-01

    Artificial talkers and speech synthesis systems have long been used as a means of understanding both speech production and speech perception. The development of an airway modulation model is described that simulates the time-varying changes of the glottis and vocal tract, as well as acoustic wave propagation, during speech production. The result is a type of artificial talker that can be used to study various aspects of how sound is generated by humans and how that sound is perceived by a listener. The primary components of the model are introduced and simulation of words and phrases are demonstrated.

  17. Improving the speech intelligibility in classrooms

    NASA Astrophysics Data System (ADS)

    Lam, Choi Ling Coriolanus

    One of the major acoustical concerns in classrooms is the establishment of effective verbal communication between teachers and students. Non-optimal acoustical conditions, resulting in reduced verbal communication, can cause two main problems. First, they can lead to reduce learning efficiency. Second, they can also cause fatigue, stress, vocal strain and health problems, such as headaches and sore throats, among teachers who are forced to compensate for poor acoustical conditions by raising their voices. Besides, inadequate acoustical conditions can induce the usage of public address system. Improper usage of such amplifiers or loudspeakers can lead to impairment of students' hearing systems. The social costs of poor classroom acoustics will be large to impair the learning of children. This invisible problem has far reaching implications for learning, but is easily solved. Many researches have been carried out that they have accurately and concisely summarized the research findings on classrooms acoustics. Though, there is still a number of challenging questions remaining unanswered. Most objective indices for speech intelligibility are essentially based on studies of western languages. Even several studies of tonal languages as Mandarin have been conducted, there is much less on Cantonese. In this research, measurements have been done in unoccupied rooms to investigate the acoustical parameters and characteristics of the classrooms. The speech intelligibility tests, which based on English, Mandarin and Cantonese, and the survey were carried out on students aged from 5 years old to 22 years old. It aims to investigate the differences in intelligibility between English, Mandarin and Cantonese of the classrooms in Hong Kong. The significance on speech transmission index (STI) related to Phonetically Balanced (PB) word scores will further be developed. Together with developed empirical relationship between the speech intelligibility in classrooms with the variations

  18. Contemporary Issues in Phoneme Production by Hearing-Impaired Persons: Physiological and Acoustic Aspects.

    ERIC Educational Resources Information Center

    McGarr, Nancy S.; Whitehead, Robert

    1992-01-01

    This paper on physiologic correlates of speech production in children and youth with hearing impairments focuses specifically on the production of phonemes and includes data on respiration for speech production, phonation, speech aerodynamics, articulation, and acoustic analyses of speech by hearing-impaired persons. (Author/DB)

  19. Some articulatory details of emotional speech

    NASA Astrophysics Data System (ADS)

    Lee, Sungbok; Yildirim, Serdar; Bulut, Murtaza; Kazemzadeh, Abe; Narayanan, Shrikanth

    2005-09-01

    Differences in speech articulation among four emotion types, neutral, anger, sadness, and happiness are investigated by analyzing tongue tip, jaw, and lip movement data collected from one male and one female speaker of American English. The data were collected using an electromagnetic articulography (EMA) system while subjects produce simulated emotional speech. Pitch, root-mean-square (rms) energy and the first three formants were estimated for vowel segments. For both speakers, angry speech exhibited the largest rms energy and largest articulatory activity in terms of displacement range and movement speed. Happy speech is characterized by largest pitch variability. It has higher rms energy than neutral speech but articulatory activity is rather comparable to, or less than, neutral speech. That is, happy speech is more prominent in voicing activity than in articulation. Sad speech exhibits longest sentence duration and lower rms energy. However, its articulatory activity is no less than neutral speech. Interestingly, for the male speaker, articulation for vowels in sad speech is consistently more peripheral (i.e., more forwarded displacements) when compared to other emotions. However, this does not hold for female subject. These and other results will be discussed in detail with associated acoustics and perceived emotional qualities. [Work supported by NIH.

  20. Aeroacoustics production of fricative speech sounds

    NASA Astrophysics Data System (ADS)

    Krane, Michael; Gary, Settles

    2001-05-01

    The aeroacoustic production of fricative speech sounds is studied using schlieren imaging and acoustic measurements. The focus is the structure of the turbulent jets formed during fricative speech sound production, and how interaction of the jet with articulators affects the acoustic nature of the speech sound. Patterns of the jets formed during the articulation of both voiced and unvoiced fricatives (s and z) are shown using schlieren images. In particular, the interaction of the jet with articulators such as teeth and lips are clearly seen, and demonstrated by varying articulator positions. Pressure measurements were made in conjunction with the images using a microphone placed near the teeth and one in the farfield. The pressure measurements show the acoustic consequences of the various jet/articulator interactions, further clarifying which articulators are most important in determining the aeroacoustic source characteristics.

  1. Auditory-neurophysiological responses to speech during early childhood: Effects of background noise

    PubMed Central

    White-Schwoch, Travis; Davies, Evan C.; Thompson, Elaine C.; Carr, Kali Woodruff; Nicol, Trent; Bradlow, Ann R.; Kraus, Nina

    2015-01-01

    Early childhood is a critical period of auditory learning, during which children are constantly mapping sounds to meaning. But learning rarely occurs under ideal listening conditions—children are forced to listen against a relentless din. This background noise degrades the neural coding of these critical sounds, in turn interfering with auditory learning. Despite the importance of robust and reliable auditory processing during early childhood, little is known about the neurophysiology underlying speech processing in children so young. To better understand the physiological constraints these adverse listening scenarios impose on speech sound coding during early childhood, auditory-neurophysiological responses were elicited to a consonant-vowel syllable in quiet and background noise in a cohort of typically-developing preschoolers (ages 3–5 yr). Overall, responses were degraded in noise: they were smaller, less stable across trials, slower, and there was poorer coding of spectral content and the temporal envelope. These effects were exacerbated in response to the consonant transition relative to the vowel, suggesting that the neural coding of spectrotemporally-dynamic speech features is more tenuous in noise than the coding of static features—even in children this young. Neural coding of speech temporal fine structure, however, was more resilient to the addition of background noise than coding of temporal envelope information. Taken together, these results demonstrate that noise places a neurophysiological constraint on speech processing during early childhood by causing a breakdown in neural processing of speech acoustics. These results may explain why some listeners have inordinate difficulties understanding speech in noise. Speech-elicited auditory-neurophysiological responses offer objective insight into listening skills during early childhood by reflecting the integrity of neural coding in quiet and noise; this paper documents typical response properties

  2. Measures to Evaluate the Effects of DBS on Speech Production

    PubMed Central

    Weismer, Gary; Yunusova, Yana; Bunton, Kate

    2011-01-01

    The purpose of this paper is to review and evaluate measures of speech production that could be used to document effects of Deep Brain Stimulation (DBS) on speech performance, especially in persons with Parkinson disease (PD). A small set of evaluative criteria for these measures is presented first, followed by consideration of several speech physiology and speech acoustic measures that have been studied frequently and reported on in the literature on normal speech production, and speech production affected by neuromotor disorders (dysarthria). Each measure is reviewed and evaluated against the evaluative criteria. Embedded within this review and evaluation is a presentation of new data relating speech motions to speech intelligibility measures in speakers with PD, amyotrophic lateral sclerosis (ALS), and control speakers (CS). These data are used to support the conclusion that at the present time the slope of second formant transitions (F2 slope), an acoustic measure, is well suited to make inferences to speech motion and to predict speech intelligibility. The use of other measures should not be ruled out, however, and we encourage further development of evaluative criteria for speech measures designed to probe the effects of DBS or any treatment with potential effects on speech production and communication skills. PMID:24932066

  3. Correlation study of predictive and descriptive metrics of speech intelligibility

    NASA Astrophysics Data System (ADS)

    Stefaniw, Abigail; Shimizu, Yasushi; Smith, Dana

    2002-11-01

    There exists a wide range of speech-intelligibility metrics, each of which is designed to encapsulate a different aspect of room acoustics that relates to speech intelligibility. This study reviews the different definitions of and correlations between various proposed speech intelligibility measures. Speech Intelligibility metrics can be grouped by two main uses: prediction of designed rooms and description of existing rooms. Two descriptive metrics still under investigation are Ease of Hearing and Acoustical Comfort. These are measured by a simple questionnaire, and their relationships with each other and with significant speech intelligibility metrics are explored. A variety of rooms are modeled and auralized in cooperation with a larger study, including classrooms, lecture halls, and offices. Auralized rooms are used to conveniently provide calculated metrics and cross-talk canceled auralizations for diagnostic and descriptive intelligibility tests. Rooms are modeled in CATT-Acoustic and auralized with a multi-channel speaker array in a hemi-anechoic chamber.

  4. Speech entrainment enables patients with Broca's aphasia to produce fluent speech.

    PubMed

    Fridriksson, Julius; Hubbard, H Isabel; Hudspeth, Sarah Grace; Holland, Audrey L; Bonilha, Leonardo; Fromm, Davida; Rorden, Chris

    2012-12-01

    A distinguishing feature of Broca's aphasia is non-fluent halting speech typically involving one to three words per utterance. Yet, despite such profound impairments, some patients can mimic audio-visual speech stimuli enabling them to produce fluent speech in real time. We call this effect 'speech entrainment' and reveal its neural mechanism as well as explore its usefulness as a treatment for speech production in Broca's aphasia. In Experiment 1, 13 patients with Broca's aphasia were tested in three conditions: (i) speech entrainment with audio-visual feedback where they attempted to mimic a speaker whose mouth was seen on an iPod screen; (ii) speech entrainment with audio-only feedback where patients mimicked heard speech; and (iii) spontaneous speech where patients spoke freely about assigned topics. The patients produced a greater variety of words using audio-visual feedback compared with audio-only feedback and spontaneous speech. No difference was found between audio-only feedback and spontaneous speech. In Experiment 2, 10 of the 13 patients included in Experiment 1 and 20 control subjects underwent functional magnetic resonance imaging to determine the neural mechanism that supports speech entrainment. Group results with patients and controls revealed greater bilateral cortical activation for speech produced during speech entrainment compared with spontaneous speech at the junction of the anterior insula and Brodmann area 47, in Brodmann area 37, and unilaterally in the left middle temporal gyrus and the dorsal portion of Broca's area. Probabilistic white matter tracts constructed for these regions in the normal subjects revealed a structural network connected via the corpus callosum and ventral fibres through the extreme capsule. Unilateral areas were connected via the arcuate fasciculus. In Experiment 3, all patients included in Experiment 1 participated in a 6-week treatment phase using speech entrainment to improve speech production. Behavioural and

  5. Prediction and constraint in audiovisual speech perception

    PubMed Central

    Peelle, Jonathan E.; Sommers, Mitchell S.

    2015-01-01

    During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing precision of prediction. Electrophysiological studies demonstrate oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to auditory information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration, supported

  6. Spectral identification of sperm whales from Littoral Acoustic Demonstration Center passive acoustic recordings

    NASA Astrophysics Data System (ADS)

    Sidorovskaia, Natalia A.; Richard, Blake; Ioup, George E.; Ioup, Juliette W.

    2005-09-01

    The Littoral Acoustic Demonstration Center (LADC) made a series of passive broadband acoustic recordings in the Gulf of Mexico and Ligurian Sea to study noise and marine mammal phonations. The collected data contain a large amount of various types of sperm whale phonations, such as isolated clicks and communication codas. It was previously reported that the spectrograms of the extracted clicks and codas contain well-defined null patterns that seem to be unique for individuals. The null pattern is formed due to individual features of the sound production organs of an animal. These observations motivated the present studies of adapting human speech identification techniques for deep-diving marine mammal phonations. A three-state trained hidden Markov model (HMM) was used with the phonation spectra of sperm whales. The HHM-algorithm gave 75% accuracy in identifying individuals when it had been initially tested for the acoustic data set correlated with visual observations of sperm whales. A comparison of the identification accuracy based on null-pattern similarity analysis and the HMM-algorithm is presented. The results can establish the foundation for developing an acoustic identification database for sperm whales and possibly other deep-diving marine mammals that would be difficult to observe visually. [Research supported by ONR.

  7. Coding pitch differences in voiceless fricatives: Whispered relative to normal speech.

    PubMed

    Heeren, Willemijn F L

    2015-12-01

    Intonation can be perceived in whispered speech despite the absence of the fundamental frequency. In the past, acoustic correlates of pitch in whisper have been sought in vowel content, but, recently, studies of normal speech demonstrated correlates of intonation in consonants as well. This study examined how consonants may contribute to the coding of intonation in whispered relative to normal speech. The acoustic characteristics of whispered, voiceless fricatives /s/ and /f/, produced at different pitch targets (low, mid, high), were investigated and compared to corresponding normal speech productions to assess if whisper contained secondary or compensatory pitch correlates. Furthermore, listener sensitivity to fricative cues to pitch in whisper was established, also relative to normal speech. Consistent with recent studies, acoustic correlates of whispered and normal speech fricatives systematically varied with pitch target. Comparable findings across speech modes showed that acoustic correlates were secondary. Discrimination of vowel-fricative-vowel stimuli was less accurate and slower in whispered than normal speech, which is attributed to differences in acoustic cues available. Perception of fricatives presented without their vowel contexts, however, revealed comparable processing speeds and response accuracies between speech modes, supporting the finding that within fricatives, acoustic correlates of pitch are similar across speech modes. PMID:26723300

  8. Acoustic Emphasis in Four Year Olds

    ERIC Educational Resources Information Center

    Wonnacott, Elizabeth; Watson, Duane G.

    2008-01-01

    Acoustic emphasis may convey a range of subtle discourse distinctions, yet little is known about how this complex ability develops in children. This paper presents a first investigation of the factors which influence the production of acoustic prominence in young children's spontaneous speech. In a production experiment, SVO sentences were…

  9. Articulatory Mediation of Speech Perception: A Causal Analysis of Multi-Modal Imaging Data

    ERIC Educational Resources Information Center

    Gow, David W., Jr.; Segawa, Jennifer A.

    2009-01-01

    The inherent confound between the organization of articulation and the acoustic-phonetic structure of the speech signal makes it exceptionally difficult to evaluate the competing claims of motor and acoustic-phonetic accounts of how listeners recognize coarticulated speech. Here we use Granger causation analysis of high spatiotemporal resolution…

  10. Compensation for Coarticulation: Disentangling Auditory and Gestural Theories of Perception of Coarticulatory Effects in Speech

    ERIC Educational Resources Information Center

    Viswanathan, Navin; Magnuson, James S.; Fowler, Carol A.

    2010-01-01

    According to one approach to speech perception, listeners perceive speech by applying general pattern matching mechanisms to the acoustic signal (e.g., Diehl, Lotto, & Holt, 2004). An alternative is that listeners perceive the phonetic gestures that structured the acoustic signal (e.g., Fowler, 1986). The two accounts have offered different…

  11. Influence of Visual Echo and Visual Reverberation on Speech Fluency in Stutterers.

    ERIC Educational Resources Information Center

    Smolka, Elzbieta; Adamczyk, Bogdan

    1992-01-01

    The influence of visual signals (echo and reverberation) on speech fluency in 60 stutterers and nonstutterers was examined. Visual signals were found to exert a corrective influence on the speech of stutterers but less than the influence of acoustic stimuli. Use of visual signals in combination with acoustic and tactile signals is recommended. (DB)

  12. Effects of charge design features on parameters of acoustic and seismic waves and cratering, for SMR chemical surface explosions

    NASA Astrophysics Data System (ADS)

    Gitterman, Y.

    2012-04-01

    time delays clearly separated for the shot of IMI explosives (characterized by much higher detonation velocity than ANFO). Additionally acoustic records at close distances from WSMR explosions Distant Image (2440 tons of ANFO) and Minor Uncle (2725 tons of ANFO) were used to extend the charge and distance range for the SS delay scaled relationship, that showed consistency with SMR ANFO shots. The developed specific charge design contributed to the success of this unique dual Sayarim explosion experiment, providing the strongest GT0 sources since the establishment of the IMS network, that demonstrated clearly the most favorable westward/ eastward infrasound propagation up to 3400/6250 km according to appropriate summer/winter weather pattern and stratospheric wind directions, respectively, and thus verified empirically common models of infrasound propagation in the atmosphere. The research was supported by the CTBTO, Vienna, and the Israel Ministry of Immigrant Absorption.

  13. Across-formant integration and speech intelligibility: Effects of acoustic source properties in the presence and absence of a contralateral interferer.

    PubMed

    Summers, Robert J; Bailey, Peter J; Roberts, Brian

    2016-08-01

    The role of source properties in across-formant integration was explored using three-formant (F1+F2+F3) analogues of natural sentences (targets). In experiment 1, F1+F3 were harmonic analogues (H1+H3) generated using a monotonous buzz source and second-order resonators; in experiment 2, F1+F3 were tonal analogues (T1+T3). F2 could take either form (H2 or T2). Target formants were always presented monaurally; the receiving ear was assigned randomly on each trial. In some conditions, only the target was present; in others, a competitor for F2 (F2C) was presented contralaterally. Buzz-excited or tonal competitors were created using the time-reversed frequency and amplitude contours of F2. Listeners must reject F2C to optimize keyword recognition. Whether or not a competitor was present, there was no effect of source mismatch between F1+F3 and F2. The impact of adding F2C was modest when it was tonal but large when it was harmonic, irrespective of whether F2C matched F1+F3. This pattern was maintained when harmonic and tonal counterparts were loudness-matched (experiment 3). Source type and competition, rather than acoustic similarity, governed the phonetic contribution of a formant. Contrary to earlier research using dichotic targets, requiring across-ear integration to optimize intelligibility, H2C was an equally effective informational masker for H2 as for T2. PMID:27586751

  14. Comparison of transducers and intraoral placement options for measuring lingua-palatal contact pressure during speech.

    PubMed

    Searl, Jeffrey P

    2003-12-01

    Two studies were completed that focused on instrumentation and procedural issues associated with measurement of lingua-palatal contact pressure (LPCP) during speech. In the first experiment, physical features and response characteristics of 2 miniature pressure transducers (Entran EPI-BO and Precision Measurement 60S) were evaluated to identify a transducer suitable for measuring LPCP during speech. The 2 transducers were comparable in terms of physical dimensions and most response characteristics. However, the Entran device was less affected by air temperature fluctuations, making it the more attractive option for speech LPCP measurement. In a second experiment, 3 methods of placing the Entran device in the mouth were compared. The 3 adhesion methods evaluated were (a) taping a transducer to the hard palate, (b) surface mounting on a mold of the palate, and (c) flush mounting on a mold of the palate. Directly taping the transducer to the alveolar ridge was the least acceptable option, as it resulted in changes in other aspects of speech production (consonant duration and centroid frequency of the burst/frication) suggesting that articulation was unduly altered. Direct taping was also rated as least acceptable by the speakers. Surface and flush mounting resulted in fewer changes in speech aerodynamic and acoustic parameters of /t/ and/s/ compared to the tape condition. Listener ratings also indicated less articulatory disturbance in the surface and flush mounting conditions compared to the tape condition. Surface mounting was technically easier than flush mounting and it allows for rapid repositioning of the transducer if needed.

  15. Effects of Consonant-Vowel Transitions in Speech Stimuli on Cortical Auditory Evoked Potentials in Adults

    PubMed Central

    Doellinger, Michael; Burger, Martin; Hoppe, Ulrich; Bosco, Enrico; Eysholdt, Ulrich

    2011-01-01

    We examined the neural activation to consonant-vowel transitions by cortical auditory evoked potentials (AEPs). The aim was to show whether cortical response patterns to speech stimuli contain components due to one of the temporal features, the voice-onset time (VOT). In seven normal-hearing adults, the cortical responses to four different monosyllabic words were opposed to the cortical responses to noise stimuli with the same temporal envelope as the speech stimuli. Significant hemispheric asymmetries were found for speech but not in noise evoked potentials. The difference signals between the AEPs to speech and corresponding noise stimuli revealed a significant negative component, which correlated with the VOT. The hemispheric asymmetries can be referred to rapid spectral changes. The correlation with the VOT indicates that the significant component in the difference signal reflects the perception of the acoustic change within the consonant-vowel transition. Thus, at the level of automatic processing, the characteristics of speech evoked potentials appear to be determined primarily by temporal aspects of the eliciting stimuli. PMID:21643536

  16. Integrated Spacesuit Audio System Enhances Speech Quality and Reduces Noise

    NASA Technical Reports Server (NTRS)

    Huang, Yiteng Arden; Chen, Jingdong; Chen, Shaoyan Sharyl

    2009-01-01

    A new approach has been proposed for increasing astronaut comfort and speech capture. Currently, the special design of a spacesuit forms an extreme acoustic environment making it difficult to capture clear speech without compromising comfort. The proposed Integrated Spacesuit Audio (ISA) system is to incorporate the microphones into the helmet and use software to extract voice signals from background noise.

  17. The Pump-Valve Model of Speech Articulation.

    ERIC Educational Resources Information Center

    Dew, Donald

    The traditional respiration-phonation-articulation-resonation model of speech production which permeates introductory literature is not the only suitable model of this process. The pump-valve model, which derives from the acoustic theory of speech production, is a viable alternative. This newer model is also consistent with modern theories. It…

  18. The Hyperarticulation Hypothesis of Infant-Directed Speech

    ERIC Educational Resources Information Center

    Cristia, Alejandrina; Seidl, Amanda

    2014-01-01

    Typically, the point vowels [i,?,u] are acoustically more peripheral in infant-directed speech (IDS) compared to adult-directed speech (ADS). If caregivers seek to highlight lexically relevant contrasts in IDS, then two sounds that are contrastive should become more distinct, whereas two sounds that are surface realizations of the same underlying…

  19. The Effects of Macroglossia on Speech: A Case Study

    ERIC Educational Resources Information Center

    Mekonnen, Abebayehu Messele

    2012-01-01

    This article presents a case study of speech production in a 14-year-old Amharic-speaking boy. The boy had developed secondary macroglossia, related to a disturbance of growth hormones, following a history of normal speech development. Perceptual analysis combined with acoustic analysis and static palatography is used to investigate the specific…

  20. Does Signal Degradation Affect Top-Down Processing of Speech?

    PubMed

    Wagner, Anita; Pals, Carina; de Blecourt, Charlotte M; Sarampalis, Anastasios; Başkent, Deniz

    2016-01-01

    Speech perception is formed based on both the acoustic signal and listeners' knowledge of the world and semantic context. Access to semantic information can facilitate interpretation of degraded speech, such as speech in background noise or the speech signal transmitted via cochlear implants (CIs). This paper focuses on the latter, and investigates the time course of understanding words, and how sentential context reduces listeners' dependency on the acoustic signal for natural and degraded speech via an acoustic CI simulation.In an eye-tracking experiment we combined recordings of listeners' gaze fixations with pupillometry, to capture effects of semantic information on both the time course and effort of speech processing. Normal-hearing listeners were presented with sentences with or without a semantically constraining verb (e.g., crawl) preceding the target (baby), and their ocular responses were recorded to four pictures, including the target, a phonological (bay) competitor and a semantic (worm) and an unrelated distractor.The results show that in natural speech, listeners' gazes reflect their uptake of acoustic information, and integration of preceding semantic context. Degradation of the signal leads to a later disambiguation of phonologically similar words, and to a delay in integration of semantic information. Complementary to this, the pupil dilation data show that early semantic integration reduces the effort in disambiguating phonologically similar words. Processing degraded speech comes with increased effort due to the impoverished nature of the signal. Delayed integration of semantic information further constrains listeners' ability to compensate for inaudible signals. PMID:27080670

  1. Strategies for distant speech recognitionin reverberant environments

    NASA Astrophysics Data System (ADS)

    Delcroix, Marc; Yoshioka, Takuya; Ogawa, Atsunori; Kubo, Yotaro; Fujimoto, Masakiyo; Ito, Nobutaka; Kinoshita, Keisuke; Espi, Miquel; Araki, Shoko; Hori, Takaaki; Nakatani, Tomohiro

    2015-12-01

    Reverberation and noise are known to severely affect the automatic speech recognition (ASR) performance of speech recorded by distant microphones. Therefore, we must deal with reverberation if we are to realize high-performance hands-free speech recognition. In this paper, we review a recognition system that we developed at our laboratory to deal with reverberant speech. The system consists of a speech enhancement (SE) front-end that employs long-term linear prediction-based dereverberation followed by noise reduction. We combine our SE front-end with an ASR back-end that uses neural networks for acoustic and language modeling. The proposed system achieved top scores on the ASR task of the REVERB challenge. This paper describes the different technologies used in our system and presents detailed experimental results that justify our implementation choices and may provide hints for designing distant ASR systems.

  2. When speech sounds like music.

    PubMed

    Falk, Simone; Rathcke, Tamara; Dalla Bella, Simone

    2014-08-01

    Repetition can boost memory and perception. However, repeating the same stimulus several times in immediate succession also induces intriguing perceptual transformations and illusions. Here, we investigate the Speech to Song Transformation (S2ST), a massed repetition effect in the auditory modality, which crosses the boundaries between language and music. In the S2ST, a phrase repeated several times shifts to being heard as sung. To better understand this unique cross-domain transformation, we examined the perceptual determinants of the S2ST, in particular the role of acoustics. In 2 Experiments, the effects of 2 pitch properties and 3 rhythmic properties on the probability and speed of occurrence of the transformation were examined. Results showed that both pitch and rhythmic properties are key features fostering the transformation. However, some properties proved to be more conducive to the S2ST than others. Stable tonal targets that allowed for the perception of a musical melody led more often and quickly to the S2ST than scalar intervals. Recurring durational contrasts arising from segmental grouping favoring a metrical interpretation of the stimulus also facilitated the S2ST. This was, however, not the case for a regular beat structure within and across repetitions. In addition, individual perceptual abilities allowed to predict the likelihood of the S2ST. Overall, the study demonstrated that repetition enables listeners to reinterpret specific prosodic features of spoken utterances in terms of musical structures. The findings underline a tight link between language and music, but they also reveal important differences in communicative functions of prosodic structure in the 2 domains. PMID:24911013

  3. When speech sounds like music.

    PubMed

    Falk, Simone; Rathcke, Tamara; Dalla Bella, Simone

    2014-08-01

    Repetition can boost memory and perception. However, repeating the same stimulus several times in immediate succession also induces intriguing perceptual transformations and illusions. Here, we investigate the Speech to Song Transformation (S2ST), a massed repetition effect in the auditory modality, which crosses the boundaries between language and music. In the S2ST, a phrase repeated several times shifts to being heard as sung. To better understand this unique cross-domain transformation, we examined the perceptual determinants of the S2ST, in particular the role of acoustics. In 2 Experiments, the effects of 2 pitch properties and 3 rhythmic properties on the probability and speed of occurrence of the transformation were examined. Results showed that both pitch and rhythmic properties are key features fostering the transformation. However, some properties proved to be more conducive to the S2ST than others. Stable tonal targets that allowed for the perception of a musical melody led more often and quickly to the S2ST than scalar intervals. Recurring durational contrasts arising from segmental grouping favoring a metrical interpretation of the stimulus also facilitated the S2ST. This was, however, not the case for a regular beat structure within and across repetitions. In addition, individual perceptual abilities allowed to predict the likelihood of the S2ST. Overall, the study demonstrated that repetition enables listeners to reinterpret specific prosodic features of spoken utterances in terms of musical structures. The findings underline a tight link between language and music, but they also reveal important differences in communicative functions of prosodic structure in the 2 domains.

  4. Some Issues in Infant Speech Perception: Do the Means Justify the Ends?

    ERIC Educational Resources Information Center

    Weitzman, Raymond S.

    2007-01-01

    A major focus of research on language acquisition in infancy involves experimental studies of the infant's ability to discriminate various kinds of speech or speech-like stimuli. This research has demonstrated that infants are sensitive to many fine-grained differences in the acoustic properties of speech utterance. Furthermore, these empirical…

  5. Audiovisual Speech Perception in Children with Developmental Language Disorder in Degraded Listening Conditions

    ERIC Educational Resources Information Center

    Meronen, Auli; Tiippana, Kaisa; Westerholm, Jari; Ahonen, Timo

    2013-01-01

    Purpose: The effect of the signal-to-noise ratio (SNR) on the perception of audiovisual speech in children with and without developmental language disorder (DLD) was investigated by varying the noise level and the sound intensity of acoustic speech. The main hypotheses were that the McGurk effect (in which incongruent visual speech alters the…

  6. Effects of Elicitation Task Variables on Speech Production by Children with Cochlear Implants

    ERIC Educational Resources Information Center

    McCleary, Elizabeth A.; Ide-Helvie, Dana L.; Lotto, Andrew J.; Carney, Arlene Earley; Higgins, Maureen B.

    2007-01-01

    Given the interest in comparing speech production development in children with normal hearing and hearing impairment, it is important to evaluate how variables within speech elicitation tasks can differentially affect the acoustics of speech production for these groups. In a first experiment, children (6-14 years old) with cochlear implants…

  7. Children's Perception of Speech Produced in a Two-Talker Background

    ERIC Educational Resources Information Center

    Baker, Mallory; Buss, Emily; Jacks, Adam; Taylor, Crystal; Leibold, Lori J.

    2014-01-01

    Purpose: This study evaluated the degree to which children benefit from the acoustic modifications made by talkers when they produce speech in noise. Method: A repeated measures design compared the speech perception performance of children (5-11 years) and adults in a 2-talker masker. Target speech was produced in a 2-talker background or in…

  8. The Hypothesis of Apraxia of Speech in Children with Autism Spectrum Disorder

    ERIC Educational Resources Information Center

    Shriberg, Lawrence D.; Paul, Rhea; Black, Lois M.; van Santen, Jan P.

    2011-01-01

    In a sample of 46 children aged 4-7 years with Autism Spectrum Disorder (ASD) and intelligible speech, there was no statistical support for the hypothesis of concomitant Childhood Apraxia of Speech (CAS). Perceptual and acoustic measures of participants' speech, prosody, and voice were compared with data from 40 typically-developing children, 13…

  9. EXPERIMENTAL ANALYSIS OF THE CONTROL OF SPEECH PRODUCTION AND PERCEPTION--VI. PROGRESS REPORT NO. 6.

    ERIC Educational Resources Information Center

    LANE, HARLAN

    VARIOUS ASPECTS OF THE PROGRESS OF AN EXPERIMENTAL PROGRAM IN SPEECH CONTROL WERE REPORTED. THE TOPICS COVERED IN THE DISCUSSIONS WERE (1) ACOUSTIC AND DISCRIMINATIVE PROPERTIES OF SPEECH SOUNDS, (2) MATCHING FUNCTIONS AND EQUAL-SENSATION CONTOURS FOR LOUDNESS, (3) RELATIONS BETWEEN IDENTIFICATION AND DISCRIMINATION FUNCTIONS FOR SPEECH AND…

  10. Free Speech Yearbook: 1972.

    ERIC Educational Resources Information Center

    Tedford, Thomas L., Ed.

    This book is a collection of essays on free speech issues and attitudes, compiled by the Commission on Freedom of Speech of the Speech Communication Association. Four articles focus on freedom of speech in classroom situations as follows: a philosophic view of teaching free speech, effects of a course on free speech on student attitudes,…

  11. A survey on acoustic signature recognition and classification techniques for persistent surveillance systems

    NASA Astrophysics Data System (ADS)

    Shirkhodaie, Amir; Alkilani, Amjad

    2012-06-01

    Application of acoustic sensors in Persistent Surveillance Systems (PSS) has received considerable attention over the last two decades because they can be rapidly deployed and have low cost. Conventional utilization of acoustic sensors in PSS spans a wide range of applications including: vehicle classification, target tracking, activity understanding, speech recognition, shooter detection, etc. This paper presents a current survey of physics-based acoustic signature classification techniques for outdoor sounds recognition and understanding. Particularly, this paper focuses on taxonomy and ontology of acoustic signatures resulted from group activities. The taxonomy and supportive ontology considered include: humanvehicle, human-objects, and human-human interactions. This paper, in particular, exploits applicability of several spectral analysis techniques as a means to maximize likelihood of correct acoustic source detection, recognition, and discrimination. Spectral analysis techniques based on Fast Fourier Transform, Discrete Wavelet Transform, and Short Time Fourier Transform are considered for extraction of features from acoustic sources. In addition, comprehensive overviews of most current research activities related to scope of this work are presented with their applications. Furthermore, future potential direction of research in this area is discussed for improvement of acoustic signature recognition and classification technology suitable for PSS applications.

  12. Statistical properties of infant-directed versus adult-directed speech: Insights from speech recognition

    NASA Astrophysics Data System (ADS)

    Kirchhoff, Katrin; Schimmel, Steven

    2005-04-01

    Previous studies have shown that infant-directed speech (`motherese') exhibits overemphasized acoustic properties which may facilitate the acquisition of phonetic categories by infant learners. It has been suggested that the use of infant-directed data for training automatic speech recognition systems might also enhance the automatic learning and discrimination of phonetic categories. This study investigates the properties of infant-directed vs. adult-directed speech from the point of view of the statistical pattern recognition paradigm underlying automatic speech recognition. Isolated-word speech recognizers were trained on adult-directed vs. infant-directed data sets and were tested on both matched and mismatched data. Results show that recognizers trained on infant-directed speech did not always exhibit better recognition performance; however, their relative loss in performance on mismatched data was significantly less severe than that of recognizers trained on adult-directed speech and presented with infant-directed test data. An analysis of the statistical distributions of a subset of phonetic classes in both data sets showed that this pattern is caused by larger class overlaps in infant-directed speech. This finding has implications for both automatic speech recognition and theories of infant speech perception. .

  13. Start/End Delays of Voiced and Unvoiced Speech Signals

    SciTech Connect

    Herrnstein, A

    1999-09-24

    Recent experiments using low power EM-radar like sensors (e.g, GEMs) have demonstrated a new method for measuring vocal fold activity and the onset times of voiced speech, as vocal fold contact begins to take place. Similarly the end time of a voiced speech segment can be measured. Secondly it appears that in most normal uses of American English speech, unvoiced-speech segments directly precede or directly follow voiced-speech segments. For many applications, it is useful to know typical duration times of these unvoiced speech segments. A corpus, assembled earlier of spoken ''Timit'' words, phrases, and sentences and recorded using simultaneously measured acoustic and EM-sensor glottal signals, from 16 male speakers, was used for this study. By inspecting the onset (or end) of unvoiced speech, using the acoustic signal, and the onset (or end) of voiced speech using the EM sensor signal, the average duration times for unvoiced segments preceding onset of vocalization were found to be 300ms, and for following segments, 500ms. An unvoiced speech period is then defined in time, first by using the onset of the EM-sensed glottal signal, as the onset-time marker for the voiced speech segment and end marker for the unvoiced segment. Then, by subtracting 300ms from the onset time mark of voicing, the unvoiced speech segment start time is found. Similarly, the times for a following unvoiced speech segment can be found. While data of this nature have proven to be useful for work in our laboratory, a great deal of additional work remains to validate such data for use with general populations of users. These procedures have been useful for applying optimal processing algorithms over time segments of unvoiced, voiced, and non-speech acoustic signals. For example, these data appear to be of use in speaker validation, in vocoding, and in denoising algorithms.

  14. Primary Progressive Aphasia and Apraxia of Speech

    PubMed Central

    Jung, Youngsin; Duffy, Joseph R.; Josephs, Keith A.

    2014-01-01

    Primary progressive aphasia is a neurodegenerative syndrome characterized by progressive language dysfunction. The majority of primary progressive aphasia cases can be classified into three subtypes: non-fluent/agrammatic, semantic, and logopenic variants of primary progressive aphasia. Each variant presents with unique clinical features, and is associated with distinctive underlying pathology and neuroimaging findings. Unlike primary progressive aphasia, apraxia of speech is a disorder that involves inaccurate production of sounds secondary to impaired planning or programming of speech movements. Primary progressive apraxia of speech is a neurodegenerative form of apraxia of speech, and it should be distinguished from primary progressive aphasia given its discrete clinicopathological presentation. Recently, there have been substantial advances in our understanding of these speech and language disorders. Here, we review clinical, neuroimaging, and histopathological features of primary progressive aphasia and apraxia of speech. The distinctions among these disorders will be crucial since accurate diagnosis will be important from a prognostic and therapeutic standpoint. PMID:24234355

  15. Learning Vowel Categories from Maternal Speech in Gurindji Kriol

    ERIC Educational Resources Information Center

    Jones, Caroline; Meakins, Felicity; Muawiyath, Shujau

    2012-01-01

    Distributional learning is a proposal for how infants might learn early speech sound categories from acoustic input before they know many words. When categories in the input differ greatly in relative frequency and overlap in acoustic space, research in bilingual development suggests that this affects the course of development. In the present…

  16. Talker Versus Dialect Effects on Speech Intelligibility: A Symmetrical Study.

    PubMed

    McCloy, Daniel R; Wright, Richard A; Souza, Pamela E

    2015-09-01

    This study investigates the relative effects of talker-specific variation and dialect-based variation on speech intelligibility. Listeners from two dialects of American English performed speech-in-noise tasks with sentences spoken by talkers of each dialect. An initial statistical model showed no significant effects for either talker or listener dialect group, and no interaction. However, a mixed-effects regression model including several acoustic measures of the talker's speech revealed a subtle effect of talker dialect once the various acoustic dimensions were accounted for. Results are discussed in relation to other recent studies of cross-dialect intelligibility. PMID:26529902

  17. Talker versus dialect effects on speech intelligibility: a symmetrical study

    PubMed Central

    McCloy, Daniel R.; Wright, Richard A.; Souza, Pamela E.

    2014-01-01

    This study investigates the relative effects of talker-specific variation and dialect-based variation on speech intelligibility. Listeners from two dialects of American English performed speech-in-noise tasks with sentences spoken by talkers of each dialect. An initial statistical model showed no significant effects for either talker or listener dialect group, and no interaction. However, a mixed-effects regression model including several acoustic measures of the talker’s speech revealed a subtle effect of talker dialect once the various acoustic dimensions were accounted for. Results are discussed in relation to other recent studies of cross-dialect intelligibility. PMID:26529902

  18. Talker Versus Dialect Effects on Speech Intelligibility: A Symmetrical Study.

    PubMed

    McCloy, Daniel R; Wright, Richard A; Souza, Pamela E

    2015-09-01

    This study investigates the relative effects of talker-specific variation and dialect-based variation on speech intelligibility. Listeners from two dialects of American English performed speech-in-noise tasks with sentences spoken by talkers of each dialect. An initial statistical model showed no significant effects for either talker or listener dialect group, and no interaction. However, a mixed-effects regression model including several acoustic measures of the talker's speech revealed a subtle effect of talker dialect once the various acoustic dimensions were accounted for. Results are discussed in relation to other recent studies of cross-dialect intelligibility.

  19. GALLAUDET'S NEW HEARING AND SPEECH CENTER.

    ERIC Educational Resources Information Center

    FRISINA, D. ROBERT

    THIS REPROT DESCRIBES THE DESIGN OF A NEW SPEECH AND HEARING CENTER AND ITS INTEGRATION INTO THE OVERALL ARCHITECTURAL SCHEME OF THE CAMPUS. THE CIRCULAR SHAPE WAS SELECTED TO COMPLEMENT THE SURROUNDING STRUCTURES AND COMPENSATE FOR DIFFERENCES IN SITE, WHILE PROVIDING THE ACOUSTICAL ADVANTAGES OF NON-PARALLEL WALLS, AND FACILITATING TRAFFIC FLOW.…

  20. Enhancement of Electrolaryngeal Speech by Adaptive Filtering.

    ERIC Educational Resources Information Center

    Espy-Wilson, Carol Y.; Chari, Venkatesh R.; MacAuslan, Joel M.; Huang, Caroline B.; Walsh, Michael J.

    1998-01-01

    A study tested the quality and intelligibility, as judged by several listeners, of four users' electrolaryngeal speech, with and without filtering to compensate for perceptually objectionable acoustic characteristics. Results indicated that an adaptive filtering technique produced a noticeable improvement in the quality of the Transcutaneous…

  1. Effects of Syntactic Expectations on Speech Segmentation

    ERIC Educational Resources Information Center

    Mattys, Sven L.; Melhorn, James F.; White, Laurence

    2007-01-01

    Although the effect of acoustic cues on speech segmentation has been extensively investigated, the role of higher order information (e.g., syntax) has received less attention. Here, the authors examined whether syntactic expectations based on subject-verb agreement have an effect on segmentation and whether they do so despite conflicting acoustic…

  2. Effects of Cognitive Load on Speech Recognition

    ERIC Educational Resources Information Center

    Mattys, Sven L.; Wiget, Lukas

    2011-01-01

    The effect of cognitive load (CL) on speech recognition has received little attention despite the prevalence of CL in everyday life, e.g., dual-tasking. To assess the effect of CL on the interaction between lexically-mediated and acoustically-mediated processes, we measured the magnitude of the "Ganong effect" (i.e., lexical bias on phoneme…

  3. Speech Intelligibility

    NASA Astrophysics Data System (ADS)

    Brand, Thomas

    Speech intelligibility (SI) is important for different fields of research, engineering and diagnostics in order to quantify very different phenomena like the quality of recordings, communication and playback devices, the reverberation of auditoria, characteristics of hearing impairment, benefit using hearing aids or combinations of these things.

  4. Speech neglect: A strange educational blind spot

    NASA Astrophysics Data System (ADS)

    Harris, Katherine Safford

    2005-09-01

    Speaking is universally acknowledged as an important human talent, yet as a topic of educated common knowledge, it is peculiarly neglected. Partly, this is a consequence of the relatively recent growth of research on speech perception, production, and development, but also a function of the way that information is sliced up by undergraduate colleges. Although the basic acoustic mechanism of vowel production was known to Helmholtz, the ability to view speech production as a physiological event is evolving even now with such techniques as fMRI. Intensive research on speech perception emerged only in the early 1930s as Fletcher and the engineers at Bell Telephone Laboratories developed the transmission of speech over telephone lines. The study of speech development was revolutionized by the papers of Eimas and his colleagues on speech perception in infants in the 1970s. Dissemination of knowledge in these fields is the responsibility of no single academic discipline. It forms a center for two departments, Linguistics, and Speech and Hearing, but in the former, there is a heavy emphasis on other aspects of language than speech and, in the latter, a focus on clinical practice. For psychologists, it is a rather minor component of a very diverse assembly of topics. I will focus on these three fields in proposing possible remedies.

  5. Speech vs. singing: infants choose happier sounds

    PubMed Central

    Corbeil, Marieve; Trehub, Sandra E.; Peretz, Isabelle

    2013-01-01

    Infants prefer speech to non-vocal sounds and to non-human vocalizations, and they prefer happy-sounding speech to neutral speech. They also exhibit an interest in singing, but there is little knowledge of their relative interest in speech and singing. The present study explored infants' attention to unfamiliar audio samples of speech and singing. In Experiment 1, infants 4–13 months of age were exposed to happy-sounding infant-directed speech vs. hummed lullabies by the same woman. They listened significantly longer to the speech, which had considerably greater acoustic variability and expressiveness, than to the lullabies. In Experiment 2, infants of comparable age who heard the lyrics of a Turkish children's song spoken vs. sung in a joyful/happy manner did not exhibit differential listening. Infants in Experiment 3 heard the happily sung lyrics of the Turkish children's song vs. a version that was spoken in an adult-directed or affectively neutral manner. They listened significantly longer to the sung version. Overall, happy voice quality rather than vocal mode (speech or singing) was the principal contributor to infant attention, regardless of age. PMID:23805119

  6. The Modulation Transfer Function for Speech Intelligibility

    PubMed Central

    Elliott, Taffeta M.; Theunissen, Frédéric E.

    2009-01-01

    We systematically determined which spectrotemporal modulations in speech are necessary for comprehension by human listeners. Speech comprehension has been shown to be robust to spectral and temporal degradations, but the specific relevance of particular degradations is arguable due to the complexity of the joint spectral and temporal information in the speech signal. We applied a novel modulation filtering technique to recorded sentences to restrict acoustic information quantitatively and to obtain a joint spectrotemporal modulation transfer function for speech comprehension, the speech MTF. For American English, the speech MTF showed the criticality of low modulation frequencies in both time and frequency. Comprehension was significantly impaired when temporal modulations <12 Hz or spectral modulations <4 cycles/kHz were removed. More specifically, the MTF was bandpass in temporal modulations and low-pass in spectral modulations: temporal modulations from 1 to 7 Hz and spectral modulations <1 cycles/kHz were the most important. We evaluated the importance of spectrotemporal modulations for vocal gender identification and found a different region of interest: removing spectral modulations between 3 and 7 cycles/kHz significantly increases gender misidentifications of female speakers. The determination of the speech MTF furnishes an additional method for producing speech signals with reduced bandwidth but high intelligibility. Such compression could be used for audio applications such as file compression or noise removal and for clinical applications such as signal processing for cochlear implants. PMID:19266016

  7. Brain-Computer Interfaces for Speech Communication

    PubMed Central

    Brumberg, Jonathan S.; Nieto-Castanon, Alfonso; Kennedy, Philip R.; Guenther, Frank H.

    2010-01-01

    This paper briefly reviews current silent speech methodologies for normal and disabled individuals. Current techniques utilizing electromyographic (EMG) recordings of vocal tract movements are useful for physically healthy individuals but fail for tetraplegic individuals who do not have accurate voluntary control over the speech articulators. Alternative methods utilizing EMG from other body parts (e.g., hand, arm, or facial muscles) or electroencephalography (EEG) can provide capable silent communication to severely paralyzed users, though current interfaces are extremely slow relative to normal conversation rates and require constant attention to a computer screen that provides visual feedback and/or cueing. We present a novel approach to the problem of silent speech via an intracortical microelectrode brain computer interface (BCI) to predict intended speech information directly from the activity of neurons involved in speech production. The predicted speech is synthesized and acoustically fed back to the user with a delay under 50 ms. We demonstrate that the Neurotrophic Electrode used in the BCI is capable of providing useful neural recordings for over 4 years, a necessary property for BCIs that need to remain viable over the lifespan of the user. Other design considerations include neural decoding techniques based on previous research involving BCIs for computer cursor or robotic arm control via prediction of intended movement kinematics from motor cortical signals in monkeys and humans. Initial results from a study of continuous speech production with instantaneous acoustic feedback show the BCI user was able to improve his control over an artificial speech synthesizer both within and across recording sessions. The success of this initial trial validates the potential of the intracortical microelectrode-based approach for providing a speech prosthesis that can allow much more rapid communication rates. PMID:20204164

  8. Brain-Computer Interfaces for Speech Communication.

    PubMed

    Brumberg, Jonathan S; Nieto-Castanon, Alfonso; Kennedy, Philip R; Guenther, Frank H

    2010-04-01

    This paper briefly reviews current silent speech methodologies for normal and disabled individuals. Current techniques utilizing electromyographic (EMG) recordings of vocal tract movements are useful for physically healthy individuals but fail for tetraplegic individuals who do not have accurate voluntary control over the speech articulators. Alternative methods utilizing EMG from other body parts (e.g., hand, arm, or facial muscles) or electroencephalography (EEG) can provide capable silent communication to severely paralyzed users, though current interfaces are extremely slow relative to normal conversation rates and require constant attention to a computer screen that provides visual feedback and/or cueing. We present a novel approach to the problem of silent speech via an intracortical microelectrode brain computer interface (BCI) to predict intended speech information directly from the activity of neurons involved in speech production. The predicted speech is synthesized and acoustically fed back to the user with a delay under 50 ms. We demonstrate that the Neurotrophic Electrode used in the BCI is capable of providing useful neural recordings for over 4 years, a necessary property for BCIs that need to remain viable over the lifespan of the user. Other design considerations include neural decoding techniques based on previous research involving BCIs for computer cursor or robotic arm control via prediction of intended movement kinematics from motor cortical signals in monkeys and humans. Initial results from a study of continuous speech production with instantaneous acoustic feedback show the BCI user was able to improve his control over an artificial speech synthesizer both within and across recording sessions. The success of this initial trial validates the potential of the intracortical microelectrode-based approach for providing a speech prosthesis that can allow much more rapid communication rates.

  9. Speech processing: An evolving technology

    SciTech Connect

    Crochiere, R.E.; Flanagan, J.L.

    1986-09-01

    As we enter the information age, speech processing is emerging as an important technology for making machines easier and more convenient for humans to use. It is both an old and a new technology - dating back to the invention of the telephone and forward, at least in aspirations, to the capabilities of HAL in 2001. Explosive advances in microelectronics now make it possible to implement economical real-time hardware for sophisticated speech processing - processing that formerly could be demonstrated only in simulations on main-frame computers. As a result, fundamentally new product concepts - as well as new features and functions in existing products - are becoming possible and are being explored in the marketplace. As the introductory piece to this issue, the authors draw a brief perspective on the evolving field of speech processing and assess the technology in the the three constituent sectors: speech coding, synthesis, and recognition.

  10. Speech perception as an active cognitive process

    PubMed Central

    Heald, Shannon L. M.; Nusbaum, Howard C.

    2014-01-01

    One view of speech perception is that acoustic signals are transformed into representations for pattern matching to determine linguistic structure. This process can be taken as a statistical pattern-matching problem, assuming realtively stable linguistic categories are characterized by neural representations related to auditory properties of speech that can be compared to speech input. This kind of pattern matching can be termed a passive process which implies rigidity of processing with few demands on cognitive processing. An alternative view is that speech recognition, even in early stages, is an active process in which speech analysis is attentionally guided. Note that this does not mean consciously guided but that information-contingent changes in early auditory encoding can occur as a function of context and experience. Active processing assumes that attention, plasticity, and listening goals are important in considering how listeners cope with adverse circumstances that impair hearing by masking noise in the environment or hearing loss. Although theories of speech perception have begun to incorporate some active processing, they seldom treat early speech encoding as plastic and attentionally guided. Recent research has suggested that speech perception is the product of both feedforward and feedback interactions between a number of brain regions that include descending projections perhaps as far downstream as the cochlea. It is important to understand how the ambiguity of the speech signal and constraints of context dynamically determine cognitive resources recruited during perception including focused attention, learning, and working memory. Theories of speech perception need to go beyond the current corticocentric approach in order to account for the intrinsic dynamics of the auditory encoding of speech. In doing so, this may provide new insights into ways in which hearing disorders and loss may be treated either through augementation or therapy. PMID

  11. Real-time speech-driven animation of expressive talking faces

    NASA Astrophysics Data System (ADS)

    Liu, Jia; You, Mingyu; Chen, Chun; Song, Mingli

    2011-05-01

    In this paper, we present a real-time facial animation system in which speech drives mouth movements and facial expressions synchronously. Considering five basic emotions, a hierarchical structure with an upper layer of emotion classification is established. Based on the recognized emotion label, the under-layer classification at sub-phonemic level has been modelled on the relationship between acoustic features of frames and audio labels in phonemes. Using certain constraint, the predicted emotion labels of speech are adjusted to gain the facial expression labels which are combined with sub-phonemic labels. The combinations are mapped into facial action units (FAUs), and audio-visual synchronized animation with mouth movements and facial expressions is generated by morphing between FAUs. The experimental results demonstrate that the two-layer structure succeeds in both emotion and sub-phonemic classifications, and the synthesized facial sequences reach a comparative convincing quality.

  12. Lexical effects on speech production and intelligibility in Parkinson's disease

    NASA Astrophysics Data System (ADS)

    Chiu, Yi-Fang

    Individuals with Parkinson's disease (PD) often have speech deficits that lead to reduced speech intelligibility. Previous research provides a rich database regarding the articulatory deficits associated with PD including restricted vowel space (Skodda, Visser, & Schlegel, 2011) and flatter formant transitions (Tjaden & Wilding, 2004; Walsh & Smith, 2012). However, few studies consider the effect of higher level structural variables of word usage frequency and the number of similar sounding words (i.e. neighborhood density) on lower level articulation or on listeners' perception of dysarthric speech. The purpose of the study is to examine the interaction of lexical properties and speech articulation as measured acoustically in speakers with PD and healthy controls (HC) and the effect of lexical properties on the perception of their speech. Individuals diagnosed with PD and age-matched healthy controls read sentences with words that varied in word frequency and neighborhood density. Acoustic analysis was performed to compare second formant transitions in diphthongs, an indicator of the dynamics of tongue movement during speech production, across different lexical characteristics. Young listeners transcribed the spoken sentences and the transcription accuracy was compared across lexical conditions. The acoustic results indicate that both PD and HC speakers adjusted their articulation based on lexical properties but the PD group had significant reductions in second formant transitions compared to HC. Both groups of speakers increased second formant transitions for words with low frequency and low density, but the lexical effect is diphthong dependent. The change in second formant slope was limited in the PD group when the required formant movement for the diphthong is small. The data from listeners' perception of the speech by PD and HC show that listeners identified high frequency words with greater accuracy suggesting the use of lexical knowledge during the

  13. Speech preparation in adults with persistent developmental stuttering.

    PubMed

    Mock, Jeffrey R; Foundas, Anne L; Golob, Edward J

    2015-10-01

    Motor efference copy conveys movement information to sensory areas before and during vocalization. We hypothesized speech preparation would modulate auditory processing, via motor efference copy, differently in men who stutter (MWS) vs. fluent adults. Participants (n=12/group) had EEG recorded during a cue-target paradigm with two conditions: speech which allowed for speech preparation, while a control condition did not. Acoustic stimuli probed auditory responsiveness between the cue and target. MWS had longer vocal reaction times (p<0.01) when the cue-target differed (10% of trials), suggesting a difficulty of rapidly updating their speech plans. Acoustic probes elicited a negative slow wave indexing motor efference copy that was smaller in MWS vs. fluent adults (p<0.03). Current density responses in MWS showed smaller left prefrontal responses and auditory responses that were delayed and correlated to stuttering rate. Taken together, the results provide insight into the cortical mechanisms underlying atypical speech planning and dysfluencies in MWS.

  14. Speech research: Studies on the nature of speech, instrumentation for its investigation, and practical applications

    NASA Astrophysics Data System (ADS)

    Liberman, A. M.

    1982-03-01

    This report is one of a regular series on the status and progress of studies on the nature of speech, instrumentation for its investigation and practical applications. Manuscripts cover the following topics: Speech perception and memory coding in relation to reading ability; The use of orthographic structure by deaf adults: Recognition of finger-spelled letters; Exploring the information support for speech; The stream of speech; Using the acoustic signal to make inferences about place and duration of tongue-palate contact. Patterns of human interlimb coordination emerge from the the properties of nonlinear limit cycle oscillatory processes: Theory and data; Motor control: Which themes do we orchestrate? Exploring the nature of motor control in Down's syndrome; Periodicity and auditory memory: A pilot study; Reading skill and language skill: On the role of sign order and morphological structure in memory for American Sign Language sentences; Perception of nasal consonants with special reference to Catalan; and Speech production Characteristics of the hearing impaired.

  15. Neuroscience-inspired computational systems for speech recognition under noisy conditions

    NASA Astrophysics Data System (ADS)

    Schafer, Phillip B.

    Humans routinely recognize speech in challenging acoustic environments with background music, engine sounds, competing talkers, and other acoustic noise. However, today's automatic speech recognition (ASR) systems perform poorly in such environments. In this dissertation, I present novel methods for ASR designed to approach human-level performance by emulating the brain's processing of sounds. I exploit recent advances in auditory neuroscience to compute neuron-based representations of speech, and design novel methods for decoding these representations to produce word transcriptions. I begin by considering speech representations modeled on the spectrotemporal receptive fields of auditory neurons. These representations can be tuned to optimize a variety of objective functions, which characterize the response properties of a neural population. I propose an objective function that explicitly optimizes the noise invariance of the neural responses, and find that it gives improved performance on an ASR task in noise compared to other objectives. The method as a whole, however, fails to significantly close the performance gap with humans. I next consider speech representations that make use of spiking model neurons. The neurons in this method are feature detectors that selectively respond to spectrotemporal patterns within short time windows in speech. I consider a number of methods for training the response properties of the neurons. In particular, I present a method using linear support vector machines (SVMs) and show that this method produces spikes that are robust to additive noise. I compute the spectrotemporal receptive fields of the neurons for comparison with previous physiological results. To decode the spike-based speech representations, I propose two methods designed to work on isolated word recordings. The first method uses a classical ASR technique based on the hidden Markov model. The second method is a novel template-based recognition scheme that takes

  16. Speech and noise levels for predicting the degree of speech security

    NASA Astrophysics Data System (ADS)

    Bradley, John S.; Gover, Bradford N.

    2005-09-01

    A meeting room is speech secure when it is difficult or impossible for an eavesdropper to overhear speech from within. The degree of security could range from less stringent conditions of being barely able to understand a few words from the meeting room, to higher levels, where transmitted speech would be completely inaudible. This paper reports on measurements to determine the statistical distribution of speech levels in meeting rooms and the distribution of ambient noise levels just outside meeting rooms. To select the required transmission characteristics for a meeting room wall, one would first decide on an acceptable level of risk, in terms of the probability of a speech security lapse occurring. This leads to the selection of a combination of a speech level in the meeting room and a noise level nearby that would occur together with this probability. The combination of appropriate estimates of meeting room speech levels and nearby ambient noise levels, together with the sound transmission characteristics of the intervening partition, makes it possible to calculate signal/noise ratio indices related to speech security [J. Acoust. Soc. Am. 116(6), 3480-3490 (2004)]. The value of these indices indicates if adequate speech security will be achieved.

  17. Specific features of vowel-like signals of white whales

    NASA Astrophysics Data System (ADS)

    Bel'Kovich, V. M.; Kreichi, S. A.

    2004-05-01

    The set of acoustic signals of White-Sea white whales comprises about 70 types of signals. Six of them occur most often and constitute 75% of the total number of signals produced by these animals. According to behavioral reactions, white whales distinguish each other by acoustic signals, which is also typical of other animal species and humans. To investigate this phenomenon, signals perceived as vowel-like sounds of speech, including sounds perceived as a “bleat,” were chosen A sample of 480 signals recorded in June and July, 2000, in the White Sea within a reproductive assemblage of white whales near the Large Solovetskii Island was studied. Signals were recorded on a digital data carrier (a SONY minidisk) in the frequency range of 0.06 20 kHz. The purpose of the study was to reveal the perceptive and acoustic features specific to individual animals. The study was carried out using the methods of structural analysis of vocal speech that are employed in lingual criminalistics to identify a speaking person. It was demonstrated that this approach allows one to group the signals by coincident perceptive and acoustic parameters with assigning individual attributes to single parameters. This provided an opportunity to separate conditionally about 40 different sources of acoustic signals according to the totality of coincidences, which corresponded to the number of white whales observed visually. Thus, the application of this method proves to be very promising for the acoustic identification of white whales and other marine mammals, this possibility being very important for biology.

  18. On the relationship between auditory cognition and speech intelligibility in cochlear implant users: An ERP study.

    PubMed

    Finke, Mareike; Büchner, Andreas; Ruigendijk, Esther; Meyer, Martin; Sandmann, Pascale

    2016-07-01

    There is a high degree of variability in speech intelligibility outcomes across cochlear-implant (CI) users. To better understand how auditory cognition affects speech intelligibility with the CI, we performed an electroencephalography study in which we examined the relationship between central auditory processing, cognitive abilities, and speech intelligibility. Postlingually deafened CI users (N=13) and matched normal-hearing (NH) listeners (N=13) performed an oddball task with words presented in different background conditions (quiet, stationary noise, modulated noise). Participants had to categorize words as living (targets) or non-living entities (standards). We also assessed participants' working memory (WM) capacity and verbal abilities. For the oddball task, we found lower hit rates and prolonged response times in CI users when compared with NH listeners. Noise-related prolongation of the N1 amplitude was found for all participants. Further, we observed group-specific modulation effects of event-related potentials (ERPs) as a function of background noise. While NH listeners showed stronger noise-related modulation of the N1 latency, CI users revealed enhanced modulation effects of the N2/N4 latency. In general, higher-order processing (N2/N4, P3) was prolonged in CI users in all background conditions when compared with NH listeners. Longer N2/N4 latency in CI users suggests that these individuals have difficulties to map acoustic-phonetic features to lexical representations. These difficulties seem to be increased for speech-in-noise conditions when compared with speech in quiet background. Correlation analyses showed that shorter ERP latencies were related to enhanced speech intelligibility (N1, N2/N4), better lexical fluency (N1), and lower ratings of listening effort (N2/N4) in CI users. In sum, our findings suggest that CI users and NH listeners differ with regards to both the sensory and the higher-order processing of speech in quiet as well as in

  19. The Neural Substrates of Infant Speech Perception

    ERIC Educational Resources Information Center

    Homae, Fumitaka; Watanabe, Hama; Taga, Gentaro

    2014-01-01

    Infants often pay special attention to speech sounds, and they appear to detect key features of these sounds. To investigate the neural foundation of speech perception in infants, we measured cortical activation using near-infrared spectroscopy. We presented the following three types of auditory stimuli while 3-month-old infants watched a silent…

  20. Melodic contour identification and sentence recognition using sung speech

    PubMed Central

    Crew, Joseph D.; Galvin, John J.; Fu, Qian-Jie

    2015-01-01

    For bimodal cochlear implant users, acoustic and electric hearing has been shown to contribute differently to speech and music perception. However, differences in test paradigms and stimuli in speech and music testing can make it difficult to assess the relative contributions of each device. To address these concerns, the Sung Speech Corpus (SSC) was created. The SSC contains 50 monosyllable words sung over an octave range and can be used to test both speech and music perception using the same stimuli. Here SSC data are presented with normal hearing listeners and any advantage of musicianship is examined. PMID:26428838