Science.gov

Sample records for acoustic speech features

  1. Adding articulatory features to acoustic features for automatic speech recognition

    SciTech Connect

    Zlokarnik, I.

    1995-05-01

    A hidden-Markov-model (HMM) based speech recognition system was evaluated that makes use of simultaneously recorded acoustic and articulatory data. The articulatory measurements were gathered by means of electromagnetic articulography and describe the movement of small coils fixed to the speakers` tongue and jaw during the production of German V{sub 1}CV{sub 2} sequences [P. Hoole and S. Gfoerer, J. Acoust. Soc. Am. Suppl. 1 {bold 87}, S123 (1990)]. Using the coordinates of the coil positions as an articulatory representation, acoustic and articulatory features were combined to make up an acoustic--articulatory feature vector. The discriminant power of this combined representation was evaluated for two subjects on a speaker-dependent isolated word recognition task. When the articulatory measurements were used both for training and testing the HMMs, the articulatory representation was capable of reducing the error rate of comparable acoustic-based HMMs by a relative percentage of more than 60%. In a separate experiment, the articulatory movements during the testing phase were estimated using a multilayer perceptron that performed an acoustic-to-articulatory mapping. Under these more realistic conditions, when articulatory measurements are only available during the training, the error rate could be reduced by a relative percentage of 18% to 25%.

  2. Automatic speech recognition using articulatory features from subject-independent acoustic-to-articulatory inversion.

    PubMed

    Ghosh, Prasanta Kumar; Narayanan, Shrikanth

    2011-10-01

    An automatic speech recognition approach is presented which uses articulatory features estimated by a subject-independent acoustic-to-articulatory inversion. The inversion allows estimation of articulatory features from any talker's speech acoustics using only an exemplary subject's articulatory-to-acoustic map. Results are reported on a broad class phonetic classification experiment on speech from English talkers using data from three distinct English talkers as exemplars for inversion. Results indicate that the inclusion of the articulatory information improves classification accuracy but the improvement is more significant when the speaking style of the exemplar and the talker are matched compared to when they are mismatched.

  3. Dynamic Encoding of Acoustic Features in Neural Responses to Continuous Speech.

    PubMed

    Khalighinejad, Bahar; Cruzatto da Silva, Guilherme; Mesgarani, Nima

    2017-02-22

    Humans are unique in their ability to communicate using spoken language. However, it remains unclear how the speech signal is transformed and represented in the brain at different stages of the auditory pathway. In this study, we characterized electroencephalography responses to continuous speech by obtaining the time-locked responses to phoneme instances (phoneme-related potential). We showed that responses to different phoneme categories are organized by phonetic features. We found that each instance of a phoneme in continuous speech produces multiple distinguishable neural responses occurring as early as 50 ms and as late as 400 ms after the phoneme onset. Comparing the patterns of phoneme similarity in the neural responses and the acoustic signals confirms a repetitive appearance of acoustic distinctions of phonemes in the neural data. Analysis of the phonetic and speaker information in neural activations revealed that different time intervals jointly encode the acoustic similarity of both phonetic and speaker categories. These findings provide evidence for a dynamic neural transformation of low-level speech features as they propagate along the auditory pathway, and form an empirical framework to study the representational changes in learning, attention, and speech disorders.SIGNIFICANCE STATEMENT We characterized the properties of evoked neural responses to phoneme instances in continuous speech. We show that each instance of a phoneme in continuous speech produces several observable neural responses at different times occurring as early as 50 ms and as late as 400 ms after the phoneme onset. Each temporal event explicitly encodes the acoustic similarity of phonemes, and linguistic and nonlinguistic information are best represented at different time intervals. Finally, we show a joint encoding of phonetic and speaker information, where the neural representation of speakers is dependent on phoneme category. These findings provide compelling new evidence for

  4. Dynamic Encoding of Acoustic Features in Neural Responses to Continuous Speech

    PubMed Central

    Khalighinejad, Bahar; Cruzatto da Silva, Guilherme

    2017-01-01

    Humans are unique in their ability to communicate using spoken language. However, it remains unclear how the speech signal is transformed and represented in the brain at different stages of the auditory pathway. In this study, we characterized electroencephalography responses to continuous speech by obtaining the time-locked responses to phoneme instances (phoneme-related potential). We showed that responses to different phoneme categories are organized by phonetic features. We found that each instance of a phoneme in continuous speech produces multiple distinguishable neural responses occurring as early as 50 ms and as late as 400 ms after the phoneme onset. Comparing the patterns of phoneme similarity in the neural responses and the acoustic signals confirms a repetitive appearance of acoustic distinctions of phonemes in the neural data. Analysis of the phonetic and speaker information in neural activations revealed that different time intervals jointly encode the acoustic similarity of both phonetic and speaker categories. These findings provide evidence for a dynamic neural transformation of low-level speech features as they propagate along the auditory pathway, and form an empirical framework to study the representational changes in learning, attention, and speech disorders. SIGNIFICANCE STATEMENT We characterized the properties of evoked neural responses to phoneme instances in continuous speech. We show that each instance of a phoneme in continuous speech produces several observable neural responses at different times occurring as early as 50 ms and as late as 400 ms after the phoneme onset. Each temporal event explicitly encodes the acoustic similarity of phonemes, and linguistic and nonlinguistic information are best represented at different time intervals. Finally, we show a joint encoding of phonetic and speaker information, where the neural representation of speakers is dependent on phoneme category. These findings provide compelling new evidence for

  5. Primary Progressive Apraxia of Speech: Clinical Features and Acoustic and Neurologic Correlates

    PubMed Central

    Strand, Edythe A.; Clark, Heather; Machulda, Mary; Whitwell, Jennifer L.; Josephs, Keith A.

    2015-01-01

    Purpose This study summarizes 2 illustrative cases of a neurodegenerative speech disorder, primary progressive apraxia of speech (AOS), as a vehicle for providing an overview of the disorder and an approach to describing and quantifying its perceptual features and some of its temporal acoustic attributes. Method Two individuals with primary progressive AOS underwent speech-language and neurologic evaluations on 2 occasions, ranging from 2.0 to 7.5 years postonset. Performance on several tests, tasks, and rating scales, as well as several acoustic measures, were compared over time within and between cases. Acoustic measures were compared with performance of control speakers. Results Both patients initially presented with AOS as the only or predominant sign of disease and without aphasia or dysarthria. The presenting features and temporal progression were captured in an AOS Rating Scale, an Articulation Error Score, and temporal acoustic measures of utterance duration, syllable rates per second, rates of speechlike alternating motion and sequential motion, and a pairwise variability index measure. Conclusions AOS can be the predominant manifestation of neurodegenerative disease. Clinical ratings of its attributes and acoustic measures of some of its temporal characteristics can support its diagnosis and help quantify its salient characteristics and progression over time. PMID:25654422

  6. Estimation of glottal source features from the spectral envelope of the acoustic speech signal

    NASA Astrophysics Data System (ADS)

    Torres, Juan Felix

    Speech communication encompasses diverse types of information, including phonetics, affective state, voice quality, and speaker identity. From a speech production standpoint, the acoustic speech signal can be mainly divided into glottal source and vocal tract components, which play distinct roles in rendering the various types of information it contains. Most deployed speech analysis systems, however, do not explicitly represent these two components as distinct entities, as their joint estimation from the acoustic speech signal becomes an ill-defined blind deconvolution problem. Nevertheless, because of the desire to understand glottal behavior and how it relates to perceived voice quality, there has been continued interest in explicitly estimating the glottal component of the speech signal. To this end, several inverse filtering (IF) algorithms have been proposed, but they are unreliable in practice because of the blind formulation of the separation problem. In an effort to develop a method that can bypass the challenging IF process, this thesis proposes a new glottal source information extraction method that relies on supervised machine learning to transform smoothed spectral representations of speech, which are already used in some of the most widely deployed and successful speech analysis applications, into a set of glottal source features. A transformation method based on Gaussian mixture regression (GMR) is presented and compared to current IF methods in terms of feature similarity, reliability, and speaker discrimination capability on a large speech corpus, and potential representations of the spectral envelope of speech are investigated for their ability represent glottal source variation in a predictable manner. The proposed system was found to produce glottal source features that reasonably matched their IF counterparts in many cases, while being less susceptible to spurious errors. The development of the proposed method entailed a study into the aspects

  7. Speech Recognition and Acoustic Features in Combined Electric and Acoustic Stimulation

    ERIC Educational Resources Information Center

    Yoon, Yang-soo; Li, Yongxin; Fu, Qian-Jie

    2012-01-01

    Purpose: In this study, the authors aimed to identify speech information processed by a hearing aid (HA) that is additive to information processed by a cochlear implant (CI) as a function of signal-to-noise ratio (SNR). Method: Speech recognition was measured with CI alone, HA alone, and CI + HA. Ten participants were separated into 2 groups; good…

  8. Speech Recognition and Acoustic Features in Combined Electric and Acoustic Stimulation

    PubMed Central

    Yoon, Yang-soo; Li, Yongxin; Fu, Qian-Jie

    2011-01-01

    Purpose This study aimed to identify speech information processed by a hearing aid (HA) that is additive to information processed by a cochlear implant (CI) as a function of signal-to-noise ratio (SNR). Method Speech recognition was measured with CI alone, HA alone, and CI+HA. Ten participants were separated into two groups; good (aided pure-tone average (PTA) < 55 dB) and poor (aided PTA ≥ 55 dB) at audiometric frequencies ≤ 1 kHz in HA. Results Results showed that the good aided PTA group derived a clear bimodal benefit (performance difference between CI+HA and CI alone) for vowel and sentence recognition in noise while the poor aided PTA group received little benefit across speech tests and SNRs. Results also showed that a better aided PTA helped in processing cues embedded in both low and high frequencies; none of these cues were significantly perceived by the poor aided PTA group. Conclusions The aided PTA is an important indicator for bimodal advantage in speech perception. The lack of bimodal benefits in the poor group may be attributed to the non-optimal HA fitting. Bimodal listening provides a synergistic effect for cues in both low and high frequency components in speech. PMID:22199183

  9. Acoustic and Articulatory Features of Diphthong Production: A Speech Clarity Study

    ERIC Educational Resources Information Center

    Tasko, Stephen M.; Greilick, Kristin

    2010-01-01

    Purpose: The purpose of this study was to evaluate how speaking clearly influences selected acoustic and orofacial kinematic measures associated with diphthong production. Method: Forty-nine speakers, drawn from the University of Wisconsin X-Ray Microbeam Speech Production Database (J. R. Westbury, 1994), served as participants. Samples of clear…

  10. Fast multi-feature paradigm for recording several mismatch negativities (MMNs) to phonetic and acoustic changes in speech sounds.

    PubMed

    Pakarinen, Satu; Lovio, Riikka; Huotilainen, Minna; Alku, Paavo; Näätänen, Risto; Kujala, Teija

    2009-12-01

    In this study, we addressed whether a new fast multi-feature mismatch negativity (MMN) paradigm can be used for determining the central auditory discrimination accuracy for several acoustic and phonetic changes in speech sounds. We recorded the MMNs in the multi-feature paradigm to changes in syllable intensity, frequency, and vowel length, as well as for consonant and vowel change, and compared these MMNs to those obtained with the traditional oddball paradigm. In addition, we examined the reliability of the multi-feature paradigm by repeating the recordings with the same subjects 1-7 days after the first recordings. The MMNs recorded with the multi-feature paradigm were similar to those obtained with the oddball paradigm. Furthermore, only minor differences were observed in the MMN amplitudes across the two recording sessions. Thus, this new multi-feature paradigm with speech stimuli provides similar results as the oddball paradigm, and the MMNs recorded with the new paradigm were reproducible.

  11. Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

    DOEpatents

    Holzrichter, John F.; Ng, Lawrence C.

    1998-01-01

    The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching.

  12. Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

    DOEpatents

    Holzrichter, J.F.; Ng, L.C.

    1998-03-17

    The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching. 35 figs.

  13. Exploration of Acoustic Features for Automatic Vowel Discrimination in Spontaneous Speech

    ERIC Educational Resources Information Center

    Tyson, Na'im R.

    2012-01-01

    In an attempt to understand what acoustic/auditory feature sets motivated transcribers towards certain labeling decisions, I built machine learning models that were capable of discriminating between canonical and non-canonical vowels excised from the Buckeye Corpus. Specifically, I wanted to model when the dictionary form and the transcribed-form…

  14. An acoustic feature-based similarity scoring system for speech rehabilitation assistance.

    PubMed

    Syauqy, Dahnial; Wu, Chao-Min; Setyawati, Onny

    2016-08-01

    The purpose of this study is to develop a tool to assist speech therapy and rehabilitation, which focused on automatic scoring based on the comparison of the patient's speech with another normal speech on several aspects including pitch, vowel, voiced-unvoiced segments, strident fricative and sound intensity. The pitch estimation employed the use of cepstrum-based algorithm for its robustness; the vowel classification used multilayer perceptron (MLP) to classify vowel from pitch and formants; and the strident fricative detection was based on the major peak spectral intensity, location and the pitch existence in the segment. In order to evaluate the performance of the system, this study analyzed eight patient's speech recordings (four males, four females; 4-58-years-old), which had been recorded in previous study in cooperation with Taipei Veterans General Hospital and Taoyuan General Hospital. The experiment result on pitch algorithm showed that the cepstrum method had 5.3% of gross pitch error from a total of 2086 frames. On the vowel classification algorithm, MLP method provided 93% accuracy (men), 87% (women) and 84% (children). In total, the overall results showed that 156 tool's grading results (81%) were consistent compared to 192 audio and visual observations done by four experienced respondents. Implication for Rehabilitation Difficulties in communication may limit the ability of a person to transfer and exchange information. The fact that speech is one of the primary means of communication has encouraged the needs of speech diagnosis and rehabilitation. The advances of technology in computer-assisted speech therapy (CAST) improve the quality, time efficiency of the diagnosis and treatment of the disorders. The present study attempted to develop tool to assist speech therapy and rehabilitation, which provided simple interface to let the assessment be done even by the patient himself without the need of particular knowledge of speech processing while at the

  15. Speech acoustics: How much science?

    PubMed

    Tiwari, Manjul

    2012-01-01

    Human vocalizations are sounds made exclusively by a human vocal tract. Among other vocalizations, for example, laughs or screams, speech is the most important. Speech is the primary medium of that supremely human symbolic communication system called language. One of the functions of a voice, perhaps the main one, is to realize language, by conveying some of the speaker's thoughts in linguistic form. Speech is language made audible. Moreover, when phoneticians compare and describe voices, they usually do so with respect to linguistic units, especially speech sounds, like vowels or consonants. It is therefore necessary to understand the structure as well as nature of speech sounds and how they are described. In order to understand and evaluate the speech, it is important to have at least a basic understanding of science of speech acoustics: how the acoustics of speech are produced, how they are described, and how differences, both between speakers and within speakers, arise in an acoustic output. One of the aims of this article is try to facilitate this understanding.

  16. Speech Music Discrimination Using Class-Specific Features

    DTIC Science & Technology

    2004-08-01

    Speech Music Discrimination Using Class-Specific Features Thomas Beierholm...between speech and music . Feature extraction is class-specific and can therefore be tailored to each class meaning that segment size, model orders...interest. Some of the applications of audio signal classification are speech/ music classification [1], acoustical environmental classification [2][3

  17. Methods and apparatus for non-acoustic speech characterization and recognition

    DOEpatents

    Holzrichter, John F.

    1999-01-01

    By simultaneously recording EM wave reflections and acoustic speech information, the positions and velocities of the speech organs as speech is articulated can be defined for each acoustic speech unit. Well defined time frames and feature vectors describing the speech, to the degree required, can be formed. Such feature vectors can uniquely characterize the speech unit being articulated each time frame. The onset of speech, rejection of external noise, vocalized pitch periods, articulator conditions, accurate timing, the identification of the speaker, acoustic speech unit recognition, and organ mechanical parameters can be determined.

  18. Methods and apparatus for non-acoustic speech characterization and recognition

    SciTech Connect

    Holzrichter, J.F.

    1999-12-21

    By simultaneously recording EM wave reflections and acoustic speech information, the positions and velocities of the speech organs as speech is articulated can be defined for each acoustic speech unit. Well defined time frames and feature vectors describing the speech, to the degree required, can be formed. Such feature vectors can uniquely characterize the speech unit being articulated each time frame. The onset of speech, rejection of external noise, vocalized pitch periods, articulator conditions, accurate timing, the identification of the speaker, acoustic speech unit recognition, and organ mechanical parameters can be determined.

  19. Mapping acoustics to kinematics in speech

    NASA Astrophysics Data System (ADS)

    Bali, Rohan

    An accurate mapping from speech acoustics to speech articulator movements has many practical applications, as well as theoretical implications of speech planning and perception science. This work can be divided into two parts. In the first part, we show that a simple codebook can be used to map acoustics to speech articulator movements in natural, conversational speech. In the second part, we incorporate cost optimization principles that have been shown to be relevant in motor control tasks into the codebook approach. These cost optimizations are defined as minimization of integral of magnitude velocity, acceleration and jerk of the speech articulators, and are implemented using a dynamic programming technique. Results show that incorporating cost minimization of speech articulator movements can significantly improve mapping acoustics to speech articulator movements. This suggests underlying physiological or neural planning principles used by speech articulators during speech production.

  20. Detecting suspicious behaviour using speech: acoustic correlates of deceptive speech -- an exploratory investigation.

    PubMed

    Kirchhübel, Christin; Howard, David M

    2013-09-01

    The current work intended to enhance our knowledge of changes or lack of changes in the speech signal when people were being deceptive. In particular, the study attempted to investigate the appropriateness of using speech cues in detecting deception. Truthful, deceptive and control speech were elicited from ten speakers in an interview setting. The data were subjected to acoustic analysis and results are presented on a range of speech parameters including fundamental frequency (f0), overall amplitude and mean vowel formants F1, F2 and F3. A significant correlation could not be established between deceptiveness/truthfulness and any of the acoustic features examined. Directions for future work are highlighted.

  1. Acoustics of Clear Speech: Effect of Instruction

    ERIC Educational Resources Information Center

    Lam, Jennifer; Tjaden, Kris; Wilding, Greg

    2012-01-01

    Purpose: This study investigated how different instructions for eliciting clear speech affected selected acoustic measures of speech. Method: Twelve speakers were audio-recorded reading 18 different sentences from the Assessment of Intelligibility of Dysarthric Speech (Yorkston & Beukelman, 1984). Sentences were produced in habitual, clear,…

  2. Acoustic Characteristics of Ataxic Speech in Japanese Patients with Spinocerebellar Degeneration (SCD)

    ERIC Educational Resources Information Center

    Ikui, Yukiko; Tsukuda, Mamoru; Kuroiwa, Yoshiyuki; Koyano, Shigeru; Hirose, Hajime; Taguchi, Takahide

    2012-01-01

    Background: In English- and German-speaking countries, ataxic speech is often described as showing scanning based on acoustic impressions. Although the term "scanning" is generally considered to represent abnormal speech features including prosodic excess or insufficiency, any precise acoustic analysis of ataxic speech has not been…

  3. Near-Term Fetuses Process Temporal Features of Speech

    ERIC Educational Resources Information Center

    Granier-Deferre, Carolyn; Ribeiro, Aurelie; Jacquet, Anne-Yvonne; Bassereau, Sophie

    2011-01-01

    The perception of speech and music requires processing of variations in spectra and amplitude over different time intervals. Near-term fetuses can discriminate acoustic features, such as frequencies and spectra, but whether they can process complex auditory streams, such as speech sequences and more specifically their temporal variations, fast or…

  4. Acoustic assessment of speech privacy curtains in two nursing units.

    PubMed

    Pope, Diana S; Miller-Klein, Erik T

    2016-01-01

    Hospitals have complex soundscapes that create challenges to patient care. Extraneous noise and high reverberation rates impair speech intelligibility, which leads to raised voices. In an unintended spiral, the increasing noise may result in diminished speech privacy, as people speak loudly to be heard over the din. The products available to improve hospital soundscapes include construction materials that absorb sound (acoustic ceiling tiles, carpet, wall insulation) and reduce reverberation rates. Enhanced privacy curtains are now available and offer potential for a relatively simple way to improve speech privacy and speech intelligibility by absorbing sound at the hospital patient's bedside. Acoustic assessments were performed over 2 days on two nursing units with a similar design in the same hospital. One unit was built with the 1970s' standard hospital construction and the other was newly refurbished (2013) with sound-absorbing features. In addition, we determined the effect of an enhanced privacy curtain versus standard privacy curtains using acoustic measures of speech privacy and speech intelligibility indexes. Privacy curtains provided auditory protection for the patients. In general, that protection was increased by the use of enhanced privacy curtains. On an average, the enhanced curtain improved sound absorption from 20% to 30%; however, there was considerable variability, depending on the configuration of the rooms tested. Enhanced privacy curtains provide measureable improvement to the acoustics of patient rooms but cannot overcome larger acoustic design issues. To shorten reverberation time, additional absorption, and compact and more fragmented nursing unit floor plate shapes should be considered.

  5. Acoustic assessment of speech privacy curtains in two nursing units

    PubMed Central

    Pope, Diana S.; Miller-Klein, Erik T.

    2016-01-01

    Hospitals have complex soundscapes that create challenges to patient care. Extraneous noise and high reverberation rates impair speech intelligibility, which leads to raised voices. In an unintended spiral, the increasing noise may result in diminished speech privacy, as people speak loudly to be heard over the din. The products available to improve hospital soundscapes include construction materials that absorb sound (acoustic ceiling tiles, carpet, wall insulation) and reduce reverberation rates. Enhanced privacy curtains are now available and offer potential for a relatively simple way to improve speech privacy and speech intelligibility by absorbing sound at the hospital patient's bedside. Acoustic assessments were performed over 2 days on two nursing units with a similar design in the same hospital. One unit was built with the 1970s’ standard hospital construction and the other was newly refurbished (2013) with sound-absorbing features. In addition, we determined the effect of an enhanced privacy curtain versus standard privacy curtains using acoustic measures of speech privacy and speech intelligibility indexes. Privacy curtains provided auditory protection for the patients. In general, that protection was increased by the use of enhanced privacy curtains. On an average, the enhanced curtain improved sound absorption from 20% to 30%; however, there was considerable variability, depending on the configuration of the rooms tested. Enhanced privacy curtains provide measureable improvement to the acoustics of patient rooms but cannot overcome larger acoustic design issues. To shorten reverberation time, additional absorption, and compact and more fragmented nursing unit floor plate shapes should be considered. PMID:26780959

  6. The acoustic-modeling problem in automatic speech recognition

    NASA Astrophysics Data System (ADS)

    Brown, Peter F.

    1987-12-01

    This thesis examines the acoustic-modeling problem in automatic speech recognition from an information-theoretic point of view. This problem is to design a speech-recognition system which can extract from the speech waveform as much information as possible about the corresponding word sequence. The information extraction process is broken down into two steps: a signal processing step which converts a speech waveform into a sequence of information bearing acoustic feature vectors, and a step which models such a sequence. This thesis is primarily concerned with the use of hidden Markov models to model sequences of feature vectors which lie in a continuous space such as R sub N. It explores the trade-off between packing a lot of information into such sequences and being able to model them accurately. The difficulty of developing accurate models of continuous parameter sequences is addressed by investigating a method of parameter estimation which is specifically designed to cope with inaccurate modeling assumptions.

  7. Analog Acoustic Expression in Speech Communication

    ERIC Educational Resources Information Center

    Shintel, Hadas; Nusbaum, Howard C.; Okrent, Arika

    2006-01-01

    We present the first experimental evidence of a phenomenon in speech communication we call "analog acoustic expression." Speech is generally thought of as conveying information in two distinct ways: discrete linguistic-symbolic units such as words and sentences represent linguistic meaning, and continuous prosodic forms convey information about…

  8. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2006-02-14

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  9. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2004-03-23

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  10. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2006-08-08

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  11. An Acoustic Measure for Word Prominence in Spontaneous Speech

    PubMed Central

    Wang, Dagen; Narayanan, Shrikanth

    2010-01-01

    An algorithm for automatic speech prominence detection is reported in this paper. We describe a comparative analysis on various acoustic features for word prominence detection and report results using a spoken dialog corpus with manually assigned prominence labels. The focus is on features such as spectral intensity and speech rate that are directly extracted from speech based on a correlation-based approach without requiring explicit linguistic or phonetic knowledge. Additionally, various pitch-based measures are studied with respect to their discriminating ability for prominence detection. A parametric scheme for modeling pitch plateau is proposed and this feature alone is found to outperform the traditional local pitch statistics. Two sets of experiments are used to explore the usefulness of the acoustic score generated using these features. The first set focuses on a more traditional way of word prominence detection based on a manually-tagged corpus. A 76.8% classification accuracy was achieved on a corpus of role-playing spoken dialogs. Due to difficulties in manually tagging speech prominence into discrete levels (categories), the second set of experiments focuses on evaluating the score indirectly. Specifically, through experiments on the Switchboard corpus, it is shown that the proposed acoustic score can discriminate between content word and function words in a statistically significant way. The relation between speech prominence and content/function words is also explored. Since prominent words tend to be predominantly content words, and since content words can be automatically marked from text-derived part of speech (POS) information, it is shown that the proposed acoustic score can be indirectly cross-validated through POS information. PMID:20454538

  12. Acoustic Evidence for Phonologically Mismatched Speech Errors

    ERIC Educational Resources Information Center

    Gormley, Andrea

    2015-01-01

    Speech errors are generally said to accommodate to their new phonological context. This accommodation has been validated by several transcription studies. The transcription methodology is not the best choice for detecting errors at this level, however, as this type of error can be difficult to perceive. This paper presents an acoustic analysis of…

  13. Acoustic Detail But Not Predictability of Task-Irrelevant Speech Disrupts Working Memory

    PubMed Central

    Wöstmann, Malte; Obleser, Jonas

    2016-01-01

    Attended speech is comprehended better not only if more acoustic detail is available, but also if it is semantically highly predictable. But can more acoustic detail or higher predictability turn into disadvantages and distract a listener if the speech signal is to be ignored? Also, does the degree of distraction increase for older listeners who typically show a decline in attentional control ability? Adopting the irrelevant-speech paradigm, we tested whether younger (age 23–33 years) and older (60–78 years) listeners’ working memory for the serial order of spoken digits would be disrupted by the presentation of task-irrelevant speech varying in its acoustic detail (using noise-vocoding) and its semantic predictability (of sentence endings). More acoustic detail, but not higher predictability, of task-irrelevant speech aggravated memory interference. This pattern of results did not differ between younger and older listeners, despite generally lower performance in older listeners. Our findings suggest that the focus of attention determines how acoustics and predictability affect the processing of speech: first, as more acoustic detail is known to enhance speech comprehension and memory for speech, we here demonstrate that more acoustic detail of ignored speech enhances the degree of distraction. Second, while higher predictability of attended speech is known to also enhance speech comprehension under acoustically adverse conditions, higher predictability of ignored speech is unable to exert any distracting effect upon working memory performance in younger or older listeners. These findings suggest that features that make attended speech easier to comprehend do not necessarily enhance distraction by ignored speech. PMID:27826235

  14. Perceptual centres in speech - an acoustic analysis

    NASA Astrophysics Data System (ADS)

    Scott, Sophie Kerttu

    Perceptual centres, or P-centres, represent the perceptual moments of occurrence of acoustic signals - the 'beat' of a sound. P-centres underlie the perception and production of rhythm in perceptually regular speech sequences. P-centres have been modelled both in speech and non speech (music) domains. The three aims of this thesis were toatest out current P-centre models to determine which best accounted for the experimental data bto identify a candidate parameter to map P-centres onto (a local approach) as opposed to the previous global models which rely upon the whole signal to determine the P-centre the final aim was to develop a model of P-centre location which could be applied to speech and non speech signals. The first aim was investigated by a series of experiments in which a) speech from different speakers was investigated to determine whether different models could account for variation between speakers b) whether rendering the amplitude time plot of a speech signal affects the P-centre of the signal c) whether increasing the amplitude at the offset of a speech signal alters P-centres in the production and perception of speech. The second aim was carried out by a) manipulating the rise time of different speech signals to determine whether the P-centre was affected, and whether the type of speech sound ramped affected the P-centre shift b) manipulating the rise time and decay time of a synthetic vowel to determine whether the onset alteration was had more affect on P-centre than the offset manipulation c) and whether the duration of a vowel affected the P-centre, if other attributes (amplitude, spectral contents) were held constant. The third aim - modelling P-centres - was based on these results. The Frequency dependent Amplitude Increase Model of P-centre location (FAIM) was developed using a modelling protocol, the APU GammaTone Filterbank and the speech from different speakers. The P-centres of the stimuli corpus were highly predicted by attributes of

  15. Acoustic Analysis of PD Speech

    PubMed Central

    Chenausky, Karen; MacAuslan, Joel; Goldhor, Richard

    2011-01-01

    According to the U.S. National Institutes of Health, approximately 500,000 Americans have Parkinson's disease (PD), with roughly another 50,000 receiving new diagnoses each year. 70%–90% of these people also have the hypokinetic dysarthria associated with PD. Deep brain stimulation (DBS) substantially relieves motor symptoms in advanced-stage patients for whom medication produces disabling dyskinesias. This study investigated speech changes as a result of DBS settings chosen to maximize motor performance. The speech of 10 PD patients and 12 normal controls was analyzed for syllable rate and variability, syllable length patterning, vowel fraction, voice-onset time variability, and spirantization. These were normalized by the controls' standard deviation to represent distance from normal and combined into a composite measure. Results show that DBS settings relieving motor symptoms can improve speech, making it up to three standard deviations closer to normal. However, the clinically motivated settings evaluated here show greater capacity to impair, rather than improve, speech. A feedback device developed from these findings could be useful to clinicians adjusting DBS parameters, as a means for ensuring they do not unwittingly choose DBS settings which impair patients' communication. PMID:21977333

  16. Les Traits acoustiques (Acoustic Features)

    ERIC Educational Resources Information Center

    Rossi, Mario

    1977-01-01

    An analysis of the theory of distinctive features advanced by Roman Jakobson, Gunnar Fant and Morris Halle in "Preliminaries to Speech Analysis." The notion of binarism, the criterion of distinctiveness and the definition of features are discussed. Questions leading to further research are raised. (Text is in French.) (AMH)

  17. Automatic speech segmentation using throat-acoustic correlation coefficients

    NASA Astrophysics Data System (ADS)

    Mussabayev, Rustam Rafikovich; Kalimoldayev, Maksat N.; Amirgaliyev, Yedilkhan N.; Mussabayev, Timur R.

    2016-11-01

    This work considers one of the approaches to the solution of the task of discrete speech signal automatic segmentation. The aim of this work is to construct such an algorithm which should meet the following requirements: segmentation of a signal into acoustically homogeneous segments, high accuracy and segmentation speed, unambiguity and reproducibility of segmentation results, lack of necessity of preliminary training with the use of a special set consisting of manually segmented signals. Development of the algorithm which corresponds to the given requirements was conditioned by the necessity of formation of automatically segmented speech databases that have a large volume. One of the new approaches to the solution of this task is viewed in this article. For this purpose we use the new type of informative features named TAC-coefficients (Throat-Acoustic Correlation coefficients) which provide sufficient segmentation accuracy and effi- ciency.

  18. Gender difference in speech intelligibility using speech intelligibility tests and acoustic analyses

    PubMed Central

    2010-01-01

    PURPOSE The purpose of this study was to compare men with women in terms of speech intelligibility, to investigate the validity of objective acoustic parameters related with speech intelligibility, and to try to set up the standard data for the future study in various field in prosthodontics. MATERIALS AND METHODS Twenty men and women were served as subjects in the present study. After recording of sample sounds, speech intelligibility tests by three speech pathologists and acoustic analyses were performed. Comparison of the speech intelligibility test scores and acoustic parameters such as fundamental frequency, fundamental frequency range, formant frequency, formant ranges, vowel working space area, and vowel dispersion were done between men and women. In addition, the correlations between the speech intelligibility values and acoustic variables were analyzed. RESULTS Women showed significantly higher speech intelligibility scores than men and there were significant difference between men and women in most of acoustic parameters used in the present study. However, the correlations between the speech intelligibility scores and acoustic parameters were low. CONCLUSION Speech intelligibility test and acoustic parameters used in the present study were effective in differentiating male voice from female voice and their values might be used in the future studies related patients involved with maxillofacial prosthodontics. However, further studies are needed on the correlation between speech intelligibility tests and objective acoustic parameters. PMID:21165272

  19. Acoustic assessment of erygmophonic speech of Moroccan laryngectomized patients

    PubMed Central

    Ouattassi, Naouar; Benmansour, Najib; Ridal, Mohammed; Zaki, Zouheir; Bendahhou, Karima; Nejjari, Chakib; Cherkaoui, Abdeljabbar; El Alami, Mohammed Nouredine El Amine

    2015-01-01

    Introduction Acoustic evaluation of alaryngeal voices is among the most prominent issues in speech analysis field. In fact, many methods have been developed to date to substitute the classic perceptual evaluation. The Aim of this study is to present our experience in erygmophonic speech objective assessment and to discuss the most widely used methods of acoustic speech appraisal. through a prospective case-control study we have measured acoustic parameters of speech quality during one year of erygmophonic rehabilitation therapy of Moroccan laryngectomized patients. Methods We have assessed acoustic parameters of erygmophonic speech samples of eleven laryngectomized patients through the speech rehabilitation therapy. Acoustic parameters were obtained by perturbation analysis method and linear predictive coding algorithms also through the broadband spectrogram. Results Using perturbation analysis methods, we have found erygmophonic voice to be significantly poorer than normal speech and it exhibits higher formant frequency values. However, erygmophonic voice shows also higher and extremely variable Error values that were greater than the acceptable level. And thus, live a doubt on the reliability of those analytic methods results. Conclusion Acoustic parameters for objective evaluation of alaryngeal voices should allow a reliable representation of the perceptual evaluation of the quality of speech. This requirement has not been fulfilled by the common methods used so far. Therefore, acoustical assessment of erygmophonic speech needs more investigations. PMID:26587121

  20. System And Method For Characterizing Voiced Excitations Of Speech And Acoustic Signals, Removing Acoustic Noise From Speech, And Synthesizi

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2006-04-25

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  1. Acoustic Study of Acted Emotions in Speech

    NASA Astrophysics Data System (ADS)

    Wang, Rong

    An extensive set of carefully recorded utterances provided a speech database for investigating acoustic correlates among eight emotional states. Four professional actors and four professional actresses simulated the emotional states of joy, conversation, nervousness, anger, sadness, hate, fear, and depression. The values of 14 acoustic parameters were extracted from analyses of the simulated portrayals. Normalization of the parameters was made to reduce the talker-dependence. Correlates of emotion were investigated by means of principal components analysis. Sadness and depression were found to be "ambiguous" with respect to each other, but "unique" with respect to joy and anger in the principal components space. Joy, conversation, nervousness, anger, hate, and fear did not separate well in the space and so exhibited ambiguity with respect to one another. The different talkers expressed joy, anger, sadness, and depression more consistently than the other four emotions. The analysis results were compared with the results of a subjective study using the same speech database and considerable consistency between the two was found.

  2. Acoustics in Halls for Speech and Music

    NASA Astrophysics Data System (ADS)

    Gade, Anders C.

    This chapter deals specifically with concepts, tools, and architectural variables of importance when designing auditoria for speech and music. The focus will be on cultivating the useful components of the sound in the room rather than on avoiding noise from outside or from installations, which is dealt with in Chap. 11. The chapter starts by presenting the subjective aspects of the room acoustic experience according to consensus at the time of writing. Then follows a description of their objective counterparts, the objective room acoustic parameters, among which the classical reverberation time measure is only one of many, but still of fundamental value. After explanations on how these parameters can be measured and predicted during the design phase, the remainder of the chapter deals with how the acoustic properties can be controlled by the architectural design of auditoria. This is done by presenting the influence of individual design elements as well as brief descriptions of halls designed for specific purposes, such as drama, opera, and symphonic concerts. Finally, some important aspects of loudspeaker installations in auditoria are briefly touched upon.

  3. Suppressing aliasing noise in the speech feature domain for automatic speech recognition.

    PubMed

    Deng, Huiqun; O'Shaughnessy, Douglas

    2008-07-01

    This letter points out that, although in the audio signal domain low-pass filtering has been used to prevent aliasing noise from entering the baseband of speech signals, an antialias process in the speech feature domain is still needed to prevent high modulation frequency components from entering the baseband of speech features. The existence of aliasing noise in speech features is revealed via spectral analysis of speech feature streams. A method for suppressing such aliasing noise is proposed. Experiments on large vocabulary speech recognition show that antialias processing of speech features can improve speech recognition, especially for noisy speech.

  4. Acoustic sleepiness detection: framework and validation of a speech-adapted pattern recognition approach.

    PubMed

    Krajewski, Jarek; Batliner, Anton; Golz, Martin

    2009-08-01

    This article describes a general framework for detecting sleepiness states on the basis of prosody, articulation, and speech-quality-related speech characteristics. The advantages of this automatic real-time approach are that obtaining speech data is nonobstrusive and is free from sensor application and calibration efforts. Different types of acoustic features derived from speech, speaker, and emotion recognition were employed (frame-level-based speech features). Combing these features with high-level contour descriptors, which capture the temporal information of frame-level descriptor contours, results in 45,088 features per speech sample. In general, the measurement process follows the speech-adapted steps of pattern recognition: (1) recording speech, (2) preprocessing, (3) feature computation (using perceptual and signal-processing-related features such as, e.g., fundamental frequency, intensity, pause patterns, formants, and cepstral coefficients), (4) dimensionality reduction, (5) classification, and (6) evaluation. After a correlation-filter-based feature subset selection employed on the feature space in order to find most relevant features, different classification models were trained. The best model-namely, the support-vector machine-achieved 86.1% classification accuracy in predicting sleepiness in a sleep deprivation study (two-class problem, N=12; 01.00-08.00 a.m.).

  5. Improving Robustness of Deep Neural Network Acoustic Models via Speech Separation and Joint Adaptive Training

    PubMed Central

    Narayanan, Arun; Wang, DeLiang

    2015-01-01

    Although deep neural network (DNN) acoustic models are known to be inherently noise robust, especially with matched training and testing data, the use of speech separation as a frontend and for deriving alternative feature representations has been shown to improve performance in challenging environments. We first present a supervised speech separation system that significantly improves automatic speech recognition (ASR) performance in realistic noise conditions. The system performs separation via ratio time-frequency masking; the ideal ratio mask (IRM) is estimated using DNNs. We then propose a framework that unifies separation and acoustic modeling via joint adaptive training. Since the modules for acoustic modeling and speech separation are implemented using DNNs, unification is done by introducing additional hidden layers with fixed weights and appropriate network architecture. On the CHiME-2 medium-large vocabulary ASR task, and with log mel spectral features as input to the acoustic model, an independently trained ratio masking frontend improves word error rates by 10.9% (relative) compared to the noisy baseline. In comparison, the jointly trained system improves performance by 14.4%. We also experiment with alternative feature representations to augment the standard log mel features, like the noise and speech estimates obtained from the separation module, and the standard feature set used for IRM estimation. Our best system obtains a word error rate of 15.4% (absolute), an improvement of 4.6 percentage points over the next best result on this corpus. PMID:26973851

  6. Acoustic markers of prosodic boundaries in Spanish spontaneous alaryngeal speech.

    PubMed

    Cuenca, M H; Barrio, M M

    2010-11-01

    Prosodic information aids segmentation of the continuous speech signal and thereby facilitates auditory speech processing. Durational and pitch variations are prosodic cues especially necessary to convey prosodic boundaries, but alaryngeal speakers have inconsistent control over acoustic parameters such as F0 and duration, being as a result noisy and less intelligible than normal speech. This case study has investigated whether one Spanish alaryngeal speaker proficient in both oesophageal and tracheoesophageal speech modes used the same acoustic cues for prosodic boundaries in both types of voicing. Pre-boundary lengthening, F0-excursions and pausing (number of pauses and position) were measured in spontaneous speech samples, using Praat. The acoustic analysis has revealed that the subject has relied on a different combination of cues in each type of voicing to convey the presence of prosodic boundaries.

  7. Age-Related Changes in Acoustic Characteristics of Adult Speech

    ERIC Educational Resources Information Center

    Torre, Peter, III; Barlow, Jessica A.

    2009-01-01

    This paper addresses effects of age and sex on certain acoustic properties of speech, given conflicting findings on such effects reported in prior research. The speech of 27 younger adults (15 women, 12 men; mean age 25.5 years) and 59 older adults (32 women, 27 men; mean age 75.2 years) was evaluated for identification of differences for sex and…

  8. Acoustic Markers of Prosodic Boundaries in Spanish Spontaneous Alaryngeal Speech

    ERIC Educational Resources Information Center

    Cuenca, M. H.; Barrio, M. M.

    2010-01-01

    Prosodic information aids segmentation of the continuous speech signal and thereby facilitates auditory speech processing. Durational and pitch variations are prosodic cues especially necessary to convey prosodic boundaries, but alaryngeal speakers have inconsistent control over acoustic parameters such as F0 and duration, being as a result noisy…

  9. Clear Speech Variants: An Acoustic Study in Parkinson's Disease

    ERIC Educational Resources Information Center

    Lam, Jennifer; Tjaden, Kris

    2016-01-01

    Purpose: The authors investigated how different variants of clear speech affect segmental and suprasegmental acoustic measures of speech in speakers with Parkinson's disease and a healthy control group. Method: A total of 14 participants with Parkinson's disease and 14 control participants served as speakers. Each speaker produced 18 different…

  10. Acoustic richness modulates the neural networks supporting intelligible speech processing.

    PubMed

    Lee, Yune-Sang; Min, Nam Eun; Wingfield, Arthur; Grossman, Murray; Peelle, Jonathan E

    2016-03-01

    The information contained in a sensory signal plays a critical role in determining what neural processes are engaged. Here we used interleaved silent steady-state (ISSS) functional magnetic resonance imaging (fMRI) to explore how human listeners cope with different degrees of acoustic richness during auditory sentence comprehension. Twenty-six healthy young adults underwent scanning while hearing sentences that varied in acoustic richness (high vs. low spectral detail) and syntactic complexity (subject-relative vs. object-relative center-embedded clause structures). We manipulated acoustic richness by presenting the stimuli as unprocessed full-spectrum speech, or noise-vocoded with 24 channels. Importantly, although the vocoded sentences were spectrally impoverished, all sentences were highly intelligible. These manipulations allowed us to test how intelligible speech processing was affected by orthogonal linguistic and acoustic demands. Acoustically rich speech showed stronger activation than acoustically less-detailed speech in a bilateral temporoparietal network with more pronounced activity in the right hemisphere. By contrast, listening to sentences with greater syntactic complexity resulted in increased activation of a left-lateralized network including left posterior lateral temporal cortex, left inferior frontal gyrus, and left dorsolateral prefrontal cortex. Significant interactions between acoustic richness and syntactic complexity occurred in left supramarginal gyrus, right superior temporal gyrus, and right inferior frontal gyrus, indicating that the regions recruited for syntactic challenge differed as a function of acoustic properties of the speech. Our findings suggest that the neural systems involved in speech perception are finely tuned to the type of information available, and that reducing the richness of the acoustic signal dramatically alters the brain's response to spoken language, even when intelligibility is high.

  11. Evaluation of disfluent speech by means of automatic acoustic measurements.

    PubMed

    Lustyk, Tomas; Bergl, Petr; Cmejla, Roman

    2014-03-01

    An experiment was carried out to determine whether the level of the speech fluency disorder can be estimated by means of automatic acoustic measurements. These measures analyze, for example, the amount of silence in a recording or the number of abrupt spectral changes in a speech signal. All the measures were designed to take into account symptoms of stuttering. In the experiment, 118 audio recordings of read speech by Czech native speakers were employed. The results indicate that the human-made rating of the speech fluency disorder in read speech can be predicted on the basis of automatic measurements. The number of abrupt spectral changes in the speech segments turns out to be the most appropriate measure to describe the overall speech performance. The results also imply that there are measures with good results describing partial symptoms (especially fixed postures without audible airflow).

  12. Study of acoustic correlates associate with emotional speech

    NASA Astrophysics Data System (ADS)

    Yildirim, Serdar; Lee, Sungbok; Lee, Chul Min; Bulut, Murtaza; Busso, Carlos; Kazemzadeh, Ebrahim; Narayanan, Shrikanth

    2004-10-01

    This study investigates the acoustic characteristics of four different emotions expressed in speech. The aim is to obtain detailed acoustic knowledge on how a speech signal is modulated by changes from neutral to a certain emotional state. Such knowledge is necessary for automatic emotion recognition and classification and emotional speech synthesis. Speech data obtained from two semi-professional actresses are analyzed and compared. Each subject produces 211 sentences with four different emotions; neutral, sad, angry, happy. We analyze changes in temporal and acoustic parameters such as magnitude and variability of segmental duration, fundamental frequency and the first three formant frequencies as a function of emotion. Acoustic differences among the emotions are also explored with mutual information computation, multidimensional scaling and acoustic likelihood comparison with normal speech. Results indicate that speech associated with anger and happiness is characterized by longer duration, shorter interword silence, higher pitch and rms energy with wider ranges. Sadness is distinguished from other emotions by lower rms energy and longer interword silence. Interestingly, the difference in formant pattern between [happiness/anger] and [neutral/sadness] are better reflected in back vowels such as /a/(/father/) than in front vowels. Detailed results on intra- and interspeaker variability will be reported.

  13. Speech feature extracting based on DSP

    NASA Astrophysics Data System (ADS)

    Niu, Jingtao; Shi, Zhongke

    2003-09-01

    In this paper, for the voiced frame in the speech processing, the implementations of LPC prognosticate coefficient resolution by Levisohn-Durbin algorithm on the DSP based system was proposed, and also the implementation of L. R. Rabiner basic frequency estimation is discussed. At the end of this paper, several new methods of sound feature extraction only by voiced frame is also discussed.

  14. Speech intelligibility of deaf speakers and distinctive feature usage.

    PubMed

    Mencke, E O; Ochsner, G J; Testut, E W

    1984-01-01

    Speech samples (41 CNC monosyllables) of 22 deaf children were analyzed using two distinctive-feature systems, one acoustic and one physiologic. Moderate to high correlations between intelligibility scores by listener judges vs correct feature usage were obtained for positive as well as negative features of both systems. Further, higher correlations between percent-correct feature usage scores vs listener intelligibility scores were observed for phonemes in the initial vs final position-in-work regardless of listener-judge experience, feature system, or presentation mode. These findings suggest that either acoustic or physiologic feature analysis can be employed in describing the articulation of deaf talkers. In general, either of these feature systems also predicts with fair to good accuracy the intelligibility of deaf speakers as judged by either experienced or inexperienced listeners. In view of the appreciably higher correlations obtained between feature use and intelligibility scores in initial compared to final position-in-word, however, caution should be exercised with either of the feature systems studied in predicting the intelligibility of a deaf speaker's final phoneme.

  15. Optimizing acoustical conditions for speech intelligibility in classrooms

    NASA Astrophysics Data System (ADS)

    Yang, Wonyoung

    High speech intelligibility is imperative in classrooms where verbal communication is critical. However, the optimal acoustical conditions to achieve a high degree of speech intelligibility have previously been investigated with inconsistent results, and practical room-acoustical solutions to optimize the acoustical conditions for speech intelligibility have not been developed. This experimental study validated auralization for speech-intelligibility testing, investigated the optimal reverberation for speech intelligibility for both normal and hearing-impaired listeners using more realistic room-acoustical models, and proposed an optimal sound-control design for speech intelligibility based on the findings. The auralization technique was used to perform subjective speech-intelligibility tests. The validation study, comparing auralization results with those of real classroom speech-intelligibility tests, found that if the room to be auralized is not very absorptive or noisy, speech-intelligibility tests using auralization are valid. The speech-intelligibility tests were done in two different auralized sound fields---approximately diffuse and non-diffuse---using the Modified Rhyme Test and both normal and hearing-impaired listeners. A hybrid room-acoustical prediction program was used throughout the work, and it and a 1/8 scale-model classroom were used to evaluate the effects of ceiling barriers and reflectors. For both subject groups, in approximately diffuse sound fields, when the speech source was closer to the listener than the noise source, the optimal reverberation time was zero. When the noise source was closer to the listener than the speech source, the optimal reverberation time was 0.4 s (with another peak at 0.0 s) with relative output power levels of the speech and noise sources SNS = 5 dB, and 0.8 s with SNS = 0 dB. In non-diffuse sound fields, when the noise source was between the speaker and the listener, the optimal reverberation time was 0.6 s with

  16. Acoustical properties of speech as indicators of depression and suicidal risk.

    PubMed

    France, D J; Shiavi, R G; Silverman, S; Silverman, M; Wilkes, D M

    2000-07-01

    Acoustic properties of speech have previously been identified as possible cues to depression, and there is evidence that certain vocal parameters may be used further to objectively discriminate between depressed and suicidal speech. Studies were performed to analyze and compare the speech acoustics of separate male and female samples comprised of normal individuals and individuals carrying diagnoses of depression and high-risk, near-term suicidality. The female sample consisted of ten control subjects, 17 dysthymic patients, and 21 major depressed patients. The male sample contained 24 control subjects, 21 major depressed patients, and 22 high-risk suicidal patients. Acoustic analyses of voice fundamental frequency (Fo), amplitude modulation (AM), formants, and power distribution were performed on speech samples extracted from audio recordings collected from the sample members. Multivariate feature and discriminant analyses were performed on feature vectors representing the members of the control and disordered classes. Features derived from the formant and power spectral density measurements were found to be the best discriminators of class membership in both the male and female studies. AM features emerged as strong class discriminators of the male classes. Features describing Fo were generally ineffective discriminators in both studies. The results support theories that identify psychomotor disturbances as central elements in depression and suicidality.

  17. Words from spontaneous conversational speech can be recognized with human-like accuracy by an error-driven learning algorithm that discriminates between meanings straight from smart acoustic features, bypassing the phoneme as recognition unit.

    PubMed

    Arnold, Denis; Tomaschek, Fabian; Sering, Konstantin; Lopez, Florence; Baayen, R Harald

    2017-01-01

    Sound units play a pivotal role in cognitive models of auditory comprehension. The general consensus is that during perception listeners break down speech into auditory words and subsequently phones. Indeed, cognitive speech recognition is typically taken to be computationally intractable without phones. Here we present a computational model trained on 20 hours of conversational speech that recognizes word meanings within the range of human performance (model 25%, native speakers 20-44%), without making use of phone or word form representations. Our model also generates successfully predictions about the speed and accuracy of human auditory comprehension. At the heart of the model is a 'wide' yet sparse two-layer artificial neural network with some hundred thousand input units representing summaries of changes in acoustic frequency bands, and proxies for lexical meanings as output units. We believe that our model holds promise for resolving longstanding theoretical problems surrounding the notion of the phone in linguistic theory.

  18. Relationship between acoustic measures and speech naturalness ratings in Parkinson's disease: A within-speaker approach.

    PubMed

    Klopfenstein, Marie

    2015-01-01

    This study investigated the acoustic basis of across-utterance, within-speaker variation in speech naturalness for four speakers with dysarthria secondary to Parkinson's disease (PD). Speakers read sentences and produced spontaneous speech. Acoustic measures of fundamental frequency, phrase-final syllable lengthening, intensity and speech rate were obtained. A group of listeners judged speech naturalness using a nine-point Likert scale. Relationships between judgements of speech naturalness and acoustic measures were determined for individual speakers with PD. Relationships among acoustic measures also were quantified. Despite variability between speakers, measures of mean F0, intensity range, articulation rate, average syllable duration, duration of final syllables, vocalic nucleus length of final unstressed syllables and pitch accent of final syllables emerged as possible acoustic variables contributing to within-speaker variations in speech naturalness. Results suggest that acoustic measures correlate with speech naturalness, but in dysarthric speech they depend on the speaker due to the within-speaker variation in speech impairment.

  19. EEG oscillations entrain their phase to high-level features of speech sound.

    PubMed

    Zoefel, Benedikt; VanRullen, Rufin

    2016-01-01

    Phase entrainment of neural oscillations, the brain's adjustment to rhythmic stimulation, is a central component in recent theories of speech comprehension: the alignment between brain oscillations and speech sound improves speech intelligibility. However, phase entrainment to everyday speech sound could also be explained by oscillations passively following the low-level periodicities (e.g., in sound amplitude and spectral content) of auditory stimulation-and not by an adjustment to the speech rhythm per se. Recently, using novel speech/noise mixture stimuli, we have shown that behavioral performance can entrain to speech sound even when high-level features (including phonetic information) are not accompanied by fluctuations in sound amplitude and spectral content. In the present study, we report that neural phase entrainment might underlie our behavioral findings. We observed phase-locking between electroencephalogram (EEG) and speech sound in response not only to original (unprocessed) speech but also to our constructed "high-level" speech/noise mixture stimuli. Phase entrainment to original speech and speech/noise sound did not differ in the degree of entrainment, but rather in the actual phase difference between EEG signal and sound. Phase entrainment was not abolished when speech/noise stimuli were presented in reverse (which disrupts semantic processing), indicating that acoustic (rather than linguistic) high-level features play a major role in the observed neural entrainment. Our results provide further evidence for phase entrainment as a potential mechanism underlying speech processing and segmentation, and for the involvement of high-level processes in the adjustment to the rhythm of speech.

  20. Clear Speech Variants: An Acoustic Study in Parkinson's Disease

    PubMed Central

    Tjaden, Kris

    2016-01-01

    Purpose The authors investigated how different variants of clear speech affect segmental and suprasegmental acoustic measures of speech in speakers with Parkinson's disease and a healthy control group. Method A total of 14 participants with Parkinson's disease and 14 control participants served as speakers. Each speaker produced 18 different sentences selected from the Sentence Intelligibility Test (Yorkston & Beukelman, 1996). All speakers produced stimuli in 4 speaking conditions (habitual, clear, overenunciate, and hearing impaired). Segmental acoustic measures included vowel space area and first moment (M1) coefficient difference measures for consonant pairs. Second formant slope of diphthongs and measures of vowel and fricative durations were also obtained. Suprasegmental measures included fundamental frequency, sound pressure level, and articulation rate. Results For the majority of adjustments, all variants of clear speech instruction differed from the habitual condition. The overenunciate condition elicited the greatest magnitude of change for segmental measures (vowel space area, vowel durations) and the slowest articulation rates. The hearing impaired condition elicited the greatest fricative durations and suprasegmental adjustments (fundamental frequency, sound pressure level). Conclusions Findings have implications for a model of speech production for healthy speakers as well as for speakers with dysarthria. Findings also suggest that particular clear speech instructions may target distinct speech subsystems. PMID:27355431

  1. Multiexpert automatic speech recognition using acoustic and myoelectric signals.

    PubMed

    Chan, Adrian D C; Englehart, Kevin B; Hudgins, Bernard; Lovely, Dennis F

    2006-04-01

    Classification accuracy of conventional automatic speech recognition (ASR) systems can decrease dramatically under acoustically noisy conditions. To improve classification accuracy and increase system robustness a multiexpert ASR system is implemented. In this system, acoustic speech information is supplemented with information from facial myoelectric signals (MES). A new method of combining experts, known as the plausibility method, is employed to combine an acoustic ASR expert and a MES ASR expert. The plausibility method of combining multiple experts, which is based on the mathematical framework of evidence theory, is compared to the Borda count and score-based methods of combination. Acoustic and facial MES data were collected from 5 subjects, using a 10-word vocabulary across an 18-dB range of acoustic noise. As expected the performance of an acoustic expert decreases with increasing acoustic noise; classification accuracies of the acoustic ASR expert are as low as 11.5%. The effect of noise is significantly reduced with the addition of the MES ASR expert. Classification accuracies remain above 78.8% across the 18-dB range of acoustic noise, when the plausibility method is used to combine the opinions of multiple experts. In addition, the plausibility method produced classification accuracies higher than any individual expert at all noise levels, as well as the highest classification accuracies, except at the 9-dB noise level. Using the Borda count and score-based multiexpert systems, classification accuracies are improved relative to the acoustic ASR expert but are as low as 51.5% and 59.5%, respectively.

  2. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2002-01-01

    Low power EM waves are used to detect motions of vocal tract tissues of the human speech system before, during, and after voiced speech. A voiced excitation function is derived. The excitation function provides speech production information to enhance speech characterization and to enable noise removal from human speech.

  3. Speech Intelligibility Advantages using an Acoustic Beamformer Display

    NASA Technical Reports Server (NTRS)

    Begault, Durand R.; Sunder, Kaushik; Godfroy, Martine; Otto, Peter

    2015-01-01

    A speech intelligibility test conforming to the Modified Rhyme Test of ANSI S3.2 "Method for Measuring the Intelligibility of Speech Over Communication Systems" was conducted using a prototype 12-channel acoustic beamformer system. The target speech material (signal) was identified against speech babble (noise), with calculated signal-noise ratios of 0, 5 and 10 dB. The signal was delivered at a fixed beam orientation of 135 deg (re 90 deg as the frontal direction of the array) and the noise at 135 deg (co-located) and 0 deg (separated). A significant improvement in intelligibility from 57% to 73% was found for spatial separation for the same signal-noise ratio (0 dB). Significant effects for improved intelligibility due to spatial separation were also found for higher signal-noise ratios (5 and 10 dB).

  4. Feature analysis of pathological speech signals using local discriminant bases technique.

    PubMed

    Umapathy, K; Krishnan, S

    2005-07-01

    Speech is an integral part of the human communication system. Various pathological conditions affect the vocal functions, inducing speech disorders. Acoustic parameters of speech are commonly used for the assessment of speech disorders and for monitoring the progress of the patient over the course of therapy. In the last two decades, signal-processing techniques have been successfully applied in screening speech disorders. In the paper, a novel approach is proposed to classify pathological speech signals using a local discriminant bases (LDB) algorithm and wavelet packet decompositions. The focus of the paper was to demonstrate the significance of identifying the signal subspaces that contribute to the discriminatory characteristics of normal and pathological speech signals in a computationally efficient way. Features were extracted from target subspaces for classification, and time-frequency decomposition was used to eliminate the need for segmentation of the speech signals. The technique was tested with a database of 212 speech signals (51 normal and 161 pathological) using the Daubechies wavelet (db4). Classification accuracies up to 96% were achieved for a two-group classification as normal and pathological speech signals, and 74% was achieved for a four-group classification as male normal, female normal, male pathological and female pathological signals.

  5. [The acoustic aspect of the speech development in children during the third year of life].

    PubMed

    Liakso, E E; Gromova, A D; Frolova, O V; Romanova, O D

    2004-01-01

    The current part of a Russian language acquisition longitudinal study based on auditory, phonetic and instrumental analysis is devoted to the third year of child's life. We examined the development of supplementary acoustic and phonetic features of the child's speech providing for the possibility for the speech to be recognized. The instrumental analysis and statistical processing of vowel formant dynamics as well as stress, palatalization and VOT development, has been performed for the first time in Russian children. We showed that the high probability of children words recognition by auditors was due to establishment of a system of acoustically stable features which, in combination with each other, provide for the informative sufficiency of a message.

  6. Static and Dynamic Features for Improved HMM based Visual Speech Recognition

    NASA Astrophysics Data System (ADS)

    Rajavel, R.; Sathidevi, P. S.

    Visual speech recognition refers to the identification of utterances through the movements of lips, tongue, teeth, and other facial muscles of the speaker without using the acoustic signal. This work shows the relative benefits of both static and dynamic visual speech features for improved visual speech recognition. Two approaches for visual feature extraction have been considered: (1) an image transform based static feature approach in which Discrete Cosine Transform (DCT) is applied to each video frame and 6×6 triangle region coefficients are considered as features. Principal Component Analysis (PCA) is applied over all 60 features corresponding to the video frame to reduce the redundancy; the resultant 21 coefficients are taken as the static visual features. (2) Motion segmentation based dynamic feature approach in which the facial movements are segmented from the video file using motion history images (MHI). DCT is applied to the MHI and triangle region coefficients are taken as the dynamic visual features. Two types of experiments were done one with concatenated features and another with dimension reduced feature by using PCA to identify the utterances. The left-right continuous HMMs are used as visual speech classifier to classify nine MPEG-4 standard viseme consonants. The experimental result shows that the concatenated as well as dimension reduced features improve te visual speech recognition with a high accuracy of 92.45% and 92.15% respectively.

  7. Speech intelligibility in complex acoustic environments in young children

    NASA Astrophysics Data System (ADS)

    Litovsky, Ruth

    2003-04-01

    While the auditory system undergoes tremendous maturation during the first few years of life, it has become clear that in complex scenarios when multiple sounds occur and when echoes are present, children's performance is significantly worse than their adult counterparts. The ability of children (3-7 years of age) to understand speech in a simulated multi-talker environment and to benefit from spatial separation of the target and competing sounds was investigated. In these studies, competing sources vary in number, location, and content (speech, modulated or unmodulated speech-shaped noise and time-reversed speech). The acoustic spaces were also varied in size and amount of reverberation. Finally, children with chronic otitis media who received binaural training were tested pre- and post-training on a subset of conditions. Results indicated the following. (1) Children experienced significantly more masking than adults, even in the simplest conditions tested. (2) When the target and competing sounds were spatially separated speech intelligibility improved, but the amount varied with age, type of competing sound, and number of competitors. (3) In a large reverberant classroom there was no benefit of spatial separation. (4) Binaural training improved speech intelligibility performance in children with otitis media. Future work includes similar studies in children with unilateral and bilateral cochlear implants. [Work supported by NIDCD, DRF, and NOHR.

  8. Tongue-Palate Contact Pressure, Oral Air Pressure, and Acoustics of Clear Speech

    ERIC Educational Resources Information Center

    Searl, Jeff; Evitts, Paul M.

    2013-01-01

    Purpose: The authors compared articulatory contact pressure (ACP), oral air pressure (Po), and speech acoustics for conversational versus clear speech. They also assessed the relationship of these measures to listener perception. Method: Twelve adults with normal speech produced monosyllables in a phrase using conversational and clear speech.…

  9. Intelligibility Evaluation of Pathological Speech through Multigranularity Feature Extraction and Optimization

    PubMed Central

    Ma, Lin; Zhang, Mancai

    2017-01-01

    Pathological speech usually refers to speech distortion resulting from illness or other biological insults. The assessment of pathological speech plays an important role in assisting the experts, while automatic evaluation of speech intelligibility is difficult because it is usually nonstationary and mutational. In this paper, we carry out an independent innovation of feature extraction and reduction, and we describe a multigranularity combined feature scheme which is optimized by the hierarchical visual method. A novel method of generating feature set based on S-transform and chaotic analysis is proposed. There are BAFS (430, basic acoustics feature), local spectral characteristics MSCC (84, Mel S-transform cepstrum coefficients), and chaotic features (12). Finally, radar chart and F-score are proposed to optimize the features by the hierarchical visual fusion. The feature set could be optimized from 526 to 96 dimensions based on NKI-CCRT corpus and 104 dimensions based on SVD corpus. The experimental results denote that new features by support vector machine (SVM) have the best performance, with a recognition rate of 84.4% on NKI-CCRT corpus and 78.7% on SVD corpus. The proposed method is thus approved to be effective and reliable for pathological speech intelligibility evaluation. PMID:28194222

  10. Intelligibility Evaluation of Pathological Speech through Multigranularity Feature Extraction and Optimization.

    PubMed

    Fang, Chunying; Li, Haifeng; Ma, Lin; Zhang, Mancai

    2017-01-01

    Pathological speech usually refers to speech distortion resulting from illness or other biological insults. The assessment of pathological speech plays an important role in assisting the experts, while automatic evaluation of speech intelligibility is difficult because it is usually nonstationary and mutational. In this paper, we carry out an independent innovation of feature extraction and reduction, and we describe a multigranularity combined feature scheme which is optimized by the hierarchical visual method. A novel method of generating feature set based on S-transform and chaotic analysis is proposed. There are BAFS (430, basic acoustics feature), local spectral characteristics MSCC (84, Mel S-transform cepstrum coefficients), and chaotic features (12). Finally, radar chart and F-score are proposed to optimize the features by the hierarchical visual fusion. The feature set could be optimized from 526 to 96 dimensions based on NKI-CCRT corpus and 104 dimensions based on SVD corpus. The experimental results denote that new features by support vector machine (SVM) have the best performance, with a recognition rate of 84.4% on NKI-CCRT corpus and 78.7% on SVD corpus. The proposed method is thus approved to be effective and reliable for pathological speech intelligibility evaluation.

  11. Effects and modeling of phonetic and acoustic confusions in accented speech

    NASA Astrophysics Data System (ADS)

    Fung, Pascale; Liu, Yi

    2005-11-01

    Accented speech recognition is more challenging than standard speech recognition due to the effects of phonetic and acoustic confusions. Phonetic confusion in accented speech occurs when an expected phone is pronounced as a different one, which leads to erroneous recognition. Acoustic confusion occurs when the pronounced phone is found to lie acoustically between two baseform models and can be equally recognized as either one. We propose that it is necessary to analyze and model these confusions separately in order to improve accented speech recognition without degrading standard speech recognition. Since low phonetic confusion units in accented speech do not give rise to automatic speech recognition errors, we focus on analyzing and reducing phonetic and acoustic confusability under high phonetic confusion conditions. We propose using likelihood ratio test to measure phonetic confusion, and asymmetric acoustic distance to measure acoustic confusion. Only accent-specific phonetic units with low acoustic confusion are used in an augmented pronunciation dictionary, while phonetic units with high acoustic confusion are reconstructed using decision tree merging. Experimental results show that our approach is effective and superior to methods modeling phonetic confusion or acoustic confusion alone in accented speech, with a significant 5.7% absolute WER reduction, without degrading standard speech recognition.

  12. The acoustic features of human laughter

    NASA Astrophysics Data System (ADS)

    Bachorowski, Jo-Anne; Owren, Michael J.

    2002-05-01

    Remarkably little is known about the acoustic features of laughter, despite laughter's ubiquitous role in human vocal communication. Outcomes are described for 1024 naturally produced laugh bouts recorded from 97 young adults. Acoustic analysis focused on temporal characteristics, production modes, source- and filter-related effects, and indexical cues to laugher sex and individual identity. The results indicate that laughter is a remarkably complex vocal signal, with evident diversity in both production modes and fundamental frequency characteristics. Also of interest was finding a consistent lack of articulation effects in supralaryngeal filtering. Outcomes are compared to previously advanced hypotheses and conjectures about this species-typical vocal signal.

  13. Influences of noise-interruption and information-bearing acoustic changes on understanding simulated electric-acoustic speech.

    PubMed

    Stilp, Christian; Donaldson, Gail; Oh, Soohee; Kong, Ying-Yee

    2016-11-01

    In simulations of electrical-acoustic stimulation (EAS), vocoded speech intelligibility is aided by preservation of low-frequency acoustic cues. However, the speech signal is often interrupted in everyday listening conditions, and effects of interruption on hybrid speech intelligibility are poorly understood. Additionally, listeners rely on information-bearing acoustic changes to understand full-spectrum speech (as measured by cochlea-scaled entropy [CSE]) and vocoded speech (CSECI), but how listeners utilize these informational changes to understand EAS speech is unclear. Here, normal-hearing participants heard noise-vocoded sentences with three to six spectral channels in two conditions: vocoder-only (80-8000 Hz) and simulated hybrid EAS (vocoded above 500 Hz; original acoustic signal below 500 Hz). In each sentence, four 80-ms intervals containing high-CSECI or low-CSECI acoustic changes were replaced with speech-shaped noise. As expected, performance improved with the preservation of low-frequency fine-structure cues (EAS). This improvement decreased for continuous EAS sentences as more spectral channels were added, but increased as more channels were added to noise-interrupted EAS sentences. Performance was impaired more when high-CSECI intervals were replaced by noise than when low-CSECI intervals were replaced, but this pattern did not differ across listening modes. Utilizing information-bearing acoustic changes to understand speech is predicted to generalize to cochlear implant users who receive EAS inputs.

  14. Aging Affects Neural Synchronization to Speech-Related Acoustic Modulations

    PubMed Central

    Goossens, Tine; Vercammen, Charlotte; Wouters, Jan; van Wieringen, Astrid

    2016-01-01

    As people age, speech perception problems become highly prevalent, especially in noisy situations. In addition to peripheral hearing and cognition, temporal processing plays a key role in speech perception. Temporal processing of speech features is mediated by synchronized activity of neural oscillations in the central auditory system. Previous studies indicate that both the degree and hemispheric lateralization of synchronized neural activity relate to speech perception performance. Based on these results, we hypothesize that impaired speech perception in older persons may, in part, originate from deviances in neural synchronization. In this study, auditory steady-state responses that reflect synchronized activity of theta, beta, low and high gamma oscillations (i.e., 4, 20, 40, and 80 Hz ASSR, respectively) were recorded in young, middle-aged, and older persons. As all participants had normal audiometric thresholds and were screened for (mild) cognitive impairment, differences in synchronized neural activity across the three age groups were likely to be attributed to age. Our data yield novel findings regarding theta and high gamma oscillations in the aging auditory system. At an older age, synchronized activity of theta oscillations is increased, whereas high gamma synchronization is decreased. In contrast to young persons who exhibit a right hemispheric dominance for processing of high gamma range modulations, older adults show a symmetrical processing pattern. These age-related changes in neural synchronization may very well underlie the speech perception problems in aging persons. PMID:27378906

  15. Quantifying the effect of compression hearing aid release time on speech acoustics and intelligibility.

    PubMed

    Jenstad, Lorienne M; Souza, Pamela E

    2005-06-01

    Compression hearing aids have the inherent, and often adjustable, feature of release time from compression. Research to date does not provide a consensus on how to choose or set release time. The current study had 2 purposes: (a) a comprehensive evaluation of the acoustic effects of release time for a single-channel compression system in quiet and (b) an evaluation of the relation between the acoustic changes and speech recognition. The release times under study were 12, 100, and 800 ms. All of the stimuli were VC syllables from the Nonsense Syllable Task spoken by a female talker. The stimuli were processed through a hearing aid simulator at 3 input levels. Two acoustic measures were made on individual syllables: the envelope-difference index and CV ratio. These measurements allowed for quantification of the short-term amplitude characteristics of the speech signal and the changes to these amplitude characteristics caused by compression. The acoustic analyses revealed statistically significant effects among the 3 release times. The size of the effect was dependent on characteristics of the phoneme. Twelve listeners with moderate sensorineural hearing loss were tested for their speech recognition for the same stimuli. Although release time for this single-channel, 3:1 compression ratio system did not directly predict overall intelligibility for these nonsense syllables in quiet, the acoustic measurements reflecting the changes due to release time were significant predictors of phoneme recognition. Increased temporal-envelope distortion was predictive of reduced recognition for some individual phonemes, which is consistent with previous research on the importance of relative amplitude as a cue to syllable recognition for some phonemes.

  16. DWT features performance analysis for automatic speech recognition of Urdu.

    PubMed

    Ali, Hazrat; Ahmad, Nasir; Zhou, Xianwei; Iqbal, Khalid; Ali, Sahibzada Muhammad

    2014-01-01

    This paper presents the work on Automatic Speech Recognition of Urdu language, using a comparative analysis for Discrete Wavelets Transform (DWT) based features and Mel Frequency Cepstral Coefficients (MFCC). These features have been extracted for one hundred isolated words of Urdu, each word uttered by ten different speakers. The words have been selected from the most frequently used words of Urdu. A variety of age and dialect has been covered by using a balanced corpus approach. After extraction of features, the classification has been achieved by using Linear Discriminant Analysis. After the classification task, the confusion matrix obtained for the DWT features has been compared with the one obtained for Mel-Frequency Cepstral Coefficients based speech recognition. The framework has been trained and tested for speech data recorded under controlled environments. The experimental results are useful in determination of the optimum features for speech recognition task.

  17. A Bayesian view on acoustic model-based techniques for robust speech recognition

    NASA Astrophysics Data System (ADS)

    Maas, Roland; Huemmer, Christian; Sehr, Armin; Kellermann, Walter

    2015-12-01

    This article provides a unifying Bayesian view on various approaches for acoustic model adaptation, missing feature, and uncertainty decoding that are well-known in the literature of robust automatic speech recognition. The representatives of these classes can often be deduced from a Bayesian network that extends the conventional hidden Markov models used in speech recognition. These extensions, in turn, can in many cases be motivated from an underlying observation model that relates clean and distorted feature vectors. By identifying and converting the observation models into a Bayesian network representation, we formulate the corresponding compensation rules. We thus summarize the various approaches as approximations or modifications of the same Bayesian decoding rule leading to a unified view on known derivations as well as to new formulations for certain approaches.

  18. Suppressed Alpha Oscillations Predict Intelligibility of Speech and its Acoustic Details

    PubMed Central

    Weisz, Nathan

    2012-01-01

    Modulations of human alpha oscillations (8–13 Hz) accompany many cognitive processes, but their functional role in auditory perception has proven elusive: Do oscillatory dynamics of alpha reflect acoustic details of the speech signal and are they indicative of comprehension success? Acoustically presented words were degraded in acoustic envelope and spectrum in an orthogonal design, and electroencephalogram responses in the frequency domain were analyzed in 24 participants, who rated word comprehensibility after each trial. First, the alpha power suppression during and after a degraded word depended monotonically on spectral and, to a lesser extent, envelope detail. The magnitude of this alpha suppression exhibited an additional and independent influence on later comprehension ratings. Second, source localization of alpha suppression yielded superior parietal, prefrontal, as well as anterior temporal brain areas. Third, multivariate classification of the time–frequency pattern across participants showed that patterns of late posterior alpha power allowed best for above-chance classification of word intelligibility. Results suggest that both magnitude and topography of late alpha suppression in response to single words can indicate a listener's sensitivity to acoustic features and the ability to comprehend speech under adverse listening conditions. PMID:22100354

  19. Talker Differences in Clear and Conversational Speech: Acoustic Characteristics of Vowels

    ERIC Educational Resources Information Center

    Ferguson, Sarah Hargus; Kewley-Port, Diane

    2007-01-01

    Purpose: To determine the specific acoustic changes that underlie improved vowel intelligibility in clear speech. Method: Seven acoustic metrics were measured for conversational and clear vowels produced by 12 talkers--6 who previously were found (S. H. Ferguson, 2004) to produce a large clear speech vowel intelligibility effect for listeners with…

  20. On the Acoustics of Emotion in Audio: What Speech, Music, and Sound have in Common

    PubMed Central

    Weninger, Felix; Eyben, Florian; Schuller, Björn W.; Mortillaro, Marcello; Scherer, Klaus R.

    2013-01-01

    Without doubt, there is emotional information in almost any kind of sound received by humans every day: be it the affective state of a person transmitted by means of speech; the emotion intended by a composer while writing a musical piece, or conveyed by a musician while performing it; or the affective state connected to an acoustic event occurring in the environment, in the soundtrack of a movie, or in a radio play. In the field of affective computing, there is currently some loosely connected research concerning either of these phenomena, but a holistic computational model of affect in sound is still lacking. In turn, for tomorrow’s pervasive technical systems, including affective companions and robots, it is expected to be highly beneficial to understand the affective dimensions of “the sound that something makes,” in order to evaluate the system’s auditory environment and its own audio output. This article aims at a first step toward a holistic computational model: starting from standard acoustic feature extraction schemes in the domains of speech, music, and sound analysis, we interpret the worth of individual features across these three domains, considering four audio databases with observer annotations in the arousal and valence dimensions. In the results, we find that by selection of appropriate descriptors, cross-domain arousal, and valence regression is feasible achieving significant correlations with the observer annotations of up to 0.78 for arousal (training on sound and testing on enacted speech) and 0.60 for valence (training on enacted speech and testing on music). The high degree of cross-domain consistency in encoding the two main dimensions of affect may be attributable to the co-evolution of speech and music from multimodal affect bursts, including the integration of nature sounds for expressive effects. PMID:23750144

  1. Denoising of human speech using combined acoustic and em sensor signal processing

    SciTech Connect

    Ng, L C; Burnett, G C; Holzrichter, J F; Gable, T J

    1999-11-29

    Low Power EM radar-like sensors have made it possible to measure properties of the human speech production system in real-time, without acoustic interference. This greatly enhances the quality and quantify of information for many speech related applications. See Holzrichter, Burnett, Ng, and Lea, J. Acoustic. Soc. Am. 103 (1) 622 (1998). By using combined Glottal-EM- Sensor- and Acoustic-signals, segments of voiced, unvoiced, and no-speech can be reliably defined. Real-time Denoising filters can be constructed to remove noise from the user's corresponding speech signal.

  2. Acoustic Analysis of the Voiced-Voiceless Distinction in Dutch Tracheoesophageal Speech

    ERIC Educational Resources Information Center

    Jongmans, Petra; Wempe, Ton G.; van Tinteren, Harm; Hilgers, Frans J. M.; Pols, Louis C. W.; van As-Brooks, Corina J.

    2010-01-01

    Purpose: Confusions between voiced and voiceless plosives and voiced and voiceless fricatives are common in Dutch tracheoesophageal (TE) speech. This study investigates (a) which acoustic measures are found to convey a correct voicing contrast in TE speech and (b) whether different measures are found in TE speech than in normal laryngeal (NL)…

  3. Perceptual and Acoustic Reliability Estimates for the Speech Disorders Classification System (SDCS)

    ERIC Educational Resources Information Center

    Shriberg, Lawrence D.; Fourakis, Marios; Hall, Sheryl D.; Karlsson, Heather B.; Lohmeier, Heather L.; McSweeny, Jane L.; Potter, Nancy L.; Scheer-Cohen, Alison R.; Strand, Edythe A.; Tilkens, Christie M.; Wilson, David L.

    2010-01-01

    A companion paper describes three extensions to a classification system for paediatric speech sound disorders termed the Speech Disorders Classification System (SDCS). The SDCS uses perceptual and acoustic data reduction methods to obtain information on a speaker's speech, prosody, and voice. The present paper provides reliability estimates for…

  4. Acoustic Predictors of Intelligibility for Segmentally Interrupted Speech: Temporal Envelope, Voicing, and Duration

    ERIC Educational Resources Information Center

    Fogerty, Daniel

    2013-01-01

    Purpose: Temporal interruption limits the perception of speech to isolated temporal glimpses. An analysis was conducted to determine the acoustic parameter that best predicts speech recognition from temporal fragments that preserve different types of speech information--namely, consonants and vowels. Method: Young listeners with normal hearing…

  5. Visual Cortical Entrainment to Motion and Categorical Speech Features during Silent Lipreading

    PubMed Central

    O’Sullivan, Aisling E.; Crosse, Michael J.; Di Liberto, Giovanni M.; Lalor, Edmund C.

    2017-01-01

    Speech is a multisensory percept, comprising an auditory and visual component. While the content and processing pathways of audio speech have been well characterized, the visual component is less well understood. In this work, we expand current methodologies using system identification to introduce a framework that facilitates the study of visual speech in its natural, continuous form. Specifically, we use models based on the unheard acoustic envelope (E), the motion signal (M) and categorical visual speech features (V) to predict EEG activity during silent lipreading. Our results show that each of these models performs similarly at predicting EEG in visual regions and that respective combinations of the individual models (EV, MV, EM and EMV) provide an improved prediction of the neural activity over their constituent models. In comparing these different combinations, we find that the model incorporating all three types of features (EMV) outperforms the individual models, as well as both the EV and MV models, while it performs similarly to the EM model. Importantly, EM does not outperform EV and MV, which, considering the higher dimensionality of the V model, suggests that more data is needed to clarify this finding. Nevertheless, the performance of EMV, and comparisons of the subject performances for the three individual models, provides further evidence to suggest that visual regions are involved in both low-level processing of stimulus dynamics and categorical speech perception. This framework may prove useful for investigating modality-specific processing of visual speech under naturalistic conditions. PMID:28123363

  6. Frequency overlap between electric and acoustic stimulation and speech-perception benefit in patients with combined electric and acoustic stimulation

    PubMed Central

    Zhang, Ting; Spahr, Anthony J.; Dorman, Michael F.

    2010-01-01

    Objectives Our aim was to assess, for patients with a cochlear implant in one ear and low-frequency acoustic hearing in the contralateral ear, whether reducing the overlap in frequencies conveyed in the acoustic signal and those analyzed by the cochlear implant speech processor would improve speech recognition. Design The recognition of monosyllabic words in quiet and sentences in noise was evaluated in three listening configurations: electric stimulation alone, acoustic stimulation alone, and combined electric and acoustic stimulation. The acoustic stimuli were either unfiltered or low-pass (LP) filtered at 250 Hz, 500 Hz, or 750 Hz. The electric stimuli were either unfiltered or high-pass (HP) filtered at 250 Hz, 500 Hz or 750 Hz. In the combined condition the unfiltered acoustic signal was paired with the unfiltered electric signal, the 250 LP acoustic signal was paired with the 250 Hz HP electric signal, the 500 Hz LP acoustic signal was paired with the 500 Hz HP electric signal and the 750 Hz LP acoustic signal was paired with the 750 Hz HP electric signal. Results For both acoustic and electric signals performance increased as the bandwith increased. The highest level of performance in the combined condition was observed in the unfiltered acoustic plus unfiltered electric condition. Conclusions Reducing the overlap in frequency representation between acoustic and electric stimulation does not increase speech understanding scores for patients who have residual hearing in the ear contralateral to the implant. We find that acoustic information below 250 Hz significantly improves performance for patients who combine electric and acoustic stimulation and accounts for the majority of the speech-perception benefit when acoustic stimulation is combined with electric stimulation. PMID:19915474

  7. Speech recognition in dental software systems: features and functionality.

    PubMed

    Yuhaniak Irwin, Jeannie; Fernando, Shawn; Schleyer, Titus; Spallek, Heiko

    2007-01-01

    Speech recognition allows clinicians a hands-free option for interacting with computers, which is important for dentists who have difficulty using a keyboard and a mouse when working with patients. While roughly 13% of all general dentists with computers at chairside use speech recognition for data entry, 16% have tried and discontinued using this technology. In this study, researches explored the speech recognition features and functionality of four dental software applications. For each system, the documentation as well as the working program was evaluated to determine speech recognition capabilities. A comparison checklist was created to highlight each program's speech functionality. Next, after the development of charting scripts, feasibility user tests were conducted to determine if performance comparisons could be made across systems. While four systems were evaluated in the feature comparison, only two of the systems were reviewed during the feasibility user tests. Results show that current speech functionality, instead of being intuitive, is directly comparable to using a mouse. Further, systems require memorizing an enormous amount of specific terminology opposed to using natural language. User testing is a feasible way to measure the performance of speech recognition across systems and will be conducted in the near future. Overall, limited speech functionality reduces the ability of clinicians to interact directly with the computer during clinical care. This can hinder the benefits of electronic patient records and clinical decision support systems.

  8. Acoustic landmarks drive delta-theta oscillations to enable speech comprehension by facilitating perceptual parsing.

    PubMed

    Doelling, Keith B; Arnal, Luc H; Ghitza, Oded; Poeppel, David

    2014-01-15

    A growing body of research suggests that intrinsic neuronal slow (<10 Hz) oscillations in auditory cortex appear to track incoming speech and other spectro-temporally complex auditory signals. Within this framework, several recent studies have identified critical-band temporal envelopes as the specific acoustic feature being reflected by the phase of these oscillations. However, how this alignment between speech acoustics and neural oscillations might underpin intelligibility is unclear. Here we test the hypothesis that the 'sharpness' of temporal fluctuations in the critical band envelope acts as a temporal cue to speech syllabic rate, driving delta-theta rhythms to track the stimulus and facilitate intelligibility. We interpret our findings as evidence that sharp events in the stimulus cause cortical rhythms to re-align and parse the stimulus into syllable-sized chunks for further decoding. Using magnetoencephalographic recordings, we show that by removing temporal fluctuations that occur at the syllabic rate, envelope-tracking activity is reduced. By artificially reinstating these temporal fluctuations, envelope-tracking activity is regained. These changes in tracking correlate with intelligibility of the stimulus. Together, the results suggest that the sharpness of fluctuations in the stimulus, as reflected in the cochlear output, drive oscillatory activity to track and entrain to the stimulus, at its syllabic rate. This process likely facilitates parsing of the stimulus into meaningful chunks appropriate for subsequent decoding, enhancing perception and intelligibility.

  9. Location and acoustic scale cues in concurrent speech recognition1

    PubMed Central

    Ives, D. Timothy; Vestergaard, Martin D.; Kistler, Doris J.; Patterson, Roy D.

    2010-01-01

    Location and acoustic scale cues have both been shown to have an effect on the recognition of speech in multi-speaker environments. This study examines the interaction of these variables. Subjects were presented with concurrent triplets of syllables from a target voice and a distracting voice, and asked to recognize a specific target syllable. The task was made more or less difficult by changing (a) the location of the distracting speaker, (b) the scale difference between the two speakers, and∕or (c) the relative level of the two speakers. Scale differences were produced by changing the vocal tract length and glottal pulse rate during syllable synthesis: 32 acoustic scale differences were used. Location cues were produced by convolving head-related transfer functions with the stimulus. The angle between the target speaker and the distracter was 0°, 4°, 8°, 16°, or 32° on the 0° horizontal plane. The relative level of the target to the distracter was 0 or −6 dB. The results show that location and scale difference interact, and the interaction is greatest when one of these cues is small. Increasing either the acoustic scale or the angle between target and distracter speakers quickly elevates performance to ceiling levels. PMID:20550271

  10. Logopenic and Nonfluent Variants of Primary Progressive Aphasia Are Differentiated by Acoustic Measures of Speech Production

    PubMed Central

    Ballard, Kirrie J.; Savage, Sharon; Leyton, Cristian E.; Vogel, Adam P.; Hornberger, Michael; Hodges, John R.

    2014-01-01

    Differentiation of logopenic (lvPPA) and nonfluent/agrammatic (nfvPPA) variants of Primary Progressive Aphasia is important yet remains challenging since it hinges on expert based evaluation of speech and language production. In this study acoustic measures of speech in conjunction with voxel-based morphometry were used to determine the success of the measures as an adjunct to diagnosis and to explore the neural basis of apraxia of speech in nfvPPA. Forty-one patients (21 lvPPA, 20 nfvPPA) were recruited from a consecutive sample with suspected frontotemporal dementia. Patients were diagnosed using the current gold-standard of expert perceptual judgment, based on presence/absence of particular speech features during speaking tasks. Seventeen healthy age-matched adults served as controls. MRI scans were available for 11 control and 37 PPA cases; 23 of the PPA cases underwent amyloid ligand PET imaging. Measures, corresponding to perceptual features of apraxia of speech, were periods of silence during reading and relative vowel duration and intensity in polysyllable word repetition. Discriminant function analyses revealed that a measure of relative vowel duration differentiated nfvPPA cases from both control and lvPPA cases (r2 = 0.47) with 88% agreement with expert judgment of presence of apraxia of speech in nfvPPA cases. VBM analysis showed that relative vowel duration covaried with grey matter intensity in areas critical for speech motor planning and programming: precentral gyrus, supplementary motor area and inferior frontal gyrus bilaterally, only affected in the nfvPPA group. This bilateral involvement of frontal speech networks in nfvPPA potentially affects access to compensatory mechanisms involving right hemisphere homologues. Measures of silences during reading also discriminated the PPA and control groups, but did not increase predictive accuracy. Findings suggest that a measure of relative vowel duration from of a polysyllable word repetition task

  11. Logopenic and nonfluent variants of primary progressive aphasia are differentiated by acoustic measures of speech production.

    PubMed

    Ballard, Kirrie J; Savage, Sharon; Leyton, Cristian E; Vogel, Adam P; Hornberger, Michael; Hodges, John R

    2014-01-01

    Differentiation of logopenic (lvPPA) and nonfluent/agrammatic (nfvPPA) variants of Primary Progressive Aphasia is important yet remains challenging since it hinges on expert based evaluation of speech and language production. In this study acoustic measures of speech in conjunction with voxel-based morphometry were used to determine the success of the measures as an adjunct to diagnosis and to explore the neural basis of apraxia of speech in nfvPPA. Forty-one patients (21 lvPPA, 20 nfvPPA) were recruited from a consecutive sample with suspected frontotemporal dementia. Patients were diagnosed using the current gold-standard of expert perceptual judgment, based on presence/absence of particular speech features during speaking tasks. Seventeen healthy age-matched adults served as controls. MRI scans were available for 11 control and 37 PPA cases; 23 of the PPA cases underwent amyloid ligand PET imaging. Measures, corresponding to perceptual features of apraxia of speech, were periods of silence during reading and relative vowel duration and intensity in polysyllable word repetition. Discriminant function analyses revealed that a measure of relative vowel duration differentiated nfvPPA cases from both control and lvPPA cases (r(2) = 0.47) with 88% agreement with expert judgment of presence of apraxia of speech in nfvPPA cases. VBM analysis showed that relative vowel duration covaried with grey matter intensity in areas critical for speech motor planning and programming: precentral gyrus, supplementary motor area and inferior frontal gyrus bilaterally, only affected in the nfvPPA group. This bilateral involvement of frontal speech networks in nfvPPA potentially affects access to compensatory mechanisms involving right hemisphere homologues. Measures of silences during reading also discriminated the PPA and control groups, but did not increase predictive accuracy. Findings suggest that a measure of relative vowel duration from of a polysyllable word repetition task

  12. Effective Acoustic Modeling for Pronunciation Quality Scoring of Strongly Accented Mandarin Speech

    NASA Astrophysics Data System (ADS)

    Ge, Fengpei; Liu, Changliang; Shao, Jian; Pan, Fuping; Dong, Bin; Yan, Yonghong

    In this paper we present our investigation into improving the performance of our computer-assisted language learning (CALL) system through exploiting the acoustic model and features within the speech recognition framework. First, to alleviate channel distortion, speaker-dependent cepstrum mean normalization (CMN) is adopted and the average correlation coefficient (average CC) between machine and expert scores is improved from 78.00% to 84.14%. Second, heteroscedastic linear discriminant analysis (HLDA) is adopted to enhance the discriminability of the acoustic model, which successfully increases the average CC from 84.14% to 84.62%. Additionally, HLDA causes the scoring accuracy to be more stable at various pronunciation proficiency levels, and thus leads to an increase in the speaker correct-rank rate from 85.59% to 90.99%. Finally, we use maximum a posteriori (MAP) estimation to tune the acoustic model to fit strongly accented test speech. As a result, the average CC is improved from 84.62% to 86.57%. These three novel techniques improve the accuracy of evaluating pronunciation quality.

  13. Temporal acoustic measures distinguish primary progressive apraxia of speech from primary progressive aphasia.

    PubMed

    Duffy, Joseph R; Hanley, Holly; Utianski, Rene; Clark, Heather; Strand, Edythe; Josephs, Keith A; Whitwell, Jennifer L

    2017-02-07

    The purpose of this study was to determine if acoustic measures of duration and syllable rate during word and sentence repetition, and a measure of within-word lexical stress, distinguish speakers with primary progressive apraxia of speech (PPAOS) from nonapraxic speakers with the agrammatic or logopenic variants of primary progressive aphasia (PPA), and control speakers. Results revealed that the PPAOS group had longer durations and reduced rate of syllable production for most words and sentences, and the measure of lexical stress. Sensitivity and specificity indices for the PPAOS versus the other groups were highest for longer multisyllabic words and sentences. For the PPAOS group, correlations between acoustic measures and perceptual ratings of AOS were moderately high to high. Several temporal measures used in this study may aid differential diagnosis and help quantify features of PPAOS that are distinct from those associated with PPA in which AOS is not present.

  14. Mandarin Speech Perception in Combined Electric and Acoustic Stimulation

    PubMed Central

    Li, Yongxin; Zhang, Guoping; Galvin, John J.; Fu, Qian-Jie

    2014-01-01

    For deaf individuals with residual low-frequency acoustic hearing, combined use of a cochlear implant (CI) and hearing aid (HA) typically provides better speech understanding than with either device alone. Because of coarse spectral resolution, CIs do not provide fundamental frequency (F0) information that contributes to understanding of tonal languages such as Mandarin Chinese. The HA can provide good representation of F0 and, depending on the range of aided acoustic hearing, first and second formant (F1 and F2) information. In this study, Mandarin tone, vowel, and consonant recognition in quiet and noise was measured in 12 adult Mandarin-speaking bimodal listeners with the CI-only and with the CI+HA. Tone recognition was significantly better with the CI+HA in noise, but not in quiet. Vowel recognition was significantly better with the CI+HA in quiet, but not in noise. There was no significant difference in consonant recognition between the CI-only and the CI+HA in quiet or in noise. There was a wide range in bimodal benefit, with improvements often greater than 20 percentage points in some tests and conditions. The bimodal benefit was compared to CI subjects’ HA-aided pure-tone average (PTA) thresholds between 250 and 2000 Hz; subjects were divided into two groups: “better” PTA (<50 dB HL) or “poorer” PTA (>50 dB HL). The bimodal benefit differed significantly between groups only for consonant recognition. The bimodal benefit for tone recognition in quiet was significantly correlated with CI experience, suggesting that bimodal CI users learn to better combine low-frequency spectro-temporal information from acoustic hearing with temporal envelope information from electric hearing. Given the small number of subjects in this study (n = 12), further research with Chinese bimodal listeners may provide more information regarding the contribution of acoustic and electric hearing to tonal language perception. PMID:25386962

  15. Noise-robust speech recognition through auditory feature detection and spike sequence decoding.

    PubMed

    Schafer, Phillip B; Jin, Dezhe Z

    2014-03-01

    Speech recognition in noisy conditions is a major challenge for computer systems, but the human brain performs it routinely and accurately. Automatic speech recognition (ASR) systems that are inspired by neuroscience can potentially bridge the performance gap between humans and machines. We present a system for noise-robust isolated word recognition that works by decoding sequences of spikes from a population of simulated auditory feature-detecting neurons. Each neuron is trained to respond selectively to a brief spectrotemporal pattern, or feature, drawn from the simulated auditory nerve response to speech. The neural population conveys the time-dependent structure of a sound by its sequence of spikes. We compare two methods for decoding the spike sequences--one using a hidden Markov model-based recognizer, the other using a novel template-based recognition scheme. In the latter case, words are recognized by comparing their spike sequences to template sequences obtained from clean training data, using a similarity measure based on the length of the longest common sub-sequence. Using isolated spoken digits from the AURORA-2 database, we show that our combined system outperforms a state-of-the-art robust speech recognizer at low signal-to-noise ratios. Both the spike-based encoding scheme and the template-based decoding offer gains in noise robustness over traditional speech recognition methods. Our system highlights potential advantages of spike-based acoustic coding and provides a biologically motivated framework for robust ASR development.

  16. Acoustical Characteristics of Mastication Sounds: Application of Speech Analysis Techniques

    NASA Astrophysics Data System (ADS)

    Brochetti, Denise

    Food scientists have used acoustical methods to study characteristics of mastication sounds in relation to food texture. However, a model for analysis of the sounds has not been identified, and reliability of the methods has not been reported. Therefore, speech analysis techniques were applied to mastication sounds, and variation in measures of the sounds was examined. To meet these objectives, two experiments were conducted. In the first experiment, a digital sound spectrograph generated waveforms and wideband spectrograms of sounds by 3 adult subjects (1 male, 2 females) for initial chews of food samples differing in hardness and fracturability. Acoustical characteristics were described and compared. For all sounds, formants appeared in the spectrograms, and energy occurred across a 0 to 8000-Hz range of frequencies. Bursts characterized waveforms for peanut, almond, raw carrot, ginger snap, and hard candy. Duration and amplitude of the sounds varied with the subjects. In the second experiment, the spectrograph was used to measure the duration, amplitude, and formants of sounds for the initial 2 chews of cylindrical food samples (raw carrot, teething toast) differing in diameter (1.27, 1.90, 2.54 cm). Six adult subjects (3 males, 3 females) having normal occlusions and temporomandibular joints chewed the samples between the molar teeth and with the mouth open. Ten repetitions per subject were examined for each food sample. Analysis of estimates of variation indicated an inconsistent intrasubject variation in the acoustical measures. Food type and sample diameter also affected the estimates, indicating the variable nature of mastication. Generally, intrasubject variation was greater than intersubject variation. Analysis of ranks of the data indicated that the effect of sample diameter on the acoustical measures was inconsistent and depended on the subject and type of food. If inferences are to be made concerning food texture from acoustical measures of mastication

  17. Acoustic properties of naturally produced clear speech at normal speaking rates

    NASA Astrophysics Data System (ADS)

    Krause, Jean C.; Braida, Louis D.

    2004-01-01

    Sentences spoken ``clearly'' are significantly more intelligible than those spoken ``conversationally'' for hearing-impaired listeners in a variety of backgrounds [Picheny et al., J. Speech Hear. Res. 28, 96-103 (1985); Uchanski et al., ibid. 39, 494-509 (1996); Payton et al., J. Acoust. Soc. Am. 95, 1581-1592 (1994)]. While producing clear speech, however, talkers often reduce their speaking rate significantly [Picheny et al., J. Speech Hear. Res. 29, 434-446 (1986); Uchanski et al., ibid. 39, 494-509 (1996)]. Yet speaking slowly is not solely responsible for the intelligibility benefit of clear speech (over conversational speech), since a recent study [Krause and Braida, J. Acoust. Soc. Am. 112, 2165-2172 (2002)] showed that talkers can produce clear speech at normal rates with training. This finding suggests that clear speech has inherent acoustic properties, independent of rate, that contribute to improved intelligibility. Identifying these acoustic properties could lead to improved signal processing schemes for hearing aids. To gain insight into these acoustical properties, conversational and clear speech produced at normal speaking rates were analyzed at three levels of detail (global, phonological, and phonetic). Although results suggest that talkers may have employed different strategies to achieve clear speech at normal rates, two global-level properties were identified that appear likely to be linked to the improvements in intelligibility provided by clear/normal speech: increased energy in the 1000-3000-Hz range of long-term spectra and increased modulation depth of low frequency modulations of the intensity envelope. Other phonological and phonetic differences associated with clear/normal speech include changes in (1) frequency of stop burst releases, (2) VOT of word-initial voiceless stop consonants, and (3) short-term vowel spectra.

  18. Effect of Reflective Practice on Student Recall of Acoustics for Speech Science

    ERIC Educational Resources Information Center

    Walden, Patrick R.; Bell-Berti, Fredericka

    2013-01-01

    Researchers have developed models of learning through experience; however, these models are rarely named as a conceptual frame for educational research in the sciences. This study examined the effect of reflective learning responses on student recall of speech acoustics concepts. Two groups of undergraduate students enrolled in a speech science…

  19. Formant Centralization Ratio: A Proposal for a New Acoustic Measure of Dysarthric Speech

    ERIC Educational Resources Information Center

    Sapir, Shimon; Ramig, Lorraine O.; Spielman, Jennifer L.; Fox, Cynthia

    2010-01-01

    Purpose: The vowel space area (VSA) has been used as an acoustic metric of dysarthric speech, but with varying degrees of success. In this study, the authors aimed to test an alternative metric to the VSA--the "formant centralization ratio" (FCR), which is hypothesized to more effectively differentiate dysarthric from healthy speech and register…

  20. Acoustic diagnosis of pulmonary hypertension: automated speech- recognition-inspired classification algorithm outperforms physicians

    PubMed Central

    Kaddoura, Tarek; Vadlamudi, Karunakar; Kumar, Shine; Bobhate, Prashant; Guo, Long; Jain, Shreepal; Elgendi, Mohamed; Coe, James Y; Kim, Daniel; Taylor, Dylan; Tymchak, Wayne; Schuurmans, Dale; Zemp, Roger J.; Adatia, Ian

    2016-01-01

    We hypothesized that an automated speech- recognition-inspired classification algorithm could differentiate between the heart sounds in subjects with and without pulmonary hypertension (PH) and outperform physicians. Heart sounds, electrocardiograms, and mean pulmonary artery pressures (mPAp) were recorded simultaneously. Heart sound recordings were digitized to train and test speech-recognition-inspired classification algorithms. We used mel-frequency cepstral coefficients to extract features from the heart sounds. Gaussian-mixture models classified the features as PH (mPAp ≥ 25 mmHg) or normal (mPAp < 25 mmHg). Physicians blinded to patient data listened to the same heart sound recordings and attempted a diagnosis. We studied 164 subjects: 86 with mPAp ≥ 25 mmHg (mPAp 41 ± 12 mmHg) and 78 with mPAp < 25 mmHg (mPAp 17 ± 5 mmHg) (p  < 0.005). The correct diagnostic rate of the automated speech-recognition-inspired algorithm was 74% compared to 56% by physicians (p = 0.005). The false positive rate for the algorithm was 34% versus 50% (p = 0.04) for clinicians. The false negative rate for the algorithm was 23% and 68% (p = 0.0002) for physicians. We developed an automated speech-recognition-inspired classification algorithm for the acoustic diagnosis of PH that outperforms physicians that could be used to screen for PH and encourage earlier specialist referral. PMID:27609672

  1. Acoustic diagnosis of pulmonary hypertension: automated speech- recognition-inspired classification algorithm outperforms physicians

    NASA Astrophysics Data System (ADS)

    Kaddoura, Tarek; Vadlamudi, Karunakar; Kumar, Shine; Bobhate, Prashant; Guo, Long; Jain, Shreepal; Elgendi, Mohamed; Coe, James Y.; Kim, Daniel; Taylor, Dylan; Tymchak, Wayne; Schuurmans, Dale; Zemp, Roger J.; Adatia, Ian

    2016-09-01

    We hypothesized that an automated speech- recognition-inspired classification algorithm could differentiate between the heart sounds in subjects with and without pulmonary hypertension (PH) and outperform physicians. Heart sounds, electrocardiograms, and mean pulmonary artery pressures (mPAp) were recorded simultaneously. Heart sound recordings were digitized to train and test speech-recognition-inspired classification algorithms. We used mel-frequency cepstral coefficients to extract features from the heart sounds. Gaussian-mixture models classified the features as PH (mPAp ≥ 25 mmHg) or normal (mPAp < 25 mmHg). Physicians blinded to patient data listened to the same heart sound recordings and attempted a diagnosis. We studied 164 subjects: 86 with mPAp ≥ 25 mmHg (mPAp 41 ± 12 mmHg) and 78 with mPAp < 25 mmHg (mPAp 17 ± 5 mmHg) (p  < 0.005). The correct diagnostic rate of the automated speech-recognition-inspired algorithm was 74% compared to 56% by physicians (p = 0.005). The false positive rate for the algorithm was 34% versus 50% (p = 0.04) for clinicians. The false negative rate for the algorithm was 23% and 68% (p = 0.0002) for physicians. We developed an automated speech-recognition-inspired classification algorithm for the acoustic diagnosis of PH that outperforms physicians that could be used to screen for PH and encourage earlier specialist referral.

  2. A Frame-Based Context-Dependent Acoustic Modeling for Speech Recognition

    NASA Astrophysics Data System (ADS)

    Terashima, Ryuta; Zen, Heiga; Nankaku, Yoshihiko; Tokuda, Keiichi

    We propose a novel acoustic model for speech recognition, named FCD (Frame-based Context Dependent) model. It can obtain a probability distribution by using a top-down clustering technique to simultaneously consider the local frame position in phoneme, phoneme duration, and phoneme context. The model topology is derived from connecting left-to-right HMM models without self-loop transition for each phoneme duration. Because the FCD model can change the probability distribution into a sequence corresponding with one phoneme duration, it can has the ability to generate a smooth trajectory of speech feature vector. We also performed an experiment to evaluate the performance of speech recognition for the model. In the experiment, 132 questions for frame position, 66 questions for phoneme duration and 134 questions for phoneme context were used to train the sub-phoneme FCD model. In order to compare the performance, left-to-right HMM and two types of HSMM models with almost same number of states were also trained. As a result, 18% of relative improvement of tri-phone accuracy was achieved by the FCD model.

  3. Dyadic Wavelet Features for Isolated Word Speaker Dependent Speech Recognition

    DTIC Science & Technology

    1994-03-01

    Guppies RL/IRAA 32 Hangar Rd Griffiss AFB NY 13441 11. SUPPLEMENTARY NOTES 12a. DISTRIBUTION/ AVAILABILITY STATEMENT 12b. DISTRIBUTION CODE Distribution...contains ten examples of each of the spoken digits ("zero" through "nine") for eight different speakers; four male and four female. The speech recordings...there were no overlapping windows. Once the feature vector was determined, the features were level normalized . This was achieved by subtracting each

  4. The Use of Artificial Neural Networks to Estimate Speech Intelligibility from Acoustic Variables: A Preliminary Analysis.

    ERIC Educational Resources Information Center

    Metz, Dale Evan; And Others

    1992-01-01

    A preliminary scheme for estimating the speech intelligibility of hearing-impaired speakers from acoustic parameters, using a computerized artificial neural network to process mathematically the acoustic input variables, is outlined. Tests with 60 hearing-impaired speakers found the scheme to be highly accurate in identifying speakers separated by…

  5. Speech production knowledge in automatic speech recognition.

    PubMed

    King, Simon; Frankel, Joe; Livescu, Karen; McDermott, Erik; Richmond, Korin; Wester, Mirjam

    2007-02-01

    Although much is known about how speech is produced, and research into speech production has resulted in measured articulatory data, feature systems of different kinds, and numerous models, speech production knowledge is almost totally ignored in current mainstream approaches to automatic speech recognition. Representations of speech production allow simple explanations for many phenomena observed in speech which cannot be easily analyzed from either acoustic signal or phonetic transcription alone. In this article, a survey of a growing body of work in which such representations are used to improve automatic speech recognition is provided.

  6. Fourier descriptor features for acoustic landmine detection

    NASA Astrophysics Data System (ADS)

    Keller, James M.; Cheng, Zhanqi; Gader, Paul D.; Hocaoglu, Ali K.

    2002-08-01

    Signatures of buried landmines are often difficult to separate from those of clutter objects. Often, shape information is not directly obtainable from the sensors used for landmine detection. The Acoustic Sensing Technology (AST), which uses a Laser Doppler Vibrometer (LDV) that measures the spatial pattern of particle velocity amplitude of the ground surface in a variety of frequency bands, offers a unique look at subsurface phenomena. It directly records shape related information. Generally, after preprocessing the frequency band images in a downward looking LDV system, landmines have fairly regular shapes (roughly circular) over a range of frequencies while clutter tends to exhibit irregular shapes different from those of landmines. Therefore, shape description has the potential to be used in discriminating mines from clutter. Normalized Fourier Descriptors (NFD) are shape parameters independent of size, angular orientation, position, and contour starting conditions. In this paper, the stack of 2D frequency images from the LDV system are preprocessed by a linear combination of order statistics (LOS) filter, thresholding, and 2D and 3D connected labeling. Contours are extracted form the connected components and aggregated to produce evenly spaced boundary points. Two types of Normalized Fourier Descriptors are computed from the outlines. Using images obtained from a standard data collection site, these features are analyzed for their ability to discriminate landmines from background and clutter such as wood and stones. From a standard feature selection procedure, it was found that a very small number of features are required to effectively separate landmines from background and clutter using simple pattern recognition algorithms. Details of the experiments are included.

  7. Speech feature discrimination in deaf children following cochlear implantation

    NASA Astrophysics Data System (ADS)

    Bergeson, Tonya R.; Pisoni, David B.; Kirk, Karen Iler

    2002-05-01

    Speech feature discrimination is a fundamental perceptual skill that is often assumed to underlie word recognition and sentence comprehension performance. To investigate the development of speech feature discrimination in deaf children with cochlear implants, we conducted a retrospective analysis of results from the Minimal Pairs Test (Robbins et al., 1988) selected from patients enrolled in a longitudinal study of speech perception and language development. The MP test uses a 2AFC procedure in which children hear a word and select one of two pictures (bat-pat). All 43 children were prelingually deafened, received a cochlear implant before 6 years of age or between ages 6 and 9, and used either oral or total communication. Children were tested once every 6 months to 1 year for 7 years; not all children were tested at each interval. By 2 years postimplant, the majority of these children achieved near-ceiling levels of discrimination performance for vowel height, vowel place, and consonant manner. Most of the children also achieved plateaus but did not reach ceiling performance for consonant place and voicing. The relationship between speech feature discrimination, spoken word recognition, and sentence comprehension will be discussed. [Work supported by NIH/NIDCD Research Grant No. R01DC00064 and NIH/NIDCD Training Grant No. T32DC00012.

  8. School cafeteria noise-The impact of room acoustics and speech intelligibility on children's voice levels

    NASA Astrophysics Data System (ADS)

    Bridger, Joseph F.

    2002-05-01

    The impact of room acoustics and speech intelligibility conditions of different school cafeterias on the voice levels of children is examined. Methods of evaluating cafeteria designs and predicting noise levels are discussed. Children are shown to modify their voice levels with changes in speech intelligibility like adults. Reverberation and signal to noise ratio are the important acoustical factors affecting speech intelligibility. Children have much more difficulty than adults in conditions where noise and reverberation are present. To evaluate the relationship of voice level and speech intelligibility, a database of real sound levels and room acoustics data was generated from measurements and data recorded during visits to a variety of existing cafeterias under different occupancy conditions. The effects of speech intelligibility and room acoustics on childrens voice levels are demonstrated. A new method is presented for predicting speech intelligibility conditions and resulting noise levels for the design of new cafeterias and renovation of existing facilities. Measurements are provided for an existing school cafeteria before and after new room acoustics treatments were added. This will be helpful for acousticians, architects, school systems, regulatory agencies, and Parent Teacher Associations to create less noisy cafeteria environments.

  9. A bio-inspired feature extraction for robust speech recognition.

    PubMed

    Zouhir, Youssef; Ouni, Kaïs

    2014-01-01

    In this paper, a feature extraction method for robust speech recognition in noisy environments is proposed. The proposed method is motivated by a biologically inspired auditory model which simulates the outer/middle ear filtering by a low-pass filter and the spectral behaviour of the cochlea by the Gammachirp auditory filterbank (GcFB). The speech recognition performance of our method is tested on speech signals corrupted by real-world noises. The evaluation results show that the proposed method gives better recognition rates compared to the classic techniques such as Perceptual Linear Prediction (PLP), Linear Predictive Coding (LPC), Linear Prediction Cepstral coefficients (LPCC) and Mel Frequency Cepstral Coefficients (MFCC). The used recognition system is based on the Hidden Markov Models with continuous Gaussian Mixture densities (HMM-GM).

  10. A maximum likelihood approach to estimating articulator positions from speech acoustics

    SciTech Connect

    Hogden, J.

    1996-09-23

    This proposal presents an algorithm called maximum likelihood continuity mapping (MALCOM) which recovers the positions of the tongue, jaw, lips, and other speech articulators from measurements of the sound-pressure waveform of speech. MALCOM differs from other techniques for recovering articulator positions from speech in three critical respects: it does not require training on measured or modeled articulator positions, it does not rely on any particular model of sound propagation through the vocal tract, and it recovers a mapping from acoustics to articulator positions that is linearly, not topographically, related to the actual mapping from acoustics to articulation. The approach categorizes short-time windows of speech into a finite number of sound types, and assumes the probability of using any articulator position to produce a given sound type can be described by a parameterized probability density function. MALCOM then uses maximum likelihood estimation techniques to: (1) find the most likely smooth articulator path given a speech sample and a set of distribution functions (one distribution function for each sound type), and (2) change the parameters of the distribution functions to better account for the data. Using this technique improves the accuracy of articulator position estimates compared to continuity mapping -- the only other technique that learns the relationship between acoustics and articulation solely from acoustics. The technique has potential application to computer speech recognition, speech synthesis and coding, teaching the hearing impaired to speak, improving foreign language instruction, and teaching dyslexics to read. 34 refs., 7 figs.

  11. Language-specific developmental differences in speech production: A cross-language acoustic study

    PubMed Central

    Li, Fangfang

    2013-01-01

    Speech productions of 40 English- and 40 Japanese-speaking children (aged 2 to 5) were examined and compared with the speech produced by 20 adult speakers (10 speakers per language). Participants were recorded while repeating words that began with “s” and “sh” sounds. Clear language-specific patterns in adults’ speech were found, with English speakers differentiating “s” and “sh” in one acoustic dimension (i.e., spectral mean) and Japanese speakers differentiating the two categories in three acoustic dimensions (i.e., spectral mean, standard deviation, and onset F2 frequency). For both language groups, children’s speech exhibited a gradual change from an early undifferentiated form to later differentiated categories. The separation processes, however, only occur in those acoustic dimensions used by adults in the corresponding languages. PMID:22540834

  12. Feature extraction and models for speech: An overview

    NASA Astrophysics Data System (ADS)

    Schroeder, Manfred

    2002-11-01

    Modeling of speech has a long history, beginning with Count von Kempelens 1770 mechanical speaking machine. Even then human vowel production was seen as resulting from a source (the vocal chords) driving a physically separate resonator (the vocal tract). Homer Dudley's 1928 frequency-channel vocoder and many of its descendants are based on the same successful source-filter paradigm. For linguistic studies as well as practical applications in speech recognition, compression, and synthesis (see M. R. Schroeder, Computer Speech), the extant models require the (often difficult) extraction of numerous parameters such as the fundamental and formant frequencies and various linguistic distinctive features. Some of these difficulties were obviated by the introduction of linear predictive coding (LPC) in 1967 in which the filter part is an all-pole filter, reflecting the fact that for non-nasalized vowels the vocal tract is well approximated by an all-pole transfer function. In the now ubiquitous code-excited linear prediction (CELP), the source-part is replaced by a code book which (together with a perceptual error criterion) permits speech compression to very low bit rates at high speech quality for the Internet and cell phones.

  13. Intelligibility and acoustic characteristics of clear and conversational speech in telugu (a South Indian dravidian language).

    PubMed

    Durisala, Naresh; Prakash, S G R; Nambi, Arivudai; Batra, Ridhima

    2011-04-01

    The overall goal of this study is to examine the intelligibility differences of clear and conversational speech and also to objectively analyze the acoustic properties contributing to these differences. Seventeen post-lingual stable sensory-neural hearing impaired listeners with an age range of 17-40 years were recruited for the study. Forty Telugu sentences spoken by a female Telugu speaker in both clear and conversational speech styles were used as stimuli for the subjects. Results revealed that mean scores of clear speech were higher (mean = 84.5) when compared to conversational speech (mean = 61.4) with an advantage of 23.1% points. Acoustic properties revealed greater fundamental frequency (f0) and intensity, longer duration, higher consonant-vowel ratio (CVR) and greater temporal energy in clear speech.

  14. A Hybrid Acoustic and Pronunciation Model Adaptation Approach for Non-native Speech Recognition

    NASA Astrophysics Data System (ADS)

    Oh, Yoo Rhee; Kim, Hong Kook

    In this paper, we propose a hybrid model adaptation approach in which pronunciation and acoustic models are adapted by incorporating the pronunciation and acoustic variabilities of non-native speech in order to improve the performance of non-native automatic speech recognition (ASR). Specifically, the proposed hybrid model adaptation can be performed at either the state-tying or triphone-modeling level, depending at which acoustic model adaptation is performed. In both methods, we first analyze the pronunciation variant rules of non-native speakers and then classify each rule as either a pronunciation variant or an acoustic variant. The state-tying level hybrid method then adapts pronunciation models and acoustic models by accommodating the pronunciation variants in the pronunciation dictionary and by clustering the states of triphone acoustic models using the acoustic variants, respectively. On the other hand, the triphone-modeling level hybrid method initially adapts pronunciation models in the same way as in the state-tying level hybrid method; however, for the acoustic model adaptation, the triphone acoustic models are then re-estimated based on the adapted pronunciation models and the states of the re-estimated triphone acoustic models are clustered using the acoustic variants. From the Korean-spoken English speech recognition experiments, it is shown that ASR systems employing the state-tying and triphone-modeling level adaptation methods can relatively reduce the average word error rates (WERs) by 17.1% and 22.1% for non-native speech, respectively, when compared to a baseline ASR system.

  15. Feature characteristics of spontaneous speech production in young deaf children.

    PubMed

    Geffner, D

    1980-12-01

    Sixty-five 6-yr-old deaf children from state supported schools were given an adaptation of the Goldman Fristoe test of articulation to assess their spontaneous speech production. Responses were measured in terms of features of manner, place, voice visibility, position, and error type and compared to imitative samples. A rank order of difficulty for each phoneme, error type, and word position is presented. Results show that of the phonemes, low back vowels, diphthongs, laterals, and voiced consonants were more easily produced. A relationship could be found between fundamental frequency, formant frequency, intensity, and phoneme production, suggesting that these variables and features may be providing the governance underlying the phonological rules in the development of speech in the deaf. Suggestions for training are given.

  16. The acoustics for speech of eight auditoriums in the city of Sao Paulo

    NASA Astrophysics Data System (ADS)

    Bistafa, Sylvio R.

    2002-11-01

    Eight auditoriums with a proscenium type of stage, which usually operate as dramatic theaters in the city of Sao Paulo, were acoustically surveyed in terms of their adequacy to unassisted speech. Reverberation times, early decay times, and speech levels were measured in different positions, together with objective measures of speech intelligibility. The measurements revealed reverberation time values rather uniform throughout the rooms, whereas significant variations were found in the values of the other acoustical measures with position. The early decay time was found to be better correlated with the objective measures of speech intelligibility than the reverberation time. The results from the objective measurements of speech intelligibility revealed that the speech transmission index STI, and its simplified version RaSTI, are strongly correlated with the early-to-late sound ratio C50 (1 kHz). However, it was found that the criterion value of acceptability of the latter is more easily met than the former. The results from these measurements enable to understand how the characteristics of the architectural design determine the acoustical quality for speech. Measurements of ST1-Gade were made as an attempt to validate it as an objective measure of ''support'' for the actor. The preliminary diagnosing results with ray tracing simulations will also be presented.

  17. A magnetic resonance imaging study on the articulatory and acoustic speech parameters of Malay vowels.

    PubMed

    Zourmand, Alireza; Mirhassani, Seyed Mostafa; Ting, Hua-Nong; Bux, Shaik Ismail; Ng, Kwan Hoong; Bilgen, Mehmet; Jalaludin, Mohd Amin

    2014-07-25

    The phonetic properties of six Malay vowels are investigated using magnetic resonance imaging (MRI) to visualize the vocal tract in order to obtain dynamic articulatory parameters during speech production. To resolve image blurring due to the tongue movement during the scanning process, a method based on active contour extraction is used to track tongue contours. The proposed method efficiently tracks tongue contours despite the partial blurring of MRI images. Consequently, the articulatory parameters that are effectively measured as tongue movement is observed, and the specific shape of the tongue and its position for all six uttered Malay vowels are determined.Speech rehabilitation procedure demands some kind of visual perceivable prototype of speech articulation. To investigate the validity of the measured articulatory parameters based on acoustic theory of speech production, an acoustic analysis based on the uttered vowels by subjects has been performed. As the acoustic speech and articulatory parameters of uttered speech were examined, a correlation between formant frequencies and articulatory parameters was observed. The experiments reported a positive correlation between the constriction location of the tongue body and the first formant frequency, as well as a negative correlation between the constriction location of the tongue tip and the second formant frequency. The results demonstrate that the proposed method is an effective tool for the dynamic study of speech production.

  18. Speech and melody recognition in binaurally combined acoustic and electric hearing

    NASA Astrophysics Data System (ADS)

    Kong, Ying-Yee; Stickney, Ginger S.; Zeng, Fan-Gang

    2005-03-01

    Speech recognition in noise and music perception is especially challenging for current cochlear implant users. The present study utilizes the residual acoustic hearing in the nonimplanted ear in five cochlear implant users to elucidate the role of temporal fine structure at low frequencies in auditory perception and to test the hypothesis that combined acoustic and electric hearing produces better performance than either mode alone. The first experiment measured speech recognition in the presence of competing noise. It was found that, although the residual low-frequency (<1000 Hz) acoustic hearing produced essentially no recognition for speech recognition in noise, it significantly enhanced performance when combined with the electric hearing. The second experiment measured melody recognition in the same group of subjects and found that, contrary to the speech recognition result, the low-frequency acoustic hearing produced significantly better performance than the electric hearing. It is hypothesized that listeners with combined acoustic and electric hearing might use the correlation between the salient pitch in low-frequency acoustic hearing and the weak pitch in the envelope to enhance segregation between signal and noise. The present study suggests the importance and urgency of accurately encoding the fine-structure cue in cochlear implants. .

  19. Discrimination of environmental background noise in presence of speech using sample-pairs statistics based features

    NASA Astrophysics Data System (ADS)

    Jhanwar, D.; Sharma, Kamlesh K.; Modani, S. G.

    2015-09-01

    A methodology to discriminate the different classes of background noise using new features based on samples of the signal is presented here. Two consecutive samples of different amplitude of the discretetime signals are termed as sample-pair and 14 types of sample-pairs are considered here as fundamental features. Results of simulation work proves that count of some of such type of sample-pairs as well as count of few combinations of two, three and four such sample-pairs are useful to detect and discriminate the different acoustic noise mixed with speech signals. On the basis of simulation results, the performance of proposed features have proved better than other spectral features like Mel Frequency Cepstral Coefficients (MFCC), Spectral Centroid, Spectral Flux and Spectral Roll-off regarding discrimination capabilities, simplicity of extraction process and lesser dependency over speech utterances mixed with noise. These sample-pairs based features having advantage of not requiring frame-decomposition and silence period removal. Their discrimination capabilities are shown by Fisher's F-ratio as performance index. The multiclass Support Vector Machine (SVM) is used as a classifier.

  20. Changes in speech production in a child with a cochlear implant: acoustic and kinematic evidence.

    PubMed

    Goffman, Lisa; Ertmer, David J; Erdle, Christa

    2002-10-01

    A method is presented for examining change in motor patterns used to produce linguistic contrasts. In this case study, the method is applied to a child receiving new auditory input following cochlear implantation. This child experienced hearing loss at age 3 years and received a multichannel cochlear implant at age 7 years. Data collection points occurred both pre- and postimplant and included acoustic and kinematic analyses. Overall, this child's speech output was transcribed as accurate across the pre- and postimplant periods. Postimplant, with the onset of new auditory experience, acoustic durations showed a predictable maturational change, usually decreasing in duration. Conversely, the spatiotemporal stability of speech movements initially became more variable postimplantation. The auditory perturbations experienced by this child during development led to changes in the physiological underpinnings of speech production, even when speech output was perceived as accurate.

  1. Zebra finches are sensitive to prosodic features of human speech.

    PubMed

    Spierings, Michelle J; ten Cate, Carel

    2014-07-22

    Variation in pitch, amplitude and rhythm adds crucial paralinguistic information to human speech. Such prosodic cues can reveal information about the meaning or emphasis of a sentence or the emotional state of the speaker. To examine the hypothesis that sensitivity to prosodic cues is language independent and not human specific, we tested prosody perception in a controlled experiment with zebra finches. Using a go/no-go procedure, subjects were trained to discriminate between speech syllables arranged in XYXY patterns with prosodic stress on the first syllable and XXYY patterns with prosodic stress on the final syllable. To systematically determine the salience of the various prosodic cues (pitch, duration and amplitude) to the zebra finches, they were subjected to five tests with different combinations of these cues. The zebra finches generalized the prosodic pattern to sequences that consisted of new syllables and used prosodic features over structural ones to discriminate between stimuli. This strong sensitivity to the prosodic pattern was maintained when only a single prosodic cue was available. The change in pitch was treated as more salient than changes in the other prosodic features. These results show that zebra finches are sensitive to the same prosodic cues known to affect human speech perception.

  2. Excavation Equipment Recognition Based on Novel Acoustic Statistical Features.

    PubMed

    Cao, Jiuwen; Wang, Wei; Wang, Jianzhong; Wang, Ruirong

    2016-09-30

    Excavation equipment recognition attracts increasing attentions in recent years due to its significance in underground pipeline network protection and civil construction management. In this paper, a novel classification algorithm based on acoustics processing is proposed for four representative excavation equipments. New acoustic statistical features, namely, the short frame energy ratio, concentration of spectrum amplitude ratio, truncated energy range, and interval of pulse are first developed to characterize acoustic signals. Then, probability density distributions of these acoustic features are analyzed and a novel classifier is presented. Experiments on real recorded acoustics of the four excavation devices are conducted to demonstrate the effectiveness of the proposed algorithm. Comparisons with two popular machine learning methods, support vector machine and extreme learning machine, combined with the popular linear prediction cepstral coefficients are provided to show the generalization capability of our method. A real surveillance system using our algorithm is developed and installed in a metro construction site for real-time recognition performance validation.

  3. Mathematical model of acoustic speech production with mobile walls of the vocal tract

    NASA Astrophysics Data System (ADS)

    Lyubimov, N. A.; Zakharov, E. V.

    2016-03-01

    A mathematical speech production model is considered that describes acoustic oscillation propagation in a vocal tract with mobile walls. The wave field function satisfies the Helmholtz equation with boundary conditions of the third kind (impedance type). The impedance mode corresponds to a threeparameter pendulum oscillation model. The experimental research demonstrates the nonlinear character of how the mobility of the vocal tract walls influence the spectral envelope of a speech signal.

  4. Speech privacy and annoyance considerations in the acoustic environment of passenger cars of high-speed trains.

    PubMed

    Jeon, Jin Yong; Hong, Joo Young; Jang, Hyung Suk; Kim, Jae Hyeon

    2015-12-01

    It is necessary to consider not only annoyance of interior noises but also speech privacy to achieve acoustic comfort in a passenger car of a high-speed train because speech from other passengers can be annoying. This study aimed to explore an optimal acoustic environment to satisfy speech privacy and reduce annoyance in a passenger car. Two experiments were conducted using speech sources and compartment noise of a high speed train with varying speech-to-noise ratios (SNRA) and background noise levels (BNL). Speech intelligibility was tested in experiment I, and in experiment II, perceived speech privacy, annoyance, and acoustic comfort of combined sounds with speech and background noise were assessed. The results show that speech privacy and annoyance were significantly influenced by the SNRA. In particular, the acoustic comfort was evaluated as acceptable when the SNRA was less than -6 dB for both speech privacy and noise annoyance. In addition, annoyance increased significantly as the BNL exceeded 63 dBA, whereas the effect of the background-noise level on the speech privacy was not significant. These findings suggest that an optimal level of interior noise in a passenger car might exist between 59 and 63 dBA, taking normal speech levels into account.

  5. Prosodic influences on speech production in children with specific language impairment and speech deficits: kinematic, acoustic, and transcription evidence.

    PubMed

    Goffman, L

    1999-12-01

    It is often hypothesized that young children's difficulties with producing weak-strong (iambic) prosodic forms arise from perceptual or linguistically based production factors. A third possible contributor to errors in the iambic form may be biological constraints, or biases, of the motor system. In the present study, 7 children with specific language impairment (SLI) and speech deficits were matched to same age peers. Multiple levels of analysis, including kinematic (modulation and stability of movement), acoustic, and transcription, were applied to children's productions of iambic (weak-strong) and trochaic (strong-weak) prosodic forms. Findings suggest that a motor bias toward producing unmodulated rhythmic articulatory movements, similar to that observed in canonical babbling, contribute to children's acquisition of metrical forms. Children with SLI and speech deficits show less mature segmental and speech motor systems, as well as decreased modulation of movement in later developing iambic forms. Further, components of prosodic and segmental acquisition develop independently and at different rates.

  6. Precategorical Acoustic Storage and the Perception of Speech

    ERIC Educational Resources Information Center

    Frankish, Clive

    2008-01-01

    Theoretical accounts of both speech perception and of short term memory must consider the extent to which perceptual representations of speech sounds might survive in relatively unprocessed form. This paper describes a novel version of the serial recall task that can be used to explore this area of shared interest. In immediate recall of digit…

  7. Vowel Acoustics in Adults with Apraxia of Speech

    ERIC Educational Resources Information Center

    Jacks, Adam; Mathes, Katey A.; Marquardt, Thomas P.

    2010-01-01

    Purpose: To investigate the hypothesis that vowel production is more variable in adults with acquired apraxia of speech (AOS) relative to healthy individuals with unimpaired speech. Vowel formant frequency measures were selected as the specific target of focus. Method: Seven adults with AOS and aphasia produced 15 repetitions of 6 American English…

  8. Acoustic features of objects matched by an echolocating bottlenose dolphin.

    PubMed

    Delong, Caroline M; Au, Whitlow W L; Lemonds, David W; Harley, Heidi E; Roitblat, Herbert L

    2006-03-01

    The focus of this study was to investigate how dolphins use acoustic features in returning echolocation signals to discriminate among objects. An echolocating dolphin performed a match-to-sample task with objects that varied in size, shape, material, and texture. After the task was completed, the features of the object echoes were measured (e.g., target strength, peak frequency). The dolphin's error patterns were examined in conjunction with the between-object variation in acoustic features to identify the acoustic features that the dolphin used to discriminate among the objects. The present study explored two hypotheses regarding the way dolphins use acoustic information in echoes: (1) use of a single feature, or (2) use of a linear combination of multiple features. The results suggested that dolphins do not use a single feature across all object sets or a linear combination of six echo features. Five features appeared to be important to the dolphin on four or more sets: the echo spectrum shape, the pattern of changes in target strength and number of highlights as a function of object orientation, and peak and center frequency. These data suggest that dolphins use multiple features and integrate information across echoes from a range of object orientations.

  9. Perceiving speech in context: Compensation for contextual variability during acoustic cue encoding and categorization

    NASA Astrophysics Data System (ADS)

    Toscano, Joseph Christopher

    Several fundamental questions about speech perception concern how listeners understand spoken language despite considerable variability in speech sounds across different contexts (the problem of lack of invariance in speech). This contextual variability is caused by several factors, including differences between individual talkers' voices, variation in speaking rate, and effects of coarticulatory context. A number of models have been proposed to describe how the speech system handles differences across contexts. Critically, these models make different predictions about (1) whether contextual variability is handled at the level of acoustic cue encoding or categorization, (2) whether it is driven by feedback from category-level processes or interactions between cues, and (3) whether listeners discard fine-grained acoustic information to compensate for contextual variability. Separating the effects of cue- and category-level processing has been difficult because behavioral measures tap processes that occur well after initial cue encoding and are influenced by task demands and linguistic information. Recently, we have used the event-related brain potential (ERP) technique to examine cue encoding and online categorization. Specifically, we have looked at differences in the auditory N1 as a measure of acoustic cue encoding and the P3 as a measure of categorization. This allows us to examine multiple levels of processing during speech perception and can provide a useful tool for studying effects of contextual variability. Here, I apply this approach to determine the point in processing at which context has an effect on speech perception and to examine whether acoustic cues are encoded continuously. Several types of contextual variability (talker gender, speaking rate, and coarticulation), as well as several acoustic cues (voice onset time, formant frequencies, and bandwidths), are examined in a series of experiments. The results suggest that (1) at early stages of speech

  10. Acoustical features of two Mayan monuments at Chichen Itza: Accident or design?

    NASA Astrophysics Data System (ADS)

    Lubman, David

    2002-11-01

    Chichen Itza dominated the early postclassic Maya world, ca. 900-1200 C.E. Two of its colossal monuments, the Great Ball Court and the temple of Kukulkan, reflect the sophisticated, hybrid culture of a Mexicanized Maya civilization. The architecture seems intended for ceremony and ritual drama. Deducing ritual practices will advance the understanding of a lost civilization, but what took place there is largely unknown. Perhaps acoustical science can add value. Unexpected and unusual acoustical features can be interpreted as intriguing clues or irrelevant accidents. Acoustical advocates believe that, when combined with an understanding of the Maya worldview, acoustical features can provide unique insights into how the Maya designed and used theater spaces. At Chichen Itza's monuments, sound reinforcement features improve rulers and priests ability to address large crowds, and Ball Court whispering galleries permit speech communication over unexpectedly large distances. Handclaps at Kukulkan stimulate chirps that mimic a revered bird (''Kukul''), thus reinforcing cultic beliefs. A ball striking playing field wall stimulates flutter echoes at the Great Ball Court; their strength and duration arguably had dramatic, mythic, and practical significance. Interpretations of the possible mythic, magic, and political significance of sound phenomena at these Maya monuments strongly suggests intentional design.

  11. Acoustic Markers of Prominence Influence Infants' and Adults' Segmentation of Speech Sequences

    ERIC Educational Resources Information Center

    Bion, Ricardo A. H.; Benavides-Varela, Silvia; Nespor, Marina

    2011-01-01

    Two experiments investigated the way acoustic markers of prominence influence the grouping of speech sequences by adults and 7-month-old infants. In the first experiment, adults were familiarized with and asked to memorize sequences of adjacent syllables that alternated in either pitch or duration. During the test phase, participants heard pairs…

  12. Effects of contextual cues on speech recognition in simulated electric-acoustic stimulation

    PubMed Central

    Kong, Ying-Yee; Donaldson, Gail; Somarowthu, Ala

    2015-01-01

    Low-frequency acoustic cues have shown to improve speech perception in cochlear-implant listeners. However, the mechanisms underlying this benefit are still not well understood. This study investigated the extent to which low-frequency cues can facilitate listeners' use of linguistic knowledge in simulated electric-acoustic stimulation (EAS). Experiment 1 examined differences in the magnitude of EAS benefit at the phoneme, word, and sentence levels. Speech materials were processed via noise-channel vocoding and lowpass (LP) filtering. The amount of spectral degradation in the vocoder speech was varied by applying different numbers of vocoder channels. Normal-hearing listeners were tested on vocoder-alone, LP-alone, and vocoder + LP conditions. Experiment 2 further examined factors that underlie the context effect on EAS benefit at the sentence level by limiting the low-frequency cues to temporal envelope and periodicity (AM + FM). Results showed that EAS benefit was greater for higher-context than for lower-context speech materials even when the LP ear received only low-frequency AM + FM cues. Possible explanations for the greater EAS benefit observed with higher-context materials may lie in the interplay between perceptual and expectation-driven processes for EAS speech recognition, and/or the band-importance functions for different types of speech materials. PMID:25994712

  13. Effects of contextual cues on speech recognition in simulated electric-acoustic stimulation.

    PubMed

    Kong, Ying-Yee; Donaldson, Gail; Somarowthu, Ala

    2015-05-01

    Low-frequency acoustic cues have shown to improve speech perception in cochlear-implant listeners. However, the mechanisms underlying this benefit are still not well understood. This study investigated the extent to which low-frequency cues can facilitate listeners' use of linguistic knowledge in simulated electric-acoustic stimulation (EAS). Experiment 1 examined differences in the magnitude of EAS benefit at the phoneme, word, and sentence levels. Speech materials were processed via noise-channel vocoding and lowpass (LP) filtering. The amount of spectral degradation in the vocoder speech was varied by applying different numbers of vocoder channels. Normal-hearing listeners were tested on vocoder-alone, LP-alone, and vocoder + LP conditions. Experiment 2 further examined factors that underlie the context effect on EAS benefit at the sentence level by limiting the low-frequency cues to temporal envelope and periodicity (AM + FM). Results showed that EAS benefit was greater for higher-context than for lower-context speech materials even when the LP ear received only low-frequency AM + FM cues. Possible explanations for the greater EAS benefit observed with higher-context materials may lie in the interplay between perceptual and expectation-driven processes for EAS speech recognition, and/or the band-importance functions for different types of speech materials.

  14. Contributions of electric and acoustic hearing to bimodal speech and music perception.

    PubMed

    Crew, Joseph D; Galvin, John J; Landsberger, David M; Fu, Qian-Jie

    2015-01-01

    Cochlear implant (CI) users have difficulty understanding speech in noisy listening conditions and perceiving music. Aided residual acoustic hearing in the contralateral ear can mitigate these limitations. The present study examined contributions of electric and acoustic hearing to speech understanding in noise and melodic pitch perception. Data was collected with the CI only, the hearing aid (HA) only, and both devices together (CI+HA). Speech reception thresholds (SRTs) were adaptively measured for simple sentences in speech babble. Melodic contour identification (MCI) was measured with and without a masker instrument; the fundamental frequency of the masker was varied to be overlapping or non-overlapping with the target contour. Results showed that the CI contributes primarily to bimodal speech perception and that the HA contributes primarily to bimodal melodic pitch perception. In general, CI+HA performance was slightly improved relative to the better ear alone (CI-only) for SRTs but not for MCI, with some subjects experiencing a decrease in bimodal MCI performance relative to the better ear alone (HA-only). Individual performance was highly variable, and the contribution of either device to bimodal perception was both subject- and task-dependent. The results suggest that individualized mapping of CIs and HAs may further improve bimodal speech and music perception.

  15. Contributions of Electric and Acoustic Hearing to Bimodal Speech and Music Perception

    PubMed Central

    Crew, Joseph D.; Galvin III, John J.; Landsberger, David M.; Fu, Qian-Jie

    2015-01-01

    Cochlear implant (CI) users have difficulty understanding speech in noisy listening conditions and perceiving music. Aided residual acoustic hearing in the contralateral ear can mitigate these limitations. The present study examined contributions of electric and acoustic hearing to speech understanding in noise and melodic pitch perception. Data was collected with the CI only, the hearing aid (HA) only, and both devices together (CI+HA). Speech reception thresholds (SRTs) were adaptively measured for simple sentences in speech babble. Melodic contour identification (MCI) was measured with and without a masker instrument; the fundamental frequency of the masker was varied to be overlapping or non-overlapping with the target contour. Results showed that the CI contributes primarily to bimodal speech perception and that the HA contributes primarily to bimodal melodic pitch perception. In general, CI+HA performance was slightly improved relative to the better ear alone (CI-only) for SRTs but not for MCI, with some subjects experiencing a decrease in bimodal MCI performance relative to the better ear alone (HA-only). Individual performance was highly variable, and the contribution of either device to bimodal perception was both subject- and task-dependent. The results suggest that individualized mapping of CIs and HAs may further improve bimodal speech and music perception. PMID:25790349

  16. Influence of Architectural Features and Styles on Various Acoustical Measures in Churches

    NASA Astrophysics Data System (ADS)

    Carvalho, Antonio Pedro Oliveira De.

    This work reports on acoustical field measurements made in a major survey of 41 Catholic churches in Portugal that were built in the last 14 centuries. A series of monaural and binaural acoustical measurements was taken at multiple source/receiver positions in each church using the impulse response with noise burst method. The acoustical measures were Reverberation Time (RT), Early Decay Time (EDT), Clarity (C80), Definition (D), Center Time (TS), Loudness (L), Bass Ratios based on the Reverberation Time and Loudness rm (BR_-RT and rm BR_-L), Rapid Speech Transmission Index (RASTI), and the binaural Coherence (COH). The scope of this research is to investigate how the acoustical performance of Catholic churches relates to their architectural features and to determine simple formulas to predict acoustical measures by the use of elementary architectural parameters. Prediction equations were defined among the acoustical measures to estimate values at individual locations within each room as well as the mean values in each church. Best fits with rm R^2~0.9 were not uncommon among many of the measures. Within and interchurch differences in the data for the acoustical measures were also analyzed. The variations of RT and EDT were identified as much smaller than the variations of the other measures. The churches tested were grouped in eight architectural styles, and the effect of their evolution through time on these acoustical measures was investigated. Statistically significant differences were found regarding some architectural styles that can be traced to historical changes in Church history, especially to the Reformation period. Prediction equations were defined to estimate mean acoustical measures by the use of fifteen simple architectural parameters. The use of the Sabine and Eyring reverberation time equations was tested. The effect of coupled spaces was analyzed, and a new algorithm for the application of the Sabine equation was developed, achieving an average of

  17. Distinctive-feature analyses of the speech of deaf children.

    PubMed

    Mencke, E O; Ochsner, G J; Testut, E W

    1985-07-01

    22 children aged 8.5 through 15.5 yrs with HTLs greater than or equal to 90 db in the better ear spoke a carrier phrase before each of 41 monosyllables containing each an initial and a final consonant (23 consonants were represented). Each S repeated the 41-word list 10 times. Speech samples were recorded simultaneously but independently in audio-only and in audio-visual modes, and transcribed by 3 judges using each mode separately. Percent correct speaker-subjects' utterances of target consonants in initial and in final word-positions were scored for presence or absence of distinctive features according to the systems of Chomsky and Halle (1968) and of Fisher and Logemann (1971). Consistently higher correct feature usage was noted for target consonants in the initial rather than in the final word-position for both systems. Further, higher scores were obtained when transcribers could see as well as hear the speaker, but correct usage of a feature was not uniformly a function of the visibility of that feature. Finally, there was no significant increase in correct feature usage as a function of speaker age.

  18. Classifying acoustic signals into phoneme categories: average and dyslexic readers make use of complex dynamical patterns and multifractal scaling properties of the speech signal

    PubMed Central

    2015-01-01

    Several competing aetiologies of developmental dyslexia suggest that the problems with acquiring literacy skills are causally entailed by low-level auditory and/or speech perception processes. The purpose of this study is to evaluate the diverging claims about the specific deficient peceptual processes under conditions of strong inference. Theoretically relevant acoustic features were extracted from a set of artificial speech stimuli that lie on a /bAk/-/dAk/ continuum. The features were tested on their ability to enable a simple classifier (Quadratic Discriminant Analysis) to reproduce the observed classification performance of average and dyslexic readers in a speech perception experiment. The ‘classical’ features examined were based on component process accounts of developmental dyslexia such as the supposed deficit in Envelope Rise Time detection and the deficit in the detection of rapid changes in the distribution of energy in the frequency spectrum (formant transitions). Studies examining these temporal processing deficit hypotheses do not employ measures that quantify the temporal dynamics of stimuli. It is shown that measures based on quantification of the dynamics of complex, interaction-dominant systems (Recurrence Quantification Analysis and the multifractal spectrum) enable QDA to classify the stimuli almost identically as observed in dyslexic and average reading participants. It seems unlikely that participants used any of the features that are traditionally associated with accounts of (impaired) speech perception. The nature of the variables quantifying the temporal dynamics of the speech stimuli imply that the classification of speech stimuli cannot be regarded as a linear aggregate of component processes that each parse the acoustic signal independent of one another, as is assumed by the ‘classical’ aetiologies of developmental dyslexia. It is suggested that the results imply that the differences in speech perception performance between

  19. Measurements of speech intelligibility in common rooms for older adults as a first step towards acoustical guidelines.

    PubMed

    Reinten, Jikke; van Hout, Nicole; Hak, Constant; Kort, Helianthe

    2015-01-01

    Adapting the built environment to the needs of nursing- or care-home residents has become common practice. Even though hearing loss due to ageing is a normal occurring biological process, little research has been performed on the effects of room acoustic parameters on the speech intelligibility for older adults. This article presents the results of room acoustic measurements in common rooms for older adults and the effect on speech intelligibility. Perceived speech intelligibility amongst the users of the rooms was also investigated. The results have led to ongoing research at Utrecht University of Applied Sciences and Eindhoven University of Technology, aimed at the development of acoustical guidelines for elderly care facilities.

  20. The advantages of sound localization and speech perception of bilateral electric acoustic stimulation

    PubMed Central

    Moteki, Hideaki; Kitoh, Ryosuke; Tsukada, Keita; Iwasaki, Satoshi; Nishio, Shin-Ya

    2015-01-01

    Conclusion: Bilateral electric acoustic stimulation (EAS) effectively improved speech perception in noise and sound localization in patients with high-frequency hearing loss. Objective: To evaluate bilateral EAS efficacy of sound localization detection and speech perception in noise in two cases of high-frequency hearing loss. Methods: Two female patients, aged 38 and 45 years, respectively, received bilateral EAS sequentially. Pure-tone audiometry was performed preoperatively and postoperatively to evaluate the hearing preservation in the lower frequencies. Speech perception outcomes in quiet and noise and sound localization were assessed with unilateral and bilateral EAS. Results: Residual hearing in the lower frequencies was well preserved after insertion of a FLEX24 electrode (24 mm) using the round window approach. After bilateral EAS, speech perception improved in quiet and even more so in noise. In addition, the sound localization ability of both cases with bilateral EAS improved remarkably. PMID:25423260

  1. Vowel Acoustics in Dysarthria: Speech Disorder Diagnosis and Classification

    ERIC Educational Resources Information Center

    Lansford, Kaitlin L.; Liss, Julie M.

    2014-01-01

    Purpose: The purpose of this study was to determine the extent to which vowel metrics are capable of distinguishing healthy from dysarthric speech and among different forms of dysarthria. Method: A variety of vowel metrics were derived from spectral and temporal measurements of vowel tokens embedded in phrases produced by 45 speakers with…

  2. Noise Robust Feature Scheme for Automatic Speech Recognition Based on Auditory Perceptual Mechanisms

    NASA Astrophysics Data System (ADS)

    Cai, Shang; Xiao, Yeming; Pan, Jielin; Zhao, Qingwei; Yan, Yonghong

    Mel Frequency Cepstral Coefficients (MFCC) are the most popular acoustic features used in automatic speech recognition (ASR), mainly because the coefficients capture the most useful information of the speech and fit well with the assumptions used in hidden Markov models. As is well known, MFCCs already employ several principles which have known counterparts in the peripheral properties of human hearing: decoupling across frequency, mel-warping of the frequency axis, log-compression of energy, etc. It is natural to introduce more mechanisms in the auditory periphery to improve the noise robustness of MFCC. In this paper, a k-nearest neighbors based frequency masking filter is proposed to reduce the audibility of spectra valleys which are sensitive to noise. Besides, Moore and Glasberg's critical band equivalent rectangular bandwidth (ERB) expression is utilized to determine the filter bandwidth. Furthermore, a new bandpass infinite impulse response (IIR) filter is proposed to imitate the temporal masking phenomenon of the human auditory system. These three auditory perceptual mechanisms are combined with the standard MFCC algorithm in order to investigate their effects on ASR performance, and a revised MFCC extraction scheme is presented. Recognition performances with the standard MFCC, RASTA perceptual linear prediction (RASTA-PLP) and the proposed feature extraction scheme are evaluated on a medium-vocabulary isolated-word recognition task and a more complex large vocabulary continuous speech recognition (LVCSR) task. Experimental results show that consistent robustness against background noise is achieved on these two tasks, and the proposed method outperforms both the standard MFCC and RASTA-PLP.

  3. Separable spectro-temporal Gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition.

    PubMed

    Schädler, Marc René; Kollmeier, Birger

    2015-04-01

    To test if simultaneous spectral and temporal processing is required to extract robust features for automatic speech recognition (ASR), the robust spectro-temporal two-dimensional-Gabor filter bank (GBFB) front-end from Schädler, Meyer, and Kollmeier [J. Acoust. Soc. Am. 131, 4134-4151 (2012)] was de-composed into a spectral one-dimensional-Gabor filter bank and a temporal one-dimensional-Gabor filter bank. A feature set that is extracted with these separate spectral and temporal modulation filter banks was introduced, the separate Gabor filter bank (SGBFB) features, and evaluated on the CHiME (Computational Hearing in Multisource Environments) keywords-in-noise recognition task. From the perspective of robust ASR, the results showed that spectral and temporal processing can be performed independently and are not required to interact with each other. Using SGBFB features permitted the signal-to-noise ratio (SNR) to be lowered by 1.2 dB while still performing as well as the GBFB-based reference system, which corresponds to a relative improvement of the word error rate by 12.8%. Additionally, the real time factor of the spectro-temporal processing could be reduced by more than an order of magnitude. Compared to human listeners, the SNR needed to be 13 dB higher when using Mel-frequency cepstral coefficient features, 11 dB higher when using GBFB features, and 9 dB higher when using SGBFB features to achieve the same recognition performance.

  4. A Statistical Model-Based Speech Enhancement Using Acoustic Noise Classification for Robust Speech Communication

    NASA Astrophysics Data System (ADS)

    Choi, Jae-Hun; Chang, Joon-Hyuk

    In this paper, we present a speech enhancement technique based on the ambient noise classification that incorporates the Gaussian mixture model (GMM). The principal parameters of the statistical model-based speech enhancement algorithm such as the weighting parameter in the decision-directed (DD) method and the long-term smoothing parameter of the noise estimation, are set according to the classified context to ensure best performance under each noise. For real-time context awareness, the noise classification is performed on a frame-by-frame basis using the GMM with the soft decision framework. The speech absence probability (SAP) is used in detecting the speech absence periods and updating the likelihood of the GMM.

  5. Effects of cognitive workload on speech production: acoustic analyses and perceptual consequences.

    PubMed

    Lively, S E; Pisoni, D B; Van Summers, W; Bernacki, R H

    1993-05-01

    The present investigation examined the effects of cognitive workload on speech production. Workload was manipulated by having talkers perform a compensatory visual tracking task while speaking test sentences of the form "Say hVd again." Acoustic measurements were made to compare utterances produced under workload with the same utterances produced in a control condition. In the workload condition, some talkers produced utterances with increased amplitude and amplitude variability, decreased spectral tilt and F0 variability and increased speaking rate. No changes in F1, F2, or F3 were observed across conditions for any of the talkers. These findings indicate both laryngeal and sublaryngeal adjustments in articulation, as well as modifications in the absolute timing of articulatory gestures. The results of a perceptual identification experiment paralleled the acoustic measurements. Small but significant advantages in intelligibility were observed for utterances produced under workload for talkers who showed robust changes in speech production. Changes in amplitude and amplitude variability for utterances produced under workload appeared to be the major factor controlling intelligibility. The results of the present investigation support the assumptions of Lindblom's ["Explaining phonetic variation: A sketch of the H&H theory," in Speech Production and Speech Modeling (Klewer Academic, The Netherlands, 1990)] H&H model: Talkers adapt their speech to suit the demands of the environment and these modifications are designed to maximize intelligibility.

  6. Prosodic Features and Speech Naturalness in Individuals with Dysarthria

    ERIC Educational Resources Information Center

    Klopfenstein, Marie I.

    2012-01-01

    Despite the importance of speech naturalness to treatment outcomes, little research has been done on what constitutes speech naturalness and how to best maximize naturalness in relationship to other treatment goals like intelligibility. In addition, previous literature alludes to the relationship between prosodic aspects of speech and speech…

  7. Effects of age, acoustic challenge, and verbal working memory on recall of narrative speech

    PubMed Central

    Ward, Caitlin M.; Rogers, Chad S.; Van Engen, Kristin J.; Peelle, Jonathan E.

    2016-01-01

    Background A common goal during speech comprehension is to remember what we have heard. Encoding speech into long-term memory frequently requires processes such as verbal working memory that may also be involved in processing degraded speech. Here we tested whether young and older adult listeners’ memory for short stories was worse when the stories were acoustically degraded, or whether the additional contextual support provided by a narrative would protect against these effects. Methods We tested 30 young adults (aged 18–28 years) and 30 older adults (aged 65–79 years) with good self-reported hearing. Participants heard short stories that were presented as normal (unprocessed) speech, or acoustically degraded using a noise vocoding algorithm with 24 or 16 channels. The degraded stories were still fully intelligible. Following each story, participants were asked to repeat the story in as much detail as possible. Recall was scored using a modified idea unit scoring approach, which included separately scoring hierarchical levels of narrative detail. Results Memory for acoustically degraded stories was significantly worse than for normal stories at some levels of narrative detail. Older adults’ memory for the stories was significantly worse overall, but there was no interaction between age and acoustic clarity or level of narrative detail. Verbal working memory (assessed by reading span) significantly correlated with recall accuracy for both young and older adults, whereas hearing ability (better ear pure-tone average) did not. Conclusion Our findings are consistent with a framework in which the additional cognitive demands caused by a degraded acoustic signal use resources that would otherwise be available for memory encoding for both young and older adults. Verbal working memory is a likely candidate for supporting both of these processes. PMID:26683044

  8. Acoustic and auditory phonetics: the adaptive design of speech sound systems.

    PubMed

    Diehl, Randy L

    2008-03-12

    Speech perception is remarkably robust. This paper examines how acoustic and auditory properties of vowels and consonants help to ensure intelligibility. First, the source-filter theory of speech production is briefly described, and the relationship between vocal-tract properties and formant patterns is demonstrated for some commonly occurring vowels. Next, two accounts of the structure of preferred sound inventories, quantal theory and dispersion theory, are described and some of their limitations are noted. Finally, it is suggested that certain aspects of quantal and dispersion theories can be unified in a principled way so as to achieve reasonable predictive accuracy.

  9. Subphonetic Acoustic Modeling for Speaker-Independent Continuous Speech Recognition

    DTIC Science & Technology

    1993-12-17

    concept of senone sharing across all hidden Markov models, such as triphones, multi-phones, words, or even phrase models ................. 50 3.15 The...For instance, training the 50 phone HMMs for English usually requires only 1-2 hours of training data, while to sufficiently train syllable models may...require 50 hours of speech. Faced with a limited amount of training data, the advantage of the improved structure of the stochastic model may not be

  10. From prosodic structure to acoustic saliency: A fMRI investigation of speech rate, clarity, and emphasis

    NASA Astrophysics Data System (ADS)

    Golfinopoulos, Elisa

    Acoustic variability in fluent speech can arise at many stages in speech production planning and execution. For example, at the phonological encoding stage, the grouping of phonemes into syllables determines which segments are coarticulated and, by consequence, segment-level acoustic variation. Likewise phonetic encoding, which determines the spatiotemporal extent of articulatory gestures, will affect the acoustic detail of segments. Functional magnetic resonance imaging (fMRI) was used to measure brain activity of fluent adult speakers in four speaking conditions: fast, normal, clear, and emphatic (or stressed) speech. These speech manner changes typically result in acoustic variations that do not change the lexical or semantic identity of productions but do affect the acoustic saliency of phonemes, syllables and/or words. Acoustic responses recorded inside the scanner were assessed quantitatively using eight acoustic measures and sentence duration was used as a covariate of non-interest in the neuroimaging analysis. Compared to normal speech, emphatic speech was characterized acoustically by a greater difference between stressed and unstressed vowels in intensity, duration, and fundamental frequency, and neurally by increased activity in right middle premotor cortex and supplementary motor area, and bilateral primary sensorimotor cortex. These findings are consistent with right-lateralized motor planning of prosodic variation in emphatic speech. Clear speech involved an increase in average vowel and sentence durations and average vowel spacing, along with increased activity in left middle premotor cortex and bilateral primary sensorimotor cortex. These findings are consistent with an increased reliance on feedforward control, resulting in hyper-articulation, under clear as compared to normal speech. Fast speech was characterized acoustically by reduced sentence duration and average vowel spacing, and neurally by increased activity in left anterior frontal

  11. Accuracy of perceptual and acoustic methods for the detection of inspiratory loci in spontaneous speech.

    PubMed

    Wang, Yu-Tsai; Nip, Ignatius S B; Green, Jordan R; Kent, Ray D; Kent, Jane Finley; Ullman, Cara

    2012-12-01

    The present study investigates the accuracy of perceptually and acoustically determined inspiratory loci in spontaneous speech for the purpose of identifying breath groups. Sixteen participants were asked to talk about simple topics in daily life at a comfortable speaking rate and loudness while connected to a pneumotach and audio microphone. The locations of inspiratory loci were determined on the basis of the aerodynamic signal, which served as a reference for loci identified perceptually and acoustically. Signal detection theory was used to evaluate the accuracy of the methods. The results showed that the greatest accuracy in pause detection was achieved (1) perceptually, on the basis of agreement between at least two of three judges, and (2) acoustically, using a pause duration threshold of 300 ms. In general, the perceptually based method was more accurate than was the acoustically based method. Inconsistencies among perceptually determined, acoustically determined, and aerodynamically determined inspiratory loci for spontaneous speech should be weighed in selecting a method of breath group determination.

  12. Speech after Radial Forearm Free Flap Reconstruction of the Tongue: A Longitudinal Acoustic Study of Vowel and Diphthong Sounds

    ERIC Educational Resources Information Center

    Laaksonen, Juha-Pertti; Rieger, Jana; Happonen, Risto-Pekka; Harris, Jeffrey; Seikaly, Hadi

    2010-01-01

    The purpose of this study was to use acoustic analyses to describe speech outcomes over the course of 1 year after radial forearm free flap (RFFF) reconstruction of the tongue. Eighteen Canadian English-speaking females and males with reconstruction for oral cancer had speech samples recorded (pre-operative, and 1 month, 6 months, and 1 year…

  13. Acoustic-Phonetic Differences between Infant- and Adult-Directed Speech: The Role of Stress and Utterance Position

    ERIC Educational Resources Information Center

    Wang, Yuanyuan; Seidl, Amanda; Cristia, Alejandrina

    2015-01-01

    Previous studies have shown that infant-directed speech (IDS) differs from adult-directed speech (ADS) on a variety of dimensions. The aim of the current study was to investigate whether acoustic differences between IDS and ADS in English are modulated by prosodic structure. We compared vowels across the two registers (IDS, ADS) in both stressed…

  14. An eighth-scale speech source for subjective assessments in acoustic models

    NASA Astrophysics Data System (ADS)

    Orlowski, R. J.

    1981-08-01

    The design of a source is described which is suitable for making speech recordings in eighth-scale acoustic models of auditoria. An attempt was made to match the directionality of the source with the directionality of the human voice using data reported in the literature. A narrow aperture was required for the design which was provided by mounting an inverted conical horn over the diaphragm of a high frequency loudspeaker. Resonance problems were encountered with the use of a horn and a description is given of the electronic techniques adopted to minimize the effect of these resonances. Subjective and objective assessments on the completed speech source have proved satisfactory. It has been used in a modelling exercise concerned with the acoustic design of a theatre with a thrust-type stage.

  15. The Acoustic-Modeling Problem in Automatic Speech Recognition.

    DTIC Science & Technology

    1987-12-01

    systems that use an artificial grammar do so in order to set this uncertainty by fiat, thereby ensuring that their task, will not be too difficult...an artificial grammar , the Pr (W = w)’s are known and Hm (W) can, in fact, achieve its lower bound if the system simply uses these probabilities. In a...finite-state grammar represented by that chain. As Jim Baker points out, the modeling of speech by a hidden Markov model should not be regarded as a

  16. Monaural speech intelligibility and detection in maskers with varying amounts of spectro-temporal speech features.

    PubMed

    Schubotz, Wiebke; Brand, Thomas; Kollmeier, Birger; Ewert, Stephan D

    2016-07-01

    Speech intelligibility is strongly affected by the presence of maskers. Depending on the spectro-temporal structure of the masker and its similarity to the target speech, different masking aspects can occur which are typically referred to as energetic, amplitude modulation, and informational masking. In this study speech intelligibility and speech detection was measured in maskers that vary systematically in the time-frequency domain from steady-state noise to a single interfering talker. Male and female target speech was used in combination with maskers based on speech for the same or different gender. Observed data were compared to predictions of the speech intelligibility index, extended speech intelligibility index, multi-resolution speech-based envelope-power-spectrum model, and the short-time objective intelligibility measure. The different models served as analysis tool to help distinguish between the different masking aspects. Comparison shows that overall masking can to a large extent be explained by short-term energetic masking. However, the other masking aspects (amplitude modulation an informational masking) influence speech intelligibility as well. Additionally, it was obvious that all models showed considerable deviations from the data. Therefore, the current study provides a benchmark for further evaluation of speech prediction models.

  17. Are you a good mimic? Neuro-acoustic signatures for speech imitation ability.

    PubMed

    Reiterer, Susanne M; Hu, Xiaochen; Sumathi, T A; Singh, Nandini C

    2013-01-01

    We investigated individual differences in speech imitation ability in late bilinguals using a neuro-acoustic approach. One hundred and thirty-eight German-English bilinguals matched on various behavioral measures were tested for "speech imitation ability" in a foreign language, Hindi, and categorized into "high" and "low ability" groups. Brain activations and speech recordings were obtained from 26 participants from the two extreme groups as they performed a functional neuroimaging experiment which required them to "imitate" sentences in three conditions: (A) German, (B) English, and (C) German with fake English accent. We used recently developed novel acoustic analysis, namely the "articulation space" as a metric to compare speech imitation abilities of the two groups. Across all three conditions, direct comparisons between the two groups, revealed brain activations (FWE corrected, p < 0.05) that were more widespread with significantly higher peak activity in the left supramarginal gyrus and postcentral areas for the low ability group. The high ability group, on the other hand showed significantly larger articulation space in all three conditions. In addition, articulation space also correlated positively with imitation ability (Pearson's r = 0.7, p < 0.01). Our results suggest that an expanded articulation space for high ability individuals allows access to a larger repertoire of sounds, thereby providing skilled imitators greater flexibility in pronunciation and language learning.

  18. Are you a good mimic? Neuro-acoustic signatures for speech imitation ability

    PubMed Central

    Reiterer, Susanne M.; Hu, Xiaochen; Sumathi, T. A.; Singh, Nandini C.

    2013-01-01

    We investigated individual differences in speech imitation ability in late bilinguals using a neuro-acoustic approach. One hundred and thirty-eight German-English bilinguals matched on various behavioral measures were tested for “speech imitation ability” in a foreign language, Hindi, and categorized into “high” and “low ability” groups. Brain activations and speech recordings were obtained from 26 participants from the two extreme groups as they performed a functional neuroimaging experiment which required them to “imitate” sentences in three conditions: (A) German, (B) English, and (C) German with fake English accent. We used recently developed novel acoustic analysis, namely the “articulation space” as a metric to compare speech imitation abilities of the two groups. Across all three conditions, direct comparisons between the two groups, revealed brain activations (FWE corrected, p < 0.05) that were more widespread with significantly higher peak activity in the left supramarginal gyrus and postcentral areas for the low ability group. The high ability group, on the other hand showed significantly larger articulation space in all three conditions. In addition, articulation space also correlated positively with imitation ability (Pearson's r = 0.7, p < 0.01). Our results suggest that an expanded articulation space for high ability individuals allows access to a larger repertoire of sounds, thereby providing skilled imitators greater flexibility in pronunciation and language learning. PMID:24155739

  19. Reverberant speech recognition combining deep neural networks and deep autoencoders augmented with a phone-class feature

    NASA Astrophysics Data System (ADS)

    Mimura, Masato; Sakai, Shinsuke; Kawahara, Tatsuya

    2015-12-01

    We propose an approach to reverberant speech recognition adopting deep learning in the front-end as well as b a c k-e n d o f a r e v e r b e r a n t s p e e c h r e c o g n i t i o n s y s t e m, a n d a n o v e l m e t h o d t o i m p r o v e t h e d e r e v e r b e r a t i o n p e r f o r m a n c e of the front-end network using phone-class information. At the front-end, we adopt a deep autoencoder (DAE) for enhancing the speech feature parameters, and speech recognition is performed in the back-end using DNN-HMM acoustic models trained on multi-condition data. The system was evaluated through the ASR task in the Reverb Challenge 2014. The DNN-HMM system trained on the multi-condition training set achieved a conspicuously higher word accuracy compared to the MLLR-adapted GMM-HMM system trained on the same data. Furthermore, feature enhancement with the deep autoencoder contributed to the improvement of recognition accuracy especially in the more adverse conditions. While the mapping between reverberant and clean speech in DAE-based dereverberation is conventionally conducted only with the acoustic information, we presume the mapping is also dependent on the phone information. Therefore, we propose a new scheme (pDAE), which augments a phone-class feature to the standard acoustic features as input. Two types of the phone-class feature are investigated. One is the hard recognition result of monophones, and the other is a soft representation derived from the posterior outputs of monophone DNN. The augmented feature in either type results in a significant improvement (7-8 % relative) from the standard DAE.

  20. Decoding spectrotemporal features of overt and covert speech from the human cortex

    PubMed Central

    Martin, Stéphanie; Brunner, Peter; Holdgraf, Chris; Heinze, Hans-Jochen; Crone, Nathan E.; Rieger, Jochem; Schalk, Gerwin; Knight, Robert T.; Pasley, Brian N.

    2014-01-01

    Auditory perception and auditory imagery have been shown to activate overlapping brain regions. We hypothesized that these phenomena also share a common underlying neural representation. To assess this, we used electrocorticography intracranial recordings from epileptic patients performing an out loud or a silent reading task. In these tasks, short stories scrolled across a video screen in two conditions: subjects read the same stories both aloud (overt) and silently (covert). In a control condition the subject remained in a resting state. We first built a high gamma (70–150 Hz) neural decoding model to reconstruct spectrotemporal auditory features of self-generated overt speech. We then evaluated whether this same model could reconstruct auditory speech features in the covert speech condition. Two speech models were tested: a spectrogram and a modulation-based feature space. For the overt condition, reconstruction accuracy was evaluated as the correlation between original and predicted speech features, and was significant in each subject (p < 10−5; paired two-sample t-test). For the covert speech condition, dynamic time warping was first used to realign the covert speech reconstruction with the corresponding original speech from the overt condition. Reconstruction accuracy was then evaluated as the correlation between original and reconstructed speech features. Covert reconstruction accuracy was compared to the accuracy obtained from reconstructions in the baseline control condition. Reconstruction accuracy for the covert condition was significantly better than for the control condition (p < 0.005; paired two-sample t-test). The superior temporal gyrus, pre- and post-central gyrus provided the highest reconstruction information. The relationship between overt and covert speech reconstruction depended on anatomy. These results provide evidence that auditory representations of covert speech can be reconstructed from models that are built from an overt speech

  1. Automatic computational models of acoustical category features: Talking versus singing

    NASA Astrophysics Data System (ADS)

    Gerhard, David

    2003-10-01

    The automatic discrimination between acoustical categories has been an increasingly interesting problem in the fields of computer listening, multimedia databases, and music information retrieval. A system is presented which automatically generates classification models, given a set of destination classes and a set of a priori labeled acoustic events. Computational models are created using comparative probability density estimations. For the specific example presented, the destination classes are talking and singing. Individual feature models are evaluated using two measures: The Kologorov-Smirnov distance measures feature separation, and accuracy is measured using absolute and relative metrics. The system automatically segments the event set into a user-defined number (n) of development subsets, and runs a development cycle for each set, generating n separate systems, each of which is evaluated using the above metrics to improve overall system accuracy and to reduce inherent data skew from any one development subset. Multiple features for the same acoustical categories are then compared for underlying feature overlap using cross-correlation. Advantages of automated computational models include improved system development and testing, shortened development cycle, and automation of common system evaluation tasks. Numerical results are presented relating to the talking/singing classification problem.

  2. Acoustic-Emergent Phonology in the Amplitude Envelope of Child-Directed Speech

    PubMed Central

    Leong, Victoria; Goswami, Usha

    2015-01-01

    When acquiring language, young children may use acoustic spectro-temporal patterns in speech to derive phonological units in spoken language (e.g., prosodic stress patterns, syllables, phonemes). Children appear to learn acoustic-phonological mappings rapidly, without direct instruction, yet the underlying developmental mechanisms remain unclear. Across different languages, a relationship between amplitude envelope sensitivity and phonological development has been found, suggesting that children may make use of amplitude modulation (AM) patterns within the envelope to develop a phonological system. Here we present the Spectral Amplitude Modulation Phase Hierarchy (S-AMPH) model, a set of algorithms for deriving the dominant AM patterns in child-directed speech (CDS). Using Principal Components Analysis, we show that rhythmic CDS contains an AM hierarchy comprising 3 core modulation timescales. These timescales correspond to key phonological units: prosodic stress (Stress AM, ~2 Hz), syllables (Syllable AM, ~5 Hz) and onset-rime units (Phoneme AM, ~20 Hz). We argue that these AM patterns could in principle be used by naïve listeners to compute acoustic-phonological mappings without lexical knowledge. We then demonstrate that the modulation statistics within this AM hierarchy indeed parse the speech signal into a primitive hierarchically-organised phonological system comprising stress feet (proto-words), syllables and onset-rime units. We apply the S-AMPH model to two other CDS corpora, one spontaneous and one deliberately-timed. The model accurately identified 72–82% (freely-read CDS) and 90–98% (rhythmically-regular CDS) stress patterns, syllables and onset-rime units. This in-principle demonstration that primitive phonology can be extracted from speech AMs is termed Acoustic-Emergent Phonology (AEP) theory. AEP theory provides a set of methods for examining how early phonological development is shaped by the temporal modulation structure of speech across

  3. Feature enhancement of reverberant speech by distribution matching and non-negative matrix factorization

    NASA Astrophysics Data System (ADS)

    Keronen, Sami; Kallasjoki, Heikki; Palomäki, Kalle J.; Brown, Guy J.; Gemmeke, Jort F.

    2015-12-01

    This paper describes a novel two-stage dereverberation feature enhancement method for noise-robust automatic speech recognition. In the first stage, an estimate of the dereverberated speech is generated by matching the distribution of the observed reverberant speech to that of clean speech, in a decorrelated transformation domain that has a long temporal context in order to address the effects of reverberation. The second stage uses this dereverberated signal as an initial estimate within a non-negative matrix factorization framework, which jointly estimates a sparse representation of the clean speech signal and an estimate of the convolutional distortion. The proposed feature enhancement method, when used in conjunction with automatic speech recognizer back-end processing, is shown to improve the recognition performance compared to three other state-of-the-art techniques.

  4. Combining acoustic and electric stimulation in the service of speech recognition

    PubMed Central

    Dorman, Michael F.; Gifford, Rene H.

    2010-01-01

    The majority of recently implanted, cochlear implant patients can potentially benefit from a hearing aid in the ear contralateral to the implant. When patients combine electric and acoustic stimulation, word recognition in quiet and sentence recognition in noise increase significantly. Several studies suggest that the acoustic information that leads to the increased level of performance resides mostly in the frequency region of the voice fundamental, e.g. 125 Hz for a male voice. Recent studies suggest that this information aids speech recognition in noise by improving the recognition of lexical boundaries or word onsets. In some noise environments, patients with bilateral implants can achieve similar levels of performance as patients who combine electric and acoustic stimulation. Patients who have undergone hearing preservation surgery, and who have electric stimulation from a cochlear implant and who have low-frequency hearing in both the implanted and not-implanted ears, achieve the best performance in a high noise environment. PMID:20874053

  5. The Effectiveness of Clear Speech as a Masker

    ERIC Educational Resources Information Center

    Calandruccio, Lauren; Van Engen, Kristin; Dhar, Sumitrajit; Bradlow, Ann R.

    2010-01-01

    Purpose: It is established that speaking clearly is an effective means of enhancing intelligibility. Because any signal-processing scheme modeled after known acoustic-phonetic features of clear speech will likely affect both target and competing speech, it is important to understand how speech recognition is affected when a competing speech signal…

  6. Associations between speech features and phenotypic severity in Treacher Collins syndrome

    PubMed Central

    2014-01-01

    Background Treacher Collins syndrome (TCS, OMIM 154500) is a rare congenital disorder of craniofacial development. Characteristic hypoplastic malformations of the ears, zygomatic arch, mandible and pharynx have been described in detail. However, reports on the impact of these malformations on speech are few. Exploring speech features and investigating if speech function is related to phenotypic severity are essential for optimizing follow-up and treatment. Methods Articulation, nasal resonance, voice and intelligibility were examined in 19 individuals (5–74 years, median 34 years) divided into three groups comprising children 5–10 years (n = 4), adolescents 11–18 years (n = 4) and adults 29 years and older (n = 11). A speech composite score (0–6) was calculated to reflect the variability of speech deviations. TCS severity scores of phenotypic expression and total scores of Nordic Orofacial Test-Screening (NOT-S) measuring orofacial dysfunction were used in analyses of correlation with speech characteristics (speech composite scores). Results Children and adolescents presented with significantly higher speech composite scores (median 4, range 1–6) than adults (median 1, range 0–5). Nearly all children and adolescents (6/8) displayed speech deviations of articulation, nasal resonance and voice, while only three adults were identified with multiple speech aberrations. The variability of speech dysfunction in TCS was exhibited by individual combinations of speech deviations in 13/19 participants. The speech composite scores correlated with TCS severity scores and NOT-S total scores. Speech composite scores higher than 4 were associated with cleft palate. The percent of intelligible words in connected speech was significantly lower in children and adolescents (median 77%, range 31–99) than in adults (98%, range 93–100). Intelligibility of speech among the children was markedly inconsistent and clearly affecting the understandability

  7. Is there a correlation between Japanese L2 learner's perception of English stressed words and acoustic features?

    NASA Astrophysics Data System (ADS)

    Asano, Keiko; Isei-Jakkola, Toshiko

    2003-10-01

    Is there a correlation between Japanese L2 learner's perception of English stressed words and acoustic features? [Keiko Asano (Yokohama National University, ll-ed@ynu.ac.jp) and Toshiko Isei-jaakkola (University of Helsinki)]. It is well known that the Japanese have weakness in listening to unstressed words in English, but there are less data on their perception of stressed words. Thus, the listening tests and the acoustic experiments were conducted in terms of (1) relevancy of difficulites depending on part of speech and their English proficiency, (2) the relationship between pitch and intensity of stressed words, and (3) if there is a correlation between their perception and experimental data. In the listening test, an English prose read by an American male speaker was used. The 150 Japanese L2 learners were assigned to mark the primary stressed words. The statistical results showed that there was a variance depending on part of speech and more markedly the comparative rating scores of correct words were highly correlated to the learner's English proficiency in any part of speech. In the acoustic experiments, pitch and intensity were measured. It was confirmed that (1) both F0 and dB carried the cue to perceive a stressed-word but they were not necessarily correlated, and (2) the relationship between F0 and dB might be compared only by relative movement. By further analyzing these acoustic data, prosodic combination of F0 and dB might be relevant to the correct ratios of part of speech.

  8. Particle Swarm Optimization Based Feature Enhancement and Feature Selection for Improved Emotion Recognition in Speech and Glottal Signals

    PubMed Central

    Muthusamy, Hariharan; Polat, Kemal; Yaacob, Sazali

    2015-01-01

    In the recent years, many research works have been published using speech related features for speech emotion recognition, however, recent studies show that there is a strong correlation between emotional states and glottal features. In this work, Mel-frequency cepstralcoefficients (MFCCs), linear predictive cepstral coefficients (LPCCs), perceptual linear predictive (PLP) features, gammatone filter outputs, timbral texture features, stationary wavelet transform based timbral texture features and relative wavelet packet energy and entropy features were extracted from the emotional speech (ES) signals and its glottal waveforms(GW). Particle swarm optimization based clustering (PSOC) and wrapper based particle swarm optimization (WPSO) were proposed to enhance the discerning ability of the features and to select the discriminating features respectively. Three different emotional speech databases were utilized to gauge the proposed method. Extreme learning machine (ELM) was employed to classify the different types of emotions. Different experiments were conducted and the results show that the proposed method significantly improves the speech emotion recognition performance compared to previous works published in the literature. PMID:25799141

  9. Particle swarm optimization based feature enhancement and feature selection for improved emotion recognition in speech and glottal signals.

    PubMed

    Muthusamy, Hariharan; Polat, Kemal; Yaacob, Sazali

    2015-01-01

    In the recent years, many research works have been published using speech related features for speech emotion recognition, however, recent studies show that there is a strong correlation between emotional states and glottal features. In this work, Mel-frequency cepstralcoefficients (MFCCs), linear predictive cepstral coefficients (LPCCs), perceptual linear predictive (PLP) features, gammatone filter outputs, timbral texture features, stationary wavelet transform based timbral texture features and relative wavelet packet energy and entropy features were extracted from the emotional speech (ES) signals and its glottal waveforms(GW). Particle swarm optimization based clustering (PSOC) and wrapper based particle swarm optimization (WPSO) were proposed to enhance the discerning ability of the features and to select the discriminating features respectively. Three different emotional speech databases were utilized to gauge the proposed method. Extreme learning machine (ELM) was employed to classify the different types of emotions. Different experiments were conducted and the results show that the proposed method significantly improves the speech emotion recognition performance compared to previous works published in the literature.

  10. Segment-based acoustic models for continuous speech recognition

    NASA Astrophysics Data System (ADS)

    Ostendorf, Mari; Rohlicek, J. R.

    1993-07-01

    This research aims to develop new and more accurate stochastic models for speaker-independent continuous speech recognition, by extending previous work in segment-based modeling and by introducing a new hierarchical approach to representing intra-utterance statistical dependencies. These techniques, which are more costly than traditional approaches because of the large search space associated with higher order models, are made feasible through rescoring a set of HMM-generated N-best sentence hypotheses. We expect these different modeling techniques to result in improved recognition performance over that achieved by current systems, which handle only frame-based observations and assume that these observations are independent given an underlying state sequence. In the fourth quarter of the project, we have completed the following: (1) ported our recognition system to the Wall Street Journal task, a standard task in the ARPA community; (2) developed an initial dependency-tree model of intra-utterance observation correlation; and (3) implemented baseline language model estimation software. Our initial results on the Wall Street Journal task are quite good and represent significantly improved performance over most HMM systems reporting on the Nov. 1992 5k vocabulary test set.

  11. Control of Spoken Vowel Acoustics and the Influence of Phonetic Context in Human Speech Sensorimotor Cortex

    PubMed Central

    Bouchard, Kristofer E.

    2014-01-01

    Speech production requires the precise control of vocal tract movements to generate individual speech sounds (phonemes) which, in turn, are rapidly organized into complex sequences. Multiple productions of the same phoneme can exhibit substantial variability, some of which is inherent to control of the vocal tract and its biomechanics, and some of which reflects the contextual effects of surrounding phonemes (“coarticulation”). The role of the CNS in these aspects of speech motor control is not well understood. To address these issues, we recorded multielectrode cortical activity directly from human ventral sensory-motor cortex (vSMC) during the production of consonant-vowel syllables. We analyzed the relationship between the acoustic parameters of vowels (pitch and formants) and cortical activity on a single-trial level. We found that vSMC activity robustly predicted acoustic parameters across vowel categories (up to 80% of variance), as well as different renditions of the same vowel (up to 25% of variance). Furthermore, we observed significant contextual effects on vSMC representations of produced phonemes that suggest active control of coarticulation: vSMC representations for vowels were biased toward the representations of the preceding consonant, and conversely, representations for consonants were biased toward upcoming vowels. These results reveal that vSMC activity for phonemes are not invariant and provide insight into the cortical mechanisms of coarticulation. PMID:25232105

  12. Teachers and Teaching: Speech Production Accommodations Due to Changes in the Acoustic Environment

    PubMed Central

    Hunter, Eric J.; Bottalico, Pasquale; Graetzer, Simone; Leishman, Timothy W.; Berardi, Mark L.; Eyring, Nathan G.; Jensen, Zachary R.; Rolins, Michael K.; Whiting, Jennifer K.

    2016-01-01

    School teachers have an elevated risk of voice problems due to the vocal demands in the workplace. This manuscript presents the results of three studies investigating teachers’ voice use at work. In the first study, 57 teachers were observed for 2 weeks (waking hours) to compare how they used their voice in the school environment and in non-school environments. In a second study, 45 participants performed a short vocal task in two different rooms: a variable acoustic room and an anechoic chamber. Subjects were taken back and forth between the two rooms. Each time they entered the variable acoustics room, the reverberation time and/or the background noise condition had been modified. In this latter study, subjects responded to questions about their vocal comfort and their perception of changes in the acoustic environment. In a third study, 20 untrained vocalists performed a simple vocal task in the following conditions: with and without background babble and with and without transparent plexiglass shields to increase the first reflection. Relationships were examined between [1] the results for the room acoustic parameters; [2] the subjects’ perception of the room; and [3] the recorded speech acoustic. Several differences between male and female subjects were found; some of those differences held for each room condition (at school vs. not at school, reverberation level, noise level, and early reflection). PMID:26949426

  13. Modification of computational auditory scene analysis (CASA) for noise-robust acoustic feature

    NASA Astrophysics Data System (ADS)

    Kwon, Minseok

    While there have been many attempts to mitigate interferences of background noise, the performance of automatic speech recognition (ASR) still can be deteriorated by various factors with ease. However, normal hearing listeners can accurately perceive sounds of their interests, which is believed to be a result of Auditory Scene Analysis (ASA). As a first attempt, the simulation of the human auditory processing, called computational auditory scene analysis (CASA), was fulfilled through physiological and psychological investigations of ASA. CASA comprised of Zilany-Bruce auditory model, followed by tracking fundamental frequency for voice segmentation and detecting pairs of onset/offset at each characteristic frequency (CF) for unvoiced segmentation. The resulting Time-Frequency (T-F) representation of acoustic stimulation was converted into acoustic feature, gammachirp-tone frequency cepstral coefficients (GFCC). 11 keywords with various environmental conditions are used and the robustness of GFCC was evaluated by spectral distance (SD) and dynamic time warping distance (DTW). In "clean" and "noisy" conditions, the application of CASA generally improved noise robustness of the acoustic feature compared to a conventional method with or without noise suppression using MMSE estimator. The intial study, however, not only showed the noise-type dependency at low SNR, but also called the evaluation methods in question. Some modifications were made to capture better spectral continuity from an acoustic feature matrix, to obtain faster processing speed, and to describe the human auditory system more precisely. The proposed framework includes: 1) multi-scale integration to capture more accurate continuity in feature extraction, 2) contrast enhancement (CE) of each CF by competition with neighboring frequency bands, and 3) auditory model modifications. The model modifications contain the introduction of higher Q factor, middle ear filter more analogous to human auditory system

  14. Acoustic temporal modulation detection and speech perception in cochlear implant listeners.

    PubMed

    Won, Jong Ho; Drennan, Ward R; Nie, Kaibao; Jameyson, Elyse M; Rubinstein, Jay T

    2011-07-01

    The goals of the present study were to measure acoustic temporal modulation transfer functions (TMTFs) in cochlear implant listeners and examine the relationship between modulation detection and speech recognition abilities. The effects of automatic gain control, presentation level and number of channels on modulation detection thresholds (MDTs) were examined using the listeners' clinical sound processor. The general form of the TMTF was low-pass, consistent with previous studies. The operation of automatic gain control had no effect on MDTs when the stimuli were presented at 65 dBA. MDTs were not dependent on the presentation levels (ranging from 50 to 75 dBA) nor on the number of channels. Significant correlations were found between MDTs and speech recognition scores. The rates of decay of the TMTFs were predictive of speech recognition abilities. Spectral-ripple discrimination was evaluated to examine the relationship between temporal and spectral envelope sensitivities. No correlations were found between the two measures, and 56% of the variance in speech recognition was predicted jointly by the two tasks. The present study suggests that temporal modulation detection measured with the sound processor can serve as a useful measure of the ability of clinical sound processing strategies to deliver clinically pertinent temporal information.

  15. The role of metrical information in apraxia of speech. Perceptual and acoustic analyses of word stress.

    PubMed

    Aichert, Ingrid; Späth, Mona; Ziegler, Wolfram

    2016-02-01

    Several factors are known to influence speech accuracy in patients with apraxia of speech (AOS), e.g., syllable structure or word length. However, the impact of word stress has largely been neglected so far. More generally, the role of prosodic information at the phonetic encoding stage of speech production often remains unconsidered in models of speech production. This study aimed to investigate the influence of word stress on error production in AOS. Two-syllabic words with stress on the first (trochees) vs. the second syllable (iambs) were compared in 14 patients with AOS, three of them exhibiting pure AOS, and in a control group of six normal speakers. The patients produced significantly more errors on iambic than on trochaic words. A most prominent metrical effect was obtained for segmental errors. Acoustic analyses of word durations revealed a disproportionate advantage of the trochaic meter in the patients relative to the healthy controls. The results indicate that German apraxic speakers are sensitive to metrical information. It is assumed that metrical patterns function as prosodic frames for articulation planning, and that the regular metrical pattern in German, the trochaic form, has a facilitating effect on word production in patients with AOS.

  16. Discriminative analysis of lip motion features for speaker identification and speech-reading.

    PubMed

    Cetingül, H Ertan; Yemez, Yücel; Erzin, Engin; Tekalp, A Murat

    2006-10-01

    There have been several studies that jointly use audio, lip intensity, and lip geometry information for speaker identification and speech-reading applications. This paper proposes using explicit lip motion information, instead of or in addition to lip intensity and/or geometry information, for speaker identification and speech-reading within a unified feature selection and discrimination analysis framework, and addresses two important issues: 1) Is using explicit lip motion information useful, and, 2) if so, what are the best lip motion features for these two applications? The best lip motion features for speaker identification are considered to be those that result in the highest discrimination of individual speakers in a population, whereas for speech-reading, the best features are those providing the highest phoneme/word/phrase recognition rate. Several lip motion feature candidates have been considered including dense motion features within a bounding box about the lip, lip contour motion features, and combination of these with lip shape features. Furthermore, a novel two-stage, spatial, and temporal discrimination analysis is introduced to select the best lip motion features for speaker identification and speech-reading applications. Experimental results using an hidden-Markov-model-based recognition system indicate that using explicit lip motion information provides additional performance gains in both applications, and lip motion features prove more valuable in the case of speech-reading application.

  17. Emotion in speech: the acoustic attributes of fear, anger, sadness, and joy.

    PubMed

    Sobin, C; Alpert, M

    1999-07-01

    Decoders can detect emotion in voice with much greater accuracy than can be achieved by objective acoustic analysis. Studies that have established this advantage, however, used methods that may have favored decoders and disadvantaged acoustic analysis. In this study, we applied several methodologic modifications for the analysis of the acoustic differentiation of fear, anger, sadness, and joy. Thirty-one female subjects between the ages of 18 and 35 (encoders) were audio-recorded during an emotion-induction procedure and produced a total of 620 emotion-laden sentences. Twelve female judges (decoders), three for each of the four emotions, were assigned to rate the intensity of one emotion each. Their combined ratings were used to select 38 prototype samples per emotion. Past acoustic findings were replicated, and increased acoustic differentiation among the emotions was achieved. Multiple regression analysis suggested that some, although not all, of the acoustic variables were associated with decoders' ratings. Signal detection analysis gave some insight into this disparity. However, the analysis of the classic constellation of acoustic variables may not completely capture the acoustic features that influence decoders' ratings. Future analyses would likely benefit from the parallel assessment of respiration, phonation, and articulation.

  18. The effects of selective attention and speech acoustics on neural speech-tracking in a multi-talker scene.

    PubMed

    Rimmele, Johanna M; Zion Golumbic, Elana; Schröger, Erich; Poeppel, David

    2015-07-01

    Attending to one speaker in multi-speaker situations is challenging. One neural mechanism proposed to underlie the ability to attend to a particular speaker is phase-locking of low-frequency activity in auditory cortex to speech's temporal envelope ("speech-tracking"), which is more precise for attended speech. However, it is not known what brings about this attentional effect, and specifically if it reflects enhanced processing of the fine structure of attended speech. To investigate this question we compared attentional effects on speech-tracking of natural versus vocoded speech which preserves the temporal envelope but removes the fine structure of speech. Pairs of natural and vocoded speech stimuli were presented concurrently and participants attended to one stimulus and performed a detection task while ignoring the other stimulus. We recorded magnetoencephalography (MEG) and compared attentional effects on the speech-tracking response in auditory cortex. Speech-tracking of natural, but not vocoded, speech was enhanced by attention, whereas neural tracking of ignored speech was similar for natural and vocoded speech. These findings suggest that the more precise speech-tracking of attended natural speech is related to processing its fine structure, possibly reflecting the application of higher-order linguistic processes. In contrast, when speech is unattended its fine structure is not processed to the same degree and thus elicits less precise speech-tracking more similar to vocoded speech.

  19. Acoustic measurements through analysis of binaural recordings of speech and music

    NASA Astrophysics Data System (ADS)

    Griesinger, David

    2004-10-01

    This paper will present and demonstrate some recent work on the measurement of acoustic properties from binaural recordings of live performances. It is found that models of the process of stream formation can be used to measure intelligibility, and, when combined with band-limited running cross-correlation, can be used to measure spaciousness and envelopment. Analysis of the running cross correlation during sound onsets can be used to measure the accuracy of azimuth perception. It is additionally found that the ease of detecting fundamental pitch from the upper partials of speech and music can be used as a measure of sound quality, particularly for solo instruments and singers.

  20. Speech Emotion Feature Selection Method Based on Contribution Analysis Algorithm of Neural Network

    SciTech Connect

    Wang Xiaojia; Mao Qirong; Zhan Yongzhao

    2008-11-06

    There are many emotion features. If all these features are employed to recognize emotions, redundant features may be existed. Furthermore, recognition result is unsatisfying and the cost of feature extraction is high. In this paper, a method to select speech emotion features based on contribution analysis algorithm of NN is presented. The emotion features are selected by using contribution analysis algorithm of NN from the 95 extracted features. Cluster analysis is applied to analyze the effectiveness for the features selected, and the time of feature extraction is evaluated. Finally, 24 emotion features selected are used to recognize six speech emotions. The experiments show that this method can improve the recognition rate and the time of feature extraction.

  1. Revisiting Neil Armstrongs Moon-Landing Quote: Implications for Speech Perception, Function Word Reduction, and Acoustic Ambiguity

    PubMed Central

    Baese-Berk, Melissa M.; Dilley, Laura C.; Schmidt, Stephanie; Morrill, Tuuli H.; Pitt, Mark A.

    2016-01-01

    Neil Armstrong insisted that his quote upon landing on the moon was misheard, and that he had said one small step for a man, instead of one small step for man. What he said is unclear in part because function words like a can be reduced and spectrally indistinguishable from the preceding context. Therefore, their presence can be ambiguous, and they may disappear perceptually depending on the rate of surrounding speech. Two experiments are presented examining production and perception of reduced tokens of for and for a in spontaneous speech. Experiment 1 investigates the distributions of several acoustic features of for and for a. The results suggest that the distributions of for and for a overlap substantially, both in terms of temporal and spectral characteristics. Experiment 2 examines perception of these same tokens when the context speaking rate differs. The perceptibility of the function word a varies as a function of this context speaking rate. These results demonstrate that substantial ambiguity exists in the original quote from Armstrong, and that this ambiguity may be understood through context speaking rate. PMID:27603209

  2. Revisiting Neil Armstrongs Moon-Landing Quote: Implications for Speech Perception, Function Word Reduction, and Acoustic Ambiguity.

    PubMed

    Baese-Berk, Melissa M; Dilley, Laura C; Schmidt, Stephanie; Morrill, Tuuli H; Pitt, Mark A

    2016-01-01

    Neil Armstrong insisted that his quote upon landing on the moon was misheard, and that he had said one small step for a man, instead of one small step for man. What he said is unclear in part because function words like a can be reduced and spectrally indistinguishable from the preceding context. Therefore, their presence can be ambiguous, and they may disappear perceptually depending on the rate of surrounding speech. Two experiments are presented examining production and perception of reduced tokens of for and for a in spontaneous speech. Experiment 1 investigates the distributions of several acoustic features of for and for a. The results suggest that the distributions of for and for a overlap substantially, both in terms of temporal and spectral characteristics. Experiment 2 examines perception of these same tokens when the context speaking rate differs. The perceptibility of the function word a varies as a function of this context speaking rate. These results demonstrate that substantial ambiguity exists in the original quote from Armstrong, and that this ambiguity may be understood through context speaking rate.

  3. The effects of selective attention and speech acoustics on neural speech-tracking in a multi-talker scene

    PubMed Central

    Rimmele, Johanna M.; Golumbic, Elana Zion; Schröger, Erich; Poeppel, David

    2015-01-01

    Attending to one speaker in multi-speaker situations is challenging. One neural mechanism proposed to underlie the ability to attend to a particular speaker is phase-locking of low-frequency activity in auditory cortex to speech’s temporal envelope (“speech-tracking”), which is more precise for attended speech. However, it is not known what brings about this attentional effect, and specifically if it reflects enhanced processing of the fine structure of attended speech. To investigate this question we compared attentional effects on speech-tracking of natural vs. vocoded speech which preserves the temporal envelope but removes the fine-structure of speech. Pairs of natural and vocoded speech stimuli were presented concurrently and participants attended to one stimulus and performed a detection task while ignoring the other stimulus. We recorded magnetoencephalography (MEG) and compared attentional effects on the speech-tracking response in auditory cortex. Speech-tracking of natural, but not vocoded, speech was enhanced by attention, whereas neural tracking of ignored speech was similar for natural and vocoded speech. These findings suggest that the more precise speech tracking of attended natural speech is related to processing its fine structure, possibly reflecting the application of higher-order linguistic processes. In contrast, when speech is unattended its fine structure is not processed to the same degree and thus elicits less precise speech tracking more similar to vocoded speech. PMID:25650107

  4. A physiologically-inspired model reproducing the speech intelligibility benefit in cochlear implant listeners with residual acoustic hearing.

    PubMed

    Zamaninezhad, Ladan; Hohmann, Volker; Büchner, Andreas; Schädler, Marc René; Jürgens, Tim

    2017-02-01

    This study introduces a speech intelligibility model for cochlear implant users with ipsilateral preserved acoustic hearing that aims at simulating the observed speech-in-noise intelligibility benefit when receiving simultaneous electric and acoustic stimulation (EA-benefit). The model simulates the auditory nerve spiking in response to electric and/or acoustic stimulation. The temporally and spatially integrated spiking patterns were used as the final internal representation of noisy speech. Speech reception thresholds (SRTs) in stationary noise were predicted for a sentence test using an automatic speech recognition framework. The model was employed to systematically investigate the effect of three physiologically relevant model factors on simulated SRTs: (1) the spatial spread of the electric field which co-varies with the number of electrically stimulated auditory nerves, (2) the "internal" noise simulating the deprivation of auditory system, and (3) the upper bound frequency limit of acoustic hearing. The model results show that the simulated SRTs increase monotonically with increasing spatial spread for fixed internal noise, and also increase with increasing the internal noise strength for a fixed spatial spread. The predicted EA-benefit does not follow such a systematic trend and depends on the specific combination of the model parameters. Beyond 300 Hz, the upper bound limit for preserved acoustic hearing is less influential on speech intelligibility of EA-listeners in stationary noise. The proposed model-predicted EA-benefits are within the range of EA-benefits shown by 18 out of 21 actual cochlear implant listeners with preserved acoustic hearing.

  5. Pattern Recognition Methods and Features Selection for Speech Emotion Recognition System.

    PubMed

    Partila, Pavol; Voznak, Miroslav; Tovarek, Jaromir

    2015-01-01

    The impact of the classification method and features selection for the speech emotion recognition accuracy is discussed in this paper. Selecting the correct parameters in combination with the classifier is an important part of reducing the complexity of system computing. This step is necessary especially for systems that will be deployed in real-time applications. The reason for the development and improvement of speech emotion recognition systems is wide usability in nowadays automatic voice controlled systems. Berlin database of emotional recordings was used in this experiment. Classification accuracy of artificial neural networks, k-nearest neighbours, and Gaussian mixture model is measured considering the selection of prosodic, spectral, and voice quality features. The purpose was to find an optimal combination of methods and group of features for stress detection in human speech. The research contribution lies in the design of the speech emotion recognition system due to its accuracy and efficiency.

  6. Pattern Recognition Methods and Features Selection for Speech Emotion Recognition System

    PubMed Central

    Partila, Pavol; Voznak, Miroslav; Tovarek, Jaromir

    2015-01-01

    The impact of the classification method and features selection for the speech emotion recognition accuracy is discussed in this paper. Selecting the correct parameters in combination with the classifier is an important part of reducing the complexity of system computing. This step is necessary especially for systems that will be deployed in real-time applications. The reason for the development and improvement of speech emotion recognition systems is wide usability in nowadays automatic voice controlled systems. Berlin database of emotional recordings was used in this experiment. Classification accuracy of artificial neural networks, k-nearest neighbours, and Gaussian mixture model is measured considering the selection of prosodic, spectral, and voice quality features. The purpose was to find an optimal combination of methods and group of features for stress detection in human speech. The research contribution lies in the design of the speech emotion recognition system due to its accuracy and efficiency. PMID:26346654

  7. Feature Characteristics of Spontaneous Speech Production in Young Deaf Children.

    ERIC Educational Resources Information Center

    Geffner, Donna

    1980-01-01

    Sixty-five six-year-old deaf children from state supported schools were given an adaptation of the Goldman Fristoe test of articulation to assess their spontaneous speech production. Journal availability: Elsevier North Holland, Inc., 52 Vanderbilt Avenue, New York, NY 10017. (Author)

  8. Acoustic Context Alters Vowel Categorization in Perception of Noise-Vocoded Speech.

    PubMed

    Stilp, Christian E

    2017-03-09

    Normal-hearing listeners' speech perception is widely influenced by spectral contrast effects (SCEs), where perception of a given sound is biased away from stable spectral properties of preceding sounds. Despite this influence, it is not clear how these contrast effects affect speech perception for cochlear implant (CI) users whose spectral resolution is notoriously poor. This knowledge is important for understanding how CIs might better encode key spectral properties of the listening environment. Here, SCEs were measured in normal-hearing listeners using noise-vocoded speech to simulate poor spectral resolution. Listeners heard a noise-vocoded sentence where low-F1 (100-400 Hz) or high-F1 (550-850 Hz) frequency regions were amplified to encourage "eh" (/ɛ/) or "ih" (/ɪ/) responses to the following target vowel, respectively. This was done by filtering with +20 dB (experiment 1a) or +5 dB gain (experiment 1b) or filtering using 100 % of the difference between spectral envelopes of /ɛ/ and /ɪ/ endpoint vowels (experiment 2a) or only 25 % of this difference (experiment 2b). SCEs influenced identification of noise-vocoded vowels in each experiment at every level of spectral resolution. In every case but one, SCE magnitudes exceeded those reported for full-spectrum speech, particularly when spectral peaks in the preceding sentence were large (+20 dB gain, 100 % of the spectral envelope difference). Even when spectral resolution was insufficient for accurate vowel recognition, SCEs were still evident. Results are suggestive of SCEs influencing CI users' speech perception as well, encouraging further investigation of CI users' sensitivity to acoustic context.

  9. Prosodic Influences on Speech Production in Children with Specific Language Impairment and Speech Deficits: Kinematic, Acoustic, and Transcription Evidence.

    ERIC Educational Resources Information Center

    Goffman, Lisa

    1999-01-01

    In this study, seven children with specific language impairment (SLI) and speech deficits were matched with same age peers and evaluated for iambic (weak-strong) and trochaic (strong-weak) prosodic speech forms. Findings indicated that children with SLI and speech deficits show less mature segmental and speech motor systems, as well as decreased…

  10. Exploiting Acoustic and Syntactic Features for Automatic Prosody Labeling in a Maximum Entropy Framework

    PubMed Central

    Sridhar, Vivek Kumar Rangarajan; Bangalore, Srinivas; Narayanan, Shrikanth S.

    2009-01-01

    In this paper, we describe a maximum entropy-based automatic prosody labeling framework that exploits both language and speech information. We apply the proposed framework to both prominence and phrase structure detection within the Tones and Break Indices (ToBI) annotation scheme. Our framework utilizes novel syntactic features in the form of supertags and a quantized acoustic–prosodic feature representation that is similar to linear parameterizations of the prosodic contour. The proposed model is trained discriminatively and is robust in the selection of appropriate features for the task of prosody detection. The proposed maximum entropy acoustic–syntactic model achieves pitch accent and boundary tone detection accuracies of 86.0% and 93.1% on the Boston University Radio News corpus, and, 79.8% and 90.3% on the Boston Directions corpus. The phrase structure detection through prosodic break index labeling provides accuracies of 84% and 87% on the two corpora, respectively. The reported results are significantly better than previously reported results and demonstrate the strength of maximum entropy model in jointly modeling simple lexical, syntactic, and acoustic features for automatic prosody labeling. PMID:19603083

  11. Speech recognition using Kohonen neural networks, dynamic programming, and multi-feature fusion

    NASA Astrophysics Data System (ADS)

    Stowe, Francis S.

    1990-12-01

    The purpose of this thesis was to develop and evaluate the performance of a three-feature speech recognition system. The three features used were LPC spectrum, formants (F1/F2), and cepstrum. The system uses Kohonen neural networks, dynamic programming, and a rule-based, feature-fusion process which integrates the three input features into one output result. The first half of this research involved evaluating the system in a speaker-dependent atmosphere. For this, the 70 word F-16 cockpit command vocabulary was used and both isolated and connected speech was tested. Results obtained are compared to a two-feature system with the same system configuration. Isolated-speech testing yielded 98.7 percent accuracy. Connected-speech testing yielded 75/0 percent accuracy. The three-feature system performed an average of 1.7 percent better than the two-feature system for isolated-speech. The second half of this research was concerned with the speaker-independent performance of the system. First, cross-speaker testing was performed using an updated 86 word library. In general, this testing yielded less than 50 percent accuracy. Then, testing was performed using averaged templates. This testing yielded an overall average in-template recognition rate of approximately 90 percent and an out-of-template recognition rate of approximately 75 percent.

  12. Acoustic evaluation of short-term effects of repetitive transcranial magnetic stimulation on motor aspects of speech in Parkinson's disease.

    PubMed

    Eliasova, I; Mekyska, J; Kostalova, M; Marecek, R; Smekal, Z; Rektorova, I

    2013-04-01

    Hypokinetic dysarthria in Parkinson's disease (PD) can be characterized by monotony of pitch and loudness, reduced stress, variable rate, imprecise consonants, and a breathy and harsh voice. Using acoustic analysis, we studied the effects of high-frequency repetitive transcranial magnetic stimulation (rTMS) applied over the primary orofacial sensorimotor area (SM1) and the left dorsolateral prefrontal cortex (DLPFC) on motor aspects of voiced speech in PD. Twelve non-depressed and non-demented men with PD (mean age 64.58 ± 8.04 years, mean PD duration 10.75 ± 7.48 years) and 21 healthy age-matched men (a control group, mean age 64 ± 8.55 years) participated in the speech study. The PD patients underwent two sessions of 10 Hz rTMS over the dominant hemisphere with 2,250 stimuli/day in a random order: (1) over the SM1; (2) over the left DLPFC in the "on" motor state. Speech examination comprised the perceptual rating of global speech performance and an acoustic analysis based upon a standardized speech task. The Mann-Whitney U test was used to compare acoustic speech variables between controls and PD patients. The Wilcoxon test was used to compare data prior to and after each stimulation in the PD group. rTMS applied over the left SM1 was associated with a significant increase in harmonic-to-noise ratio and net speech rate in the sentence tasks. With respect to the vowel task results, increased median values and range of Teager-Kaiser energy operator, increased vowel space area, and significant jitter decrease were observed after the left SM1 stimulation. rTMS over the left DLPFC did not induce any significant effects. The positive results of acoustic analysis were not reflected in a subjective rating of speech performance quality as assessed by a speech therapist. Our pilot results indicate that one session of rTMS applied over the SM1 may lead to measurable improvement in voice quality and intensity and an increase in speech rate and tongue movements

  13. Features vs. Feelings: Dissociable representations of the acoustic features and valence of aversive sounds

    PubMed Central

    Kumar, Sukhbinder; von Kriegstein, Katharina; Friston, Karl; Griffiths, Timothy D

    2012-01-01

    This study addresses the neuronal representation of aversive sounds that are perceived as unpleasant. Functional magnetic resonance imaging (fMRI) in humans demonstrated responses in the amygdala and auditory cortex to aversive sounds. We show that the amygdala encodes both the acoustic features of a stimulus and its valence (perceived unpleasantness). Dynamic Causal Modelling (DCM) of this system revealed that evoked responses to sounds are relayed to the amygdala via auditory cortex. While acoustic features modulate effective connectivity from auditory cortex to the amygdala, the valence modulates the effective connectivity from amygdala to the auditory cortex. These results support a complex (recurrent) interaction between the auditory cortex and amygdala based on object-level analysis in the auditory cortex that portends the assignment of emotional valence in amygdala that in turn influences the representation of salient information in auditory cortex. PMID:23055488

  14. The effect of intertalker speech rate variation on acoustic vowel space.

    PubMed

    Tsao, Ying-Chiao; Weismer, Gary; Iqbal, Kamran

    2006-02-01

    The present study aimed to examine the size of the acoustic vowel space in talkers who had previously been identified as having slow and fast habitual speaking rates [Tsao, Y.-C. and Weismer, G. (1997) J. Speech Lang. Hear. Res. 40, 858-866]. Within talkers, it is fairly well known that faster speaking rates result in a compression of the vowel space relative to that measured for slower rates, so the current study was completed to determine if the same differences in the size of the vowel space occur across talkers who differ significantly in their habitual speaking rates. Results indicated that there was no difference in the average size of the vowel space for slow vs fast talkers, and no relationship across talkers between vowel duration and formant frequencies. One difference between the slow and fast talkers was in intertalker variability of the vowel spaces, which was clearly greater for the slow talkers, for both speaker sexes. Results are discussed relative to theories of speech production and vowel normalization in speech perception.

  15. Discrimination of speech stimuli based on neuronal response phase patterns depends on acoustics but not comprehension.

    PubMed

    Howard, Mary F; Poeppel, David

    2010-11-01

    Speech stimuli give rise to neural activity in the listener that can be observed as waveforms using magnetoencephalography. Although waveforms vary greatly from trial to trial due to activity unrelated to the stimulus, it has been demonstrated that spoken sentences can be discriminated based on theta-band (3-7 Hz) phase patterns in single-trial response waveforms. Furthermore, manipulations of the speech signal envelope and fine structure that reduced intelligibility were found to produce correlated reductions in discrimination performance, suggesting a relationship between theta-band phase patterns and speech comprehension. This study investigates the nature of this relationship, hypothesizing that theta-band phase patterns primarily reflect cortical processing of low-frequency (<40 Hz) modulations present in the acoustic signal and required for intelligibility, rather than processing exclusively related to comprehension (e.g., lexical, syntactic, semantic). Using stimuli that are quite similar to normal spoken sentences in terms of low-frequency modulation characteristics but are unintelligible (i.e., their time-inverted counterparts), we find that discrimination performance based on theta-band phase patterns is equal for both types of stimuli. Consistent with earlier findings, we also observe that whereas theta-band phase patterns differ across stimuli, power patterns do not. We use a simulation model of the single-trial response to spoken sentence stimuli to demonstrate that phase-locked responses to low-frequency modulations of the acoustic signal can account not only for the phase but also for the power results. The simulation offers insight into the interpretation of the empirical results with respect to phase-resetting and power-enhancement models of the evoked response.

  16. Discrimination of Speech Stimuli Based on Neuronal Response Phase Patterns Depends on Acoustics But Not Comprehension

    PubMed Central

    Poeppel, David

    2010-01-01

    Speech stimuli give rise to neural activity in the listener that can be observed as waveforms using magnetoencephalography. Although waveforms vary greatly from trial to trial due to activity unrelated to the stimulus, it has been demonstrated that spoken sentences can be discriminated based on theta-band (3–7 Hz) phase patterns in single-trial response waveforms. Furthermore, manipulations of the speech signal envelope and fine structure that reduced intelligibility were found to produce correlated reductions in discrimination performance, suggesting a relationship between theta-band phase patterns and speech comprehension. This study investigates the nature of this relationship, hypothesizing that theta-band phase patterns primarily reflect cortical processing of low-frequency (<40 Hz) modulations present in the acoustic signal and required for intelligibility, rather than processing exclusively related to comprehension (e.g., lexical, syntactic, semantic). Using stimuli that are quite similar to normal spoken sentences in terms of low-frequency modulation characteristics but are unintelligible (i.e., their time-inverted counterparts), we find that discrimination performance based on theta-band phase patterns is equal for both types of stimuli. Consistent with earlier findings, we also observe that whereas theta-band phase patterns differ across stimuli, power patterns do not. We use a simulation model of the single-trial response to spoken sentence stimuli to demonstrate that phase-locked responses to low-frequency modulations of the acoustic signal can account not only for the phase but also for the power results. The simulation offers insight into the interpretation of the empirical results with respect to phase-resetting and power-enhancement models of the evoked response. PMID:20484530

  17. Quantifying the Effect of Compression Hearing Aid Release Time on Speech Acoustics and Intelligibility

    ERIC Educational Resources Information Center

    Jenstad, Lorienne M.; Souza, Pamela E.

    2005-01-01

    Compression hearing aids have the inherent, and often adjustable, feature of release time from compression. Research to date does not provide a consensus on how to choose or set release time. The current study had 2 purposes: (a) a comprehensive evaluation of the acoustic effects of release time for a single-channel compression system in quiet and…

  18. The Effect of Dynamic Acoustical Features on Musical Timbre

    NASA Astrophysics Data System (ADS)

    Hajda, John M.

    Timbre has been an important concept for scientific exploration of music at least since the time of Helmholtz ([1877] 1954). Since Helmholtz's time, a number of studies have defined and investigated acoustical features of musical instrument tones to determine their perceptual importance, or salience (e.g., Grey, 1975, 1977; Kendall, 1986; Kendall et al., 1999; Luce and Clark, 1965; McAdams et al., 1995, 1999; Saldanha and Corso, 1964; Wedin and Goude, 1972). Most of these studies have considered only nonpercussive, or continuant, tones of Western orchestral instruments (or emulations thereof). In the past few years, advances in computing power and programming have made possible and affordable the definition and control of new acoustical variables. This chapter gives an overview of past and current research, with a special emphasis on the time-variant aspects of musical timbre. According to common observation, "music is made of tones in time" (Spaeth, 1933). We will also consider the fact that music is made of "time in tones."

  19. Transient Auditory Storage of Acoustic Details Is Associated with Release of Speech from Informational Masking in Reverberant Conditions

    ERIC Educational Resources Information Center

    Huang, Ying; Huang, Qiang; Chen, Xun; Wu, Xihong; Li, Liang

    2009-01-01

    Perceptual integration of the sound directly emanating from the source with reflections needs both temporal storage and correlation computation of acoustic details. We examined whether the temporal storage is frequency dependent and associated with speech unmasking. In Experiment 1, a break in correlation (BIC) between interaurally correlated…

  20. Effect of Digital Frequency Compression (DFC) on Speech Recognition in Candidates for Combined Electric and Acoustic Stimulation (EAS)

    ERIC Educational Resources Information Center

    Gifford, Rene H.; Dorman, Michael F.; Spahr, Anthony J.; McKarns, Sharon A.

    2007-01-01

    Purpose: To compare the effects of conventional amplification (CA) and digital frequency compression (DFC) amplification on the speech recognition abilities of candidates for a partial-insertion cochlear implant, that is, candidates for combined electric and acoustic stimulation (EAS). Method: The participants were 6 patients whose audiometric…

  1. Acoustic Differences between Humorous and Sincere Communicative Intentions

    ERIC Educational Resources Information Center

    Hoicka, Elena; Gattis, Merideth

    2012-01-01

    Previous studies indicate that the acoustic features of speech discriminate between positive and negative communicative intentions, such as approval and prohibition. Two studies investigated whether acoustic features of speech can discriminate between two positive communicative intentions: humour and sweet-sincerity, where sweet-sincerity involved…

  2. Speech recognition in reverberant and noisy environments employing multiple feature extractors and i-vector speaker adaptation

    NASA Astrophysics Data System (ADS)

    Alam, Md Jahangir; Gupta, Vishwa; Kenny, Patrick; Dumouchel, Pierre

    2015-12-01

    The REVERB challenge provides a common framework for the evaluation of feature extraction techniques in the presence of both reverberation and additive background noise. State-of-the-art speech recognition systems perform well in controlled environments, but their performance degrades in realistic acoustical conditions, especially in real as well as simulated reverberant environments. In this contribution, we utilize multiple feature extractors including the conventional mel-filterbank, multi-taper spectrum estimation-based mel-filterbank, robust mel and compressive gammachirp filterbank, iterative deconvolution-based dereverberated mel-filterbank, and maximum likelihood inverse filtering-based dereverberated mel-frequency cepstral coefficient features for speech recognition with multi-condition training data. In order to improve speech recognition performance, we combine their results using ROVER (Recognizer Output Voting Error Reduction). For two- and eight-channel tasks, to get benefited from the multi-channel data, we also use ROVER, instead of the multi-microphone signal processing method, to reduce word error rate by selecting the best scoring word at each channel. As in a previous work, we also apply i-vector-based speaker adaptation which was found effective. In speech recognition task, speaker adaptation tries to reduce mismatch between the training and test speakers. Speech recognition experiments are conducted on the REVERB challenge 2014 corpora using the Kaldi recognizer. In our experiments, we use both utterance-based batch processing and full batch processing. In the single-channel task, full batch processing reduced word error rate (WER) from 10.0 to 9.3 % on SimData as compared to utterance-based batch processing. Using full batch processing, we obtained an average WER of 9.0 and 23.4 % on the SimData and RealData, respectively, for the two-channel task, whereas for the eight-channel task on the SimData and RealData, the average WERs found were 8

  3. Comments on "Effects of Noise on Speech Production: Acoustic and Perceptual Analyses" [J. Acoust. Soc. Am. 84, 917-928 (1988)].

    PubMed

    Fitch, H

    1989-11-01

    The effect of background noise on speech production is an important issue, both from the practical standpoint of developing speech recognition algorithms and from the theoretical standpoint of understanding how speech is tuned to the environment in which it is spoken. Summers et al. [J. Acoust. Soc. Am. 84, 917-928 (1988]) address this issue by experimentally manipulating the level of noise delivered through headphones to two talkers and making several kinds of acoustic measurements on the resulting speech. They indicate that they have replicated effects on amplitude, duration, and pitch and have found effects on spectral tilt and first-formant frequency (F1). The authors regard these acoustic changes as effects in themselves rather than as consequences of a change in vocal effort, and thus treat equally the change in spectral tilt and the change in F1. In fact, the change in spectral tilt is a well-documented and understood consequence of the change in the glottal waveform, which is known to occur with increased effort. The situation with F1 is less clear and is made difficult by measurement problems. The bias in linear predictive coding (LPC) techniques related to two of the other changes-fundamental frequency and spectral tilt-is discussed.

  4. The sound of motion in spoken language: visual information conveyed by acoustic properties of speech.

    PubMed

    Shintel, Hadas; Nusbaum, Howard C

    2007-12-01

    Language is generally viewed as conveying information through symbols whose form is arbitrarily related to their meaning. This arbitrary relation is often assumed to also characterize the mental representations underlying language comprehension. We explore the idea that visuo-spatial information can be analogically conveyed through acoustic properties of speech and that such information is integrated into an analog perceptual representation as a natural part of comprehension. Listeners heard sentences describing objects, spoken at varying speaking rates. After each sentence, participants saw a picture of an object and judged whether it had been mentioned in the sentence. Participants were faster to recognize the object when motion implied by speaking rate matched the motion implied by the picture. Results suggest that visuo-spatial referential information can be analogically conveyed and represented.

  5. Recognition of Emotions in Mexican Spanish Speech: An Approach Based on Acoustic Modelling of Emotion-Specific Vowels

    PubMed Central

    Caballero-Morales, Santiago-Omar

    2013-01-01

    An approach for the recognition of emotions in speech is presented. The target language is Mexican Spanish, and for this purpose a speech database was created. The approach consists in the phoneme acoustic modelling of emotion-specific vowels. For this, a standard phoneme-based Automatic Speech Recognition (ASR) system was built with Hidden Markov Models (HMMs), where different phoneme HMMs were built for the consonants and emotion-specific vowels associated with four emotional states (anger, happiness, neutral, sadness). Then, estimation of the emotional state from a spoken sentence is performed by counting the number of emotion-specific vowels found in the ASR's output for the sentence. With this approach, accuracy of 87–100% was achieved for the recognition of emotional state of Mexican Spanish speech. PMID:23935410

  6. Recognition of emotions in Mexican Spanish speech: an approach based on acoustic modelling of emotion-specific vowels.

    PubMed

    Caballero-Morales, Santiago-Omar

    2013-01-01

    An approach for the recognition of emotions in speech is presented. The target language is Mexican Spanish, and for this purpose a speech database was created. The approach consists in the phoneme acoustic modelling of emotion-specific vowels. For this, a standard phoneme-based Automatic Speech Recognition (ASR) system was built with Hidden Markov Models (HMMs), where different phoneme HMMs were built for the consonants and emotion-specific vowels associated with four emotional states (anger, happiness, neutral, sadness). Then, estimation of the emotional state from a spoken sentence is performed by counting the number of emotion-specific vowels found in the ASR's output for the sentence. With this approach, accuracy of 87-100% was achieved for the recognition of emotional state of Mexican Spanish speech.

  7. Statistical evidence that musical universals derive from the acoustic characteristics of human speech

    NASA Astrophysics Data System (ADS)

    Schwartz, David; Howe, Catharine; Purves, Dale

    2003-04-01

    Listeners of all ages and societies produce a similar consonance ordering of chromatic scale tone combinations. Despite intense interest in this perceptual phenomenon over several millennia, it has no generally accepted explanation in physical, psychological, or physiological terms. Here we show that the musical universal of consonance ordering can be understood in terms of the statistical relationship between a pattern of sound pressure at the ear and the possible generative sources of the acoustic energy pattern. Since human speech is the principal naturally occurring source of tone-evoking (i.e., periodic) sound energy for human listeners, we obtained normalized spectra from more than 100000 recorded speech segments. The probability distribution of amplitude/frequency combinations derived from these spectra predicts both the fundamental frequency ratios that define the chromatic scale intervals and the consonance ordering of chromatic scale tone combinations. We suggest that these observations reveal the statistical character of the perceptual process by which the auditory system guides biologically successful behavior in response to inherently ambiguous sound stimuli.

  8. Correlation of orofacial speeds with voice acoustic measures in the fluent speech of persons who stutter.

    PubMed

    McClean, Michael D; Tasko, Stephen M

    2004-12-01

    Stuttering is often viewed as a problem in coordinating the movements of different muscle systems involved in speech production. From this perspective, it is logical that efforts be made to quantify and compare the strength of neural coupling between muscle systems in persons who stutter (PS) and those who do not stutter (NS). This problem was addressed by correlating the speeds of different orofacial structures with vowel fundamental frequency (F0) and intensity as subjects produced fluent repetitions of a simple nonsense phrase at habitual, high, and low intensity levels. It is assumed that resulting correlations indirectly reflect the strength of neural coupling between particular orofacial structures and the respiratory-laryngeal system. An electromagnetic system was employed to record movements of the upper lip, lower lip, tongue, and jaw in 43 NS and 39 PS. The acoustic speech signal was recorded and used to obtain measures of vowel F0 and intensity. For each subject, correlation measures were obtained relating peak orofacial speeds to F0 and intensity. Correlations were significantly reduced in PS compared to NS for the lower lip and tongue, although the magnitude of these group differences covaried with the correlation levels relating F0 and intensity. It is suggested that the group difference in correlation pattern reflects a reduced strength of neural coupling of the lower lip and tongue systems to the respiratory-laryngeal system in PS. Consideration is given to how this may contribute to temporal discoordination and stuttering.

  9. Objective assessment of tracheoesophageal and esophageal speech using acoustic analysis of voice.

    PubMed

    Sirić, Ljiljana; Sos, Dario; Rosso, Marinela; Stevanović, Sinisa

    2012-11-01

    The aim of this study was to analyze the voice quality of alaryngeal tracheoesophageal and esophageal speech, and to determine which of them is more similar to laryngeal voice production, and thus more acceptable as a rehabilitation method of laryngectomized persons. Objective voice evaluation was performed on a sample of 20 totally laryngectomized subjects of both sexes, average age 61.3 years. Subjects were divided into two groups: 10 (50%) respondents with built tracheoesophageal prosthesis and 10 (50%) who acquired esophageal speech. Testing included 6 variables: 5 parameters of acoustic analysis of voice and one parameter of aerodynamic measurements. The obtained data was statistically analyzed by analysis of variance. Analysis of the data showed a statistically significant difference between the two groups in the terms of intensity, fundamental frequency and maximum phonation time of vowel at a significance level of 5% and confidence interval of 95%. A statistically significant difference was not found between the values of jitter, shimmer, and harmonic-to-noise ratio between tracheoesophageal and esophageal voice. There is no ideal method of rehabilitation and every one of them requires an individual approach to the patient, but the results shows the advantages of rehabilitation by means of installing voice prosthesis.

  10. Elicitation of the Acoustic Change Complex to Long-Duration Speech Stimuli in Four-Month-Old Infants

    PubMed Central

    Chen, Ke Heng; Small, Susan A.

    2015-01-01

    The acoustic change complex (ACC) is an auditory-evoked potential elicited to changes within an ongoing stimulus that indicates discrimination at the level of the auditory cortex. Only a few studies to date have attempted to record ACCs in young infants. The purpose of the present study was to investigate the elicitation of ACCs to long-duration speech stimuli in English-learning 4-month-old infants. ACCs were elicited to consonant contrasts made up of two concatenated speech tokens. The stimuli included native dental-dental /dada/ and dental-labial /daba/ contrasts and a nonnative Hindi dental-retroflex /daDa/ contrast. Each consonant-vowel speech token was 410 ms in duration. Slow cortical responses were recorded to the onset of the stimulus and to the acoustic change from /da/ to either /ba/ or /Da/ within the stimulus with significantly prolonged latencies compared with adults. ACCs were reliably elicited for all stimulus conditions with more robust morphology compared with our previous findings using stimuli that were shorter in duration. The P1 amplitudes elicited to the acoustic change in /daba/ and /daDa/ were significantly larger compared to /dada/ supporting that the brain discriminated between the speech tokens. These findings provide further evidence for the use of ACCs as an index of discrimination ability. PMID:26798343

  11. Elicitation of the Acoustic Change Complex to Long-Duration Speech Stimuli in Four-Month-Old Infants.

    PubMed

    Chen, Ke Heng; Small, Susan A

    2015-01-01

    The acoustic change complex (ACC) is an auditory-evoked potential elicited to changes within an ongoing stimulus that indicates discrimination at the level of the auditory cortex. Only a few studies to date have attempted to record ACCs in young infants. The purpose of the present study was to investigate the elicitation of ACCs to long-duration speech stimuli in English-learning 4-month-old infants. ACCs were elicited to consonant contrasts made up of two concatenated speech tokens. The stimuli included native dental-dental /dada/ and dental-labial /daba/ contrasts and a nonnative Hindi dental-retroflex /daDa/ contrast. Each consonant-vowel speech token was 410 ms in duration. Slow cortical responses were recorded to the onset of the stimulus and to the acoustic change from /da/ to either /ba/ or /Da/ within the stimulus with significantly prolonged latencies compared with adults. ACCs were reliably elicited for all stimulus conditions with more robust morphology compared with our previous findings using stimuli that were shorter in duration. The P1 amplitudes elicited to the acoustic change in /daba/ and /daDa/ were significantly larger compared to /dada/ supporting that the brain discriminated between the speech tokens. These findings provide further evidence for the use of ACCs as an index of discrimination ability.

  12. Acoustics in human communication: evolving ideas about the nature of speech.

    PubMed

    Cooper, F S

    1980-07-01

    This paper discusses changes in attitude toward the nature of speech during the past half century. After reviewing early views on the subject, it considers the role of speech spectrograms, speech articulation, speech perception, messages and computers, and the nature of fluent speech.

  13. Speaker height estimation from speech: Fusing spectral regression and statistical acoustic models.

    PubMed

    Hansen, John H L; Williams, Keri; Bořil, Hynek

    2015-08-01

    Estimating speaker height can assist in voice forensic analysis and provide additional side knowledge to benefit automatic speaker identification or acoustic model selection for automatic speech recognition. In this study, a statistical approach to height estimation that incorporates acoustic models within a non-uniform height bin width Gaussian mixture model structure as well as a formant analysis approach that employs linear regression on selected phones are presented. The accuracy and trade-offs of these systems are explored by examining the consistency of the results, location, and causes of error as well a combined fusion of the two systems using data from the TIMIT corpus. Open set testing is also presented using the Multi-session Audio Research Project corpus and publicly available YouTube audio to examine the effect of channel mismatch between training and testing data and provide a realistic open domain testing scenario. The proposed algorithms achieve a highly competitive performance to previously published literature. Although the different data partitioning in the literature and this study may prevent performance comparisons in absolute terms, the mean average error of 4.89 cm for males and 4.55 cm for females provided by the proposed algorithm on TIMIT utterances containing selected phones suggest a considerable estimation error decrease compared to past efforts.

  14. Speech timing and linguistic rhythm: on the acoustic bases of rhythm typologies.

    PubMed

    Rathcke, Tamara V; Smith, Rachel H

    2015-05-01

    Research into linguistic rhythm has been dominated by the idea that languages can be classified according to rhythmic templates, amenable to assessment by acoustic measures of vowel and consonant durations. This study tested predictions of two proposals explaining the bases of rhythmic typologies: the Rhythm Class Hypothesis which assumes that the templates arise from an extensive vs a limited use of durational contrasts, and the Control and Compensation Hypothesis which proposes that the templates are rooted in more vs less flexible speech production strategies. Temporal properties of segments, syllables and rhythmic feet were examined in two accents of British English, a "stress-timed" variety from Leeds, and a "syllable-timed" variety spoken by Panjabi-English bilinguals from Bradford. Rhythm metrics were calculated. A perception study confirmed that the speakers of the two varieties differed in their perceived rhythm. The results revealed that both typologies were informative in that to a certain degree, they predicted temporal patterns of the two varieties. None of the metrics tested was capable of adequately reflecting the temporal complexity found in the durational data. These findings contribute to the critical evaluation of the explanatory adequacy of rhythm metrics. Acoustic bases and limitations of the traditional rhythmic typologies are discussed.

  15. Can acoustic vowel space predict the habitual speech rate of the speaker?

    PubMed

    Tsao, Y-C; Iqbal, K

    2005-01-01

    This study aims to find whether the acoustic vowel space reflect the habitual speaking rate of the speaker. The vowel space is defined as the area of the quadrilateral formed by the four corner vowels (i.e.,/i/,/æ/,/u/,/α) in the F1F2- 2 plane. The study compares the acoustic vowel space in the speech of habitually slow and fast talkers and further analyzes them by gender. In addition to the measurement of vowel duration and midpoint frequencies of F1 and F2, the F1/F2 vowel space areas were measured and compared across speakers. The results indicate substantial overlap in vowel space area functions between slow and fast talkers, though the slow speakers were found to have larger vowel spaces. Furthermore, large variability in vowel space area functions was noted among interspeakers in each group. Both F1 and F2 formant frequencies were found to be gender sensitive in consistence with the existing data. No predictive relation between vowel duration and formant frequencies was observed among speakers.

  16. Speech Prosody Abnormalities and Specific Dimensional Schizotypy Features: Are Relationships Limited to Males?

    PubMed Central

    Bedwell, Jeffrey S.; Cohen, Alex S.; Trachik, Benjamin J.; Deptula, Andrew E.; Mitchell, Jonathan C.

    2014-01-01

    In schizophrenia, diminished vocal expressivity is associated with lower quality of life. Studies using computerized acoustic analysis of speech have found no evidence of diminished vocal prosody related to categorically-defined schizotypy, a subclinical analog of schizophrenia. However, existing studies have not examined the interaction between schizotypy and sex with vocal prosody measures. The current study examined 44 young adults (50% male) who were recruited to represent a continuous range of schizotypy. Speech samples were digitally recorded during autobiographical narratives and analyzed for prosody. In male participants, variability of fundamental frequency and variability of intensity were each negatively related to Schizotypal Personality Questionnaire (SPQ) Ideas of Reference subscale, while SPQ Suspiciousness was related to a greater number of utterances, and SPQ Odd Behavior was related to a greater number of pauses. As the relationships were restricted to males, and not significant in females, results may explain earlier negative findings with schizotypy. PMID:25198702

  17. Intelligibility of normal speech I: Global and fine-grained acoustic-phonetic talker characteristics1,2

    PubMed Central

    Bradlow, Ann R.; Torretta, Gina M.; Pisoni, David B.

    2011-01-01

    This study used a multi-talker database containing intelligibility scores for 2000 sentences (20 talkers, 100 sentences), to identify talker-related correlates of speech intelligibility. We first investigated “global” talker characteristics (e.g., gender, F0 and speaking rate). Findings showed female talkers to be more intelligible as a group than male talkers. Additionally, we found a tendency for F0 range to correlate positively with higher speech intelligibility scores. However, F0 mean and speaking rate did not correlate with intelligibility. We then examined several fine-grained acoustic-phonetic talker-characteristics as correlates of overall intelligibility. We found that talkers with larger vowel spaces were generally more intelligible than talkers with reduced spaces. In investigating two cases of consistent listener errors (segment deletion and syllable affiliation), we found that these perceptual errors could be traced directly to detailed timing characteristics in the speech signal. Results suggest that a substantial portion of variability in normal speech intelligibility is traceable to specific acoustic-phonetic characteristics of the talker. Knowledge about these factors may be valuable for improving speech synthesis and recognition strategies, and for special populations (e.g., the hearing-impaired and second-language learners) who are particularly sensitive to intelligibility differences among talkers. PMID:21461127

  18. Linguistic Features of Speech in Tense Emotional State.

    ERIC Educational Resources Information Center

    Nosenko, E. E.

    The report contains a comparison of oral expression by the same experimental subjects under normal conditions and in a state of emotional stress. The study permits isolation of linguistic features of the formation of oral expression under emotional tension. This report is a translation of an article originally written in Russian. (NTIS/KM)

  19. The Use of Distinctive Features for Automatic Speech Recognition

    DTIC Science & Technology

    1991-09-01

    pronouncing w . and the feature LATERALI from the right context. which associates with the raising of the tongue towards the palatal midline during the...ROUND] is realized by protruding the lips and drawing them relatively close, resulting in the lowering of the first three formants (especially F2 in most...peaks in some cases. thus making the network unduely sensi- tive to amplitude variations at formant locations. Furthermore, the synchrony response

  20. Intensity Accents in French 2 Year Olds' Speech.

    ERIC Educational Resources Information Center

    Allen, George D.

    The acoustic features and functions of accentuation in French are discussed, and features of accentuation in the speech of French 2-year-olds are explored. The four major acoustic features used to signal accentual distinctions are fundamental frequency of voicing, duration of segments and syllables, intensity of segments and syllables, and…

  1. Speech transmission index from running speech: A neural network approach

    NASA Astrophysics Data System (ADS)

    Li, F. F.; Cox, T. J.

    2003-04-01

    Speech transmission index (STI) is an important objective parameter concerning speech intelligibility for sound transmission channels. It is normally measured with specific test signals to ensure high accuracy and good repeatability. Measurement with running speech was previously proposed, but accuracy is compromised and hence applications limited. A new approach that uses artificial neural networks to accurately extract the STI from received running speech is developed in this paper. Neural networks are trained on a large set of transmitted speech examples with prior knowledge of the transmission channels' STIs. The networks perform complicated nonlinear function mappings and spectral feature memorization to enable accurate objective parameter extraction from transmitted speech. Validations via simulations demonstrate the feasibility of this new method on a one-net-one-speech extract basis. In this case, accuracy is comparable with normal measurement methods. This provides an alternative to standard measurement techniques, and it is intended that the neural network method can facilitate occupied room acoustic measurements.

  2. Voice Modulations in German Ironic Speech

    ERIC Educational Resources Information Center

    Scharrer, Lisa; Christmann, Ursula; Knoll, Monja

    2011-01-01

    Previous research has shown that in different languages ironic speech is acoustically modulated compared to literal speech, and these modulations are assumed to aid the listener in the comprehension process by acting as cues that mark utterances as ironic. The present study was conducted to identify paraverbal features of German "ironic…

  3. Acoustic Source Characteristics, Across-Formant Integration, and Speech Intelligibility Under Competitive Conditions

    PubMed Central

    2015-01-01

    An important aspect of speech perception is the ability to group or select formants using cues in the acoustic source characteristics—for example, fundamental frequency (F0) differences between formants promote their segregation. This study explored the role of more radical differences in source characteristics. Three-formant (F1+F2+F3) synthetic speech analogues were derived from natural sentences. In Experiment 1, F1+F3 were generated by passing a harmonic glottal source (F0 = 140 Hz) through second-order resonators (H1+H3); in Experiment 2, F1+F3 were tonal (sine-wave) analogues (T1+T3). F2 could take either form (H2 or T2). In some conditions, the target formants were presented alone, either monaurally or dichotically (left ear = F1+F3; right ear = F2). In others, they were accompanied by a competitor for F2 (F1+F2C+F3; F2), which listeners must reject to optimize recognition. Competitors (H2C or T2C) were created using the time-reversed frequency and amplitude contours of F2. Dichotic presentation of F2 and F2C ensured that the impact of the competitor arose primarily through informational masking. In the absence of F2C, the effect of a source mismatch between F1+F3 and F2 was relatively modest. When F2C was present, intelligibility was lowest when F2 was tonal and F2C was harmonic, irrespective of which type matched F1+F3. This finding suggests that source type and context, rather than similarity, govern the phonetic contribution of a formant. It is proposed that wideband harmonic analogues are more effective informational maskers than narrowband tonal analogues, and so become dominant in across-frequency integration of phonetic information when placed in competition. PMID:25751040

  4. Differential Effects of Visual-Acoustic Biofeedback Intervention for Residual Speech Errors

    PubMed Central

    McAllister Byun, Tara; Campbell, Heather

    2016-01-01

    Recent evidence suggests that the incorporation of visual biofeedback technologies may enhance response to treatment in individuals with residual speech errors. However, there is a need for controlled research systematically comparing biofeedback versus non-biofeedback intervention approaches. This study implemented a single-subject experimental design with a crossover component to investigate the relative efficacy of visual-acoustic biofeedback and traditional articulatory treatment for residual rhotic errors. Eleven child/adolescent participants received ten sessions of visual-acoustic biofeedback and 10 sessions of traditional treatment, with the order of biofeedback and traditional phases counterbalanced across participants. Probe measures eliciting untreated rhotic words were administered in at least three sessions prior to the start of treatment (baseline), between the two treatment phases (midpoint), and after treatment ended (maintenance), as well as before and after each treatment session. Perceptual accuracy of rhotic production was assessed by outside listeners in a blinded, randomized fashion. Results were analyzed using a combination of visual inspection of treatment trajectories, individual effect sizes, and logistic mixed-effects regression. Effect sizes and visual inspection revealed that participants could be divided into categories of strong responders (n = 4), mixed/moderate responders (n = 3), and non-responders (n = 4). Individual results did not reveal a reliable pattern of stronger performance in biofeedback versus traditional blocks, or vice versa. Moreover, biofeedback versus traditional treatment was not a significant predictor of accuracy in the logistic mixed-effects model examining all within-treatment word probes. However, the interaction between treatment condition and treatment order was significant: biofeedback was more effective than traditional treatment in the first phase of treatment, and traditional treatment was more effective

  5. A Speech Endpoint Detection Algorithm Based on BP Neural Network and Multiple Features

    NASA Astrophysics Data System (ADS)

    Shi, Yong-Qiang; Li, Ru-Wei; Zhang, Shuang; Wang, Shuai; Yi, Xiao-Qun

    Focusing on a sharp decline in the performance of endpoint detection algorithm in a complicated noise environment, a new speech endpoint detection method based on BPNN (back propagation neural network) and multiple features is presented. Firstly, maximum of short-time autocorrelation function and spectrum variance of speech signals are extracted respectively. Secondly, these feature vectors as the input of BP neural network are trained and modeled and then the Genetic Algorithm is used to optimize the BP Neural Network. Finally, the signal's type is determined according to the output of Neural Network. The experiments show that the correct rate of this proposed algorithm is improved, because this method has better robustness and adaptability than algorithm based on maximum of short-time autocorrelation function or spectrum variance.

  6. Production and perception of clear speech

    NASA Astrophysics Data System (ADS)

    Bradlow, Ann R.

    2003-04-01

    When a talker believes that the listener is likely to have speech perception difficulties due to a hearing loss, background noise, or a different native language, she or he will typically adopt a clear speaking style. Previous research has established that, with a simple set of instructions to the talker, ``clear speech'' can be produced by most talkers under laboratory recording conditions. Furthermore, there is reliable evidence that adult listeners with either impaired or normal hearing typically find clear speech more intelligible than conversational speech. Since clear speech production involves listener-oriented articulatory adjustments, a careful examination of the acoustic-phonetic and perceptual consequences of the conversational-to-clear speech transformation can serve as an effective window into talker- and listener-related forces in speech communication. Furthermore, clear speech research has considerable potential for the development of speech enhancement techniques. After reviewing previous and current work on the acoustic properties of clear versus conversational speech, this talk will present recent data from a cross-linguistic study of vowel production in clear speech and a cross-population study of clear speech perception. Findings from these studies contribute to an evolving view of clear speech production and perception as reflecting both universal, auditory and language-specific, phonological contrast enhancement features.

  7. Clinical and acoustical variability in hypokinetic dysarthria

    SciTech Connect

    Metter, E.J.; Hanson, W.R.

    1986-10-01

    Ten male patients with parkinsonism secondary to Parkinson's disease or progressive supranuclear palsy had clinical neurological, speech, and acoustical speech evaluations. In addition, seven of the patients were evaluated by x-ray computed tomography (CT) and (F-18)-fluorodeoxyglucose (FDG) positron emission tomography (PET). Extensive variability of speech features, both clinical and acoustical, were found and seemed to be independent of the severity of any parkinsonian sign, CT, or FDG PET. In addition, little relationship existed between the variability across each measured speech feature. What appeared to be important for the appearance of abnormal acoustic measures was the degree of overall severity of the dysarthria. These observations suggest that a better understanding of hypokinetic dysarthria may result from more extensive examination of the variability between patients. Emphasizing a specific feature such as rapid speaking rate in characterizing hypokinetic dysarthria focuses on a single and inconstant finding in a complex speech pattern.

  8. Gunshot acoustic signature specific features and false alarms reduction

    NASA Astrophysics Data System (ADS)

    Donzier, Alain; Millet, Joel

    2005-05-01

    This paper provides a detailed analysis of the most specific parameters of gunshot signatures through models as well as through real data. The models for the different contributions to gunshot typical signature (shock and muzzle blast) are presented and used to discuss the variation of measured signatures over the different environmental conditions and shot configurations. The analysis is followed by a description of the performance requirements for gunshot detection systems, from sniper detection that was the main concern 10 years ago, to the new and more challenging conditions faced in today operations. The work presented examines the process of how systems are deployed and used as well as how the operational environment has changed. The main sources of false alarms and new threats such as RPGs and mortars that acoustic gunshot detection systems have to face today are also defined and discussed. Finally, different strategies for reducing false alarms are proposed based on the acoustic signatures. Different strategies are presented through various examples of specific missions ranging from vehicle protection to area protection. These strategies not only include recommendation on how to handle acoustic information for the best efficiency of the acoustic detector but also recommends some add-on sensors to enhance system overall performance.

  9. Modeling words with subword units in an articulatorily constrained speech recognition algorithm

    SciTech Connect

    Hogden, J.

    1997-11-20

    The goal of speech recognition is to find the most probable word given the acoustic evidence, i.e. a string of VQ codes or acoustic features. Speech recognition algorithms typically take advantage of the fact that the probability of a word, given a sequence of VQ codes, can be calculated.

  10. Language-Specific Developmental Differences in Speech Production: A Cross-Language Acoustic Study

    ERIC Educational Resources Information Center

    Li, Fangfang

    2012-01-01

    Speech productions of 40 English- and 40 Japanese-speaking children (aged 2-5) were examined and compared with the speech produced by 20 adult speakers (10 speakers per language). Participants were recorded while repeating words that began with "s" and "sh" sounds. Clear language-specific patterns in adults' speech were found,…

  11. Robust Feature Extraction Using Variable Window Function in Autocorrelation Domain for Speech Recognition

    NASA Astrophysics Data System (ADS)

    Lee, Sangho; Ha, Jeonghyun; Hong, Jaekeun

    This paper presents a new feature extraction method for robust speech recognition based on the autocorrelation mel frequency cepstral coefficients (AMFCCs) and a variable window. While the AMFCC feature extraction method uses the fixed double-dynamic-range (DDR) Hamming window for higher-lag autocorrelation coefficients, which are least affected by noise, the proposed method applies a variable window, depending on the frame energy and periodicity. The performance of the proposed method is verified using an Aurora-2 task, and the results confirm a significantly improved performance under noisy conditions.

  12. A neural mechanism for recognizing speech spoken by different speakers.

    PubMed

    Kreitewolf, Jens; Gaudrain, Etienne; von Kriegstein, Katharina

    2014-05-01

    Understanding speech from different speakers is a sophisticated process, particularly because the same acoustic parameters convey important information about both the speech message and the person speaking. How the human brain accomplishes speech recognition under such conditions is unknown. One view is that speaker information is discarded at early processing stages and not used for understanding the speech message. An alternative view is that speaker information is exploited to improve speech recognition. Consistent with the latter view, previous research identified functional interactions between the left- and the right-hemispheric superior temporal sulcus/gyrus, which process speech- and speaker-specific vocal tract parameters, respectively. Vocal tract parameters are one of the two major acoustic features that determine both speaker identity and speech message (phonemes). Here, using functional magnetic resonance imaging (fMRI), we show that a similar interaction exists for glottal fold parameters between the left and right Heschl's gyri. Glottal fold parameters are the other main acoustic feature that determines speaker identity and speech message (linguistic prosody). The findings suggest that interactions between left- and right-hemispheric areas are specific to the processing of different acoustic features of speech and speaker, and that they represent a general neural mechanism when understanding speech from different speakers.

  13. Feature extraction from time domain acoustic signatures of weapons systems fire

    NASA Astrophysics Data System (ADS)

    Yang, Christine; Goldman, Geoffrey H.

    2014-06-01

    The U.S. Army is interested in developing algorithms to classify weapons systems fire based on their acoustic signatures. To support this effort, an algorithm was developed to extract features from acoustic signatures of weapons systems fire and applied to over 1300 signatures. The algorithm filtered the data using standard techniques then estimated the amplitude and time of the first five peaks and troughs and the location of the zero crossing in the waveform. The results were stored in Excel spreadsheets. The results are being used to develop and test acoustic classifier algorithms.

  14. The Perception of Telephone-Processed Speech by Combined Electric and Acoustic Stimulation

    PubMed Central

    Tahmina, Qudsia; Runge, Christina; Friedland, David R.

    2013-01-01

    This study assesses the effects of adding low- or high-frequency information to the band-limited telephone-processed speech on bimodal listeners’ telephone speech perception in quiet environments. In the proposed experiments, bimodal users were presented under quiet listening conditions with wideband speech (WB), bandpass-filtered telephone speech (300–3,400 Hz, BP), high-pass filtered speech (f > 300 Hz, HP, i.e., distorted frequency components above 3,400 Hz in telephone speech were restored), and low-pass filtered speech (f < 3,400 Hz, LP, i.e., distorted frequency components below 300 Hz in telephone speech were restored). Results indicated that in quiet environments, for all four types of stimuli, listening with both hearing aid (HA) and cochlear implant (CI) was significantly better than listening with CI alone. For both bimodal and CI-alone modes, there were no statistically significant differences between the LP and BP scores and between the WB and HP scores. However, the HP scores were significantly better than the BP scores. In quiet conditions, both CI alone and bimodal listening achieved the largest benefits when telephone speech was augmented with high rather than low-frequency information. These findings provide support for the design of algorithms that would extend higher frequency information, at least in quiet environments. PMID:24265213

  15. Speech endpoint detection with non-language speech sounds for generic speech processing applications

    NASA Astrophysics Data System (ADS)

    McClain, Matthew; Romanowski, Brian

    2009-05-01

    Non-language speech sounds (NLSS) are sounds produced by humans that do not carry linguistic information. Examples of these sounds are coughs, clicks, breaths, and filled pauses such as "uh" and "um" in English. NLSS are prominent in conversational speech, but can be a significant source of errors in speech processing applications. Traditionally, these sounds are ignored by speech endpoint detection algorithms, where speech regions are identified in the audio signal prior to processing. The ability to filter NLSS as a pre-processing step can significantly enhance the performance of many speech processing applications, such as speaker identification, language identification, and automatic speech recognition. In order to be used in all such applications, NLSS detection must be performed without the use of language models that provide knowledge of the phonology and lexical structure of speech. This is especially relevant to situations where the languages used in the audio are not known apriori. We present the results of preliminary experiments using data from American and British English speakers, in which segments of audio are classified as language speech sounds (LSS) or NLSS using a set of acoustic features designed for language-agnostic NLSS detection and a hidden-Markov model (HMM) to model speech generation. The results of these experiments indicate that the features and model used are capable of detection certain types of NLSS, such as breaths and clicks, while detection of other types of NLSS such as filled pauses will require future research.

  16. The effect of speaking context on spectral- and cepstral-based acoustic features of normal voice.

    PubMed

    Lowell, Soren Y; Hylkema, Jennifer A

    2016-01-01

    The effect of speaking context on four cepstral- and spectral-based acoustic measures was investigated in 20 participants with normal voice. Speakers produced three different continuous speaking tasks that varied in duration and phonemic content. Cepstral and spectral measures that can be validly derived from continuous speech were computed across the three speaking contexts. Cepstral peak prominence (CPP), low/high spectral ratio, and the standard deviation (SD) of the low/high spectral ratio did not significantly differ across speaking contexts, and correlations for the first two measures were strong among the three speaking tasks. The SD of the CPP showed significant task differences, and relationships between the speaking contexts were generally moderate. These findings suggest that in speakers with normal voice, the differing phonemic content across several frequently used speaking stimuli minimally impacted group means for three clinically relevant cepstral- and spectral-based acoustic measures.

  17. Designing acoustics for linguistically diverse classrooms: Effects of background noise, reverberation and talker foreign accent on speech comprehension by native and non-native English-speaking listeners

    NASA Astrophysics Data System (ADS)

    Peng, Zhao Ellen

    The current classroom acoustics standard (ANSI S12.60-2010) recommends core learning spaces not to exceed background noise level (BNL) of 35 dBA and reverberation time (RT) of 0.6 second, based on speech intelligibility performance mainly by the native English-speaking population. Existing literature has not correlated these recommended values well with student learning outcomes. With a growing population of non-native English speakers in American classrooms, the special needs for perceiving degraded speech among non-native listeners, either due to realistic room acoustics or talker foreign accent, have not been addressed in the current standard. This research seeks to investigate the effects of BNL and RT on the comprehension of English speech from native English and native Mandarin Chinese talkers as perceived by native and non-native English listeners, and to provide acoustic design guidelines to supplement the existing standard. This dissertation presents two studies on the effects of RT and BNL on more realistic classroom learning experiences. How do native and non-native English-speaking listeners perform on speech comprehension tasks under adverse acoustic conditions, if the English speech is produced by talkers of native English (Study 1) versus native Mandarin Chinese (Study 2)? Speech comprehension materials were played back in a listening chamber to individual listeners: native and non-native English-speaking in Study 1; native English, native Mandarin Chinese, and other non-native English-speaking in Study 2. Each listener was screened for baseline English proficiency level, and completed dual tasks simultaneously involving speech comprehension and adaptive dot-tracing under 15 acoustic conditions, comprised of three BNL conditions (RC-30, 40, and 50) and five RT scenarios (0.4 to 1.2 seconds). The results show that BNL and RT negatively affect both objective performance and subjective perception of speech comprehension, more severely for non

  18. Prosodic Contrasts in Ironic Speech

    ERIC Educational Resources Information Center

    Bryant, Gregory A.

    2010-01-01

    Prosodic features in spontaneous speech help disambiguate implied meaning not explicit in linguistic surface structure, but little research has examined how these signals manifest themselves in real conversations. Spontaneously produced verbal irony utterances generated between familiar speakers in conversational dyads were acoustically analyzed…

  19. Acoustic features to arousal and identity in disturbance calls of tree shrews (Tupaia belangeri).

    PubMed

    Schehka, Simone; Zimmermann, Elke

    2009-11-05

    Across mammalian species, comparable morphological and physiological constraints in the production of airborne vocalisations are suggested to lead to commonalities in the vocal conveyance of acoustic features to specific attributes of callers, such as arousal and individual identity. To explore this hypothesis we examined intra- and interindividual acoustic variation in chatter calls of tree shrews (Tupaia belangeri). The calls were induced experimentally by a disturbance paradigm and related to two defined arousal states of a subject. The arousal state of an animal was primarily operationalised by the habituation of the subject to a new environment and additionally determined by behavioural indicators of stress in tree shrews (tail-position and piloerection). We investigated whether the arousal state and indexical features of the caller, namely individual identity and sex, are conveyed acoustically. Frame-by-frame videographic and multiparametric sound analyses revealed that arousal and identity, but not sex of a caller reliably predicted spectral-temporal variation in sound structure. Furthermore, there was no effect of age or body weight on individual-specific acoustic features. Similar results in another call type of tree shrews and comparable findings in other mammalian lineages provide evidence that comparable physiological and morphological constraints in the production of airborne vocalisations across mammals lead to commonalities in acoustic features conveying arousal and identity, respectively.

  20. Spectro-temporal modulation subspace-spanning filter bank features for robust automatic speech recognition.

    PubMed

    Schädler, Marc René; Meyer, Bernd T; Kollmeier, Birger

    2012-05-01

    In an attempt to increase the robustness of automatic speech recognition (ASR) systems, a feature extraction scheme is proposed that takes spectro-temporal modulation frequencies (MF) into account. This physiologically inspired approach uses a two-dimensional filter bank based on Gabor filters, which limits the redundant information between feature components, and also results in physically interpretable features. Robustness against extrinsic variation (different types of additive noise) and intrinsic variability (arising from changes in speaking rate, effort, and style) is quantified in a series of recognition experiments. The results are compared to reference ASR systems using Mel-frequency cepstral coefficients (MFCCs), MFCCs with cepstral mean subtraction (CMS) and RASTA-PLP features, respectively. Gabor features are shown to be more robust against extrinsic variation than the baseline systems without CMS, with relative improvements of 28% and 16% for two training conditions (using only clean training samples or a mixture of noisy and clean utterances, respectively). When used in a state-of-the-art system, improvements of 14% are observed when spectro-temporal features are concatenated with MFCCs, indicating the complementarity of those feature types. An analysis of the importance of specific MF shows that temporal MF up to 25 Hz and spectral MF up to 0.25 cycles/channel are beneficial for ASR.

  1. A Computational Model of Word Segmentation from Continuous Speech Using Transitional Probabilities of Atomic Acoustic Events

    ERIC Educational Resources Information Center

    Rasanen, Okko

    2011-01-01

    Word segmentation from continuous speech is a difficult task that is faced by human infants when they start to learn their native language. Several studies indicate that infants might use several different cues to solve this problem, including intonation, linguistic stress, and transitional probabilities between subsequent speech sounds. In this…

  2. Acoustic Analysis of Clear Versus Conversational Speech in Individuals with Parkinson Disease

    ERIC Educational Resources Information Center

    Goberman, A.M.; Elmer, L.W.

    2005-01-01

    A number of studies have been devoted to the examination of clear versus conversational speech in non-impaired speakers. The purpose of these previous studies has been primarily to help increase speech intelligibility for the benefit of hearing-impaired listeners. The goal of the present study was to examine differences between conversational and…

  3. Effect of Digital Frequency Compression (DFC) on Speech Recognition in Candidates for Combined Electric and Acoustic Stimulation (EAS)

    PubMed Central

    Gifford, René H.; Dorman, Michael F.; Spahr, Anthony J.; McKarns, Sharon A.

    2008-01-01

    Purpose To compare the effects of conventional amplification (CA) and digital frequency compression (DFC) amplification on the speech recognition abilities of candidates for a partial-insertion cochlear implant, that is, candidates for combined electric and acoustic stimulation (EAS). Method The participants were 6 patients whose audiometric thresholds at 500 Hz and below were ≤60 dB HL and whose thresholds at 2000 Hz and above were ≥80 dB HL. Six tests of speech understanding were administered with CA and DFC. The Abbreviated Profile of Hearing Aid Benefit (APHAB) was also administered following use of CA and DFC. Results Group mean scores were not statistically different in the CA and DFC conditions. However, 2 patients received substantial benefit in DFC conditions. APHAB scores suggested increased ease of communication, but also increased aversive sound quality. Conclusion Results suggest that a relatively small proportion of individuals who meet EAS candidacy will receive substantial benefit from a DFC hearing aid and that a larger proportion will receive at least a small benefit when speech is presented against a background of noise. This benefit, however, comes at a cost—aversive sound quality. PMID:17905905

  4. A Model for Speech Processing in Second Language Listening Activities

    ERIC Educational Resources Information Center

    Zoghbor, Wafa Shahada

    2016-01-01

    Teachers' understanding of the process of speech perception could inform practice in listening classrooms. Catford (1950) developed a model for speech perception taking into account the influence of the acoustic features of the linguistic forms used by the speaker, whereby the listener "identifies" and "interprets" these…

  5. Cross-Channel Amplitude Sweeps Are Crucial to Speech Intelligibility

    ERIC Educational Resources Information Center

    Prendergast, Garreth; Green, Gary G. R.

    2012-01-01

    Classical views of speech perception argue that the static and dynamic characteristics of spectral energy peaks (formants) are the acoustic features that underpin phoneme recognition. Here we use representations where the amplitude modulations of sub-band filtered speech are described, precisely, in terms of co-sinusoidal pulses. These pulses are…

  6. Target re-acquisition using acoustic features with an autonomous underwater vehicle-borne sonar

    NASA Astrophysics Data System (ADS)

    Edwards, Joseph; Schmidt, Henrik

    2003-10-01

    Concurrent mapping and localization (CML) is a technique for unsupervised feature-based mapping of unknown environments, and is an essential tool for autonomous robots. For land robots, CML can be applied using video, laser, or acoustic sensors, while for autonomous underwater vehicles (AUVs) the only effective transducer in most situations is sonar. In the Generic Oceanographic Array Technology Sonar (GOATS) experiment series, CML was effectively demonstrated using a single AUV. A further hurdle in the full implementation of AUV minehunting is to re-acquire and identify targets of interest. Target re-acquisition allows other vehicles to be called into a target location to further investigate with adaptive sonar geometries or alternative sensor suites designed for classification. In this work, the features in the CML-generated map are extended from only spatial coordinates to include acoustic features such as spectral response. It is demonstrated that the inclusion of acoustic features aids in the global positioning within the map, although the fine positioning is still accomplished through standard CML. In addition, areas that are sparsely populated with targets, e.g., a sandy coastline, are shown to be more readily navigable using acoustic features.

  7. Pattern analysis of EEG responses to speech and voice: influence of feature grouping.

    PubMed

    Hausfeld, Lars; De Martino, Federico; Bonte, Milene; Formisano, Elia

    2012-02-15

    Pattern recognition algorithms are becoming increasingly used in functional neuroimaging. These algorithms exploit information contained in temporal, spatial, or spatio-temporal patterns of independent variables (features) to detect subtle but reliable differences between brain responses to external stimuli or internal brain states. When applied to the analysis of electroencephalography (EEG) or magnetoencephalography (MEG) data, a choice needs to be made on how the input features to the algorithm are obtained from the signal amplitudes measured at the various channels. In this article, we consider six types of pattern analyses deriving from the combination of three types of feature selection in the temporal domain (predefined windows, shifting window, whole trial) with two approaches to handle the channel dimension (channel wise, multi-channel). We combined these different types of analyses with a Gaussian Naïve Bayes classifier and analyzed a multi-subject EEG data set from a study aimed at understanding the task dependence of the cortical mechanisms for encoding speaker's identity and speech content (vowels) from short speech utterances (Bonte, Valente, & Formisano, 2009). Outcomes of the analyses showed that different grouping of available features helps highlighting complementary (i.e. temporal, topographic) aspects of information content in the data. A shifting window/multi-channel approach proved especially valuable in tracing both the early build up of neural information reflecting speaker or vowel identity and the late and task-dependent maintenance of relevant information reflecting the performance of a working memory task. Because it exploits the high temporal resolution of EEG (and MEG), such a shifting window approach with sequential multi-channel classifications seems the most appropriate choice for tracing the temporal profile of neural information processing.

  8. Nonlinear Statistical Modeling of Speech

    NASA Astrophysics Data System (ADS)

    Srinivasan, S.; Ma, T.; May, D.; Lazarou, G.; Picone, J.

    2009-12-01

    Contemporary approaches to speech and speaker recognition decompose the problem into four components: feature extraction, acoustic modeling, language modeling and search. Statistical signal processing is an integral part of each of these components, and Bayes Rule is used to merge these components into a single optimal choice. Acoustic models typically use hidden Markov models based on Gaussian mixture models for state output probabilities. This popular approach suffers from an inherent assumption of linearity in speech signal dynamics. Language models often employ a variety of maximum entropy techniques, but can employ many of the same statistical techniques used for acoustic models. In this paper, we focus on introducing nonlinear statistical models to the feature extraction and acoustic modeling problems as a first step towards speech and speaker recognition systems based on notions of chaos and strange attractors. Our goal in this work is to improve the generalization and robustness properties of a speech recognition system. Three nonlinear invariants are proposed for feature extraction: Lyapunov exponents, correlation fractal dimension, and correlation entropy. We demonstrate an 11% relative improvement on speech recorded under noise-free conditions, but show a comparable degradation occurs for mismatched training conditions on noisy speech. We conjecture that the degradation is due to difficulties in estimating invariants reliably from noisy data. To circumvent these problems, we introduce two dynamic models to the acoustic modeling problem: (1) a linear dynamic model (LDM) that uses a state space-like formulation to explicitly model the evolution of hidden states using an autoregressive process, and (2) a data-dependent mixture of autoregressive (MixAR) models. Results show that LDM and MixAR models can achieve comparable performance with HMM systems while using significantly fewer parameters. Currently we are developing Bayesian parameter estimation and

  9. Experimental investigation of the effects of the acoustical conditions in a simulated classroom on speech recognition and learning in children.

    PubMed

    Valente, Daniel L; Plevinsky, Hallie M; Franco, John M; Heinrichs-Graham, Elizabeth C; Lewis, Dawna E

    2012-01-01

    The potential effects of acoustical environment on speech understanding are especially important as children enter school where students' ability to hear and understand complex verbal information is critical to learning. However, this ability is compromised because of widely varied and unfavorable classroom acoustics. The extent to which unfavorable classroom acoustics affect children's performance on longer learning tasks is largely unknown as most research has focused on testing children using words, syllables, or sentences as stimuli. In the current study, a simulated classroom environment was used to measure comprehension performance of two classroom learning activities: a discussion and lecture. Comprehension performance was measured for groups of elementary-aged students in one of four environments with varied reverberation times and background noise levels. The reverberation time was either 0.6 or 1.5 s, and the signal-to-noise level was either +10 or +7 dB. Performance is compared to adult subjects as well as to sentence-recognition in the same condition. Significant differences were seen in comprehension scores as a function of age and condition; both increasing background noise and reverberation degraded performance in comprehension tasks compared to minimal differences in measures of sentence-recognition.

  10. Acoustic Variations in Adductor Spasmodic Dysphonia as a Function of Speech Task.

    ERIC Educational Resources Information Center

    Sapienza, Christine M.; Walton, Suzanne; Murry, Thomas

    1999-01-01

    Acoustic phonatory events were identified in 14 women diagnosed with adductor spasmodic dysphonia (ADSD), a focal laryngeal dystonia that disturbs phonatory function, and compared with those of 14 age-matched women with no vocal dysfunction. Findings indicated ADSD subjects produced more aberrant acoustic events than controls during tasks of…

  11. Acoustic classification of battlefield transient events using wavelet sub-band features

    NASA Astrophysics Data System (ADS)

    Azimi-Sadjadi, M. R.; Jiang, Y.; Srinivasan, S.

    2007-04-01

    Detection, localization and classification of battlefield acoustic transient events are of great importance especially for military operations in urban terrain (MOUT). Generally, there can be a wide variety of battlefield acoustic transient events such as different caliber gunshots, artillery fires, and mortar fires. The discrimination of different types of transient sources is plagued by highly non-stationary nature of these signals, which makes the extraction of representative features a challenging task. This is compounded by the variations in the environmental and operating conditions and existence of a wide range of possible interference. This paper presents new approaches for transient signal estimation and feature extraction from acoustic signatures collected by several distributed sensor nodes. A maximum likelihood (ML)-based method is developed to remove noise/interference and fading effects and restore the acoustic transient signals. The estimated transient signals are then represented using wavelets. The multi-resolution property of the wavelets allows for capturing fine details in the transient signals that can be utilized to successfully classify them. Wavelet sub-band higher order moments and energy-based features are used to characterize the transient signals. The discrimination ability of the subband features for transient signal classification has been demonstrated on several different caliber gunshots. Important findings and observations on these results are also presented.

  12. Fatigue Level Estimation of Bill Based on Acoustic Signal Feature by Supervised SOM

    NASA Astrophysics Data System (ADS)

    Teranishi, Masaru; Omatu, Sigeru; Kosaka, Toshihisa

    Fatigued bills have harmful influence on daily operation of Automated Teller Machine(ATM). To make the fatigued bills classification more efficient, development of an automatic fatigued bill classification method is desired. We propose a new method to estimate bending rigidity of bill from acoustic signal feature of banking machines. The estimated bending rigidities are used as continuous fatigue level for classification of fatigued bill. By using the supervised Self-Organizing Map(supervised SOM), we estimate the bending rigidity from only the acoustic energy pattern effectively. The experimental result with real bill samples shows the effectiveness of the proposed method.

  13. A human vocal utterance corpus for perceptual and acoustic analysis of speech, singing, and intermediate vocalizations

    NASA Astrophysics Data System (ADS)

    Gerhard, David

    2002-11-01

    In this paper we present the collection and annotation process of a corpus of human utterance vocalizations used for speech and song research. The corpus was collected to fill a void in current research tools, since no corpus currently exists which is useful for the classification of intermediate utterances between speech and monophonic singing. Much work has been done in the domain of speech versus music discrimination, and several corpora exist which can be used for this research. A specific example is the work done by Eric Scheirer and Malcom Slaney [IEEE ICASSP, 1997, pp. 1331-1334]. The collection of the corpus is described including questionnaire design and intended and actual response characteristics, as well as the collection and annotation of pre-existing samples. The annotation of the corpus consisted of a survey tool for a subset of the corpus samples, including ratings of the clips based on a speech-song continuum, and questions on the perceptual qualities of speech and song, both generally and corresponding to particular clips in the corpus.

  14. Neuromagnetic Evidence for a Featural Distinction of English Consonants: Sensor- and Source-Space Data

    ERIC Educational Resources Information Center

    Scharinger, Mathias; Merickel, Jennifer; Riley, Joshua; Idsardi, William J.

    2011-01-01

    Speech sounds can be classified on the basis of their underlying articulators or on the basis of the acoustic characteristics resulting from particular articulatory positions. Research in speech perception suggests that distinctive features are based on both articulatory and acoustic information. In recent years, neuroelectric and neuromagnetic…

  15. Perception of Suprasegmental Speech Features via Bimodal Stimulation: Cochlear Implant on One Ear and Hearing Aid on the Other

    ERIC Educational Resources Information Center

    Most, Tova; Harel, Tamar; Shpak, Talma; Luntz, Michal

    2011-01-01

    Purpose: The purpose of the study was to evaluate the contribution of acoustic hearing to the perception of suprasegmental features by adults who use a cochlear implant (CI) and a hearing aid (HA) in opposite ears. Method: 23 adults participated in this study. Perception of suprasegmental features--intonation, syllable stress, and word…

  16. Aero-acoustic features of internal and external chamfered Hartmann whistles: A comparative study

    NASA Astrophysics Data System (ADS)

    Narayanan, S.; Srinivasan, K.; Sundararajan, T.

    2014-02-01

    The efficient way of chamfering at the mouth of Hartmann whistles in generating higher acoustic emission levels are experimentally demonstrated in this paper. The relevant parameters of the present work comprise internal and external-chamfer angles (15°, 30°), cavity-length, nozzle-to-cavity-distance and jet pressure ratios. The frequency and amplitude characteristics of internal and external, chamfered-Hartmann whistles are compared in detail to ascertain the role of chamfering in enhancing acoustic radiations. The high frequencies possessed by the internal chamfered whistles as compared to the external ones indicate that it amplifies the resonance. It is observed that the internal chamfered whistles exhibit higher directivity than the external chamfered ones. Further, it is noticed that the acoustic-power and efficiency are also higher for the internal chamfered whistles. The shadowgraph sequences reveal the variance in flow-shock oscillations as well as the spill-over features at the mouth of internal and external, chamfered cavities. The presence of large mass flow as well as its subsequent increase of spill-over as a result of enlarged mouth in internal chamfered whistles, leads to the generation of high intensity acoustic radiation than the external chamfered ones. Thus, the internal chamfer proves to be the best passive control device for augmented sound pressure levels and acoustic efficiencies in resonance cavities.

  17. Alarming features: birds use specific acoustic properties to identify heterospecific alarm calls

    PubMed Central

    Fallow, Pamela M.; Pitcher, Benjamin J.; Magrath, Robert D.

    2013-01-01

    Vertebrates that eavesdrop on heterospecific alarm calls must distinguish alarms from sounds that can safely be ignored, but the mechanisms for identifying heterospecific alarm calls are poorly understood. While vertebrates learn to identify heterospecific alarms through experience, some can also respond to unfamiliar alarm calls that are acoustically similar to conspecific alarm calls. We used synthetic calls to test the role of specific acoustic properties in alarm call identification by superb fairy-wrens, Malurus cyaneus. Individuals fled more often in response to synthetic calls with peak frequencies closer to those of conspecific calls, even if other acoustic features were dissimilar to that of fairy-wren calls. Further, they then spent more time in cover following calls that had both peak frequencies and frequency modulation rates closer to natural fairy-wren means. Thus, fairy-wrens use similarity in specific acoustic properties to identify alarms and adjust a two-stage antipredator response. Our study reveals how birds respond to heterospecific alarm calls without experience, and, together with previous work using playback of natural calls, shows that both acoustic similarity and learning are important for interspecific eavesdropping. More generally, this study reconciles contrasting views on the importance of alarm signal structure and learning in recognition of heterospecific alarms. PMID:23303539

  18. Impact of Aberrant Acoustic Properties on the Perception of Sound Quality in Electrolarynx Speech

    ERIC Educational Resources Information Center

    Meltzner, Geoffrey S.; Hillman, Robert E.

    2005-01-01

    A large percentage of patients who have undergone laryngectomy to treat advanced laryngeal cancer rely on an electrolarynx (EL) to communicate verbally. Although serviceable, EL speech is plagued by shortcomings in both sound quality and intelligibility. This study sought to better quantify the relative contributions of previously identified…

  19. The Exploitation of Subphonemic Acoustic Detail in L2 Speech Segmentation

    ERIC Educational Resources Information Center

    Shoemaker, Ellenor

    2014-01-01

    The current study addresses an aspect of second language (L2) phonological acquisition that has received little attention to date--namely, the acquisition of allophonic variation as a word boundary cue. The role of subphonemic variation in the segmentation of speech by native speakers has been indisputably demonstrated; however, the acquisition of…

  20. An acoustic key to eight languages/dialects: Factor analyses of critical-band-filtered speech

    PubMed Central

    Ueda, Kazuo; Nakajima, Yoshitaka

    2017-01-01

    The peripheral auditory system functions like a frequency analyser, often modelled as a bank of non-overlapping band-pass filters called critical bands; 20 bands are necessary for simulating frequency resolution of the ear within an ordinary frequency range of speech (up to 7,000 Hz). A far smaller number of filters seemed sufficient, however, to re-synthesise intelligible speech sentences with power fluctuations of the speech signals passing through them; nevertheless, the number and frequency ranges of the frequency bands for efficient speech communication are yet unknown. We derived four common frequency bands—covering approximately 50–540, 540–1,700, 1,700–3,300, and above 3,300 Hz—from factor analyses of spectral fluctuations in eight different spoken languages/dialects. The analyses robustly led to three factors common to all languages investigated—the low & mid-high factor related to the two separate frequency ranges of 50–540 and 1,700–3,300 Hz, the mid-low factor the range of 540–1,700 Hz, and the high factor the range above 3,300 Hz—in these different languages/dialects, suggesting a language universal. PMID:28198405

  1. An acoustic key to eight languages/dialects: Factor analyses of critical-band-filtered speech.

    PubMed

    Ueda, Kazuo; Nakajima, Yoshitaka

    2017-02-15

    The peripheral auditory system functions like a frequency analyser, often modelled as a bank of non-overlapping band-pass filters called critical bands; 20 bands are necessary for simulating frequency resolution of the ear within an ordinary frequency range of speech (up to 7,000 Hz). A far smaller number of filters seemed sufficient, however, to re-synthesise intelligible speech sentences with power fluctuations of the speech signals passing through them; nevertheless, the number and frequency ranges of the frequency bands for efficient speech communication are yet unknown. We derived four common frequency bands-covering approximately 50-540, 540-1,700, 1,700-3,300, and above 3,300 Hz-from factor analyses of spectral fluctuations in eight different spoken languages/dialects. The analyses robustly led to three factors common to all languages investigated-the low &mid-high factor related to the two separate frequency ranges of 50-540 and 1,700-3,300 Hz, the mid-low factor the range of 540-1,700 Hz, and the high factor the range above 3,300 Hz-in these different languages/dialects, suggesting a language universal.

  2. Comparison of Acoustic and Kinematic Approaches to Measuring Utterance-Level Speech Variability

    ERIC Educational Resources Information Center

    Howell, Peter; Anderson, Andrew J.; Bartrip, Jon; Bailey, Eleanor

    2009-01-01

    Purpose: The spatiotemporal index (STI) is one measure of variability. As currently implemented, kinematic data are used, requiring equipment that cannot be used with some patient groups or in scanners. An experiment is reported that addressed whether STI can be extended to an audio measure of sound pressure of the speech envelope over time that…

  3. Influences of gender and anthropometric features on inspiratory inhaler acoustics and peak inspiratory flow rate.

    PubMed

    Taylor, Terence E; Holmes, Martin S; Sulaiman, Imran; Costello, Richard W; Reilly, Richard B

    2015-01-01

    Inhalers are hand-held devices used to treat chronic respiratory diseases such as asthma and chronic obstructive pulmonary disease (COPD). Medication is delivered from an inhaler to the user through an inhalation maneuver. It is unclear whether gender and anthropometric features such as age, height, weight and body mass index (BMI) influence the acoustic properties of inspiratory inhaler sounds and peak inspiratory flow rate (PIFR) in inhalers. In this study, healthy male (n=9) and female (n=7) participants were asked to inhale at an inspiratory flow rate (IFR) of 60 L/min in four commonly used inhalers (Turbuhaler(™), Diskus(™), Ellipta(™) and Evohaler(™)). Ambient inspiratory sounds were recorded from the mouthpiece of each inhaler and over the trachea of each participant. Each participant's PIFR was also recorded for each of the four inhalers. Results showed that gender and anthropometric features have the potential to influence the spectral properties of ambient and tracheal inspiratory inhaler sounds. It was also observed that males achieved statistically significantly higher PIFRs in each inhaler in comparison to females (p<;0.05). Acoustic features were found to be significantly different across inhalers suggesting that acoustic features are modulated by the inhaler design and its internal resistance to airflow.

  4. Do we need STRFs for cocktail parties? On the relevance of physiologically motivated features for human speech perception derived from automatic speech recognition.

    PubMed

    Kollmeier, B; Schädler, M R René; Meyer, A; Anemüller, J; Meyer, B T

    2013-01-01

    Complex auditory features such as spectro-temporal receptive fields (STRFs) derived from the cortical auditory neurons appear to be advantageous in sound processing. However, their physiological and functional relevance is still unclear. To assess the utility of such feature processing for speech reception in noise, automatic speech recognition (ASR) performance using feature sets obtained from physiological and/or psychoacoustical data and models is compared to human performance. Time-frequency representations with a nonlinear compression are compared with standard features such as mel-scaled spectrograms. Both alternatives serve as an input to model estimators that infer spectro-temporal filters (and subsequent nonlinearity) from physiological measurements in auditory brain areas of zebra finches. Alternatively, a filter bank of 2-dimensional Gabor functions is employed, which covers a wide range of modulation frequencies in the time and frequency domain. The results indicate a clear increase in ASR robustness using complex features (modeled by Gabor functions), while the benefit from physiologically derived STRFs is limited. In all cases, the use of power-normalized spectral representations increases performance, indicating that substantial dynamic compression is advantageous for level-independent pattern recognition. The methods employed may help physiologists to look for more relevant STRFs and to better understand specific differences in estimated STRFs.

  5. Acoustic correlates of inflectional morphology in the speech of children with specific language impairment and their typically developing peers.

    PubMed

    Owen, Amanda J; Goffman, Lisa

    2007-07-01

    The development of the use of the third-person singular -s in open syllable verbs in children with specific language impairment (SLI) and their typically developing peers was examined. Verbs that included overt productions of the third-person singular -s morpheme (e.g. Bobby plays ball everyday; Bear laughs when mommy buys popcorn) were contrasted with clearly bare stem contexts (e.g. Mommy, buy popcorn; I saw Bobby play ball) on both global and local measures of acoustic duration. A durational signature for verbs inflected with -s was identified separately from factors related to sentence length. These duration measures were also used to identify acoustic changes related to the omission of the -s morpheme. The omitted productions from the children with SLI were significantly longer than their correct third-person singular and bare stem productions. This result was unexpected given that the omitted productions have fewer phonemes than correctly inflected productions. Typically developing children did not show the same pattern, instead producing omitted productions that patterned most closely with bare stem forms. These results are discussed in relation to current theoretical approaches to SLI, with an emphasis on performance and speech-motor accounts.

  6. Abnormal cortical processing of the syllable rate of speech in poor readers

    PubMed Central

    Abrams, Daniel A.; Nicol, Trent; Zecker, Steven; Kraus, Nina

    2009-01-01

    Children with reading impairments have long been associated with impaired perception for rapidly presented acoustic stimuli and recently have shown deficits for slower features. It is not known whether impairments for low-frequency acoustic features negatively impact processing of speech in reading impaired individuals. Here we provide neurophysiological evidence that poor readers have impaired representation of the speech envelope, the acoustical cue that provides syllable pattern information in speech. We measured cortical-evoked potentials in response to sentence stimuli and found that good readers indicated consistent right-hemisphere dominance in auditory cortex for all measures of speech envelope representation, including the precision, timing and magnitude of cortical responses. Poor readers showed abnormal patterns of cerebral asymmetry for all measures of speech envelope representation. Moreover, cortical measures of speech envelope representation predicted up to 44% of the variability in standardized reading scores and 50% in measures of phonological processing across a wide range of abilities. Findings strongly support a relationship between acoustic-level processing and higher-level language abilities, and are the first to link reading ability with cortical processing of low-frequency acoustic features in the speech signal. Results also support the hypothesis that asymmetric routing between cerebral hemispheres represents an important mechanism for temporal encoding in the human auditory system, and the need for an expansion of the temporal processing hypothesis for reading-disabilities to encompass impairments for a wider range of speech features than previously acknowledged. PMID:19535580

  7. The effect of different cochlear implant microphones on acoustic hearing individuals’ binaural benefits for speech perception in noise

    PubMed Central

    Aronoff, Justin M.; Freed, Daniel J.; Fisher, Laurel M.; Pal, Ivan; Soli, Sigfrid D.

    2011-01-01

    directional microphone when the speech and masker were spatially separated, emphasizing the importance of measuring binaural benefits separately for each HRTF. Evaluation of binaural benefits indicated that binaural squelch and spatial release from masking were found for all HRTFs and binaural summation was found for all but one HRTF, although binaural summation was less robust than the other types of binaural benefits. Additionally, the results indicated that neither interaural time nor level cues dominated binaural benefits for the normal hearing participants. Conclusions This study provides a means to measure the degree to which cochlear implant microphones affect acoustic hearing with respect to speech perception in noise. It also provides measures that can be used to evaluate the independent contributions of interaural time and level cues. These measures provide tools that can aid researchers in understanding and improving binaural benefits in acoustic hearing individuals listening via cochlear implant microphones. PMID:21412155

  8. Acoustic Correlates of Emphatic Stress in Central Catalan

    ERIC Educational Resources Information Center

    Nadeu, Marianna; Hualde, Jose Ignacio

    2012-01-01

    A common feature of public speech in Catalan is the placement of prominence on lexically unstressed syllables ("emphatic stress"). This paper presents an acoustic study of radio speech data. Instances of emphatic stress were perceptually identified. Within-word comparison between vowels with emphatic stress and vowels with primary lexical stress…

  9. Agonistic sounds in the skunk clownfish Amphiprion akallopisos: size-related variation in acoustic features.

    PubMed

    Colleye, O; Frederich, B; Vandewalle, P; Casadevall, M; Parmentier, E

    2009-09-01

    Fourteen individuals of the skunk clownfish Amphiprion akallopisos of different sizes and of different sexual status (non-breeder, male or female) were analysed for four acoustic features. Dominant frequency and pulse duration were highly correlated with standard length (r = 0.97), and were not related to sex. Both the dominant frequency and pulse duration were signals conveying information related to the size of the emitter, which implies that these sound characteristics could be useful in assessing size of conspecifics.

  10. Music-induced emotions can be predicted from a combination of brain activity and acoustic features.

    PubMed

    Daly, Ian; Williams, Duncan; Hallowell, James; Hwang, Faustina; Kirke, Alexis; Malik, Asad; Weaver, James; Miranda, Eduardo; Nasuto, Slawomir J

    2015-12-01

    It is widely acknowledged that music can communicate and induce a wide range of emotions in the listener. However, music is a highly-complex audio signal composed of a wide range of complex time- and frequency-varying components. Additionally, music-induced emotions are known to differ greatly between listeners. Therefore, it is not immediately clear what emotions will be induced in a given individual by a piece of music. We attempt to predict the music-induced emotional response in a listener by measuring the activity in the listeners electroencephalogram (EEG). We combine these measures with acoustic descriptors of the music, an approach that allows us to consider music as a complex set of time-varying acoustic features, independently of any specific music theory. Regression models are found which allow us to predict the music-induced emotions of our participants with a correlation between the actual and predicted responses of up to r=0.234,p<0.001. This regression fit suggests that over 20% of the variance of the participant's music induced emotions can be predicted by their neural activity and the properties of the music. Given the large amount of noise, non-stationarity, and non-linearity in both EEG and music, this is an encouraging result. Additionally, the combination of measures of brain activity and acoustic features describing the music played to our participants allows us to predict music-induced emotions with significantly higher accuracies than either feature type alone (p<0.01).

  11. Depression Diagnoses and Fundamental Frequency-Based Acoustic Cues in Maternal Infant-Directed Speech

    ERIC Educational Resources Information Center

    Porritt, Laura L.; Zinser, Michael C.; Bachorowski, Jo-Anne; Kaplan, Peter S.

    2014-01-01

    F[subscript 0]-based acoustic measures were extracted from a brief, sentence-final target word spoken during structured play interactions between mothers and their 3- to 14-month-old infants and were analyzed based on demographic variables and DSM-IV Axis-I clinical diagnoses and their common modifiers. F[subscript 0] range (?F[subscript 0]) was…

  12. Changes in Speech Production in a Child with a Cochlear Implant: Acoustic and Kinematic Evidence.

    ERIC Educational Resources Information Center

    Goffman, Lisa; Ertmer, David J.; Erdle, Christa

    2002-01-01

    A method is presented for examining change in motor patterns used to produce linguistic contrasts. In this case study, the method is applied to a child who experienced hearing loss at age 3 and received a multi-channel cochlear implant at 7. Post-implant, acoustic durations showed a maturational change. (Contains references.) (Author/CR)

  13. Acoustic features of male baboon loud calls: Influences of context, age, and individuality

    NASA Astrophysics Data System (ADS)

    Fischer, Julia; Hammerschmidt, Kurt; Cheney, Dorothy L.; Seyfarth, Robert M.

    2002-03-01

    The acoustic structure of loud calls (``wahoos'') recorded from free-ranging male baboons (Papio cynocephalus ursinus) in the Moremi Game Reserve, Botswana, was examined for differences between and within contexts, using calls given in response to predators (alarm wahoos), during male contests (contest wahoos), and when a male had become separated from the group (contact wahoos). Calls were recorded from adolescent, subadult, and adult males. In addition, male alarm calls were compared with those recorded from females. Despite their superficial acoustic similarity, the analysis revealed a number of significant differences between alarm, contest, and contact wahoos. Contest wahoos are given at a much higher rate, exhibit lower frequency characteristics, have a longer ``hoo'' duration, and a relatively louder ``hoo'' portion than alarm wahoos. Contact wahoos are acoustically similar to contest wahoos, but are given at a much lower rate. Both alarm and contest wahoos also exhibit significant differences among individuals. Some of the acoustic features that vary in relation to age and sex presumably reflect differences in body size, whereas others are possibly related to male stamina and endurance. The finding that calls serving markedly different functions constitute variants of the same general call type suggests that the vocal production in nonhuman primates is evolutionarily constrained.

  14. No evidence of somatotopic place of articulation feature mapping in motor cortex during passive speech perception.

    PubMed

    Arsenault, Jessica S; Buchsbaum, Bradley R

    2016-08-01

    The motor theory of speech perception has experienced a recent revival due to a number of studies implicating the motor system during speech perception. In a key study, Pulvermüller et al. (2006) showed that premotor/motor cortex differentially responds to the passive auditory perception of lip and tongue speech sounds. However, no study has yet attempted to replicate this important finding from nearly a decade ago. The objective of the current study was to replicate the principal finding of Pulvermüller et al. (2006) and generalize it to a larger set of speech tokens while applying a more powerful statistical approach using multivariate pattern analysis (MVPA). Participants performed an articulatory localizer as well as a speech perception task where they passively listened to a set of eight syllables while undergoing fMRI. Both univariate and multivariate analyses failed to find evidence for somatotopic coding in motor or premotor cortex during speech perception. Positive evidence for the null hypothesis was further confirmed by Bayesian analyses. Results consistently show that while the lip and tongue areas of the motor cortex are sensitive to movements of the articulators, they do not appear to preferentially respond to labial and alveolar speech sounds during passive speech perception.

  15. How Well Can Children Recognize Speech Features in Spectrograms? Comparisons by Age and Hearing Status

    ERIC Educational Resources Information Center

    Ertmer, David J.

    2004-01-01

    Real-time spectrographic displays (SDs) have been used in speech training for more than 30 years with adults and children who have severe and profound hearing impairments. Despite positive outcomes from treatment studies, concerns remain that the complex and abstract nature of spectrograms may make these speech training aids unsuitable for use…

  16. Is the Linguistic Content of Speech Less Salient than Its Perceptual Features in Autism?

    ERIC Educational Resources Information Center

    Jarvinen-Pasley, Anna; Pasley, John; Heaton, Pamela

    2008-01-01

    Open-ended tasks are rarely used to investigate cognition in autism. No known studies have directly examined whether increased attention to the perceptual level of speech in autism might contribute to a reduced tendency to process language meaningfully. The present study investigated linguistic versus perceptual speech processing preferences.…

  17. [Ontogenetic features of psychophysiological mechanisms of perception of speech emotional component in musically gifted children].

    PubMed

    Dmitrieva, E S; Gel'man, V Ia; Zaĭtseva, K A; Orlov, A M

    2004-01-01

    Cerebral mechanisms of musical abilities were explored in musically gifted children. For this purpose, psychophysiological characteristics of perception of emotional speech information were experimentally studied in samples of gifted and ordinary children. Forty six schoolchildren and forty eight musicians of three age groups (7-10, 11-13 and 14-17 years old) participated in the study. In experimental session, a test sentence was presented to a subject through headphones with two emotional intonations (joy and anger) and without emotional expression. A subject had to recognize the type of emotion. His/her answers were recorded. The analysis of variance revealed age- and gender-related features of emotional recognition: boys musicians led the schoolchildren of the same age by 4-6 years in the development of mechanisms of emotional recognition, whereas girls musicians were 1-3 years ahead. Musical education in girls induced the shift of predominant activities for emotional perception in the left hemisphere; in boys, on the contrary, initial distinct dominance of the left hemisphere was not retained in the process of further education.

  18. Do acoustic features of lion, Panthera leo, roars reflect sex and male condition?

    PubMed

    Pfefferle, Dana; West, Peyton M; Grinnell, Jon; Packer, Craig; Fischer, Julia

    2007-06-01

    Long distance calls function to regulate intergroup spacing, attract mating partners, and/or repel competitors. Therefore, they may not only provide information about the sex (if both sexes are calling) but also about the condition of the caller. This paper provides a description of the acoustic features of roars recorded from 18 male and 6 female lions (Panthera leo) living in the Serengeti National park, Tanzania. After analyzing whether these roars differ between the sexes, tests whether male roars may function as indicators of their fighting ability or condition were conducted. Therefore, call characteristics were tested for relation to anatomical features as size, mane color, or mane length. Call characteristics included acoustic parameters that previously had been implied as indicators of size and fighting ability, e.g., call length, fundamental frequency, and peak frequency. The analysis revealed differences in relation to sex, which were entirely explained by variation in body size. No evidence that acoustic variables were related to male condition was found, indicating that sexual selection might only be a weak force modulating the lion's roar. Instead, lion roars may have mainly been selected to effectively advertise territorial boundaries.

  19. Predicting Speech Intelligibility with a Multiple Speech Subsystems Approach in Children with Cerebral Palsy

    ERIC Educational Resources Information Center

    Lee, Jimin; Hustad, Katherine C.; Weismer, Gary

    2014-01-01

    Purpose: Speech acoustic characteristics of children with cerebral palsy (CP) were examined with a multiple speech subsystems approach; speech intelligibility was evaluated using a prediction model in which acoustic measures were selected to represent three speech subsystems. Method: Nine acoustic variables reflecting different subsystems, and…

  20. Prenatal features of Pena-Shokeir sequence with atypical response to acoustic stimulation.

    PubMed

    Pittyanont, Sirida; Jatavan, Phudit; Suwansirikul, Songkiat; Tongsong, Theera

    2016-09-01

    A fetal sonographic screening examination performed at 23 weeks showed polyhydramnios, micrognathia, fixed postures of all long bones, but no movement and no breathing. The fetus showed fetal heart rate acceleration but no movement when acoustic stimulation was applied with artificial larynx. All these findings persisted on serial examinations. The neonate was stillborn at 37 weeks and a final diagnosis of Pena-Shokeir sequence was made. In addition to typical sonographic features of Pena-Shokeir sequence, fetal heart rate accelerations with no movement in response to acoustic stimulation suggests that peripheral myopathy may possibly play an important role in the pathogenesis of the disease. © 2016 Wiley Periodicals, Inc. J Clin Ultrasound 44:459-462, 2016.

  1. Effect of train type on annoyance and acoustic features of the rolling noise.

    PubMed

    Kasess, Christian H; Noll, Anton; Majdak, Piotr; Waubke, Holger

    2013-08-01

    This study investigated the annoyance associated with the rolling noise of different railway stock. Passbys of nine train types (passenger and freight trains) equipped with different braking systems were recorded. Acoustic features showed a clear distinction of the braking system with the A-weighted energy equivalent sound level (LAeq) showing a difference in the range of 10 dB between cast-iron braked trains and trains with disk or K-block brakes. Further, annoyance was evaluated in a psychoacoustic experiment where listeners rated the relative annoyance of the rolling noise for the different train types. Stimuli with and without the original LAeq differences were tested. For the original LAeq differences, the braking system significantly affected the annoyance with cast-iron brakes being most annoying, most likely as a consequence of the increased wheel roughness causing an increased LAeq. Contribution of the acoustic features to the annoyance was investigated revealing that the LAeq explained up to 94% of the variance. For the stimuli without differences in the LAeq, cast-iron braked train types were significantly less annoying and the spectral features explained up to 60% of the variance in the annoyance. The effect of these spectral features on the annoyance of the rolling noise is discussed.

  2. A Robust Approach For Acoustic Noise Suppression In Speech Using ANFIS

    NASA Astrophysics Data System (ADS)

    Martinek, Radek; Kelnar, Michal; Vanus, Jan; Bilik, Petr; Zidek, Jan

    2015-11-01

    The authors of this article deals with the implementation of a combination of techniques of the fuzzy system and artificial intelligence in the application area of non-linear noise and interference suppression. This structure used is called an Adaptive Neuro Fuzzy Inference System (ANFIS). This system finds practical use mainly in audio telephone (mobile) communication in a noisy environment (transport, production halls, sports matches, etc). Experimental methods based on the two-input adaptive noise cancellation concept was clearly outlined. Within the experiments carried out, the authors created, based on the ANFIS structure, a comprehensive system for adaptive suppression of unwanted background interference that occurs in audio communication and degrades the audio signal. The system designed has been tested on real voice signals. This article presents the investigation and comparison amongst three distinct approaches to noise cancellation in speech; they are LMS (least mean squares) and RLS (recursive least squares) adaptive filtering and ANFIS. A careful review of literatures indicated the importance of non-linear adaptive algorithms over linear ones in noise cancellation. It was concluded that the ANFIS approach had the overall best performance as it efficiently cancelled noise even in highly noise-degraded speech. Results were drawn from the successful experimentation, subjective-based tests were used to analyse their comparative performance while objective tests were used to validate them. Implementation of algorithms was experimentally carried out in Matlab to justify the claims and determine their relative performances.

  3. Speech acquisition in older nonverbal individuals with autism: a review of features, methods, and prognosis.

    PubMed

    Pickett, Erin; Pullara, Olivia; O'Grady, Jessica; Gordon, Barry

    2009-03-01

    Individuals with autism often fail to develop useful speech. If they have not done so by age 5, the prognosis for future development has been thought to be poor. However, some cases of later development of speech have been reported. To quantify and document the nature of later speech development and the factors that might be important for prognosis, we reviewed the extant literature. We searched both manually and electronically, examining all literature with at least an English-language abstract, through March 2008. The search identified a total of 167 individuals with autism who reportedly acquired speech at age 5 or older. Most of the cases of reported late speech development occurred in the younger age groups; no case older than 13 was reported. Behavioral modification was the most frequently reported training program used, although there was a wide range of interventions reported to be associated with late speech development. Given the underreporting of such cases in the literature, and the likelihood that more intensive and more focused training might be more successful, the prognosis for late development of speech in such individuals may now be better than was historically thought to be the case.

  4. Speech Modification by a Deaf Child through Dynamic Orometric Modeling and Feedback.

    ERIC Educational Resources Information Center

    Fletcher, Samuel G.; Hasegawa, Akira

    1983-01-01

    A three and one-half-year-old profoundly deaf girl, whose physiologic, acoustic, and phonetic data indicated poor speech production, rapidly learned goal articulation gestures (positional and timing features of speech) after visual articulatory modeling and feedbck on tongue position with a microprocessor based instrument and video display.…

  5. Is the linguistic content of speech less salient than its perceptual features in autism?

    PubMed

    Järvinen-Pasley, Anna; Pasley, John; Heaton, Pamela

    2008-02-01

    Open-ended tasks are rarely used to investigate cognition in autism. No known studies have directly examined whether increased attention to the perceptual level of speech in autism might contribute to a reduced tendency to process language meaningfully. The present study investigated linguistic versus perceptual speech processing preferences. Children with autism and controls were tested on a quasi-open-format paradigm, in which speech stimuli contained competing linguistic and perceptual information, and could be processed at either level. Relative to controls, children with autism exhibited superior perceptual processing of speech. However, whilst their tendency to preferentially process linguistic rather than perceptual information was weaker than that of controls, it was nevertheless their primary processing mode. Implications for language acquisition in autism are discussed.

  6. Intra- and Inter-database Study for Arabic, English, and German Databases: Do Conventional Speech Features Detect Voice Pathology?

    PubMed

    Ali, Zulfiqar; Alsulaiman, Mansour; Muhammad, Ghulam; Elamvazuthi, Irraivan; Al-Nasheri, Ahmed; Mesallam, Tamer A; Farahat, Mohamed; Malki, Khalid H

    2016-10-10

    A large population around the world has voice complications. Various approaches for subjective and objective evaluations have been suggested in the literature. The subjective approach strongly depends on the experience and area of expertise of a clinician, and human error cannot be neglected. On the other hand, the objective or automatic approach is noninvasive. Automatic developed systems can provide complementary information that may be helpful for a clinician in the early screening of a voice disorder. At the same time, automatic systems can be deployed in remote areas where a general practitioner can use them and may refer the patient to a specialist to avoid complications that may be life threatening. Many automatic systems for disorder detection have been developed by applying different types of conventional speech features such as the linear prediction coefficients, linear prediction cepstral coefficients, and Mel-frequency cepstral coefficients (MFCCs). This study aims to ascertain whether conventional speech features detect voice pathology reliably, and whether they can be correlated with voice quality. To investigate this, an automatic detection system based on MFCC was developed, and three different voice disorder databases were used in this study. The experimental results suggest that the accuracy of the MFCC-based system varies from database to database. The detection rate for the intra-database ranges from 72% to 95%, and that for the inter-database is from 47% to 82%. The results conclude that conventional speech features are not correlated with voice, and hence are not reliable in pathology detection.

  7. Nonlinear spectro-temporal features based on a cochlear model for automatic speech recognition in a noisy situation.

    PubMed

    Choi, Yong-Sun; Lee, Soo-Young

    2013-09-01

    A nonlinear speech feature extraction algorithm was developed by modeling human cochlear functions, and demonstrated as a noise-robust front-end for speech recognition systems. The algorithm was based on a model of the Organ of Corti in the human cochlea with such features as such as basilar membrane (BM), outer hair cells (OHCs), and inner hair cells (IHCs). Frequency-dependent nonlinear compression and amplification of OHCs were modeled by lateral inhibition to enhance spectral contrasts. In particular, the compression coefficients had frequency dependency based on the psychoacoustic evidence. Spectral subtraction and temporal adaptation were applied in the time-frame domain. With long-term and short-term adaptation characteristics, these factors remove stationary or slowly varying components and amplify the temporal changes such as onset or offset. The proposed features were evaluated with a noisy speech database and showed better performance than the baseline methods such as mel-frequency cepstral coefficients (MFCCs) and RASTA-PLP in unknown noisy conditions.

  8. Speech discrimination after early exposure to pulsed-noise or speech.

    PubMed

    Ranasinghe, Kamalini G; Carraway, Ryan S; Borland, Michael S; Moreno, Nicole A; Hanacik, Elizabeth A; Miller, Robert S; Kilgard, Michael P

    2012-07-01

    Early experience of structured inputs and complex sound features generate lasting changes in tonotopy and receptive field properties of primary auditory cortex (A1). In this study we tested whether these changes are severe enough to alter neural representations and behavioral discrimination of speech. We exposed two groups of rat pups during the critical period of auditory development to pulsed-noise or speech. Both groups of rats were trained to discriminate speech sounds when they were young adults, and anesthetized neural responses were recorded from A1. The representation of speech in A1 and behavioral discrimination of speech remained robust to altered spectral and temporal characteristics of A1 neurons after pulsed-noise exposure. Exposure to passive speech during early development provided no added advantage in speech sound processing. Speech training increased A1 neuronal firing rate for speech stimuli in naïve rats, but did not increase responses in rats that experienced early exposure to pulsed-noise or speech. Our results suggest that speech sound processing is resistant to changes in simple neural response properties caused by manipulating early acoustic environment.

  9. Real-Time Speech/Music Classification With a Hierarchical Oblique Decision Tree

    DTIC Science & Technology

    2008-04-01

    REAL-TIME SPEECH/ MUSIC CLASSIFICATION WITH A HIERARCHICAL OBLIQUE DECISION TREE Jun Wang, Qiong Wu, Haojiang Deng, Qin Yan Institute of Acoustics...time speech/ music classification with a hierarchical oblique decision tree. A set of discrimination features in frequency domain are selected...handle signals without discrimination and can not work properly in the existence of multimedia signals. This paper proposes a real-time speech/ music

  10. Recognition of speech spectrograms.

    PubMed

    Greene, B G; Pisoni, D B; Carrell, T D

    1984-07-01

    The performance of eight naive observers in learning to identify speech spectrograms was studied over a 2-month period. Single tokens from a 50-word phonetically balanced (PB) list were recorded by several talkers and displayed on a Spectraphonics Speech Spectrographic Display system. Identification testing occurred immediately after daily training sessions. After approximately 20 h of training, naive subjects correctly identified the 50 PB words from a single talker over 95% of the time. Generalization tests with the same words were then carried out with different tokens from the original talker, new tokens from another male talker, a female talker, and finally, a synthetic talker. The generalization results for these talkers showed recognition performance at 91%, 76%, 76%, and 48%, respectively. Finally, generalization tests with a novel set of PB words produced by the original talker were also carried out to examine in detail the perceptual strategies and visual features that subjects abstracted from the training set. Our results demonstrate that even without formal training in phonetics or acoustics naive observers can learn to identify visual displays of speech at very high levels of accuracy. Analysis of subjects' performance in a verbal protocol task demonstrated that they rely on salient visual correlates of many phonetic features in speech.

  11. Depression Diagnoses and Fundamental Frequency-Based Acoustic Cues in Maternal Infant-Directed Speech

    PubMed Central

    Porritt, Laura L.; Zinser, Michael C.; Bachorowski, Jo-Anne; Kaplan, Peter S.

    2013-01-01

    F0-based acoustic measures were extracted from a brief, sentence-final target word spoken during structured play interactions between mothers and their 3- to 14-month-old infants, and were analyzed based on demographic variables and DSM-IV Axis-I clinical diagnoses and their common modifiers. F0 range (ΔF0) was negatively correlated with infant age and number of children. ΔF0 was significantly smaller in clinically depressed mothers and mothers diagnosed with depression in partial remission, relative to non-depressed mothers, mothers diagnosed with depression in full remission, and those diagnosed with depressive disorder not otherwise specified. ΔF0 was significantly lower in mothers experiencing their first major depressive episode relative to mothers with recurrent depression. Deficits in ΔF0 were specific to diagnosed clinical depression, and were not well predicted by elevated self-report scores only, or by diagnosed anxiety disorders. Mothers with higher ΔF0 had infants with reportedly larger productive vocabularies, but depression was unrelated to vocabulary development. Implications for cognitive-linguistic development are discussed. PMID:24489521

  12. Temporal modulations in speech and music.

    PubMed

    Ding, Nai; Patel, Aniruddh D; Chen, Lin; Butler, Henry; Luo, Cheng; Poeppel, David

    2017-02-14

    Speech and music have structured rhythms. Here we discuss a major acoustic correlate of spoken and musical rhythms, the slow (0.25-32Hz) temporal modulations in sound intensity and compare the modulation properties of speech and music. We analyze these modulations using over 25h of speech and over 39h of recordings of Western music. We show that the speech modulation spectrum is highly consistent across 9 languages (including languages with typologically different rhythmic characteristics). A different, but similarly consistent modulation spectrum is observed for music, including classical music played by single instruments of different types, symphonic, jazz, and rock. The temporal modulations of speech and music show broad but well-separated peaks around 5 and 2Hz, respectively. These acoustically dominant time scales may be intrinsic features of speech and music, a possibility which should be investigated using more culturally diverse samples in each domain. Distinct modulation timescales for speech and music could facilitate their perceptual analysis and its neural processing.

  13. Feature Migration in Time: Reflection of Selective Attention on Speech Errors

    ERIC Educational Resources Information Center

    Nozari, Nazbanou; Dell, Gary S.

    2012-01-01

    This article describes an initial study of the effect of focused attention on phonological speech errors. In 3 experiments, participants recited 4-word tongue twisters and focused attention on 1 (or none) of the words. The attended word was singled out differently in each experiment; participants were under instructions to avoid errors on the…

  14. Automatic Speech Recognition Based on Electromyographic Biosignals

    NASA Astrophysics Data System (ADS)

    Jou, Szu-Chen Stan; Schultz, Tanja

    This paper presents our studies of automatic speech recognition based on electromyographic biosignals captured from the articulatory muscles in the face using surface electrodes. We develop a phone-based speech recognizer and describe how the performance of this recognizer improves by carefully designing and tailoring the extraction of relevant speech feature toward electromyographic signals. Our experimental design includes the collection of audibly spoken speech simultaneously recorded as acoustic data using a close-speaking microphone and as electromyographic signals using electrodes. Our experiments indicate that electromyographic signals precede the acoustic signal by about 0.05-0.06 seconds. Furthermore, we introduce articulatory feature classifiers, which had recently shown to improved classical speech recognition significantly. We describe that the classification accuracy of articulatory features clearly benefits from the tailored feature extraction. Finally, these classifiers are integrated into the overall decoding framework applying a stream architecture. Our final system achieves a word error rate of 29.9% on a 100-word recognition task.

  15. Neural Oscillations Carry Speech Rhythm through to Comprehension

    PubMed Central

    Peelle, Jonathan E.; Davis, Matthew H.

    2012-01-01

    A key feature of speech is the quasi-regular rhythmic information contained in its slow amplitude modulations. In this article we review the information conveyed by speech rhythm, and the role of ongoing brain oscillations in listeners’ processing of this content. Our starting point is the fact that speech is inherently temporal, and that rhythmic information conveyed by the amplitude envelope contains important markers for place and manner of articulation, segmental information, and speech rate. Behavioral studies demonstrate that amplitude envelope information is relied upon by listeners and plays a key role in speech intelligibility. Extending behavioral findings, data from neuroimaging – particularly electroencephalography (EEG) and magnetoencephalography (MEG) – point to phase locking by ongoing cortical oscillations to low-frequency information (~4–8 Hz) in the speech envelope. This phase modulation effectively encodes a prediction of when important events (such as stressed syllables) are likely to occur, and acts to increase sensitivity to these relevant acoustic cues. We suggest a framework through which such neural entrainment to speech rhythm can explain effects of speech rate on word and segment perception (i.e., that the perception of phonemes and words in connected speech is influenced by preceding speech rate). Neuroanatomically, acoustic amplitude modulations are processed largely bilaterally in auditory cortex, with intelligible speech resulting in differential recruitment of left-hemisphere regions. Notable among these is lateral anterior temporal cortex, which we propose functions in a domain-general fashion to support ongoing memory and integration of meaningful input. Together, the reviewed evidence suggests that low-frequency oscillations in the acoustic speech signal form the foundation of a rhythmic hierarchy supporting spoken language, mirrored by phase-locked oscillations in the human brain. PMID:22973251

  16. [A comparative study of pathological voice based on traditional acoustic characteristics and nonlinear features].

    PubMed

    Gan, Deying; Hu, Weiping; Zhao, Bingxin

    2014-10-01

    By analyzing the mechanism of pronunciation, traditional acoustic parameters, including fundamental frequency, Mel frequency cepstral coefficients (MFCC), linear prediction cepstrum coefficient (LPCC), frequency perturbation, amplitude perturbation, and nonlinear characteristic parameters, including entropy (sample entropy, fuzzy entropy, multi-scale entropy), box-counting dimension, intercept and Hurst, are extracted as feature vectors for identification of pathological voice. Seventy-eight normal voice samples and 73 pathological voice samples for /a/, and 78 normal samples and 80 pathological samples for /i/ are recognized based on support vector machine (SVM). The results showed that compared with traditional acoustic parameters, nonlinear characteristic parameters could be well used to distinguish between healthy and pathological voices, and the recognition rates for /a/ were all higher than those for /i/ except for multi-scale entropy. That is why the /a/ sound data is used widely in related research at home and abroad for obtaining better identification of pathological voices. Adopting multi-scale entropy for /i/ could obtain higher recognition rate than /a/ between healthy and pathological samples, which may provide some useful inspiration for evaluating vocal compensatory function.

  17. Neurophysiological evidence that musical training influences the recruitment of right hemispheric homologues for speech perception.

    PubMed

    Jantzen, McNeel G; Howe, Bradley M; Jantzen, Kelly J

    2014-01-01

    Musicians have a more accurate temporal and tonal representation of auditory stimuli than their non-musician counterparts (Musacchia et al., 2007; Parbery-Clark et al., 2009a; Zendel and Alain, 2009; Kraus and Chandrasekaran, 2010). Musicians who are adept at the production and perception of music are also more sensitive to key acoustic features of speech such as voice onset timing and pitch. Together, these data suggest that musical training may enhance the processing of acoustic information for speech sounds. In the current study, we sought to provide neural evidence that musicians process speech and music in a similar way. We hypothesized that for musicians, right hemisphere areas traditionally associated with music are also engaged for the processing of speech sounds. In contrast we predicted that in non-musicians processing of speech sounds would be localized to traditional left hemisphere language areas. Speech stimuli differing in voice onset time was presented using a dichotic listening paradigm. Subjects either indicated aural location for a specified speech sound or identified a specific speech sound from a directed aural location. Musical training effects and organization of acoustic features were reflected by activity in source generators of the P50. This included greater activation of right middle temporal gyrus and superior temporal gyrus in musicians. The findings demonstrate recruitment of right hemisphere in musicians for discriminating speech sounds and a putative broadening of their language network. Musicians appear to have an increased sensitivity to acoustic features and enhanced selective attention to temporal features of speech that is facilitated by musical training and supported, in part, by right hemisphere homologues of established speech processing regions of the brain.

  18. Abnormal pattern of cortical speech feature discrimination in 6-year-old children at risk for dyslexia.

    PubMed

    Lovio, Riikka; Näätänen, Risto; Kujala, Teija

    2010-06-04

    The present study aimed to determine whether speech sound encoding and discrimination are affected in 6-year-old children having an elevated risk for dyslexia. Their event-related potentials (ERPs) for syllables and syllable changes critical in speech perception and language development (vowel, vowel-duration, consonant, frequency (F0), and intensity changes) were compared with those of children without a dyslexia risk. ERPs were recorded with a new linguistic multi-feature paradigm which enables one to assess the discrimination of five features in 20min. Also, an oddball condition with vowel and vowel-duration deviants was included. The amplitudes of the P1 response elicited by the standard stimuli were smaller in the at-risk group. Furthermore, the amplitudes of the mismatch negativity (MMN) were smaller for the vowel, vowel-duration, consonant, and intensity deviants in children at risk for dyslexia. These results are consistent with earlier studies reporting auditory processing difficulties in children at risk for dyslexia and diagnosed dyslexia. However, the current study, enabling the recording of MMN for multiple sound features, suggests the presence of wide-spread auditory difficulties in children at risk for dyslexia.

  19. Speech masking and cancelling and voice obscuration

    DOEpatents

    Holzrichter, John F.

    2013-09-10

    A non-acoustic sensor is used to measure a user's speech and then broadcasts an obscuring acoustic signal diminishing the user's vocal acoustic output intensity and/or distorting the voice sounds making them unintelligible to persons nearby. The non-acoustic sensor is positioned proximate or contacting a user's neck or head skin tissue for sensing speech production information.

  20. Early recognition of speech

    PubMed Central

    Remez, Robert E; Thomas, Emily F

    2013-01-01

    Classic research on the perception of speech sought to identify minimal acoustic correlates of each consonant and vowel. In explaining perception, this view designated momentary components of an acoustic spectrum as cues to the recognition of elementary phonemes. This conceptualization of speech perception is untenable given the findings of phonetic sensitivity to modulation independent of the acoustic and auditory form of the carrier. The empirical key is provided by studies of the perceptual organization of speech, a low-level integrative function that finds and follows the sensory effects of speech amid concurrent events. These projects have shown that the perceptual organization of speech is keyed to modulation; fast; unlearned; nonsymbolic; indifferent to short-term auditory properties; and organization requires attention. The ineluctably multisensory nature of speech perception also imposes conditions that distinguish language among cognitive systems. WIREs Cogn Sci 2013, 4:213–223. doi: 10.1002/wcs.1213 PMID:23926454

  1. Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments

    DTIC Science & Technology

    2010-01-01

    The data and tools have been made available to the research community with the goal of enabling richer text analysis of Twitter and related so...research community with the goal of enabling richer text analysis of Twitter and related so- cial media data sets. Part-of-Speech Tagging for...nearing 90% accuracy. The data and tools have been made available to the research community with the goal of enabling richer text analysis of Twitter

  2. Tracing the emergence of categorical speech perception in the human auditory system.

    PubMed

    Bidelman, Gavin M; Moreno, Sylvain; Alain, Claude

    2013-10-01

    Speech perception requires the effortless mapping from smooth, seemingly continuous changes in sound features into discrete perceptual units, a conversion exemplified in the phenomenon of categorical perception. Explaining how/when the human brain performs this acoustic-phonetic transformation remains an elusive problem in current models and theories of speech perception. In previous attempts to decipher the neural basis of speech perception, it is often unclear whether the alleged brain correlates reflect an underlying percept or merely changes in neural activity that covary with parameters of the stimulus. Here, we recorded neuroelectric activity generated at both cortical and subcortical levels of the auditory pathway elicited by a speech vowel continuum whose percept varied categorically from /u/ to /a/. This integrative approach allows us to characterize how various auditory structures code, transform, and ultimately render the perception of speech material as well as dissociate brain responses reflecting changes in stimulus acoustics from those that index true internalized percepts. We find that activity from the brainstem mirrors properties of the speech waveform with remarkable fidelity, reflecting progressive changes in speech acoustics but not the discrete phonetic classes reported behaviorally. In comparison, patterns of late cortical evoked activity contain information reflecting distinct perceptual categories and predict the abstract phonetic speech boundaries heard by listeners. Our findings demonstrate a critical transformation in neural speech representations between brainstem and early auditory cortex analogous to an acoustic-phonetic mapping necessary to generate categorical speech percepts. Analytic modeling demonstrates that a simple nonlinearity accounts for the transformation between early (subcortical) brain activity and subsequent cortical/behavioral responses to speech (>150-200 ms) thereby describing a plausible mechanism by which the

  3. Perception of Sentence Stress in Speech Correlates with the Temporal Unpredictability of Prosodic Features

    ERIC Educational Resources Information Center

    Kakouros, Sofoklis; Räsänen, Okko

    2016-01-01

    Numerous studies have examined the acoustic correlates of sentential stress and its underlying linguistic functionality. However, the mechanism that connects stress cues to the listener's attentional processing has remained unclear. Also, the learnability versus innateness of stress perception has not been widely discussed. In this work, we…

  4. A Technique for Estimating Intensity of Emotional Expressions and Speaking Styles in Speech Based on Multiple-Regression HSMM

    NASA Astrophysics Data System (ADS)

    Nose, Takashi; Kobayashi, Takao

    In this paper, we propose a technique for estimating the degree or intensity of emotional expressions and speaking styles appearing in speech. The key idea is based on a style control technique for speech synthesis using a multiple regression hidden semi-Markov model (MRHSMM), and the proposed technique can be viewed as the inverse of the style control. In the proposed technique, the acoustic features of spectrum, power, fundamental frequency, and duration are simultaneously modeled using the MRHSMM. We derive an algorithm for estimating explanatory variables of the MRHSMM, each of which represents the degree or intensity of emotional expressions and speaking styles appearing in acoustic features of speech, based on a maximum likelihood criterion. We show experimental results to demonstrate the ability of the proposed technique using two types of speech data, simulated emotional speech and spontaneous speech with different speaking styles. It is found that the estimated values have correlation with human perception.

  5. An investigation of sex differences in acoustic features in black-capped chickadee (Poecile atricapillus) chick-a-dee calls.

    PubMed

    Campbell, Kimberley A; Hahn, Allison H; Congdon, Jenna V; Sturdy, Christopher B

    2016-09-01

    Sex differences have been identified in a number of black-capped chickadee vocalizations and in the chick-a-dee calls of other chickadee species [i.e., Carolina chickadees (Poecile carolinensis)]. In the current study, 12 acoustic features in black-capped chickadee chick-a-dee calls were investigated, including both frequency and duration measurements. Using permuted discriminant function analyses, these features were examined to determine if any features could be used to identify the sex of the caller. Only one note type (A notes) classified male and female calls at levels approaching significance. In particular, a permuted discriminant function analysis revealed that the start frequency of A notes best allowed for categorization between the sexes compared to any other acoustic parameter. This finding is consistent with previous research on Carolina chickadee chick-a-dee calls that found that the starting frequency differed between male- and female-produced A notes [Freeberg, Lucas, and Clucas (2003). J. Acoust. Soc. Am. 113, 2127-2136]. Taken together, these results and the results of studies with other chickadee species suggest that sex differences likely exist in the chick-a-dee call, specifically acoustic features in A notes, but that more complex features than those addressed here may be associated with the sex of the caller.

  6. Hearing speech in music.

    PubMed

    Ekström, Seth-Reino; Borg, Erik

    2011-01-01

    The masking effect of a piano composition, played at different speeds and in different octaves, on speech-perception thresholds was investigated in 15 normal-hearing and 14 moderately-hearing-impaired subjects. Running speech (just follow conversation, JFC) testing and use of hearing aids increased the everyday validity of the findings. A comparison was made with standard audiometric noises [International Collegium of Rehabilitative Audiology (ICRA) noise and speech spectrum-filtered noise (SPN)]. All masking sounds, music or noise, were presented at the same equivalent sound level (50 dBA). The results showed a significant effect of piano performance speed and octave (P<.01). Low octave and fast tempo had the largest effect; and high octave and slow tempo, the smallest. Music had a lower masking effect than did ICRA noise with two or six speakers at normal vocal effort (P<.01) and SPN (P<.05). Subjects with hearing loss had higher masked thresholds than the normal-hearing subjects (P<.01), but there were smaller differences between masking conditions (P<.01). It is pointed out that music offers an interesting opportunity for studying masking under realistic conditions, where spectral and temporal features can be varied independently. The results have implications for composing music with vocal parts, designing acoustic environments and creating a balance between speech perception and privacy in social settings.

  7. Acoustics

    NASA Technical Reports Server (NTRS)

    Goodman, Jerry R.; Grosveld, Ferdinand

    2007-01-01

    The acoustics environment in space operations is important to maintain at manageable levels so that the crewperson can remain safe, functional, effective, and reasonably comfortable. High acoustic levels can produce temporary or permanent hearing loss, or cause other physiological symptoms such as auditory pain, headaches, discomfort, strain in the vocal cords, or fatigue. Noise is defined as undesirable sound. Excessive noise may result in psychological effects such as irritability, inability to concentrate, decrease in productivity, annoyance, errors in judgment, and distraction. A noisy environment can also result in the inability to sleep, or sleep well. Elevated noise levels can affect the ability to communicate, understand what is being said, hear what is going on in the environment, degrade crew performance and operations, and create habitability concerns. Superfluous noise emissions can also create the inability to hear alarms or other important auditory cues such as an equipment malfunctioning. Recent space flight experience, evaluations of the requirements in crew habitable areas, and lessons learned (Goodman 2003; Allen and Goodman 2003; Pilkinton 2003; Grosveld et al. 2003) show the importance of maintaining an acceptable acoustics environment. This is best accomplished by having a high-quality set of limits/requirements early in the program, the "designing in" of acoustics in the development of hardware and systems, and by monitoring, testing and verifying the levels to ensure that they are acceptable.

  8. Multilevel Analysis in Analyzing Speech Data

    ERIC Educational Resources Information Center

    Guddattu, Vasudeva; Krishna, Y.

    2011-01-01

    The speech produced by human vocal tract is a complex acoustic signal, with diverse applications in phonetics, speech synthesis, automatic speech recognition, speaker identification, communication aids, speech pathology, speech perception, machine translation, hearing research, rehabilitation and assessment of communication disorders and many…

  9. Predictive Ensemble Decoding of Acoustical Features Explains Context-Dependent Receptive Fields

    PubMed Central

    Mesgarani, Nima; Deneve, Sophie

    2016-01-01

    A primary goal of auditory neuroscience is to identify the sound features extracted and represented by auditory neurons. Linear encoding models, which describe neural responses as a function of the stimulus, have been primarily used for this purpose. Here, we provide theoretical arguments and experimental evidence in support of an alternative approach, based on decoding the stimulus from the neural response. We used a Bayesian normative approach to predict the responses of neurons detecting relevant auditory features, despite ambiguities and noise. We compared the model predictions to recordings from the primary auditory cortex of ferrets and found that: (1) the decoding filters of auditory neurons resemble the filters learned from the statistics of speech sounds; (2) the decoding model captures the dynamics of responses better than a linear encoding model of similar complexity; and (3) the decoding model accounts for the accuracy with which the stimulus is represented in neural activity, whereas linear encoding model performs very poorly. Most importantly, our model predicts that neuronal responses are fundamentally shaped by “explaining away,” a divisive competition between alternative interpretations of the auditory scene. SIGNIFICANCE STATEMENT Neural responses in the auditory cortex are dynamic, nonlinear, and hard to predict. Traditionally, encoding models have been used to describe neural responses as a function of the stimulus. However, in addition to external stimulation, neural activity is strongly modulated by the responses of other neurons in the network. We hypothesized that auditory neurons aim to collectively decode their stimulus. In particular, a stimulus feature that is decoded (or explained away) by one neuron is not explained by another. We demonstrated that this novel Bayesian decoding model is better at capturing the dynamic responses of cortical neurons in ferrets. Whereas the linear encoding model poorly reflects selectivity of neurons

  10. The auditory representation of speech sounds in human motor cortex.

    PubMed

    Cheung, Connie; Hamiton, Liberty S; Johnson, Keith; Chang, Edward F

    2016-03-04

    In humans, listening to speech evokes neural responses in the motor cortex. This has been controversially interpreted as evidence that speech sounds are processed as articulatory gestures. However, it is unclear what information is actually encoded by such neural activity. We used high-density direct human cortical recordings while participants spoke and listened to speech sounds. Motor cortex neural patterns during listening were substantially different than during articulation of the same sounds. During listening, we observed neural activity in the superior and inferior regions of ventral motor cortex. During speaking, responses were distributed throughout somatotopic representations of speech articulators in motor cortex. The structure of responses in motor cortex during listening was organized along acoustic features similar to auditory cortex, rather than along articulatory features as during speaking. Motor cortex does not contain articulatory representations of perceived actions in speech, but rather, represents auditory vocal information.

  11. Child directed speech, speech in noise and hyperarticulated speech in the Pacific Northwest

    NASA Astrophysics Data System (ADS)

    Wright, Richard; Carmichael, Lesley; Beckford Wassink, Alicia; Galvin, Lisa

    2004-05-01

    Three types of exaggerated speech are thought to be systematic responses to accommodate the needs of the listener: child-directed speech (CDS), hyperspeech, and the Lombard response. CDS (e.g., Kuhl et al., 1997) occurs in interactions with young children and infants. Hyperspeech (Johnson et al., 1993) is a modification in response to listeners difficulties in recovering the intended message. The Lombard response (e.g., Lane et al., 1970) is a compensation for increased noise in the signal. While all three result from adaptations to accommodate the needs of the listener, and therefore should share some features, the triggering conditions are quite different, and therefore should exhibit differences in their phonetic outcomes. While CDS has been the subject of a variety of acoustic studies, it has never been studied in the broader context of the other ``exaggerated'' speech styles. A large crosslinguistic study was undertaken that compares speech produced under four conditions: spontaneous conversations, CDS aimed at 6-9-month-old infants, hyperarticulated speech, and speech in noise. This talk will present some findings for North American English as spoken in the Pacific Northwest. The measures include f0, vowel duration, F1 and F2 at vowel midpoint, and intensity.

  12. Design Foundations for Content-Rich Acoustic Interfaces: Investigating Audemes as Referential Non-Speech Audio Cues

    ERIC Educational Resources Information Center

    Ferati, Mexhid Adem

    2012-01-01

    To access interactive systems, blind and visually impaired users can leverage their auditory senses by using non-speech sounds. The current structure of non-speech sounds, however, is geared toward conveying user interface operations (e.g., opening a file) rather than large theme-based information (e.g., a history passage) and, thus, is ill-suited…

  13. Robust Speech Rate Estimation for Spontaneous Speech

    PubMed Central

    Wang, Dagen; Narayanan, Shrikanth S.

    2010-01-01

    In this paper, we propose a direct method for speech rate estimation from acoustic features without requiring any automatic speech transcription. We compare various spectral and temporal signal analysis and smoothing strategies to better characterize the underlying syllable structure to derive speech rate. The proposed algorithm extends the methods of spectral subband correlation by including temporal correlation and the use of prominent spectral subbands for improving the signal correlation essential for syllable detection. Furthermore, to address some of the practical robustness issues in previously proposed methods, we introduce some novel components into the algorithm such as the use of pitch confidence for filtering spurious syllable envelope peaks, magnifying window for tackling neighboring syllable smearing, and relative peak measure thresholds for pseudo peak rejection. We also describe an automated approach for learning algorithm parameters from data, and find the optimal settings through Monte Carlo simulations and parameter sensitivity analysis. Final experimental evaluations are conducted based on a portion of the Switchboard corpus for which manual phonetic segmentation information, and published results for direct comparison are available. The results show a correlation coefficient of 0.745 with respect to the ground truth based on manual segmentation. This result is about a 17% improvement compared to the current best single estimator and a 11% improvement over the multiestimator evaluated on the same Switchboard database. PMID:20428476

  14. A self-organizing neural network architecture for auditory and speech perception with applications to acoustic and other temporal prediction problems

    NASA Astrophysics Data System (ADS)

    Cohen, Michael; Grossberg, Stephen

    1994-09-01

    This project is developing autonomous neural network models for the real-time perception and production of acoustic and speech signals. Our SPINET pitch model was developed to take realtime acoustic input and to simulate the key pitch data. SPINET was embedded into a model for auditory scene analysis, or how the auditory system separates sound sources in environments with multiple sources. The model groups frequency components based on pitch and spatial location cues and resonantly binds them within different streams. The model simulates psychophysical grouping data, such as how an ascending, tone groups with a descending tone even if noise exists at the intersection point, and how a tone before and after a noise burst is perceived to continue through the noise. These resonant streams input to working memories, wherein phonetic percepts adapt to global speech rate. Computer simulations quantitatively generate the experimentally observed category boundary shifts for voiced stop pairs that have the same or different place of articulation, including why the interval to hear a double (geminate) stop is twice as long as that to hear two different stops. This model also uses resonant feedback, here between list categories and working memory.

  15. Classification and clinicoradiologic features of primary progressive aphasia (PPA) and apraxia of speech.

    PubMed

    Botha, Hugo; Duffy, Joseph R; Whitwell, Jennifer L; Strand, Edythe A; Machulda, Mary M; Schwarz, Christopher G; Reid, Robert I; Spychalla, Anthony J; Senjem, Matthew L; Jones, David T; Lowe, Val; Jack, Clifford R; Josephs, Keith A

    2015-08-01

    The consensus criteria for the diagnosis and classification of primary progressive aphasia (PPA) have served as an important tool in studying this group of disorders. However, a large proportion of patients remain unclassifiable whilst others simultaneously meet criteria for multiple subtypes. We prospectively evaluated a large cohort of patients with degenerative aphasia and/or apraxia of speech using multidisciplinary clinical assessments and multimodal imaging. Blinded diagnoses were made using operational definitions with important differences compared to the consensus criteria. Of the 130 included patients, 40 were diagnosed with progressive apraxia of speech (PAOS), 12 with progressive agrammatic aphasia, 9 with semantic dementia, 52 with logopenic progressive aphasia, and 4 with progressive fluent aphasia, while 13 were unclassified. The PAOS and progressive fluent aphasia groups were least impaired. Performance on repetition and sentence comprehension was especially poor in the logopenic group. The semantic and progressive fluent aphasia groups had prominent anomia, but only semantic subjects had loss of word meaning and object knowledge. Distinct patterns of grey matter loss and white matter changes were found in all groups compared to controls. PAOS subjects had bilateral frontal grey matter loss, including the premotor and supplementary motor areas, and bilateral frontal white matter involvement. The agrammatic group had more widespread, predominantly left sided grey matter loss and white matter abnormalities. Semantic subjects had bitemporal grey matter loss and white matter changes, including the uncinate and inferior occipitofrontal fasciculi, whereas progressive fluent subjects only had left sided temporal involvement. Logopenic subjects had diffuse and bilateral grey matter loss and diffusion tensor abnormalities, maximal in the posterior temporal region. A diagnosis of logopenic aphasia was strongly associated with being amyloid positive (46

  16. Acoustic measurement and morphological features of organic sediment deposits in combined sewer networks.

    PubMed

    Carnacina, Iacopo; Larrarte, Frédérique; Leonardi, Nicoletta

    2017-04-01

    The performance of sewer networks has important consequences from an environmental and social point of view. Poor functioning can result in flood risk and pollution at a large scale. Sediment deposits forming in sewer trunks might severely compromise the sewer line by affecting the flow field, reducing cross-sectional areas, and increasing roughness coefficients. In spite of numerous efforts, the morphological features of these depositional environments remain poorly understood. The interface between water and sediment remains inefficiently identified and the estimation of the stock of deposit is frequently inaccurate. In part, this is due to technical issues connected to difficulties in collecting accurate field measurements without disrupting existing morphologies. In this paper, results from an extensive field campaign are presented; during the campaign a new survey methodology based on acoustic techniques has been tested. Furthermore, a new algorithm for the detection of the soil-water interface, and therefore for the correct esteem of sediment stocks is proposed. Finally, results in regard to bed topography, and morphological features at two different field sites are presented and reveal that a large variability in bed forms is present along sewer networks.

  17. Differences in acoustic features of vocalizations produced by killer whales cross-socialized with bottlenose dolphins.

    PubMed

    Musser, Whitney B; Bowles, Ann E; Grebner, Dawn M; Crance, Jessica L

    2014-10-01

    Limited previous evidence suggests that killer whales (Orcinus orca) are capable of vocal production learning. However, vocal contextual learning has not been studied, nor the factors promoting learning. Vocalizations were collected from three killer whales with a history of exposure to bottlenose dolphins (Tursiops truncatus) and compared with data from seven killer whales held with conspecifics and nine bottlenose dolphins. The three whales' repertoires were distinguishable by a higher proportion of click trains and whistles. Time-domain features of click trains were intermediate between those of whales held with conspecifics and dolphins. These differences provided evidence for contextual learning. One killer whale spontaneously learned to produce artificial chirps taught to dolphins; acoustic features fell within the range of inter-individual differences among the dolphins. This whale also produced whistles similar to a stereotyped whistle produced by one dolphin. Thus, results provide further support for vocal production learning and show that killer whales are capable of contextual learning. That killer whales produce similar repertoires when associated with another species suggests substantial vocal plasticity and motivation for vocal conformity with social associates.

  18. The Auditory-Brainstem Response to Continuous, Non-repetitive Speech Is Modulated by the Speech Envelope and Reflects Speech Processing

    PubMed Central

    Reichenbach, Chagit S.; Braiman, Chananel; Schiff, Nicholas D.; Hudspeth, A. J.; Reichenbach, Tobias

    2016-01-01

    The auditory-brainstem response (ABR) to short and simple acoustical signals is an important clinical tool used to diagnose the integrity of the brainstem. The ABR is also employed to investigate the auditory brainstem in a multitude of tasks related to hearing, such as processing speech or selectively focusing on one speaker in a noisy environment. Such research measures the response of the brainstem to short speech signals such as vowels or words. Because the voltage signal of the ABR has a tiny amplitude, several hundred to a thousand repetitions of the acoustic signal are needed to obtain a reliable response. The large number of repetitions poses a challenge to assessing cognitive functions due to neural adaptation. Here we show that continuous, non-repetitive speech, lasting several minutes, may be employed to measure the ABR. Because the speech is not repeated during the experiment, the precise temporal form of the ABR cannot be determined. We show, however, that important structural features of the ABR can nevertheless be inferred. In particular, the brainstem responds at the fundamental frequency of the speech signal, and this response is modulated by the envelope of the voiced parts of speech. We accordingly introduce a novel measure that assesses the ABR as modulated by the speech envelope, at the fundamental frequency of speech and at the characteristic latency of the response. This measure has a high signal-to-noise ratio and can hence be employed effectively to measure the ABR to continuous speech. We use this novel measure to show that the ABR is weaker to intelligible speech than to unintelligible, time-reversed speech. The methods presented here can be employed for further research on speech processing in the auditory brainstem and can lead to the development of future clinical diagnosis of brainstem function. PMID:27303286

  19. Psychoacoustic cues to emotion in speech prosody and music.

    PubMed

    Coutinho, Eduardo; Dibben, Nicola

    2013-01-01

    There is strong evidence of shared acoustic profiles common to the expression of emotions in music and speech, yet relatively limited understanding of the specific psychoacoustic features involved. This study combined a controlled experiment and computational modelling to investigate the perceptual codes associated with the expression of emotion in the acoustic domain. The empirical stage of the study provided continuous human ratings of emotions perceived in excerpts of film music and natural speech samples. The computational stage created a computer model that retrieves the relevant information from the acoustic stimuli and makes predictions about the emotional expressiveness of speech and music close to the responses of human subjects. We show that a significant part of the listeners' second-by-second reported emotions to music and speech prosody can be predicted from a set of seven psychoacoustic features: loudness, tempo/speech rate, melody/prosody contour, spectral centroid, spectral flux, sharpness, and roughness. The implications of these results are discussed in the context of cross-modal similarities in the communication of emotion in the acoustic domain.

  20. Critical Issues in Airborne Applications of Speech Recognition.

    DTIC Science & Technology

    1979-01-01

    human’s tongue , lips, and other articulators to get ready for the next vowel or consonant to be spoken, and to gradually move away from the...acoustic tube, so that formants and other interesting features of the speech signal could be more readily and accurately detected). Of particular

  1. How well can children recognize speech features in spectrograms? Comparisons by age and hearing status.

    PubMed

    Ertmer, David J

    2004-06-01

    Real-time spectrographic displays (SDs) have been used in speech training for more than 30 years with adults and children who have severe and profound hearing impairments. Despite positive outcomes from treatment studies, concerns remain that the complex and abstract nature of spectrograms may make these speech training aids unsuitable for use with children. This investigation examined how well children with normal hearing sensitivity and children with impaired hearing can recognize spectrographic cues for vowels and consonants, and the ages at which these visual cues are distinguished. Sixty children (30 with normal hearing sensitivity, 30 with hearing impairments) in 3 age groups (6-7, 8-9, and 10-11 years) were familiarized with the spectrographic characteristics of selected vowels and consonants. The children were then tested on their ability to select a match for a model spectrogram from among 3 choices. Overall scores indicated that spectrographic cues were recognized with greater-than-chance accuracy by all age groups. Formant contrasts were recognized with greater accuracy than consonant manner contrasts. Children with normal hearing sensitivity and those with hearing impairment performed equally well.

  2. Acoustic features contributing to the individuality of wild agile gibbon (Hylobates agilis agilis) songs.

    PubMed

    Oyakawa, Chisako; Koda, Hiroki; Sugiura, Hideki

    2007-07-01

    We examined acoustic individuality in wild agile gibbon Hylobates agilis agilis and determined the acoustic variables that contribute to individual discrimination using multivariate analyses. We recorded 125 female-specific songs (great calls) from six groups in west Sumatra and measured 58 acoustic variables for each great call. We performed principal component analysis to summarize the 58 variables into six acoustic principal components (PCs). Generally, each PC corresponded to a part of the great call. Significant individual differences were found across six individual gibbons in each of the six PCs. Moreover, strong acoustic individuality was found in the introductory and climax parts of the great call. In contrast, the terminal part contributed little to individual identification. Discriminant analysis showed that these PCs contributed to individual discrimination with high repeatability. Although we cannot conclude that agile gibbon use these acoustic components for individual discrimination, they are potential candidates for individual recognition.

  3. Experimental investigation of the effects of the acoustical conditions in a simulated classroom on speech recognition and learning in children a

    PubMed Central

    Valente, Daniel L.; Plevinsky, Hallie M.; Franco, John M.; Heinrichs-Graham, Elizabeth C.; Lewis, Dawna E.

    2012-01-01

    The potential effects of acoustical environment on speech understanding are especially important as children enter school where students’ ability to hear and understand complex verbal information is critical to learning. However, this ability is compromised because of widely varied and unfavorable classroom acoustics. The extent to which unfavorable classroom acoustics affect children’s performance on longer learning tasks is largely unknown as most research has focused on testing children using words, syllables, or sentences as stimuli. In the current study, a simulated classroom environment was used to measure comprehension performance of two classroom learning activities: a discussion and lecture. Comprehension performance was measured for groups of elementary-aged students in one of four environments with varied reverberation times and background noise levels. The reverberation time was either 0.6 or 1.5 s, and the signal-to-noise level was either +10 or +7 dB. Performance is compared to adult subjects as well as to sentence-recognition in the same condition. Significant differences were seen in comprehension scores as a function of age and condition; both increasing background noise and reverberation degraded performance in comprehension tasks compared to minimal differences in measures of sentence-recognition. PMID:22280587

  4. Motor representations of articulators contribute to categorical perception of speech sounds.

    PubMed

    Möttönen, Riikka; Watkins, Kate E

    2009-08-05

    Listening to speech modulates activity in human motor cortex. It is unclear, however, whether the motor cortex has an essential role in speech perception. Here, we aimed to determine whether the motor representations of articulators contribute to categorical perception of speech sounds. Categorization of continuously variable acoustic signals into discrete phonemes is a fundamental feature of speech communication. We used repetitive transcranial magnetic stimulation (rTMS) to temporarily disrupt the lip representation in the left primary motor cortex. This disruption impaired categorical perception of artificial acoustic continua ranging between two speech sounds that differed in place of articulation, in that the vocal tract is opened and closed rapidly either with the lips or the tip of the tongue (/ba/-/da/ and /pa/-/ta/). In contrast, it did not impair categorical perception of continua ranging between speech sounds that do not involve the lips in their articulation (/ka/-/ga/ and /da/-/ga/). Furthermore, an rTMS-induced disruption of the hand representation had no effect on categorical perception of either of the tested continua (/ba/-da/ and /ka/-/ga/). These findings indicate that motor circuits controlling production of speech sounds also contribute to their perception. Mapping acoustically highly variable speech sounds onto less variable motor representations may facilitate their phonemic categorization and be important for robust speech perception.

  5. A gearbox fault diagnosis scheme based on near-field acoustic holography and spatial distribution features of sound field

    NASA Astrophysics Data System (ADS)

    Lu, Wenbo; Jiang, Weikang; Yuan, Guoqing; Yan, Li

    2013-05-01

    Vibration signal analysis is the main technique in machine condition monitoring or fault diagnosis, whereas in some cases vibration-based diagnosis is restrained because of its contact measurement. Acoustic-based diagnosis (ABD) with non-contact measurement has received little attention, although sound field may contain abundant information related to fault pattern. A new scheme of ABD for gearbox based on near-field acoustic holography (NAH) and spatial distribution features of sound field is presented in this paper. It focuses on applying distribution information of sound field to gearbox fault diagnosis. A two-stage industrial helical gearbox is experimentally studied in a semi-anechoic chamber and a lab workshop, respectively. Firstly, multi-class faults (mild pitting, moderate pitting, severe pitting and tooth breakage) are simulated, respectively. Secondly, sound fields and corresponding acoustic images in different gearbox running conditions are obtained by fast Fourier transform (FFT) based NAH. Thirdly, by introducing texture analysis to fault diagnosis, spatial distribution features are extracted from acoustic images for capturing fault patterns underlying the sound field. Finally, the features are fed into multi-class support vector machine for fault pattern identification. The feasibility and effectiveness of our proposed scheme is demonstrated on the good experimental results and the comparison with traditional ABD method. Even with strong noise interference, spatial distribution features of sound field can reliably reveal the fault patterns of gearbox, and thus the satisfactory accuracy can be obtained. The combination of histogram features and gray level gradient co-occurrence matrix features is suggested for good diagnosis accuracy and low time cost.

  6. Fatigue features study on the crankshaft material of 42CrMo steel using acoustic emission

    NASA Astrophysics Data System (ADS)

    Shi, Yue; Dong, Lihong; Wang, Haidou; Li, Guolu; Liu, Shenshui

    2016-09-01

    Crankshaft is regarded as an important component of engines, and it is an important application of remanufacturing because of its high added value. However, the fatigue failure research of remanufactured crankshaft is still in its primary stage. Thus, monitoring and investigating the fatigue failure of the remanufacturing crankshaft is crucial. In this paper, acoustic emission (AE) technology and machine vision are used to monitor the four-point bending fatigue of 42CrMo, which is the material of crankshaft. The specimens are divided into two categories, namely, pre-existing crack and non-preexisting crack, which simulate the crankshaft and crankshaft blank, respectively. The analysis methods of parameter-based AE techniques, wavelet transform (WT) and SEM analysis are combined to identify the stage of fatigue failure. The stage of fatigue failure is the basis of using AE technology in the field of remanufacturing crankshafts. The experiment results show that the fatigue crack propagation style is a transgranular fracture and the fracture is a brittle fracture. The difference mainly depends on the form of crack initiation. Various AE signals are detected by parameter analysis method. Wavelet threshold denoising and WT are combined to extract the spectral features of AE signals at different fatigue failure stages.

  7. Perception of Speech Features by French-Speaking Children with Cochlear Implants

    ERIC Educational Resources Information Center

    Bouton, Sophie; Serniclaes, Willy; Bertoncini, Josiane; Cole, Pascale

    2012-01-01

    Purpose: The present study investigates the perception of phonological features in French-speaking children with cochlear implants (CIs) compared with normal-hearing (NH) children matched for listening age. Method: Scores for discrimination and identification of minimal pairs for all features defining consonants (e.g., place, voicing, manner,…

  8. The Effect of Residual Acoustic Hearing and Adaptation to Uncertainty on Speech Perception in Cochlear Implant Users: Evidence from Eye-Tracking

    PubMed Central

    McMurray, Bob; Farris-Trimble, Ashley; Seedorff, Michael; Rigler, Hannah

    2015-01-01

    Objectives While outcomes with cochlear implants (CIs) are generally good, performance can be fragile. The authors examined two factors that are crucial for good CI performance. First, while there is a clear benefit for adding residual acoustic hearing to CI stimulation (typically in low frequencies), it is unclear whether this contributes directly to phonetic categorization. Thus, the authors examined perception of voicing (which uses low-frequency acoustic cues) and fricative place of articulation (s/ʃ, which does not) in CI users with and without residual acoustic hearing. Second, in speech categorization experiments, CI users typically show shallower identification functions. These are typically interpreted as deriving from noisy encoding of the signal. However, psycholinguistic work suggests shallow slopes may also be a useful way to adapt to uncertainty. The authors thus employed an eye-tracking paradigm to examine this in CI users. Design Participants were 30 CI users (with a variety of configurations) and 22 age-matched normal hearing (NH) controls. Participants heard tokens from six b/p and six s/ʃ continua (eight steps) spanning real words (e.g., beach/peach, sip/ship). Participants selected the picture corresponding to the word they heard from a screen containing four items (a b-, p-, s- and ʃ-initial item). Eye movements to each object were monitored as a measure of how strongly they were considering each interpretation in the moments leading up to their final percept. Results Mouse-click results (analogous to phoneme identification) for voicing showed a shallower slope for CI users than NH listeners, but no differences between CI users with and without residual acoustic hearing. For fricatives, CI users also showed a shallower slope, but unexpectedly, acoustic + electric listeners showed an even shallower slope. Eye movements showed a gradient response to fine-grained acoustic differences for all listeners. Even considering only trials in which a

  9. Measurement of acoustical characteristics of mosques in Saudi Arabia

    NASA Astrophysics Data System (ADS)

    Abdou, Adel A.

    2003-03-01

    The study of mosque acoustics, with regard to acoustical characteristics, sound quality for speech intelligibility, and other applicable acoustic criteria, has been largely neglected. In this study a background as to why mosques are designed as they are and how mosque design is influenced by worship considerations is given. In the study the acoustical characteristics of typically constructed contemporary mosques in Saudi Arabia have been investigated, employing a well-known impulse response. Extensive field measurements were taken in 21 representative mosques of different sizes and architectural features in order to characterize their acoustical quality and to identify the impact of air conditioning, ceiling fans, and sound reinforcement systems on their acoustics. Objective room-acoustic indicators such as reverberation time (RT) and clarity (C50) were measured. Background noise (BN) was assessed with and without the operation of air conditioning and fans. The speech transmission index (STI) was also evaluated with and without the operation of existing sound reinforcement systems. The existence of acoustical deficiencies was confirmed and quantified. The study, in addition to describing mosque acoustics, compares design goals to results obtained in practice and suggests acoustical target values for mosque design. The results show that acoustical quality in the investigated mosques deviates from optimum conditions when unoccupied, but is much better in the occupied condition.

  10. Measurement of acoustical characteristics of mosques in Saudi Arabia.

    PubMed

    Abdou, Adel A

    2003-03-01

    The study of mosque acoustics, with regard to acoustical characteristics, sound quality for speech intelligibility, and other applicable acoustic criteria, has been largely neglected. In this study a background as to why mosques are designed as they are and how mosque design is influenced by worship considerations is given. In the study the acoustical characteristics of typically constructed contemporary mosques in Saudi Arabia have been investigated, employing a well-known impulse response. Extensive field measurements were taken in 21 representative mosques of different sizes and architectural features in order to characterize their acoustical quality and to identify the impact of air conditioning, ceiling fans, and sound reinforcement systems on their acoustics. Objective room-acoustic indicators such as reverberation time (RT) and clarity (C50) were measured. Background noise (BN) was assessed with and without the operation of air conditioning and fans. The speech transmission index (STI) was also evaluated with and without the operation of existing sound reinforcement systems. The existence of acoustical deficiencies was confirmed and quantified. The study, in addition to describing mosque acoustics, compares design goals to results obtained in practice and suggests acoustical target values for mosque design. The results show that acoustical quality in the investigated mosques deviates from optimum conditions when unoccupied, but is much better in the occupied condition.

  11. Differential Diagnosis of Severe Speech Disorders Using Speech Gestures

    ERIC Educational Resources Information Center

    Bahr, Ruth Huntley

    2005-01-01

    The differentiation of childhood apraxia of speech from severe phonological disorder is a common clinical problem. This article reports on an attempt to describe speech errors in children with childhood apraxia of speech on the basis of gesture use and acoustic analyses of articulatory gestures. The focus was on the movement of articulators and…

  12. SPEECH COMMUNICATION RESEARCH.

    DTIC Science & Technology

    studies of the dynamics of speech production through cineradiographic techniques and through acoustic analysis of formant motions in vowels in various...particular, the activity of the vocal cords and the dynamics of tongue motion. Research on speech perception has included experiments on vowel

  13. Effect of Acoustic Spectrographic Instruction on Production of English /i/ and /I/ by Spanish Pre-Service English Teachers

    ERIC Educational Resources Information Center

    Quintana-Lara, Marcela

    2014-01-01

    This study investigates the effects of Acoustic Spectrographic Instruction on the production of the English phonological contrast /i/ and / I /. Acoustic Spectrographic Instruction is based on the assumption that physical representations of speech sounds and spectrography allow learners to objectively see and modify those non-accurate features in…

  14. Effects of computer-based intervention through acoustically modified speech (Fast ForWord) in severe mixed receptive-expressive language impairment: outcomes from a randomized controlled trial.

    PubMed

    Cohen, Wendy; Hodson, Ann; O'Hare, Anne; Boyle, James; Durrani, Tariq; McCartney, Elspeth; Mattey, Mike; Naftalin, Lionel; Watson, Jocelynne

    2005-06-01

    Seventy-seven children between the ages of 6 and 10 years, with severe mixed receptive-expressive specific language impairment (SLI), participated in a randomized controlled trial (RCT) of Fast ForWord (FFW; Scientific Learning Corporation, 1997, 2001). FFW is a computer-based intervention for treating SLI using acoustically enhanced speech stimuli. These stimuli are modified to exaggerate their time and intensity properties as part of an adaptive training process. All children who participated in the RCT maintained their regular speech and language therapy and school regime throughout the trial. Standardized measures of receptive and expressive language were used to assess performance at baseline and to measure outcome from treatment at 9 weeks and 6 months. Children were allocated to 1 of 3 groups. Group A (n = 23) received the FFW intervention as a home-based therapy for 6 weeks. Group B (n = 27) received commercially available computer-based activities designed to promote language as a control for computer games exposure. Group C (n = 27) received no additional study intervention. Each group made significant gains in language scores, but there was no additional effect for either computer intervention. Thus, the findings from this RCT do not support the efficacy of FFW as an intervention for children with severe mixed receptive-expressive SLI.

  15. Room Acoustics

    NASA Astrophysics Data System (ADS)

    Kuttruff, Heinrich; Mommertz, Eckard

    The traditional task of room acoustics is to create or formulate conditions which ensure the best possible propagation of sound in a room from a sound source to a listener. Thus, objects of room acoustics are in particular assembly halls of all kinds, such as auditoria and lecture halls, conference rooms, theaters, concert halls or churches. Already at this point, it has to be pointed out that these conditions essentially depend on the question if speech or music should be transmitted; in the first case, the criterion for transmission quality is good speech intelligibility, in the other case, however, the success of room-acoustical efforts depends on other factors that cannot be quantified that easily, not least it also depends on the hearing habits of the listeners. In any case, absolutely "good acoustics" of a room do not exist.

  16. Rapid tuning shifts in human auditory cortex enhance speech intelligibility

    PubMed Central

    Holdgraf, Christopher R.; de Heer, Wendy; Pasley, Brian; Rieger, Jochem; Crone, Nathan; Lin, Jack J.; Knight, Robert T.; Theunissen, Frédéric E.

    2016-01-01

    Experience shapes our perception of the world on a moment-to-moment basis. This robust perceptual effect of experience parallels a change in the neural representation of stimulus features, though the nature of this representation and its plasticity are not well-understood. Spectrotemporal receptive field (STRF) mapping describes the neural response to acoustic features, and has been used to study contextual effects on auditory receptive fields in animal models. We performed a STRF plasticity analysis on electrophysiological data from recordings obtained directly from the human auditory cortex. Here, we report rapid, automatic plasticity of the spectrotemporal response of recorded neural ensembles, driven by previous experience with acoustic and linguistic information, and with a neurophysiological effect in the sub-second range. This plasticity reflects increased sensitivity to spectrotemporal features, enhancing the extraction of more speech-like features from a degraded stimulus and providing the physiological basis for the observed ‘perceptual enhancement' in understanding speech. PMID:27996965

  17. Emotional communication in speech and music: the role of melodic and rhythmic contrasts.

    PubMed

    Quinto, Lena; Thompson, William Forde; Keating, Felicity Louise

    2013-01-01

    Many acoustic features convey emotion similarly in speech and music. Researchers have established that acoustic features such as pitch height, tempo, and intensity carry important emotional information in both domains. In this investigation, we examined the emotional significance of melodic and rhythmic contrasts between successive syllables or tones in speech and music, referred to as Melodic Interval Variability (MIV) and the normalized Pairwise Variability Index (nPVI). The spoken stimuli were 96 tokens expressing the emotions of irritation, fear, happiness, sadness, tenderness, or no emotion. The music stimuli were 96 phrases, played with or without performance expression and composed with the intention of communicating the same emotions. Results showed that nPVI, but not MIV, operates similarly in music and speech. Spoken stimuli, but not musical stimuli, were characterized by changes in MIV as a function of intended emotion. The results suggest that these measures may signal emotional intentions differently in speech and music.

  18. Emotional Communication in Speech and Music: The Role of Melodic and Rhythmic Contrasts

    PubMed Central

    Quinto, Lena; Thompson, William Forde; Keating, Felicity Louise

    2013-01-01

    Many acoustic features convey emotion similarly in speech and music. Researchers have established that acoustic features such as pitch height, tempo, and intensity carry important emotional information in both domains. In this investigation, we examined the emotional significance of melodic and rhythmic contrasts between successive syllables or tones in speech and music, referred to as Melodic Interval Variability (MIV) and the normalized Pairwise Variability Index (nPVI). The spoken stimuli were 96 tokens expressing the emotions of irritation, fear, happiness, sadness, tenderness, or no emotion. The music stimuli were 96 phrases, played with or without performance expression and composed with the intention of communicating the same emotions. Results showed that nPVI, but not MIV, operates similarly in music and speech. Spoken stimuli, but not musical stimuli, were characterized by changes in MIV as a function of intended emotion. The results suggest that these measures may signal emotional intentions differently in speech and music. PMID:23630507

  19. A study of acoustic-to-articulatory inversion of speech by analysis-by-synthesis using chain matrices and the Maeda articulatory model.

    PubMed

    Panchapagesan, Sankaran; Alwan, Abeer

    2011-04-01

    In this paper, a quantitative study of acoustic-to-articulatory inversion for vowel speech sounds by analysis-by-synthesis using the Maeda articulatory model is performed. For chain matrix calculation of vocal tract (VT) acoustics, the chain matrix derivatives with respect to area function are calculated and used in a quasi-Newton method for optimizing articulatory trajectories. The cost function includes a distance measure between natural and synthesized first three formants, and parameter regularization and continuity terms. Calibration of the Maeda model to two speakers, one male and one female, from the University of Wisconsin x-ray microbeam (XRMB) database, using a cost function, is discussed. Model adaptation includes scaling the overall VT and the pharyngeal region and modifying the outer VT outline using measured palate and pharyngeal traces. The inversion optimization is initialized by a fast search of an articulatory codebook, which was pruned using XRMB data to improve inversion results. Good agreement between estimated midsagittal VT outlines and measured XRMB tongue pellet positions was achieved for several vowels and diphthongs for the male speaker, with average pellet-VT outline distances around 0.15 cm, smooth articulatory trajectories, and less than 1% average error in the first three formants.

  20. ON THE NATURE OF SPEECH SCIENCE.

    ERIC Educational Resources Information Center

    PETERSON, GORDON E.

    IN THIS ARTICLE THE NATURE OF THE DISCIPLINE OF SPEECH SCIENCE IS CONSIDERED AND THE VARIOUS BASIC AND APPLIED AREAS OF THE DISCIPLINE ARE DISCUSSED. THE BASIC AREAS ENCOMPASS THE VARIOUS PROCESSES OF THE PHYSIOLOGY OF SPEECH PRODUCTION, THE ACOUSTICAL CHARACTERISTICS OF SPEECH, INCLUDING THE SPEECH WAVE TYPES AND THE INFORMATION-BEARING ACOUSTIC…

  1. Analysis of False Starts in Spontaneous Speech.

    ERIC Educational Resources Information Center

    O'Shaughnessy, Douglas

    A primary difference between spontaneous speech and read speech concerns the use of false starts, where a speaker interrupts the flow of speech to restart his or her utterance. A study examined the acoustic aspects of such restarts in a widely-used speech database, examining approximately 1000 utterances, about 10% of which contained a restart.…

  2. Tutorial on architectural acoustics

    NASA Astrophysics Data System (ADS)

    Shaw, Neil; Talaske, Rick; Bistafa, Sylvio

    2002-11-01

    This tutorial is intended to provide an overview of current knowledge and practice in architectural acoustics. Topics covered will include basic concepts and history, acoustics of small rooms (small rooms for speech such as classrooms and meeting rooms, music studios, small critical listening spaces such as home theatres) and the acoustics of large rooms (larger assembly halls, auditoria, and performance halls).

  3. [Factorial structure of discriminating speech perception in binaural electro-acoustic correction in patients with imparied hearing of various etiology].

    PubMed

    Tokarev, O P; Bagriantseva, M N

    1990-01-01

    These authors examined 260 patients with hypoacusis of various etiology who needed hearing aids. When measuring their hearing, the authors identified the basic factors that may influence speech intelligibility in the case of binaural correction and optimal type of hearing aids. For every group of patients with hypoacusis of various etiology regression curves of integrated parameters were plotted which helped predict the effectiveness of hearing aids on an individual basis.

  4. Two-microphone spatial filtering provides speech reception benefits for cochlear implant users in difficult acoustic environments

    PubMed Central

    Goldsworthy, Raymond L.; Delhorne, Lorraine A.; Desloge, Joseph G.; Braida, Louis D.

    2014-01-01

    This article introduces and provides an assessment of a spatial-filtering algorithm based on two closely-spaced (∼1 cm) microphones in a behind-the-ear shell. The evaluated spatial-filtering algorithm used fast (∼10 ms) temporal-spectral analysis to determine the location of incoming sounds and to enhance sounds arriving from straight ahead of the listener. Speech reception thresholds (SRTs) were measured for eight cochlear implant (CI) users using consonant and vowel materials under three processing conditions: An omni-directional response, a dipole-directional response, and the spatial-filtering algorithm. The background noise condition used three simultaneous time-reversed speech signals as interferers located at 90°, 180°, and 270°. Results indicated that the spatial-filtering algorithm can provide speech reception benefits of 5.8 to 10.7 dB SRT compared to an omni-directional response in a reverberant room with multiple noise sources. Given the observed SRT benefits, coupled with an efficient design, the proposed algorithm is promising as a CI noise-reduction solution. PMID:25096120

  5. The neural basis of sublexical speech and corresponding nonspeech processing: a combined EEG-MEG study.

    PubMed

    Kuuluvainen, Soila; Nevalainen, Päivi; Sorokin, Alexander; Mittag, Maria; Partanen, Eino; Putkinen, Vesa; Seppänen, Miia; Kähkönen, Seppo; Kujala, Teija

    2014-03-01

    We addressed the neural organization of speech versus nonspeech sound processing by investigating preattentive cortical auditory processing of changes in five features of a consonant-vowel syllable (consonant, vowel, sound duration, frequency, and intensity) and their acoustically matched nonspeech counterparts in a simultaneous EEG-MEG recording of mismatch negativity (MMN/MMNm). Overall, speech-sound processing was enhanced compared to nonspeech sound processing. This effect was strongest for changes which affect word meaning (consonant, vowel, and vowel duration) in the left and for the vowel identity change in the right hemisphere also. Furthermore, in the right hemisphere, speech-sound frequency and intensity changes were processed faster than their nonspeech counterparts, and there was a trend for speech-enhancement in frequency processing. In summary, the results support the proposed existence of long-term memory traces for speech sounds in the auditory cortices, and indicate at least partly distinct neural substrates for speech and nonspeech sound processing.

  6. Acoustic and Perceptual Measurement of Expressive Prosody in High-Functioning Autism: Increased Pitch Range and What it Means to Listeners

    ERIC Educational Resources Information Center

    Nadig, Aparna; Shaw, Holly

    2012-01-01

    Are there consistent markers of atypical prosody in speakers with high functioning autism (HFA) compared to typically-developing speakers? We examined: (1) acoustic measurements of pitch range, mean pitch and speech rate in conversation, (2) perceptual ratings of conversation for these features and overall prosody, and (3) acoustic measurements of…

  7. A preliminary study on improving the recognition of esophageal speech using a hybrid system based on statistical voice conversion.

    PubMed

    Lachhab, Othman; Di Martino, Joseph; Elhaj, Elhassane Ibn; Hammouch, Ahmed

    2015-01-01

    In this paper, we propose a hybrid system based on a modified statistical GMM voice conversion algorithm for improving the recognition of esophageal speech. This hybrid system aims to compensate for the distorted information present in the esophageal acoustic features by using a voice conversion method. The esophageal speech is converted into a "target" laryngeal speech using an iterative statistical estimation of a transformation function. We did not apply a speech synthesizer for reconstructing the converted speech signal, given that the converted Mel cepstral vectors are used directly as input of our speech recognition system. Furthermore the feature vectors are linearly transformed by the HLDA (heteroscedastic linear discriminant analysis) method to reduce their size in a smaller space having good discriminative properties. The experimental results demonstrate that our proposed system provides an improvement of the phone recognition accuracy with an absolute increase of 3.40 % when compared with the phone recognition accuracy obtained with neither HLDA nor voice conversion.

  8. Emotional recognition from the speech signal for a virtual education agent

    NASA Astrophysics Data System (ADS)

    Tickle, A.; Raghu, S.; Elshaw, M.

    2013-06-01

    This paper explores the extraction of features from the speech wave to perform intelligent emotion recognition. A feature extract tool (openSmile) was used to obtain a baseline set of 998 acoustic features from a set of emotional speech recordings from a microphone. The initial features were reduced to the most important ones so recognition of emotions using a supervised neural network could be performed. Given that the future use of virtual education agents lies with making the agents more interactive, developing agents with the capability to recognise and adapt to the emotional state of humans is an important step.

  9. Speech rate and rhythm in Parkinson's disease.

    PubMed

    Skodda, Sabine; Schlegel, Uwe

    2008-05-15

    Articulatory rate and pause time in a standardized reading task in Parkinson's disease (PD) patients in correlation to disease duration and severity as compared to healthy controls were analyzed. In 121 PD patients and 70 healthy controls, an acoustical analysis was performed on the first and last sentence of a standardized 170-syllabic text, using a commercial audio software. Articulatory rate and speech to pause ratios were calculated by measuring the length of each syllable and each pause both at the end of words and within polysyllabic words. No significant difference in overall articulatory rate was found between PD patients and controls. Both groups showed an accelerated speech rate in the last sentence compared to the first; however, PD patients had a higher speech acceleration than did controls. PD patients exhibited a significantly reduced percental pause duration in relation to total speech time in the first sentence and a reduced percental pause time within polysyllabic words. PD patients made significantly less but longer pauses at the end of words and less pauses within polysyllabic words. UPDRS III showed an inverse relation to number and rate of intraword pauses, and disease duration was negatively correlated with articulatory rate. The characteristics of parkinsonian speech feature was not only a stronger acceleration of articulation rate in the course of speaking but also a significant reduction of the total numbers of pauses, indicating an impaired speech rhythm and timing organization.

  10. Segmenting Words from Natural Speech: Subsegmental Variation in Segmental Cues

    ERIC Educational Resources Information Center

    Rytting, C. Anton; Brew, Chris; Fosler-Lussier, Eric

    2010-01-01

    Most computational models of word segmentation are trained and tested on transcripts of speech, rather than the speech itself, and assume that speech is converted into a sequence of symbols prior to word segmentation. We present a way of representing speech corpora that avoids this assumption, and preserves acoustic variation present in speech. We…

  11. Speech Acquisition and Automatic Speech Recognition for Integrated Spacesuit Audio Systems

    NASA Technical Reports Server (NTRS)

    Huang, Yiteng; Chen, Jingdong; Chen, Shaoyan

    2010-01-01

    A voice-command human-machine interface system has been developed for spacesuit extravehicular activity (EVA) missions. A multichannel acoustic signal processing method has been created for distant speech acquisition in noisy and reverberant environments. This technology reduces noise by exploiting differences in the statistical nature of signal (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, the automatic speech recognition (ASR) accuracy can be improved to the level at which crewmembers would find the speech interface useful. The developed speech human/machine interface will enable both crewmember usability and operational efficiency. It can enjoy a fast rate of data/text entry, small overall size, and can be lightweight. In addition, this design will free the hands and eyes of a suited crewmember. The system components and steps include beam forming/multi-channel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, model adaption, ASR HMM (Hidden Markov Model) training, and ASR decoding. A state-of-the-art phoneme recognizer can obtain an accuracy rate of 65 percent when the training and testing data are free of noise. When it is used in spacesuits, the rate drops to about 33 percent. With the developed microphone array speech-processing technologies, the performance is improved and the phoneme recognition accuracy rate rises to 44 percent. The recognizer can be further improved by combining the microphone array and HMM model adaptation techniques and using speech samples collected from inside spacesuits. In addition, arithmetic complexity models for the major HMMbased ASR components were developed. They can help real-time ASR system designers select proper tasks when in the face of constraints in computational resources.

  12. Investigations of High Pressure Acoustic Waves in Resonators with Seal-like Features

    NASA Technical Reports Server (NTRS)

    Daniels, Christopher; Steinetz, Bruce; Finkbeiner, Joshua

    2003-01-01

    A conical resonator (having a dissonant acoustic design) was tested in four configurations: (1) baseline resonator with closed ends and no blockage, (2) closed resonator with internal blockage, (3) ventilated resonator with no blockage, and (4) ventilated resonator with an applied pressure differential. These tests were conducted to investigate the effects of blockage and ventilation holes on dynamic pressurization. Additionally, the investigation was to determine the ability of acoustic pressurization to impede flow through the resonator. In each of the configurations studied, the entire resonator was oscillated at the gas resonant frequency while dynamic pressure, static pressure, and temperature of the fluid were measured. In the final configuration, flow through the resonator was recorded for three oscillation conditions. Ambient condition air was used as the working fluid.

  13. A Deep Ensemble Learning Method for Monaural Speech Separation

    PubMed Central

    Zhang, Xiao-Lei; Wang, DeLiang

    2016-01-01

    Monaural speech separation is a fundamental problem in robust speech processing. Recently, deep neural network (DNN)-based speech separation methods, which predict either clean speech or an ideal time-frequency mask, have demonstrated remarkable performance improvement. However, a single DNN with a given window length does not leverage contextual information sufficiently, and the differences between the two optimization objectives are not well understood. In this paper, we propose a deep ensemble method, named multicontext networks, to address monaural speech separation. The first multicontext network averages the outputs of multiple DNNs whose inputs employ different window lengths. The second multicontext network is a stack of multiple DNNs. Each DNN in a module of the stack takes the concatenation of original acoustic features and expansion of the soft output of the lower module as its input, and predicts the ratio mask of the target speaker; the DNNs in the same module employ different contexts. We have conducted extensive experiments with three speech corpora. The results demonstrate the effectiveness of the proposed method. We have also compared the two optimization objectives systematically and found that predicting the ideal time-frequency mask is more efficient in utilizing clean training speech, while predicting clean speech is less sensitive to SNR variations. PMID:27917394

  14. Processing changes when listening to foreign-accented speech

    PubMed Central

    Romero-Rivas, Carlos; Martin, Clara D.; Costa, Albert

    2015-01-01

    This study investigates the mechanisms responsible for fast changes in processing foreign-accented speech. Event Related brain Potentials (ERPs) were obtained while native speakers of Spanish listened to native and foreign-accented speakers of Spanish. We observed a less positive P200 component for foreign-accented speech relative to native speech comprehension. This suggests that the extraction of spectral information and other important acoustic features was hampered during foreign-accented speech comprehension. However, the amplitude of the N400 component for foreign-accented speech comprehension decreased across the experiment, suggesting the use of a higher level, lexical mechanism. Furthermore, during native speech comprehension, semantic violations in the critical words elicited an N400 effect followed by a late positivity. During foreign-accented speech comprehension, semantic violations only elicited an N400 effect. Overall, our results suggest that, despite a lack of improvement in phonetic discrimination, native listeners experience changes at lexical-semantic levels of processing after brief exposure to foreign-accented speech. Moreover, these results suggest that lexical access, semantic integration and linguistic re-analysis processes are permeable to external factors, such as the accent of the speaker. PMID:25859209

  15. The vocal repertoire of the domesticated zebra finch: a data-driven approach to decipher the information-bearing acoustic features of communication signals.

    PubMed

    Elie, Julie E; Theunissen, Frédéric E

    2016-03-01

    Although a universal code for the acoustic features of animal vocal communication calls may not exist, the thorough analysis of the distinctive acoustical features of vocalization categories is important not only to decipher the acoustical code for a specific species but also to understand the evolution of communication signals and the mechanisms used to produce and understand them. Here, we recorded more than 8000 examples of almost all the vocalizations of the domesticated zebra finch, Taeniopygia guttata: vocalizations produced to establish contact, to form and maintain pair bonds, to sound an alarm, to communicate distress or to advertise hunger or aggressive intents. We characterized each vocalization type using complete representations that avoided any a priori assumptions on the acoustic code, as well as classical bioacoustics measures that could provide more intuitive interpretations. We then used these acoustical features to rigorously determine the potential information-bearing acoustical features for each vocalization type using both a novel regularized classifier and an unsupervised clustering algorithm. Vocalization categories are discriminated by the shape of their frequency spectrum and by their pitch saliency (noisy to tonal vocalizations) but not particularly by their fundamental frequency. Notably, the spectral shape of zebra finch vocalizations contains peaks or formants that vary systematically across categories and that would be generated by active control of both the vocal organ (source) and the upper vocal tract (filter).

  16. Increasing diversity of neural responses to speech sounds across the central auditory pathway.

    PubMed

    Ranasinghe, K G; Vrana, W A; Matney, C J; Kilgard, M P

    2013-11-12

    Neurons at higher stations of each sensory system are responsive to feature combinations not present at lower levels. As a result, the activity of these neurons becomes less redundant than lower levels. We recorded responses to speech sounds from the inferior colliculus and the primary auditory cortex neurons of rats, and tested the hypothesis that primary auditory cortex neurons are more sensitive to combinations of multiple acoustic parameters compared to inferior colliculus neurons. We independently eliminated periodicity information, spectral information and temporal information in each consonant and vowel sound using a noise vocoder. This technique made it possible to test several key hypotheses about speech sound processing. Our results demonstrate that inferior colliculus responses are spatially arranged and primarily determined by the spectral energy and the fundamental frequency of speech, whereas primary auditory cortex neurons generate widely distributed responses to multiple acoustic parameters, and are not strongly influenced by the fundamental frequency of speech. We found no evidence that inferior colliculus or primary auditory cortex was specialized for speech features such as voice onset time or formants. The greater diversity of responses in primary auditory cortex compared to inferior colliculus may help explain how the auditory system can identify a wide range of speech sounds across a wide range of conditions without relying on any single acoustic cue.

  17. Simulation study and guidelines to generate Laser-induced Surface Acoustic Waves for human skin feature detection

    NASA Astrophysics Data System (ADS)

    Li, Tingting; Fu, Xing; Chen, Kun; Dorantes-Gonzalez, Dante J.; Li, Yanning; Wu, Sen; Hu, Xiaotang

    2015-12-01

    Despite the seriously increasing number of people contracting skin cancer every year, limited attention has been given to the investigation of human skin tissues. To this regard, Laser-induced Surface Acoustic Wave (LSAW) technology, with its accurate, non-invasive and rapid testing characteristics, has recently shown promising results in biological and biomedical tissues. In order to improve the measurement accuracy and efficiency of detecting important features in highly opaque and soft surfaces such as human skin, this paper identifies the most important parameters of a pulse laser source, as well as provides practical guidelines to recommended proper ranges to generate Surface Acoustic Waves (SAWs) for characterization purposes. Considering that melanoma is a serious type of skin cancer, we conducted a finite element simulation-based research on the generation and propagation of surface waves in human skin containing a melanoma-like feature, determine best pulse laser parameter ranges of variation, simulation mesh size and time step, working bandwidth, and minimal size of detectable melanoma.

  18. Acoustic analysis in Mudejar-Gothic churches: Experimental results

    NASA Astrophysics Data System (ADS)

    Galindo, Miguel; Zamarreño, Teófilo; Girón, Sara

    2005-05-01

    This paper describes the preliminary results of research work in acoustics, conducted in a set of 12 Mudejar-Gothic churches in the city of Seville in the south of Spain. Despite common architectural style, the churches feature individual characteristics and have volumes ranging from 3947 to 10 708 m3. Acoustic parameters were measured in unoccupied churches according to the ISO-3382 standard. An extensive experimental study was carried out using impulse response analysis through a maximum length sequence measurement system in each church. It covered aspects such as reverberation (reverberation times, early decay times), distribution of sound levels (sound strength); early to late sound energy parameters derived from the impulse responses (center time, clarity for speech, clarity, definition, lateral energy fraction), and speech intelligibility (rapid speech transmission index), which all take both spectral and spatial distribution into account. Background noise was also measured to obtain the NR indices. The study describes the acoustic field inside each temple and establishes a discussion for each one of the acoustic descriptors mentioned by using the theoretical models available and the principles of architectural acoustics. Analysis of the quality of the spaces for music and speech is carried out according to the most widespread criteria for auditoria. .

  19. Acoustic analysis in Mudejar-Gothic churches: experimental results.

    PubMed

    Galindo, Miguel; Zamarreño, Teófilo; Girón, Sara

    2005-05-01

    This paper describes the preliminary results of research work in acoustics, conducted in a set of 12 Mudejar-Gothic churches in the city of Seville in the south of Spain. Despite common architectural style, the churches feature individual characteristics and have volumes ranging from 3947 to 10 708 m3. Acoustic parameters were measured in unoccupied churches according to the ISO-3382 standard. An extensive experimental study was carried out using impulse response analysis through a maximum length sequence measurement system in each church. It covered aspects such as reverberation (reverberation times, early decay times), distribution of sound levels (sound strength); early to late sound energy parameters derived from the impulse responses (center time, clarity for speech, clarity, definition, lateral energy fraction), and speech intelligibility (rapid speech transmission index), which all take both spectral and spatial distribution into account. Background noise was also measured to obtain the NR indices. The study describes the acoustic field inside each temple and establishes a discussion for each one of the acoustic descriptors mentioned by using the theoretical models available and the principles of architectural acoustics. Analysis of the quality of the spaces for music and speech is carried out according to the most widespread criteria for auditoria.

  20. Features of Propagation of the Acoustic-Gravity Waves Generated by High-Power Periodic Radiation

    NASA Astrophysics Data System (ADS)

    Chernogor, L. F.; Frolov, V. L.

    2013-09-01

    We present the results of the bandpass filtering of temporal variations of the Doppler frequency shift of radio signals from a vertical-sounding Doppler radar located near the city of Kharkov when the ionosphere was heated by high-power periodic (with 10 and 15-min periods) radiation from the Sura facility. The filtering was done in the ranges of periods that are close to the acoustic cutoff period and the Brunt—Väisälä period (4-6, 8-12, and 13-17 min). Oscillations with periods of 4-6 min and amplitudes of 50-100 mHz were not recorded in fact. Oscillations with periods of 8-12 and 13-17 min and amplitudes of 60-100 mHz were detected in almost all the sessions. In the former and the latter oscillations, the time of delay with respect to the heater switch-on was close to 100 min and about 40-50 min, respectively. These values correspond to group propagation velocities of about 160 and 320-400 m/s. The Doppler shift oscillations were caused by the acoustic-gravity waves which led to periodic variations in the electron number density with a relative amplitude of about 0.1-1.0%. It was demonstrated that the acoustic-gravity waves were not recorded when the effective power of the Sura facility was equal to 50 MW and they were confidently observed when the effective power was increased up to 130 MW. It is shown that the period of the wave processes was determined by the period of the heating-pause cycles, and the duration of the wave trains did not depend on the duration of the series of heating-pause cycles. The data suggest that the generation mechanism of recorded wave disturbances is different from the mechanism proposed in 1970-1990.

  1. Speech Research

    NASA Astrophysics Data System (ADS)

    Several articles addressing topics in speech research are presented. The topics include: exploring the functional significance of physiological tremor: A biospectroscopic approach; differences between experienced and inexperienced listeners to deaf speech; a language-oriented view of reading and its disabilities; Phonetic factors in letter detection; categorical perception; Short-term recall by deaf signers of American sign language; a common basis for auditory sensory storage in perception and immediate memory; phonological awareness and verbal short-term memory; initiation versus execution time during manual and oral counting by stutterers; trading relations in the perception of speech by five-year-old children; the role of the strap muscles in pitch lowering; phonetic validation of distinctive features; consonants and syllable boundaires; and vowel information in postvocalic frictions.

  2. Detection of Clinical Depression in Adolescents’ Speech During Family Interactions

    PubMed Central

    Low, Lu-Shih Alex; Maddage, Namunu C.; Lech, Margaret; Sheeber, Lisa B.; Allen, Nicholas B.

    2013-01-01

    The properties of acoustic speech have previously been investigated as possible cues for depression in adults. However, these studies were restricted to small populations of patients and the speech recordings were made during patients’ clinical interviews or fixed-text reading sessions. Symptoms of depression often first appear during adolescence at a time when the voice is changing, in both males and females, suggesting that specific studies of these phenomena in adolescent populations are warranted. This study investigated acoustic correlates of depression in a large sample of 139 adolescents (68 clinically depressed and 71 controls). Speech recordings were made during naturalistic interactions between adolescents and their parents. Prosodic, cepstral, spectral, and glottal features, as well as features derived from the Teager energy operator (TEO), were tested within a binary classification framework. Strong gender differences in classification accuracy were observed. The TEO-based features clearly outperformed all other features and feature combinations, providing classification accuracy ranging between 81%–87% for males and 72%–79% for females. Close, but slightly less accurate, results were obtained by combining glottal features with prosodic and spectral features (67%–69% for males and 70%–75% for females). These findings indicate the importance of nonlinear mechanisms associated with the glottal flow formation as cues for clinical depression. PMID:21075715

  3. Some acoustic features of nasal and nasalized vowels: a target for vowel nasalization.

    PubMed

    Feng, G; Castelli, E

    1996-06-01

    In order to characterize acoustic properties of nasal and nasalized vowels, these sounds will be considered as a dynamic trend from an oral configuration toward an [n]-like configuration. The latter can be viewed as a target for vowel nasalization. This target corresponds to the pharyngonasal tract and it can be modeled, with some simplifications, by a single tract without any parallel paths. Thus the first two resonance frequencies (at about 300 and 1000 Hz) characterize this target well. A series of measurements has been carried out in order to describe the acoustic characteristics of the target. Measured transfer functions confirm the resonator nature of the low-frequency peak. The introduction of such a target allows the conception of the nasal vowels as a trend beginning with a simple configuration, which is terminated in the same manner, so allowing the complex nasal phenomena to be bounded. A complete study of pole-zero evolutions for the nasalization of the 11 French vowels is presented. It allows the proposition of a common strategy for the nasalization of all vowels, so a true nasal vowel can be placed in this nasalization frame. The measured transfer functions for several French nasal vowels are also given.

  4. Lip Kinematics for /p/ and /b/ Production during Whispered and Voiced Speech

    PubMed Central

    Higashikawa, Masahiko; Green, Jordan R.; Moore, Christopher A.; Minifie, Fred D.

    2014-01-01

    In the absence of voicing, the discrimination of ‘voiced’ and ‘voiceless’ stop consonants in whispered speech relies on such acoustic cues as burst duration and amplitude, and formant transition characteristics. The articulatory processes that generate these features of whispered speech remain speculative. This preliminary investigation examines the articulatory kinematics differences between whispered /p/ and /b/, which may underlie the acoustic differences previously reported for these sounds. Computerized video-tracking methods were used to evaluate kinematic differences between voiced and voiceless stops. Seven subjects produced the target utterances ‘my papa puppy’ and ‘my baba puppy’ in voiced and whispered speech modes. The results revealed that mean peak opening and closing velocities for /b/ were significantly greater than those for /p/ during whispered speech. No differences in peak velocity for either oral closing or opening were observed during voiced speech. The maximum distance between the lips for oral opening for /b/ was significantly greater than for /p/ during whisper, whereas no difference was observed during voiced speech. These data supported the suggestion that whispered speech and voiced speech rely on distinct motor control processes. PMID:12566763

  5. Thai Automatic Speech Recognition

    DTIC Science & Technology

    2005-01-01

    reported elsewhere. 1. Introduction This research was performed as part of the DARPA-Babylon program aimed at rapidly developing multilingual speech-to...used in an external DARPA evaluation involving medical scenarios between an American Doctor and a naïve monolingual Thai patient. 2. Thai Language...To create more general acoustic models we collected read speech data from native speakers based on the concepts of our multilingual data collection

  6. Department of Cybernetic Acoustics

    NASA Astrophysics Data System (ADS)

    The development of the theory, instrumentation and applications of methods and systems for the measurement, analysis, processing and synthesis of acoustic signals within the audio frequency range, particularly of the speech signal and the vibro-acoustic signal emitted by technical and industrial equipments treated as noise and vibration sources was discussed. The research work, both theoretical and experimental, aims at applications in various branches of science, and medicine, such as: acoustical diagnostics and phoniatric rehabilitation of pathological and postoperative states of the speech organ; bilateral ""man-machine'' speech communication based on the analysis, recognition and synthesis of the speech signal; vibro-acoustical diagnostics and continuous monitoring of the state of machines, technical equipments and technological processes.

  7. A phonetic investigation of single word versus connected speech production in children with persisting speech difficulties relating to cleft palate.

    PubMed

    Howard, Sara

    2013-03-01

    Objective : To investigate the phonetic and phonological parameters of speech production associated with cleft palate in single words and in sentence repetition in order to explore the impact of connected speech processes, prosody, and word juncture on word production across contexts. Participants : Two boys (aged 9 years 5 months and 11 years 0 months) with persisting speech impairments related to a history of unilateral cleft lip and palate formed the main focus of the study; three typical adult male speakers provided control data. Method : Audio, video, and electropalatographic recordings were made of the participants producing single words and repeating two sets of sentences. The data were transcribed and the electropalatographic recordings were analyzed to explore lingual-palatal contact patterns across the different speech conditions. Acoustic analysis was used to further inform the perceptual analysis and to make specific durational measurements. Results : The two boys' speech production differed across the speech conditions. Both boys showed typical and atypical phonetic features in their connected speech production. One boy, although often unintelligible, resembled the adult speakers more closely prosodically and in his specific connected speech behaviors at word boundaries. The second boy produced developmentally atypical phonetic adjustments at word boundaries that appeared to promote intelligibility at the expense of naturalness. Conclusion : For older children with persisting speech impairments, it is particularly important to examine specific features of connected speech production, including word juncture and prosody. Sentence repetition data provide useful information to this end, but further investigations encompassing detailed perceptual and instrumental analysis of real conversational data are warranted.

  8. A Screening Approach for Classroom Acoustics Using Web-Based Listening Tests and Subjective Ratings

    PubMed Central

    Persson Waye, Kerstin; Magnusson, Lennart; Fredriksson, Sofie; Croy, Ilona

    2015-01-01

    Background Perception of speech is crucial in school where speech is the main mode of communication. The aim of the study was to evaluate whether a web based approach including listening tests and questionnaires could be used as a screening tool for poor classroom acoustics. The prime focus was the relation between pupils’ comprehension of speech, the classroom acoustics and their description of the acoustic qualities of the classroom. Methodology/Principal Findings In total, 1106 pupils aged 13-19, from 59 classes and 38 schools in Sweden participated in a listening study using Hagerman’s sentences administered via Internet. Four listening conditions were applied: high and low background noise level and positions close and far away from the loudspeaker. The pupils described the acoustic quality of the classroom and teachers provided information on the physical features of the classroom using questionnaires. Conclusions/Significance In 69% of the classes, at least three pupils described the sound environment as adverse and in 88% of the classes one or more pupil reported often having difficulties concentrating due to noise. The pupils’ comprehension of speech was strongly influenced by the background noise level (p<0.001) and distance to the loudspeakers (p<0.001). Of the physical classroom features, presence of suspended acoustic panels (p<0.05) and length of the classroom (p<0.01) predicted speech comprehension. Of the pupils’ descriptions of acoustic qualities, clattery significantly (p<0.05) predicted speech comprehension. Clattery was furthermore associated to difficulties understanding each other, while the description noisy was associated to concentration difficulties. The majority of classrooms do not seem to have an optimal sound environment. The pupil’s descriptions of acoustic qualities and listening tests can be one way of predicting sound conditions in the classroom. PMID:25615692

  9. Production of Syntactic Stress in Alaryngeal Speech.

    ERIC Educational Resources Information Center

    Gandour, Jack; Weinberg, Bernd

    1985-01-01

    Reports on an acoustical investigation of syntactic stress in alaryngeal speech. Measurements were made of fundamental frequency, relative intensity, vowel duration, and intersyllable duration. Findings suggest that stress contrasts in alaryngeal speech are based on a complex of acoustic cues which are influenced by linguistic structure.…

  10. Linear degrees of freedom in speech production: analysis of cineradio- and labio-film data and articulatory-acoustic modeling.

    PubMed

    Beautemps, D; Badin, P; Bailly, G

    2001-05-01

    The following contribution addresses several issues concerning speech degrees of freedom in French oral vowels, stop, and fricative consonants based on an analysis of tongue and lip shapes extracted from cineradio- and labio-films. The midsagittal tongue shapes have been submitted to a linear decomposition where some of the loading factors were selected such as jaw and larynx position while four other components were derived from principal component analysis (PCA). For the lips, in addition to the more traditional protrusion and opening components, a supplementary component was extracted to explain the upward movement of both the upper and lower lips in [v] production. A linear articulatory model was developed; the six tongue degrees of freedom were used as the articulatory control parameters of the midsagittal tongue contours and explained 96% of the tongue data variance. These control parameters were also used to specify the frontal lip width dimension derived from the labio-film front views. Finally, this model was complemented by a conversion model going from the midsagittal to the area function, based on a fitting of the midsagittal distances and the formant frequencies for both vowels and consonants.

  11. Features of CO2 fracturing deduced from acoustic emission and microscopy in laboratory experiments

    NASA Astrophysics Data System (ADS)

    Ishida, Tsuyoshi; Chen, Youqing; Bennour, Ziad; Yamashita, Hiroto; Inui, Shuhei; Nagaya, Yuya; Naoi, Makoto; Chen, Qu; Nakayama, Yoshiki; Nagano, Yu

    2016-11-01

    We conducted hydraulic fracturing (HF) experiments on 170 mm cubic granite specimens with a 20 mm diameter central hole to investigate how fluid viscosity affects HF process and crack properties. In experiments using supercritical carbon dioxide (SC-CO2), liquid carbon dioxide (L-CO2), water, and viscous oil with viscosity of 0.051-336.6 mPa · s, we compared the results for breakdown pressure, the distribution and fracturing mechanism of acoustic emission, and the microstructure of induced cracks revealed by using an acrylic resin containing a fluorescent compound. Fracturing with low-viscosity fluid induced three-dimensionally sinuous cracks with many secondary branches, which seem to be desirable pathways for enhanced geothermal system, shale gas recovery, and other processes.

  12. Nonlinear ion-acoustic double-layers in electronegative plasmas with electrons featuring Tsallis distribution

    NASA Astrophysics Data System (ADS)

    Ghebache, Siham; Tribeche, Mouloud

    2016-04-01

    Weakly nonlinear ion-acoustic (IA) double-layers (DLs), which accompany electronegative plasmas composed of positive ions, negative ions, and nonextensive electrons are investigated. A generalized Korteweg-de Vries equation with a cubic nonlinearity is derived using a reductive perturbation method. Different types of electronegative plasmas inspired from the experimental studies of Ichiki et al. (2001) are discussed. It is shown that the IA wave phase velocity, in different mixtures of negative and positive ions, decreases as the nonextensive parameter q increases, before levelling-off at a constant value for larger q. Moreover, a relative increase of Q involves an enhancement of the IA phase velocity. Existence domains of either solitary waves or double-layers are then presented and their parametric dependence is determined. Owing to the electron nonextensivity, our present plasma model can admit compressive as well as rarefactive IA-DLs.

  13. Nonlinear features of ion acoustic shock waves in dissipative magnetized dusty plasma

    NASA Astrophysics Data System (ADS)

    Sahu, Biswajit; Sinha, Anjana; Roychoudhury, Rajkumar

    2014-10-01

    The nonlinear propagation of small as well as arbitrary amplitude shocks is investigated in a magnetized dusty plasma consisting of inertia-less Boltzmann distributed electrons, inertial viscous cold ions, and stationary dust grains without dust-charge fluctuations. The effects of dissipation due to viscosity of ions and external magnetic field, on the properties of ion acoustic shock structure, are investigated. It is found that for small amplitude waves, the Korteweg-de Vries-Burgers (KdVB) equation, derived using Reductive Perturbation Method, gives a qualitative behaviour of the transition from oscillatory wave to shock structure. The exact numerical solution for arbitrary amplitude wave differs somehow in the details from the results obtained from KdVB equation. However, the qualitative nature of the two solutions is similar in the sense that a gradual transition from KdV oscillation to shock structure is observed with the increase of the dissipative parameter.

  14. Features of the Acoustic Mechanism of Core-Collapse Supernova Explosions

    NASA Astrophysics Data System (ADS)

    Burrows, A.; Livne, E.; Dessart, L.; Ott, C. D.; Murphy, J.

    2007-01-01

    In the context of 2D, axisymmetric, multigroup, radiation/hydrodynamic simulations of core-collapse supernovae over the full 180° domain, we present an exploration of the progenitor dependence of the acoustic mechanism of explosion. All progenitor models we have tested with our Newtonian code explode. However, some of the cores left behind in our simulations, particularly for the more massive progenitors, have baryon masses that are larger than the canonical ~1.5 Msolar of well-measured pulsars. We investigate the roles of the standing accretion shock instability (SASI), the excitation of core g-modes, the generation of core acoustic power, the ejection of matter with r-process potential, the windlike character of the explosion, and the fundamental anisotropy of the blasts. We find that the breaking of spherical symmetry is central to the supernova phenomenon, the delays to explosion can be long, and the blasts, when top-bottom asymmetric, are self-collimating. We see indications that the initial explosion energies are larger for the more massive progenitors and smaller for the less massive progenitors and that the neutrino contribution to the explosion energy may be an increasing function of progenitor mass. However, the explosion energy is still accumulating by the end of our simulations and has not converged to final values. The degree of explosion asymmetry we obtain is completely consistent with that inferred from the polarization measurements of Type Ic supernovae. Furthermore, we calculate for the first time the magnitude and sign of the net impulse on the core due to anisotropic neutrino emission and suggest that hydrodynamic and neutrino recoils in the context of our asymmetric explosions afford a natural mechanism for observed pulsar proper motions.

  15. Distinct Cortical Pathways for Music and Speech Revealed by Hypothesis-Free Voxel Decomposition.

    PubMed

    Norman-Haignere, Sam; Kanwisher, Nancy G; McDermott, Josh H

    2015-12-16

    The organization of human auditory cortex remains unresolved, due in part to the small stimulus sets common to fMRI studies and the overlap of neural populations within voxels. To address these challenges, we measured fMRI responses to 165 natural sounds and inferred canonical response profiles ("components") whose weighted combinations explained voxel responses throughout auditory cortex. This analysis revealed six components, each with interpretable response characteristics despite being unconstrained by prior functional hypotheses. Four components embodied selectivity for particular acoustic features (frequency, spectrotemporal modulation, pitch). Two others exhibited pronounced selectivity for music and speech, respectively, and were not explainable by standard acoustic features. Anatomically, music and speech selectivity concentrated in distinct regions of non-primary auditory cortex. However, music selectivity was weak in raw voxel responses, and its detection required a decomposition method. Voxel decomposition identifies primary dimensions of response variation across natural sounds, revealing distinct cortical pathways for music and speech.

  16. Speech Analysis Systems: An Evaluation.

    ERIC Educational Resources Information Center

    Read, Charles; And Others

    1992-01-01

    Performance characteristics are reviewed for seven computerized systems marketed for acoustic speech analysis: CSpeech, CSRE, ILS-PC, Kay Elemetrics model 550 Sona-Graph, MacSpeech Lab II, MSL, and Signalyze. Characteristics reviewed include system components, basic capabilities, documentation, user interface, data formats and journaling, and…

  17. A Study in Speech Recognition Using a Kohonen Neural Network Dynamic Programming and Multi-Feature Fusion

    DTIC Science & Technology

    1989-12-01

    30 Digitized Speech Processing. .. .. .. .. .. ... ... 30 Formant Processing. .. .. .. ... ... ... ... .. 31...Phonetic and Orthographic Representation of American English Phonemes 23 3. Average Vowel Formant Frequencies ....................... 23 4. Standard SPIRE...STRAFE CHARLIE’ 27 7. Sample .COM File for Batch Processing .................... 28 8. Formant Processing Using Energy Gate for Utterance ’SELECT GUN

  18. Automatic speech recognition

    NASA Astrophysics Data System (ADS)

    Espy-Wilson, Carol

    2005-04-01

    Great strides have been made in the development of automatic speech recognition (ASR) technology over the past thirty years. Most of this effort has been centered around the extension and improvement of Hidden Markov Model (HMM) approaches to ASR. Current commercially-available and industry systems based on HMMs can perform well for certain situational tasks that restrict variability such as phone dialing or limited voice commands. However, the holy grail of ASR systems is performance comparable to humans-in other words, the ability to automatically transcribe unrestricted conversational speech spoken by an infinite number of speakers under varying acoustic environments. This goal is far from being reached. Key to the success of ASR is effective modeling of variability in the speech signal. This tutorial will review the basics of ASR and the various ways in which our current knowledge of speech production, speech perception and prosody can be exploited to improve robustness at every level of the system.

  19. Towards Contactless Silent Speech Recognition Based on Detection of Active and Visible Articulators Using IR-UWB Radar.

    PubMed

    Shin, Young Hoon; Seo, Jiwon

    2016-10-29

    People with hearing or speaking disabilities are deprived of the benefits of conventional speech recognition technology because it is based on acoustic signals. Recent research has focused on silent speech recognition systems that are based on the motions of a speaker's vocal tract and articulators. Because most silent speech recognition systems use contact sensors that are very inconvenient to users or optical systems that are susceptible to environmental interference, a contactless and robust solution is hence required. Toward this objective, this paper presents a series of signal processing algorithms for a contactless silent speech recognition system using an impulse radio ultra-wide band (IR-UWB) radar. The IR-UWB radar is used to remotely and wirelessly detect motions of the lips and jaw. In order to extract the necessary features of lip and jaw motions from the received radar signals, we propose a feature extraction algorithm. The proposed algorithm noticeably improved speech recognition performance compared to the existing algorithm during our word recognition test with five speakers. We also propose a speech activity detection algorithm to automatically select speech segments from continuous input signals. Thus, speech recognition processing is performed only when speech segments are detected. Our testbed consists of commercial off-the-shelf radar products, and the proposed algorithms are readily applicable without designing specialized radar hardware for silent speech processing.

  20. Towards Contactless Silent Speech Recognition Based on Detection of Active and Visible Articulators Using IR-UWB Radar

    PubMed Central

    Shin, Young Hoon; Seo, Jiwon

    2016-01-01

    People with hearing or speaking disabilities are deprived of the benefits of conventional speech recognition technology because it is based on acoustic signals. Recent research has focused on silent speech recognition systems that are based on the motions of a speaker’s vocal tract and articulators. Because most silent speech recognition systems use contact sensors that are very inconvenient to users or optical systems that are susceptible to environmental interference, a contactless and robust solution is hence required. Toward this objective, this paper presents a series of signal processing algorithms for a contactless silent speech recognition system using an impulse radio ultra-wide band (IR-UWB) radar. The IR-UWB radar is used to remotely and wirelessly detect motions of the lips and jaw. In order to extract the necessary features of lip and jaw motions from the received radar signals, we propose a feature extraction algorithm. The proposed algorithm noticeably improved speech recognition performance compared to the existing algorithm during our word recognition test with five speakers. We also propose a speech activity detection algorithm to automatically select speech segments from continuous input signals. Thus, speech recognition processing is performed only when speech segments are detected. Our testbed consists of commercial off-the-shelf radar products, and the proposed algorithms are readily applicable without designing specialized radar hardware for silent speech processing. PMID:27801867

  1. A keyword spotting model using perceptually significant energy features

    NASA Astrophysics Data System (ADS)

    Umakanthan, Padmalochini

    The task of a keyword recognition system is to detect the presence of certain words in a conversation based on the linguistic information present in human speech. Such keyword spotting systems have applications in homeland security, telephone surveillance and human-computer interfacing. General procedure of a keyword spotting system involves feature generation and matching. In this work, new set of features that are based on the psycho-acoustic masking nature of human speech are proposed. After developing these features a time aligned pattern matching process was implemented to locate the words in a set of unknown words. A word boundary detection technique based on frame classification using the nonlinear characteristics of speech is also addressed in this work. Validation of this keyword spotting model was done using widely acclaimed Cepstral features. The experimental results indicate the viability of using these perceptually significant features as an augmented feature set in keyword spotting.

  2. Automatic detection of wheezes by evaluation of multiple acoustic feature extraction methods and C-weighted SVM

    NASA Astrophysics Data System (ADS)

    Sosa, Germán. D.; Cruz-Roa, Angel; González, Fabio A.

    2015-01-01

    This work addresses the problem of lung sound classification, in particular, the problem of distinguishing between wheeze and normal sounds. Wheezing sound detection is an important step to associate lung sounds with an abnormal state of the respiratory system, usually associated with tuberculosis or another chronic obstructive pulmonary diseases (COPD). The paper presents an approach for automatic lung sound classification, which uses different state-of-the-art sound features in combination with a C-weighted support vector machine (SVM) classifier that works better for unbalanced data. Feature extraction methods used here are commonly applied in speech recognition and related problems thanks to the fact that they capture the most informative spectral content from the original signals. The evaluated methods were: Fourier transform (FT), wavelet decomposition using Wavelet Packet Transform bank of filters (WPT) and Mel Frequency Cepstral Coefficients (MFCC). For comparison, we evaluated and contrasted the proposed approach against previous works using different combination of features and/or classifiers. The different methods were evaluated on a set of lung sounds including normal and wheezing sounds. A leave-two-out per-case cross-validation approach was used, which, in each fold, chooses as validation set a couple of cases, one including normal sounds and the other including wheezing sounds. Experimental results were reported in terms of traditional classification performance measures: sensitivity, specificity and balanced accuracy. Our best results using the suggested approach, C-weighted SVM and MFCC, achieve a 82.1% of balanced accuracy obtaining the best result for this problem until now. These results suggest that supervised classifiers based on kernel methods are able to learn better models for this challenging classification problem even using the same feature extraction methods.

  3. Effect of body position on vocal tract acoustics: Acoustic pharyngometry and vowel formants.

    PubMed

    Vorperian, Houri K; Kurtzweil, Sara L; Fourakis, Marios; Kent, Ray D; Tillman, Katelyn K; Austin, Diane

    2015-08-01

    The anatomic basis and articulatory features of speech production are often studied with imaging studies that are typically acquired in the supine body position. It is important to determine if changes in body orientation to the gravitational field alter vocal tract dimensions and speech acoustics. The purpose of this study was to assess the effect of body position (upright versus supine) on (1) oral and pharyngeal measurements derived from acoustic pharyngometry and (2) acoustic measurements of fundamental frequency (F0) and the first four formant frequencies (F1-F4) for the quadrilateral point vowels. Data were obtained for 27 male and female participants, aged 17 to 35 yrs. Acoustic pharyngometry showed a statistically significant effect of body position on volumetric measurements, with smaller values in the supine than upright position, but no changes in length measurements. Acoustic analyses of vowels showed significantly larger values in the supine than upright position for the variables of F0, F3, and the Euclidean distance from the centroid to each corner vowel in the F1-F2-F3 space. Changes in body position affected measurements of vocal tract volume but not length. Body position also affected the aforementioned acoustic variables, but the main vowel formants were preserved.

  4. Effect of body position on vocal tract acoustics: Acoustic pharyngometry and vowel formants

    PubMed Central

    Vorperian, Houri K.; Kurtzweil, Sara L.; Fourakis, Marios; Kent, Ray D.; Tillman, Katelyn K.; Austin, Diane

    2015-01-01

    The anatomic basis and articulatory features of speech production are often studied with imaging studies that are typically acquired in the supine body position. It is important to determine if changes in body orientation to the gravitational field alter vocal tract dimensions and speech acoustics. The purpose of this study was to assess the effect of body position (upright versus supine) on (1) oral and pharyngeal measurements derived from acoustic pharyngometry and (2) acoustic measurements of fundamental frequency (F0) and the first four formant frequencies (F1–F4) for the quadrilateral point vowels. Data were obtained for 27 male and female participants, aged 17 to 35 yrs. Acoustic pharyngometry showed a statistically significant effect of body position on volumetric measurements, with smaller values in the supine than upright position, but no changes in length measurements. Acoustic analyses of vowels showed significantly larger values in the supine than upright position for the variables of F0, F3, and the Euclidean distance from the centroid to each corner vowel in the F1-F2-F3 space. Changes in body position affected measurements of vocal tract volume but not length. Body position also affected the aforementioned acoustic variables, but the main vowel formants were preserved. PMID:26328699

  5. Extraction of features from ultrasound acoustic emissions: a tool to assess the hydraulic vulnerability of Norway spruce trunkwood?

    PubMed Central

    Rosner, Sabine; Klein, Andrea; Wimmer, Rupert; Karlsson, Bo

    2011-01-01

    Summary • The aim of this study was to assess the hydraulic vulnerability of Norway spruce (Picea abies) trunkwood by extraction of selected features of acoustic emissions (AEs) detected during dehydration of standard size samples. • The hydraulic method was used as the reference method to assess the hydraulic vulnerability of trunkwood of different cambial ages. Vulnerability curves were constructed by plotting the percentage loss of conductivity vs an overpressure of compressed air. • Differences in hydraulic vulnerability were very pronounced between juvenile and mature wood samples; therefore, useful AE features, such as peak amplitude, duration and relative energy, could be filtered out. The AE rates of signals clustered by amplitude and duration ranges and the AE energies differed greatly between juvenile and mature wood at identical relative water losses. • Vulnerability curves could be constructed by relating the cumulated amount of relative AE energy to the relative loss of water and to xylem tension. AE testing in combination with feature extraction offers a readily automated and easy to use alternative to the hydraulic method. PMID:16771986

  6. System and method for investigating sub-surface features of a rock formation using compressional acoustic sources

    DOEpatents

    Vu, Cung Khac; Skelt, Christopher; Nihei, Kurt; Johnson, Paul A.; Guyer, Robert; Ten Cate, James A.; Le Bas, Pierre-Yves; Larmat, Carene S.

    2016-09-27

    A system and method for investigating rock formations outside a borehole are provided. The method includes generating a first compressional acoustic wave at a first frequency by a first acoustic source; and generating a second compressional acoustic wave at a second frequency by a second acoustic source. The first and the second acoustic sources are arranged within a localized area of the borehole. The first and the second acoustic waves intersect in an intersection volume outside the borehole. The method further includes receiving a third shear acoustic wave at a third frequency, the third shear acoustic wave returning to the borehole due to a non-linear mixing process in a non-linear mixing zone within the intersection volume at a receiver arranged in the borehole. The third frequency is equal to a difference between the first frequency and the second frequency.

  7. Contributions of speech science to the technology of man-machine voice interactions

    NASA Technical Reports Server (NTRS)

    Lea, Wayne A.

    1977-01-01

    Research in speech understanding was reviewed. Plans which include prosodics research, phonological rules for speech understanding systems, and continued interdisciplinary phonetics research are discussed. Improved acoustic phonetic analysis capabilities in speech recognizers are suggested.

  8. Speaker independent acoustic-to-articulatory inversion

    NASA Astrophysics Data System (ADS)

    Ji, An

    Acoustic-to-articulatory inversion, the determination of articulatory parameters from acoustic signals, is a difficult but important problem for many speech processing applications, such as automatic speech recognition (ASR) and computer aided pronunciation training (CAPT). In recent years, several approaches have been successfully implemented for speaker dependent models with parallel acoustic and kinematic training data. However, in many practical applications inversion is needed for new speakers for whom no articulatory data is available. In order to address this problem, this dissertation introduces a novel speaker adaptation approach called Parallel Reference Speaker Weighting (PRSW), based on parallel acoustic and articulatory Hidden Markov Models (HMM). This approach uses a robust normalized articulatory space and palate referenced articulatory features combined with speaker-weighted adaptation to form an inversion mapping for new speakers that can accurately estimate articulatory trajectories. The proposed PRSW method is evaluated on the newly collected Marquette electromagnetic articulography -- Mandarin Accented English (EMA-MAE) corpus using 20 native English speakers. Cross-speaker inversion results show that given a good selection of reference speakers with consistent acoustic and articulatory patterns, the PRSW approach gives good speaker independent inversion performance even without kinematic training data.

  9. Musical ability and non-native speech-sound processing are linked through sensitivity to pitch and spectral information.

    PubMed

    Kempe, Vera; Bublitz, Dennis; Brooks, Patricia J

    2015-05-01

    Is the observed link between musical ability and non-native speech-sound processing due to enhanced sensitivity to acoustic features underlying both musical and linguistic processing? To address this question, native English speakers (N = 118) discriminated Norwegian tonal contrasts and Norwegian vowels. Short tones differing in temporal, pitch, and spectral characteristics were used to measure sensitivity to the various acoustic features implicated in musical and speech processing. Musical ability was measured using Gordon's Advanced Measures of Musical Audiation. Results showed that sensitivity to specific acoustic features played a role in non-native speech-sound processing: Controlling for non-verbal intelligence, prior foreign language-learning experience, and sex, sensitivity to pitch and spectral information partially mediated the link between musical ability and discrimination of non-native vowels and lexical tones. The findings suggest that while sensitivity to certain acoustic features partially mediates the relationship between musical ability and non-native speech-sound processing, complex tests of musical ability also tap into other shared mechanisms.

  10. Language Specific Speech Perception and the Onset of Reading.

    ERIC Educational Resources Information Center

    Burnham, Denis

    2003-01-01

    Investigates the degree to which native speech perception is superior to non-native speech perception. Shows that language specific speech perception is a linguistic rather than an acoustic phenomenon. Discusses results in terms of early speech perception abilities, experience with oral communication, cognitive ability, alphabetic versus…

  11. Experiment in Learning to Discriminate Frequency Transposed Speech.

    ERIC Educational Resources Information Center

    Ahlstrom, K.G.; And Others

    In order to improve speech perception by transposing the speech signals to lower frequencies, to determine which aspects of the information in the acoustic speech signals were influenced by transposition, and to compare two different methods of training speech perception, 44 subjects were trained to discriminate between transposed words or…

  12. Lexical and Acoustic Features of Maternal Utterances Addressing Preverbal Infants in Picture Book Reading Link to 5-Year-Old Children's Language Development

    ERIC Educational Resources Information Center

    Liu, Huei-Mei

    2014-01-01

    Research Findings: I examined the long-term association between the lexical and acoustic features of maternal utterances during book reading and the language skills of infants and children. Maternal utterances were collected from 22 mother-child dyads in picture book-reading episodes when children were ages 6-12 months and 5 years. Two aspects of…

  13. Investigation of auditory processing disorder and language impairment using the speech-evoked auditory brainstem response.

    PubMed

    Rocha-Muniz, Caroline N; Befi-Lopes, Debora M; Schochat, Eliane

    2012-12-01

    This study investigated whether there are differences in the Speech-Evoked Auditory Brainstem Response among children with Typical Development (TD), (Central) Auditory Processing Disorder (C)APD, and Language Impairment (LI). The speech-evoked Auditory Brainstem Response was tested in 57 children (ages 6-12). The children were placed into three groups: TD (n = 18), (C)APD (n = 18) and LI (n = 21). Speech-evoked ABR were elicited using the five-formant syllable/da/. Three dimensions were defined for analysis, including timing, harmonics, and pitch. A comparative analysis of the responses between the typical development children and children with (C)APD and LI revealed abnormal encoding of the speech acoustic features that are characteristics of speech perception in children with (C)APD and LI, although the two groups differed in their abnormalities. While the children with (C)APD might had a greater difficulty distinguishing stimuli based on timing cues, the children with LI had the additional difficulty of distinguishing speech harmonics, which are important to the identification of speech sounds. These data suggested that an inefficient representation of crucial components of speech sounds may contribute to the difficulties with language processing found in children with LI. Furthermore, these findings may indicate that the neural processes mediated by the auditory brainstem differ among children with auditory processing and speech-language disorders.

  14. Inconsistency of speech in children with childhood apraxia of speech, phonological disorders, and typical speech

    NASA Astrophysics Data System (ADS)

    Iuzzini, Jenya

    There is a lack of agreement on the features used to differentiate Childhood Apraxia of Speech (CAS) from Phonological Disorders (PD). One criterion which has gained consensus is lexical inconsistency of speech (ASHA, 2007); however, no accepted measure of this feature has been defined. Although lexical assessment provides information about consistency of an item across repeated trials, it may not capture the magnitude of inconsistency within an item. In contrast, segmental analysis provides more extensive information about consistency of phoneme usage across multiple contexts and word-positions. The current research compared segmental and lexical inconsistency metrics in preschool-aged children with PD, CAS, and typical development (TD) to determine how inconsistency varies with age in typical and disordered speakers, and whether CAS and PD were differentiated equally well by both assessment levels. Whereas lexical and segmental analyses may be influenced by listener characteristics or speaker intelligibility, the acoustic signal is less vulnerable to these factors. In addition, the acoustic signal may reveal information which is not evident in the perceptual signal. A second focus of the current research was motivated by Blumstein et al.'s (1980) classic study on voice onset time (VOT) in adults with acquired apraxia of speech (AOS) which demonstrated a motor impairment underlying AOS. In the current study, VOT analyses were conducted to determine the relationship between age and group with the voicing distribution for bilabial and alveolar plosives. Findings revealed that 3-year-olds evidenced significantly higher inconsistency than 5-year-olds; segmental inconsistency approached 0% in 5-year-olds with TD, whereas it persisted in children with PD and CAS suggesting that for child in this age-range, inconsistency is a feature of speech disorder rather than typical development (Holm et al., 2007). Likewise, whereas segmental and lexical inconsistency were

  15. Linking Speech Perception and Neurophysiology: Speech Decoding Guided by Cascaded Oscillators Locked to the Input Rhythm

    PubMed Central

    Ghitza, Oded

    2011-01-01

    The premise of this study is that current models of speech perception, which are driven by acoustic features alone, are incomplete, and that the role of decoding time during memory access must be incorporated to account for the patterns of observed recognition phenomena. It is postulated that decoding time is governed by a cascade of neuronal oscillators, which guide template-matching operations at a hierarchy of temporal scales. Cascaded cortical oscillations in the theta, beta, and gamma frequency bands are argued to be crucial for speech intelligibility. Intelligibility is high so long as these oscillations remain phase locked to the auditory input rhythm. A model (Tempo) is presented which is capable of emulating recent psychophysical data on the intelligibility of speech sentences as a function of “packaging” rate (Ghitza and Greenberg, 2009). The data show that intelligibility of speech that is time-compressed by a factor of 3 (i.e., a high syllabic rate) is poor (above 50% word error rate), but is substantially restored when the information stream is re-packaged by the insertion of silent gaps in between successive compressed-signal intervals – a counterintuitive finding, difficult to explain using classical models of speech perception, but emerging naturally from the Tempo architecture. PMID:21743809

  16. Acoustic Longitudinal Field NIF Optic Feature Detection Map Using Time-Reversal & MUSIC

    SciTech Connect

    Lehman, S K

    2006-02-09

    We developed an ultrasonic longitudinal field time-reversal and MUltiple SIgnal Classification (MUSIC) based detection algorithm for identifying and mapping flaws in fused silica NIF optics. The algorithm requires a fully multistatic data set, that is one with multiple, independently operated, spatially diverse transducers, each transmitter of which, in succession, launches a pulse into the optic and the scattered signal measured and recorded at every receiver. We have successfully localized engineered ''defects'' larger than 1 mm in an optic. We confirmed detection and localization of 3 mm and 5 mm features in experimental data, and a 0.5 mm in simulated data with sufficiently high signal-to-noise ratio. We present the theory, experimental results, and simulated results.

  17. Deep bottleneck features for spoken language identification.

    PubMed

    Jiang, Bing; Song, Yan; Wei, Si; Liu, Jun-Hua; McLoughlin, Ian Vince; Dai, Li-Rong

    2014-01-01

    A key problem in spoken language identification (LID) is to design effective representations which are specific to language information. For example, in recent years, representations based on both phonotactic and acoustic features have proven their effectiveness for LID. Although advances in machine learning have led to significant improvements, LID performance is still lacking, especially for short duration speech utterances. With the hypothesis that language information is weak and represented only latently in speech, and is largely dependent on the statistical properties of the speech content, existing representations may be insufficient. Furthermore they may be susceptible to the variations caused by different speakers, specific content of the speech segments, and background noise. To address this, we propose using Deep Bottleneck Features (DBF) for spoken LID, motivated by the success of Deep Neural Networks (DNN) in speech recognition. We show that DBFs can form a low-dimensional compact representation of the original inputs with a powerful descriptive and discriminative capability. To evaluate the effectiveness of this, we design two acoustic models, termed DBF-TV and parallel DBF-TV (PDBF-TV), using a DBF based i-vector representation for each speech utterance. Results on NIST language recognition evaluation 2009 (LRE09) show significant improvements over state-of-the-art systems. By fusing the output of phonotactic and acoustic approaches, we achieve an EER of 1.08%, 1.89% and 7.01% for 30 s, 10 s and 3 s test utterances respectively. Furthermore, various DBF configurations have been extensively evaluated, and an optimal system proposed.

  18. Features and machine learning classification of connected speech samples from patients with autopsy proven Alzheimer's disease with and without additional vascular pathology.

    PubMed

    Rentoumi, Vassiliki; Raoufian, Ladan; Ahmed, Samrah; de Jager, Celeste A; Garrard, Peter

    2014-01-01

    Mixed vascular and Alzheimer-type dementia and pure Alzheimer's disease are both associated with changes in spoken language. These changes have, however, seldom been subjected to systematic comparison. In the present study, we analyzed language samples obtained during the course of a longitudinal clinical study from patients in whom one or other pathology was verified at post mortem. The aims of the study were twofold: first, to confirm the presence of differences in language produced by members of the two groups using quantitative methods of evaluation; and secondly to ascertain the most informative sources of variation between the groups. We adopted a computational approach to evaluate digitized transcripts of connected speech along a range of language-related dimensions. We then used machine learning text classification to assign the samples to one of the two pathological groups on the basis of these features. The classifiers' accuracies were tested using simple lexical features, syntactic features, and more complex statistical and information theory characteristics. Maximum accuracy was achieved when word occurrences and frequencies alone were used. Features based on syntactic and lexical complexity yielded lower discrimination scores, but all combinations of features showed significantly better performance than a baseline condition in which every transcript was assigned randomly to one of the two classes. The classification results illustrate the word content specific differences in the spoken language of the two groups. In addition, those with mixed pathology were found to exhibit a marked reduction in lexical variation and complexity compared to their pure AD counterparts.

  19. Speech perception as categorization

    PubMed Central

    Holt, Lori L.; Lotto, Andrew J.

    2010-01-01

    Speech perception (SP) most commonly refers to the perceptual mapping from the highly variable acoustic speech signal to a linguistic representation, whether it be phonemes, diphones, syllables, or words. This is an example of categorization, in that potentially discriminable speech sounds are assigned to functionally equivalent classes. In this tutorial, we present some of the main challenges to our understanding of the categorization of speech sounds and the conceptualization of SP that has resulted from these challenges. We focus here on issues and experiments that define open research questions relevant to phoneme categorization, arguing that SP is best understood as perceptual categorization, a position that places SP in direct contact with research from other areas of perception and cognition. PMID:20601702

  20. Reverberant speech recognition exploiting clarity index estimation

    NASA Astrophysics Data System (ADS)

    Parada, Pablo Peso; Sharma, Dushyant; Naylor, Patrick A.; Waterschoot, Toon van

    2015-12-01

    We present single-channel approaches to robust automatic speech recognition (ASR) in reverberant environments based on non-intrusive estimation of the clarity index ( C 50). Our best performing method includes the estimated value of C 50 in the ASR feature vector and also uses C 50 to select the most suitable ASR acoustic model according to the reverberation level. We evaluate our method on the REVERB Challenge database employing two different C 50 estimators and show that our method outperforms the best baseline of the challenge achieved without unsupervised acoustic model adaptation, i.e. using multi-condition hidden Markov models (HMMs). Our approach achieves a 22.4 % relative word error rate reduction in comparison to the best baseline of the challenge.

  1. Inherent emotional quality of human speech sounds.

    PubMed

    Myers-Schulz, Blake; Pujara, Maia; Wolf, Richard C; Koenigs, Michael

    2013-01-01

    During much of the past century, it was widely believed that phonemes-the human speech sounds that constitute words-have no inherent semantic meaning, and that the relationship between a combination of phonemes (a word) and its referent is simply arbitrary. Although recent work has challenged this picture by revealing psychological associations between certain phonemes and particular semantic contents, the precise mechanisms underlying these associations have not been fully elucidated. Here we provide novel evidence that certain phonemes have an inherent, non-arbitrary emotional quality. Moreover, we show that the perceived emotional valence of certain phoneme combinations depends on a specific acoustic feature-namely, the dynamic shift within the phonemes' first two frequency components. These data suggest a phoneme-relevant acoustic property influencing the communication of emotion in humans, and provide further evidence against previously held assumptions regarding the structure of human language. This finding has potential applications for a variety of social, educational, clinical, and marketing contexts.

  2. Fluid Dynamics of Human Phonation and Speech

    NASA Astrophysics Data System (ADS)

    Mittal, Rajat; Erath, Byron D.; Plesniak, Michael W.

    2013-01-01

    This article presents a review of the fluid dynamics, flow-structure interactions, and acoustics associated with human phonation and speech. Our voice is produced through the process of phonation in the larynx, and an improved understanding of the underlying physics of this process is essential to advancing the treatment of voice disorders. Insights into the physics of phonation and speech can also contribute to improved vocal training and the development of new speech compression and synthesis schemes. This article introduces the key biomechanical features of the laryngeal physiology, reviews the basic principles of voice production, and summarizes the progress made over the past half-century in understanding the flow physics of phonation and speech. Laryngeal pathologies, which significantly enhance the complexity of phonatory dynamics, are discussed. After a thorough examination of the state of the art in computational modeling and experimental investigations of phonatory biomechanics, we present a synopsis of the pacing issues in this arena and an outlook for research in this fascinating subject.

  3. Classification of Benign and Malignant Breast Tumors in Ultrasound Images with Posterior Acoustic Shadowing Using Half-Contour Features.

    PubMed

    Zhou, Zhuhuang; Wu, Shuicai; Chang, King-Jen; Chen, Wei-Ren; Chen, Yung-Sheng; Kuo, Wen-Hung; Lin, Chung-Chih; Tsui, Po-Hsiang

    Posterior acoustic shadowing (PAS) can bias breast tumor segmentation and classification in ultrasound images. In this paper, half-contour features are proposed to classify benign and malignant breast tumors with PAS, considering the fact that the upper half of the tumor contour is less affected by PAS. Adaptive thresholding and disk expansion are employed to detect tumor contours. Based on the detected full contour, the upper half contour is extracted. For breast tumor classification, six quantitative feature parameters are analyzed for both full contours and half contours, including standard deviation of degree (SDD), which is proposed to describe tumor irregularity. Fifty clinical cases (40 with PAS and 10 without PAS) were used. Tumor circularity (TC) and SDD were both effective full- and half-contour parameters in classifying images without PAS. Half-contour TC [74 % accuracy, 72 % sensitivity, 76 % specificity, 0.78 area under the receiver operating characteristic curve (AUC), p > 0.05] significantly improved the classification of breast tumors with PAS compared to that with full-contour TC (54 % accuracy, 56 % sensitivity, 52 % specificity, 0.52 AUC, p > 0.05). Half-contour SDD (72 % accuracy, 76 % sensitivity, 68 % specificity, 0.81 AUC, p < 0.05) improved the classification of breast tumors with PAS compared to that with full-contour SDD (62 % accuracy, 80 % sensitivity, 44 % specificity, 0.61 AUC, p > 0.05). The proposed half-contour TC and SDD may be useful in classifying benign and malignant breast tumors in ultrasound images affected by PAS.

  4. The Natural Statistics of Audiovisual Speech

    PubMed Central

    Chandrasekaran, Chandramouli; Trubanova, Andrea; Stillittano, Sébastien; Caplier, Alice; Ghazanfar, Asif A.

    2009-01-01

    Humans, like other animals, are exposed to a continuous stream of signals, which are dynamic, multimodal, extended, and time varying in nature. This complex input space must be transduced and sampled by our sensory systems and transmitted to the brain where it can guide the selection of appropriate actions. To simplify this process, it's been suggested that the brain exploits statistical regularities in the stimulus space. Tests of this idea have largely been confined to unimodal signals and natural scenes. One important class of multisensory signals for which a quantitative input space characterization is unavailable is human speech. We do not understand what signals our brain has to actively piece together from an audiovisual speech stream to arrive at a percept versus what is already embedded in the signal structure of the stream itself. In essence, we do not have a clear understanding of the natural statistics of audiovisual speech. In the present study, we identified the following major statistical features of audiovisual speech. First, we observed robust correlations and close temporal correspondence between the area of the mouth opening and the acoustic envelope. Second, we found the strongest correlation between the area of the mouth opening and vocal tract resonances. Third, we observed that both area of the mouth opening and the voice envelope are temporally modulated in the 2–7 Hz frequency range. Finally, we show that the timing of mouth movements relative to the onset of the voice is consistently between 100 and 300 ms. We interpret these data in the context of recent neural theories of speech which suggest that speech communication is a reciprocally coupled, multisensory event, whereby the outputs of the signaler are matched to the neural processes of the receiver. PMID:19609344

  5. The natural statistics of audiovisual speech.

    PubMed

    Chandrasekaran, Chandramouli; Trubanova, Andrea; Stillittano, Sébastien; Caplier, Alice; Ghazanfar, Asif A

    2009-07-01

    Humans, like other animals, are exposed to a continuous stream of signals, which are dynamic, multimodal, extended, and time varying in nature. This complex input space must be transduced and sampled by our sensory systems and transmitted to the brain where it can guide the selection of appropriate actions. To simplify this process, it's been suggested that the brain exploits statistical regularities in the stimulus space. Tests of this idea have largely been confined to unimodal signals and natural scenes. One important class of multisensory signals for which a quantitative input space characterization is unavailable is human speech. We do not understand what signals our brain has to actively piece together from an audiovisual speech stream to arrive at a percept versus what is already embedded in the signal structure of the stream itself. In essence, we do not have a clear understanding of the natural statistics of audiovisual speech. In the present study, we identified the following major statistical features of audiovisual speech. First, we observed robust correlations and close temporal correspondence between the area of the mouth opening and the acoustic envelope. Second, we found the strongest correlation between the area of the mouth opening and vocal tract resonances. Third, we observed that both area of the mouth opening and the voice envelope are temporally modulated in the 2-7 Hz frequency range. Finally, we show that the timing of mouth movements relative to the onset of the voice is consistently between 100 and 300 ms. We interpret these data in the context of recent neural theories of speech which suggest that speech communication is a reciprocally coupled, multisensory event, whereby the outputs of the signaler are matched to the neural processes of the receiver.

  6. Speech prosody in cerebellar ataxia

    NASA Astrophysics Data System (ADS)

    Casper, Maureen

    The present study sought an acoustic signature for the speech disturbance recognized in cerebellar degeneration. Magnetic resonance imaging was used for a radiological rating of cerebellar involvement in six cerebellar ataxic dysarthric speakers. Acoustic measures of the [pap] syllables in contrastive prosodic conditions and of normal vs. brain-damaged patients were used to further our understanding both of the speech degeneration that accompanies cerebellar pathology and of speech motor control and movement in general. Pair-wise comparisons of the prosodic conditions within the normal group showed statistically significant differences for four prosodic contrasts. For three of the four contrasts analyzed, the normal speakers showed both longer durations and higher formant and fundamental frequency values in the more prominent first condition of the contrast. The acoustic measures of the normal prosodic contrast values were then used as a model to measure the degree of speech deterioration for individual cerebellar subjects. This estimate of speech deterioration as determined by individual differences between cerebellar and normal subjects' acoustic values of the four prosodic contrasts was used in correlation analyses with MRI ratings. Moderate correlations between speech deterioration and cerebellar atrophy were found in the measures of syllable duration and f0. A strong negative correlation was found for F1. Moreover, the normal model presented by these acoustic data allows for a description of the flexibility of task- oriented behavior in normal speech motor control. These data challenge spatio-temporal theory which explains movement as an artifact of time wherein longer durations predict more extreme movements and give further evidence for gestural internal dynamics of movement in which time emerges from articulatory events rather than dictating those events. This model provides a sensitive index of cerebellar pathology with quantitative acoustic

  7. System and method for investigating sub-surface features of a rock formation with acoustic sources generating coded signals

    SciTech Connect

    Vu, Cung Khac; Nihei, Kurt; Johnson, Paul A; Guyer, Robert; Ten Cate, James A; Le Bas, Pierre-Yves; Larmat, Carene S

    2014-12-30

    A system and a method for investigating rock formations includes generating, by a first acoustic source, a first acoustic signal comprising a first plurality of pulses, each pulse including a first modulated signal at a central frequency; and generating, by a second acoustic source, a second acoustic signal comprising a second plurality of pulses. A receiver arranged within the borehole receives a detected signal including a signal being generated by a non-linear mixing process from the first-and-second acoustic signal in a non-linear mixing zone within the intersection volume. The method also includes-processing the received signal to extract the signal generated by the non-linear mixing process over noise or over signals generated by a linear interaction process, or both.

  8. Speech impairment (adult)

    MedlinePlus

    ... this page: //medlineplus.gov/ency/article/003204.htm Speech impairment (adult) To use the sharing features on ... 2017, A.D.A.M., Inc. Duplication for commercial use must be authorized in writing by ADAM ...

  9. Speech disorders - children

    MedlinePlus

    ... this page: //medlineplus.gov/ency/article/001430.htm Speech disorders - children To use the sharing features on ... 2017, A.D.A.M., Inc. Duplication for commercial use must be authorized in writing by ADAM ...

  10. The auditory representation of speech sounds in human motor cortex

    PubMed Central

    Cheung, Connie; Hamilton, Liberty S; Johnson, Keith; Chang, Edward F

    2016-01-01

    In humans, listening to speech evokes neural responses in the motor cortex. This has been controversially interpreted as evidence that speech sounds are processed as articulatory gestures. However, it is unclear what information is actually encoded by such neural activity. We used high-density direct human cortical recordings while participants spoke and listened to speech sounds. Motor cortex neural patterns during listening were substantially different than during articulation of the same sounds. During listening, we observed neural activity in the superior and inferior regions of ventral motor cortex. During speaking, responses were distributed throughout somatotopic representations of speech articulators in motor cortex. The structure of responses in motor cortex during listening was organized along acoustic features similar to auditory cortex, rather than along articulatory features as during speaking. Motor cortex does not contain articulatory representations of perceived actions in speech, but rather, represents auditory vocal information. DOI: http://dx.doi.org/10.7554/eLife.12577.001 PMID:26943778

  11. Multiple levels of linguistic and paralinguistic features contribute to voice recognition

    PubMed Central

    Mary Zarate, Jean; Tian, Xing; Woods, Kevin J. P.; Poeppel, David

    2015-01-01

    Voice or speaker recognition is critical in a wide variety of social contexts. In this study, we investigated the contributions of acoustic, phonological, lexical, and semantic information toward voice recognition. Native English speaking participants were trained to recognize five speakers in five conditions: non-speech, Mandarin, German, pseudo-English, and English. We showed that voice recognition significantly improved as more information became available, from purely acoustic features in non-speech to additional phonological information varying in familiarity. Moreover, we found that the recognition performance is transferable between training and testing in phonologically familiar conditions (German, pseudo-English, and English), but not in unfamiliar (Mandarin) or non-speech conditions. These results provide evidence suggesting that bottom-up acoustic analysis and top-down influence from phonological processing collaboratively govern voice recognition. PMID:26088739

  12. Why would Musical Training Benefit the Neural Encoding of Speech? The OPERA Hypothesis

    PubMed Central

    Patel, Aniruddh D.

    2011-01-01

    Mounting evidence suggests that musical training benefits the neural encoding of speech. This paper offers a hypothesis specifying why such benefits occur. The “OPERA” hypothesis proposes that such benefits are driven by adaptive plasticity in speech-processing networks, and that this plasticity occurs when five conditions are met. These are: (1) Overlap: there is anatomical overlap in the brain networks that process an acoustic feature used in both music and speech (e.g., waveform periodicity, amplitude envelope), (2) Precision: music places higher demands on these shared networks than does speech, in terms of the precision of processing, (3) Emotion: the musical activities that engage this network elicit strong positive emotion, (4) Repetition: the musical activities that engage this network are frequently repeated, and (5) Attention: the musical activities that engage this network are associated with focused attention. According to the OPERA hypothesis, when these conditions are met neural plasticity drives the networks in question to function with higher precision than needed for ordinary speech communication. Yet since speech shares these networks with music, speech processing benefits. The OPERA hypothesis is used to account for the observed superior subcortical encoding of speech in musically trained individuals, and to suggest mechanisms by which musical training might improve linguistic reading abilities. PMID:21747773

  13. Why would Musical Training Benefit the Neural Encoding of Speech? The OPERA Hypothesis.

    PubMed

    Patel, Aniruddh D

    2011-01-01

    Mounting evidence suggests that musical training benefits the neural encoding of speech. This paper offers a hypothesis specifying why such benefits occur. The "OPERA" hypothesis proposes that such benefits are driven by adaptive plasticity in speech-processing networks, and that this plasticity occurs when five conditions are met. These are: (1) Overlap: there is anatomical overlap in the brain networks that process an acoustic feature used in both music and speech (e.g., waveform periodicity, amplitude envelope), (2) Precision: music places higher demands on these shared networks than does speech, in terms of the precision of processing, (3) Emotion: the musical activities that engage this network elicit strong positive emotion, (4) Repetition: the musical activities that engage this network are frequently repeated, and (5) Attention: the musical activities that engage this network are associated with focused attention. According to the OPERA hypothesis, when these conditions are met neural plasticity drives the networks in question to function with higher precision than needed for ordinary speech communication. Yet since speech shares these networks with music, speech processing benefits. The OPERA hypothesis is used to account for the observed superior subcortical encoding of speech in musically trained individuals, and to suggest mechanisms by which musical training might improve linguistic reading abilities.

  14. Using the Speech Transmission Index for predicting non-native speech intelligibility

    NASA Astrophysics Data System (ADS)

    van Wijngaarden, Sander J.; Bronkhorst, Adelbert W.; Houtgast, Tammo; Steeneken, Herman J. M.

    2004-03-01

    While the Speech Transmission Index (STI) is widely applied for prediction of speech intelligibility in room acoustics and telecommunication engineering, it is unclear how to interpret STI values when non-native talkers or listeners are involved. Based on subjectively measured psychometric functions for sentence intelligibility in noise, for populations of native and non-native communicators, a correction function for the interpretation of the STI is derived. This function is applied to determine the appropriate STI ranges with qualification labels (``bad''-``excellent''), for specific populations of non-natives. The correction function is derived by relating the non-native psychometric function to the native psychometric function by a single parameter (ν). For listeners, the ν parameter is found to be highly correlated with linguistic entropy. It is shown that the proposed correction function is also valid for conditions featuring bandwidth limiting and reverberation.

  15. Using the Speech Transmission Index for predicting non-native speech intelligibility.

    PubMed

    van Wijngaarden, Sander J; Bronkhorst, Adelbert W; Houtgast, Tammo; Steeneken, Herman J M

    2004-03-01

    While the Speech Transmission Index (STI) is widely applied for prediction of speech intelligibility in room acoustics and telecommunication engineering, it is unclear how to interpret STI values when non-native talkers or listeners are involved. Based on subjectively measured psychometric functions for sentence intelligibility in noise, for populations of native and non-native communicators, a correction function for the interpretation of the STI is derived. This function is applied to determine the appropriate STI ranges with qualification labels ("bad"-"excellent"), for specific populations of non-natives. The correction function is derived by relating the non-native psychometric function to the native psychometric function by a single parameter (nu). For listeners, the nu parameter is found to be highly correlated with linguistic entropy. It is shown that the proposed correction function is also valid for conditions featuring bandwidth limiting and reverberation.

  16. A computer model of auditory efferent suppression: implications for the recognition of speech in noise.

    PubMed

    Brown, Guy J; Ferry, Robert T; Meddis, Ray

    2010-02-01

    The neural mechanisms underlying the ability of human listeners to recognize speech in the presence of background noise are still imperfectly understood. However, there is mounting evidence that the medial olivocochlear system plays an important role, via efferents that exert a suppressive effect on the response of the basilar membrane. The current paper presents a computer modeling study that investigates the possible role of this activity on speech intelligibility in noise. A model of auditory efferent processing [Ferry, R. T., and Meddis, R. (2007). J. Acoust. Soc. Am. 122, 3519-3526] is used to provide acoustic features for a statistical automatic speech recognition system, thus allowing the effects of efferent activity on speech intelligibility to be quantified. Performance of the "basic" model (without efferent activity) on a connected digit recognition task is good when the speech is uncorrupted by noise but falls when noise is present. However, recognition performance is much improved when efferent activity is applied. Furthermore, optimal performance is obtained when the amount of efferent activity is proportional to the noise level. The results obtained are consistent with the suggestion that efferent suppression causes a "release from adaptation" in the auditory-nerve response to noisy speech, which enhances its intelligibility.

  17. Acoustic markers of sarcasm in Cantonese and English.

    PubMed

    Cheang, Henry S; Pell, Marc D

    2009-09-01

    The goal of this study was to identify acoustic parameters associated with the expression of sarcasm by Cantonese speakers, and to compare the observed features to similar data on English [Cheang, H. S. and Pell, M. D. (2008). Speech Commun. 50, 366-381]. Six native Cantonese speakers produced utterances to express sarcasm, humorous irony, sincerity, and neutrality. Each utterance was analyzed to determine the mean fundamental frequency (F0), F0-range, mean amplitude, amplitude-range, speech rate, and harmonics-to-noise ratio (HNR) (to probe voice quality changes). Results showed that sarcastic utterances in Cantonese were produced with an elevated mean F0, and reductions in amplitude- and F0-range, which differentiated them most from sincere utterances. Sarcasm was also spoken with a slower speech rate and a higher HNR (i.e., less vocal noise) than the other attitudes in certain linguistic contexts. Direct Cantonese-English comparisons revealed one major distinction in the acoustic pattern for communicating sarcasm across the two languages: Cantonese speakers raised mean F0 to mark sarcasm, whereas English speakers lowered mean F0 in this context. These findings emphasize that prosody is instrumental for marking non-literal intentions in speech such as sarcasm in Cantonese as well as in other languages. However, the specific acoustic conventions for communicating sarcasm seem to vary among languages.

  18. Ion acoustic solitary waves and double layers in a plasma with two temperature electrons featuring Tsallis distribution

    SciTech Connect

    Shalini, Saini, N. S.

    2014-10-15

    The propagation properties of large amplitude ion acoustic solitary waves (IASWs) are studied in a plasma containing cold fluid ions and multi-temperature electrons (cool and hot electrons) with nonextensive distribution. Employing Sagdeev pseudopotential method, an energy balance equation has been derived and from the expression for Sagdeev potential function, ion acoustic solitary waves and double layers are investigated numerically. The Mach number (lower and upper limits) for the existence of solitary structures is determined. Positive as well as negative polarity solitary structures are observed. Further, conditions for the existence of ion acoustic double layers (IADLs) are also determined numerically in the form of the critical values of q{sub c}, f and the Mach number (M). It is observed that the nonextensivity of electrons (via q{sub c,h}), concentration of electrons (via f) and temperature ratio of cold to hot electrons (via β) significantly influence the characteristics of ion acoustic solitary waves as well as double layers.

  19. Predicting Speech Intelligibility with A Multiple Speech Subsystems Approach in Children with Cerebral Palsy

    PubMed Central

    Lee, Jimin; Hustad, Katherine C.; Weismer, Gary

    2014-01-01

    Purpose Speech acoustic characteristics of children with cerebral palsy (CP) were examined with a multiple speech subsystem approach; speech intelligibility was evaluated using a prediction model in which acoustic measures were selected to represent three speech subsystems. Method Nine acoustic variables reflecting different subsystems, and speech intelligibility, were measured in 22 children with CP. These children included 13 with a clinical diagnosis of dysarthria (SMI), and nine judged to be free of dysarthria (NSMI). Data from children with CP were compared to data from age-matched typically developing children (TD). Results Multiple acoustic variables reflecting the articulatory subsystem were different in the SMI group, compared to the NSMI and TD groups. A significant speech intelligibility prediction model was obtained with all variables entered into the model (Adjusted R-squared = .801). The articulatory subsystem showed the most substantial independent contribution (58%) to speech intelligibility. Incremental R-squared analyses revealed that any single variable explained less than 9% of speech intelligibility variability. Conclusions Children in the SMI group have articulatory subsystem problems as indexed by acoustic measures. As in the adult literature, the articulatory subsystem makes the primary contribution to speech intelligibility variance in dysarthria, with minimal or no contribution from other systems. PMID:24824584

  20. System and method for investigating sub-surface features of a rock formation with acoustic sources generating conical broadcast signals

    DOEpatents

    Vu, Cung Khac; Skelt, Christopher; Nihei, Kurt; Johnson, Paul A.; Guyer, Robert; Ten Cate, James A.; Le Bas, Pierre -Yves; Larmat, Carene S.

    2015-08-18

    A method of interrogating a formation includes generating a conical acoustic signal, at a first frequency--a second conical acoustic signal at a second frequency each in the between approximately 500 Hz and 500 kHz such that the signals intersect in a desired intersection volume outside the borehole. The method further includes receiving, a difference signal returning to the borehole resulting from a non-linear mixing of the signals in a mixing zone within the intersection volume.

  1. Selective cortical representation of attended speaker in multi-talker speech perception.

    PubMed

    Mesgarani, Nima; Chang, Edward F

    2012-05-10

    Humans possess a remarkable ability to attend to a single speaker's voice in a multi-talker background. How the auditory system manages to extract intelligible speech under such acoustically complex and adverse listening conditions is not known, and, indeed, it is not clear how attended speech is internally represented. Here, using multi-electrode surface recordings from the cortex of subjects engaged in a listening task with two simultaneous speakers, we demonstrate that population responses in non-primary human auditory cortex encode critical features of attended speech: speech spectrograms reconstructed based on cortical responses to the mixture of speakers reveal the salient spectral and temporal features of the attended speaker, as if subjects were listening to that speaker alone. A simple classifier trained solely on examples of single speakers can decode both attended words and speaker identity. We find that task performance is well predicted by a rapid increase in attention-modulated neural selectivity across both single-electrode and population-level cortical responses. These findings demonstrate that the cortical representation of speech does not merely reflect the external acoustic environment, but instead gives rise to the perceptual aspects relevant for the listener's intended goal.

  2. A novel microdeletion syndrome at 9q21.13 characterised by mental retardation, speech delay, epilepsy and characteristic facial features.

    PubMed

    Boudry-Labis, Elise; Demeer, Bénédicte; Le Caignec, Cédric; Isidor, Bertrand; Mathieu-Dramard, Michèle; Plessis, Ghislaine; George, Alice M; Taylor, Juliet; Aftimos, Salim; Wiemer-Kruel, Adelheid; Kohlhase, Jürgen; Annerén, Göran; Firth, Helen; Simonic, Ingrid; Vermeesch, Joris; Thuresson, Ann-Charlotte; Copin, Henri; Love, Donald R; Andrieux, Joris

    2013-03-01

    The increased use of array-CGH and SNP-arrays for genetic diagnosis has led to the identification of new microdeletion/microduplication syndromes and enabled genotype-phenotype correlations to be made. In this study, nine patients with 9q21 deletions were investigated and compared with four previously Decipher reported patients. Genotype-phenotype comparisons of 13 patients revealed several common major characteristics including significant developmental delay, epilepsy, neuro-behavioural disorders and recognizable facial features including hypertelorism, feature-less philtrum, and a thin upper lip. The molecular investigation identified deletions with different breakpoints and of variable lengths, but the 750 kb smallest overlapping deleted region includes four genes. Among these genes, RORB is a strong candidate for a neurological phenotype. To our knowledge, this is the first published report of 9q21 microdeletions and our observations strongly suggest that these deletions are responsible for a new genetic syndrome characterised by mental retardation with speech delay, epilepsy, autistic behaviour and moderate facial dysmorphy.

  3. Dynamic Encoding of Speech Sequence Probability in Human Temporal Cortex

    PubMed Central

    Leonard, Matthew K.; Bouchard, Kristofer E.; Tang, Claire

    2015-01-01

    Sensory processing involves identification of stimulus features, but also integration with the surrounding sensory and cognitive context. Previous work in animals and humans has shown fine-scale sensitivity to context in the form of learned knowledge about the statistics of the sensory environment, including relative probabilities of discrete units in a stream of sequential auditory input. These statistics are a defining characteristic of one of the most important sequential signals humans encounter: speech. For speech, extensive exposure to a language tunes listeners to the statistics of sound sequences. To address how speech sequence statistics are neurally encoded, we used high-resolution direct cortical recordings from human lateral superior temporal cortex as subjects listened to words and nonwords with varying transition probabilities between sound segments. In addition to their sensitivity to acoustic features (including contextual features, such as coarticulation), we found that neural responses dynamically encoded the language-level probability of both preceding and upcoming speech sounds. Transition probability first negatively modulated neural responses, followed by positive modulation of neural responses, consistent with coordinated predictive and retrospective recognition processes, respectively. Furthermore, transition probability encoding was different for real English words compared with nonwords, providing evidence for online interactions with high-order linguistic knowledge. These results demonstrate that sensory processing of deeply learned stimuli involves integrating physical stimulus features with their contextual sequential structure. Despite not being consciously aware of phoneme sequence statistics, listeners use this information to process spoken input and to link low-level acoustic representations with linguistic information about word identity and meaning. PMID:25948269

  4. Dynamic encoding of speech sequence probability in human temporal cortex.

    PubMed

    Leonard, Matthew K; Bouchard, Kristofer E; Tang, Claire; Chang, Edward F

    2015-05-06

    Sensory processing involves identification of stimulus features, but also integration with the surrounding sensory and cognitive context. Previous work in animals and humans has shown fine-scale sensitivity to context in the form of learned knowledge about the statistics of the sensory environment, including relative probabilities of discrete units in a stream of sequential auditory input. These statistics are a defining characteristic of one of the most important sequential signals humans encounter: speech. For speech, extensive exposure to a language tunes listeners to the statistics of sound sequences. To address how speech sequence statistics are neurally encoded, we used high-resolution direct cortical recordings from human lateral superior temporal cortex as subjects listened to words and nonwords with varying transition probabilities between sound segments. In addition to their sensitivity to acoustic features (including contextual features, such as coarticulation), we found that neural responses dynamically encoded the language-level probability of both preceding and upcoming speech sounds. Transition probability first negatively modulated neural responses, followed by positive modulation of neural responses, consistent with coordinated predictive and retrospective recognition processes, respectively. Furthermore, transition probability encoding was different for real English words compared with nonwords, providing evidence for online interactions with high-order linguistic knowledge. These results demonstrate that sensory processing of deeply learned stimuli involves integrating physical stimulus features with their contextual sequential structure. Despite not being consciously aware of phoneme sequence statistics, listeners use this information to process spoken input and to link low-level acoustic representations with linguistic information about word identity and meaning.

  5. Acoustic differences between humorous and sincere communicative intentions.

    PubMed

    Hoicka, Elena; Gattis, Merideth

    2012-11-01

    Previous studies indicate that the acoustic features of speech discriminate between positive and negative communicative intentions, such as approval and prohibition. Two studies investigated whether acoustic features of speech can discriminate between two positive communicative intentions: humour and sweet-sincerity, where sweet-sincerity involved being sincere in a positive, warm-hearted way. In Study 1, 22 mothers read a book containing humorous, sweet-sincere, and neutral-sincere images to their 19- to 24-month-olds. In Study 2, 41 mothers read a book containing humorous or sweet-sincere sentences and images to their 18- to 24-month-olds. Mothers used a higher mean F0 to communicate visual humour as compared to visual sincerity. Mothers used greater F0 mean, range, and standard deviation; greater intensity mean, range, and standard deviation; and a slower speech rate to communicate verbal humour as compared to verbal sweet-sincerity. Mothers used a rising linear contour to communicate verbal humour, but used no specific contour to express verbal sweet-sincerity. We conclude that speakers provide acoustic cues enabling listeners to distinguish between positive communicative intentions.

  6. Interactions between distal speech rate, linguistic knowledge, and speech environment.

    PubMed

    Morrill, Tuuli; Baese-Berk, Melissa; Heffner, Christopher; Dilley, Laura

    2015-10-01

    During lexical access, listeners use both signal-based and knowledge-based cues, and information from the linguistic context can affect the perception of acoustic speech information. Recent findings suggest that the various cues used in lexical access are implemented with flexibility and may be affected by information from the larger speech context. We conducted 2 experiments to examine effects of a signal-based cue (distal speech rate) and a knowledge-based cue (linguistic structure) on lexical perception. In Experiment 1, we manipulated distal speech rate in utterances where an acoustically ambiguous critical word was either obligatory for the utterance to be syntactically well formed (e.g., Conner knew that bread and butter (are) both in the pantry) or optional (e.g., Don must see the harbor (or) boats). In Experiment 2, we examined identical target utterances as in Experiment 1 but changed the distribution of linguistic structures in the fillers. The results of the 2 experiments demonstrate that speech rate and linguistic knowledge about critical word obligatoriness can both influence speech perception. In addition, it is possible to alter the strength of a signal-based cue by changing information in the speech environment. These results provide support for models of word segmentation that include flexible weighting of signal-based and knowledge-based cues.

  7. [Estimation of age-related features of acoustic density and biometric relations of lens based on combined ultrasound scanning].

    PubMed

    Avetisov, K S; Markosian, A G

    2013-01-01

    Results of combined ultrasound scanning for estimation of acoustic lens density and biometric relations of lens and other eye structures are presented. A group of 124 patients (189 eyes) was studied; they were subdivided depending on age and length of anteroposterior axis of the eye. Examination algorithm was developed that allows selective estimation of acoustic density of different lens zones and biometric measurements including volumetric. Age-related increase of acoustic density of different lens zones was revealed that indirectly shows method efficiency. Biometric studies showed almost concurring volumetric lens measurements in "normal" and "short" eyes in spite of significantly thicker central zone of the latter. Significantly lower correlation between anterior chamber volume and width of its angle was revealed in "short" eyes and "normal" and "long" eyes (correlation coefficients 0.37, 0.68 and 0.63 respectively).

  8. Why Impromptu Speech Is Easy To Understand.

    ERIC Educational Resources Information Center

    Le Feal, K. Dejean

    Impromptu speech is characterized by the simultaneous processes of ideation (the elaboration and structuring of reasoning by the speaker as he improvises) and expression in the speaker. Other elements accompany this characteristic: division of speech flow into short segments, acoustic relief in the form of word stress following a pause, and both…

  9. Perception of Silent Pauses in Continuous Speech.

    ERIC Educational Resources Information Center

    Duez, Danielle

    1985-01-01

    Investigates the silent pauses in continuous speech in three genres: political speeches, political interviews, and casual interviews in order to see how the semantic-syntactic information of the message, the duration of silent pauses, and the acoustic environment of these pauses interact to produce the listener's perception of pauses. (Author/SED)

  10. Integrated speech enhancement for functional MRI environment.

    PubMed

    Pathak, Nishank; Milani, Ali A; Panahi, Issa; Briggs, Richard

    2009-01-01

    This paper presents an integrated speech enhancement (SE) method for the noisy MRI environment. We show that the performance of SE system improves considerably when the speech signal dominated by MRI acoustic noise at very low SNR is enhanced in two successive stages using two-channel SE methods followed by a single-channel post processing SE algorithm. Actual MRI noisy speech data are used in our experiments showing the improved performance of the proposed SE method.

  11. REGARDING THE LINE-OF-SIGHT BARYONIC ACOUSTIC FEATURE IN THE SLOAN DIGITAL SKY SURVEY AND BARYON OSCILLATION SPECTROSCOPIC SURVEY LUMINOUS RED GALAXY SAMPLES

    SciTech Connect

    Kazin, Eyal A.; Blanton, Michael R.; Scoccimarro, Roman; McBride, Cameron K.; Berlind, Andreas A.

    2010-08-20

    We analyze the line-of-sight baryonic acoustic feature in the two-point correlation function {xi} of the Sloan Digital Sky Survey luminous red galaxy (LRG) sample (0.16 < z < 0.47). By defining a narrow line-of-sight region, r{sub p} < 5.5 h {sup -1} Mpc, where r{sub p} is the transverse separation component, we measure a strong excess of clustering at {approx}110 h {sup -1} Mpc, as previously reported in the literature. We also test these results in an alternative coordinate system, by defining the line of sight as {theta} < 3{sup 0}, where {theta} is the opening angle. This clustering excess appears much stronger than the feature in the better-measured monopole. A fiducial {Lambda}CDM nonlinear model in redshift space predicts a much weaker signature. We use realistic mock catalogs to model the expected signal and noise. We find that the line-of-sight measurements can be explained well by our mocks as well as by a featureless {xi} = 0. We conclude that there is no convincing evidence that the strong clustering measurement is the line-of-sight baryonic acoustic feature. We also evaluate how detectable such a signal would be in the upcoming Baryon Oscillation Spectroscopic Survey (BOSS) LRG volume. Mock LRG catalogs (z < 0.6) suggest that (1) the narrow line-of-sight cylinder and cone defined above probably will not reveal a detectable acoustic feature in BOSS; (2) a clustering measurement as high as that in the current sample can be ruled out (or confirmed) at a high confidence level using a BOSS-sized data set; (3) an analysis with wider angular cuts, which provide better signal-to-noise ratios, can nevertheless be used to compare line-of-sight and transverse distances, and thereby constrain the expansion rate H(z) and diameter distance D{sub A}(z).

  12. Speech for the Deaf Child: Knowledge and Use.

    ERIC Educational Resources Information Center

    Connor, Leo E., Ed.

    Presented is a collection of 16 papers on speech development, handicaps, teaching methods, and educational trends for the aurally handicapped child. Arthur Boothroyd relates acoustic phonetics to speech teaching, and Jean Utley Lehman investigates a scheme of linguistic organization. Differences in speech production by deaf and normal hearing…

  13. Breathing-Impaired Speech after Brain Haemorrhage: A Case Study

    ERIC Educational Resources Information Center

    Heselwood, Barry

    2007-01-01

    Results are presented from an auditory and acoustic analysis of the speech of an adult male with impaired prosody and articulation due to brain haemorrhage. They show marked effects on phonation, speech rate and articulator velocity, and a speech rhythm disrupted by "intrusive" stresses. These effects are discussed in relation to the speaker's…

  14. Detection and Classification of Whale Acoustic Signals

    NASA Astrophysics Data System (ADS)

    Xian, Yin

    vocalization data set. The word error rate of the DCTNet feature is similar to the MFSC in speech recognition tasks, suggesting that the convolutional network is able to reveal acoustic content of speech signals.

  15. Brainstem transcription of speech is disrupted in children with autism spectrum disorders.

    PubMed

    Russo, Nicole; Nicol, Trent; Trommer, Barbara; Zecker, Steve; Kraus, Nina

    2009-07-01

    Language impairment is a hallmark of autism spectrum disorders (ASD). The origin of the deficit is poorly understood although deficiencies in auditory processing have been detected in both perception and cortical encoding of speech sounds. Little is known about the processing and transcription of speech sounds at earlier (brainstem) levels or about how background noise may impact this transcription process. Unlike cortical encoding of sounds, brainstem representation preserves stimulus features with a degree of fidelity that enables a direct link between acoustic components of the speech syllable (e.g. onsets) to specific aspects of neural encoding (e.g. waves V and A). We measured brainstem responses to the syllable /da/, in quiet and background noise, in children with and without ASD. Children with ASD exhibited deficits in both the neural synchrony (timing) and phase locking (frequency encoding) of speech sounds, despite normal click-evoked brainstem responses. They also exhibited reduced magnitude and fidelity of speech-evoked responses and inordinate degradation of responses by background noise in comparison to typically developing controls. Neural synchrony in noise was significantly related to measures of core and receptive language ability. These data support the idea that abnormalities in the brainstem processing of speech contribute to the language impairment in ASD. Because it is both passively elicited and malleable, the speech-evoked brainstem response may serve as a clinical tool to assess auditory processing as well as the effects of auditory training in the ASD population.

  16. Classroom Acoustics: Understanding Barriers to Learning.

    ERIC Educational Resources Information Center

    Crandell, Carl C., Ed.; Smaldino, Joseph J., Ed.

    2001-01-01

    This booklet explores classroom acoustics and their importance on the learning potential of children with hearing loss and related disabilities. The booklet also reviews research on classroom acoustics and the need for the development of classroom acoustics standards. Chapters examine: 1) a speech-perception model demonstrating the linkage between…

  17. Environment-dependent denoising autoencoder for distant-talking speech recognition

    NASA Astrophysics Data System (ADS)

    Ueda, Yuma; Wang, Longbiao; Kai, Atsuhiko; Ren, Bo

    2015-12-01

    In this paper, we propose an environment-dependent denoising autoencoder (DAE) and automatic environment identification based on a deep neural network (DNN) with blind reverberation estimation for robust distant-talking speech recognition. Recently, DAEs have been shown to be effective in many noise reduction and reverberation suppression applications because higher-level representations and increased flexibility of the feature mapping function can be learned. However, a DAE is not adequate in mismatched training and test environments. In a conventional DAE, parameters are trained using pairs of reverberant speech and clean speech under various acoustic conditions (that is, an environment-independent DAE). To address the above problem, we propose two environment-dependent DAEs to reduce the influence of mismatches between training and test environments. In the first approach, we train various DAEs using speech from different acoustic environments, and the DAE for the condition that best matches the test condition is automatically selected (that is, a two-step environment-dependent DAE). To improve environment identification performance, we propose a DNN that uses both reverberant speech and estimated reverberation. In the second approach, we add estimated reverberation features to the input of the DAE (that is, a one-step environment-dependent DAE or a reverberation-aware DAE). The proposed method is evaluated using speech in simulated and real reverberant environments. Experimental results show that the environment-dependent DAE outperforms the environment-independent one in both simulated and real reverberant environments. For two-step environment-dependent DAE, the performance of environment identification based on the proposed DNN approach is also better than that of the conventional DNN approach, in which only reverberant speech is used and reverberation is not blindly estimated. And, the one-step environment-dependent DAE significantly outperforms the two

  18. Musical melody and speech intonation: singing a different tune.

    PubMed

    Zatorre, Robert J; Baum, Shari R

    2012-01-01

    Music and speech are often cited as characteristically human forms of communication. Both share the features of hierarchical structure, complex sound systems, and sensorimotor sequencing demands, and both are used to convey and influence emotions, among other functions [1]. Both music and speech also prominently use acoustical frequency modulations, perceived as variations in pitch, as part of their communicative repertoire. Given these similarities, and the fact that pitch perception and production involve the same peripheral transduction system (cochlea) and the same production mechanism (vocal tract), it might be natural to assume that pitch processing in speech and music would also depend on the same underlying cognitive and neural mechanisms. In this essay we argue that the processing of pitch information differs significantly for speech and music; specifically, we suggest that there are two pitch-related processing systems, one for more coarse-grained, approximate analysis and one for more fine-grained accurate representation, and that the latter is unique to music. More broadly, this dissociation offers clues about the interface between sensory and motor systems, and highlights the idea that multiple processing streams are a ubiquitous feature of neuro-cognitive architectures.

  19. Multivoxel patterns reveal functionally differentiated networks underlying auditory feedback processing of speech.

    PubMed

    Zheng, Zane Z; Vicente-Grabovetsky, Alejandro; MacDonald, Ewen N; Munhall, Kevin G; Cusack, Rhodri; Johnsrude, Ingrid S

    2013-03-06

    The everyday act of speaking involves the complex processes of speech motor control. An important component of control is monitoring, detection, and processing of errors when auditory feedback does not correspond to the intended motor gesture. Here we show, using fMRI and converging operations within a multivoxel pattern analysis framework, that this sensorimotor process is supported by functionally differentiated brain networks. During scanning, a real-time speech-tracking system was used to deliver two acoustically different types of distorted auditory feedback or unaltered feedback while human participants were vocalizing monosyllabic words, and to present the same auditory stimuli while participants were passively listening. Whole-brain analysis of neural-pattern similarity revealed three functional networks that were differentially sensitive to distorted auditory feedback during vocalization, compared with during passive listening. One network of regions appears to encode an "error signal" regardless of acoustic features of the error: this network, including right angular gyrus, right supplementary motor area, and bilateral cerebellum, yielded consistent neural patterns across acoustically different, distorted feedback types, only during articulation (not during passive listening). In contrast, a frontotemporal network appears sensitive to the speech features of auditory stimuli during passive listening; this preference for speech features was diminished when the same stimuli were presented as auditory concomitants of vocalization. A third network, showing a distinct functional pattern from the other two, appears to capture aspects of both neural response profiles. Together, our findings suggest that auditory feedback processing during speech motor control may rely on multiple, interactive, functionally differentiated neural systems.

  20. Speech perception at the interface of neurobiology and linguistics.

    PubMed

    Poeppel, David; Idsardi, William J; van Wassenhove, Virginie

    2008-03-12

    Speech perception consists of a set of computations that take continuously varying acoustic waveforms as input and generate discrete representations that make contact with the lexical representations stored in long-term memory as output. Because the perceptual objects that are recognized by the speech perception enter into subsequent linguistic computation, the format that is used for lexical representation and processing fundamentally constrains the speech perceptual processes. Consequently, theories of speech perception must, at some level, be tightly linked to theories of lexical representation. Minimally, speech perception must yield representations that smoothly and rapidly interface with stored lexical items. Adopting the perspective of Marr, we argue and provide neurobiological and psychophysical evidence for the following research programme. First, at the implementational level, speech perception is a multi-time resolution process, with perceptual analyses occurring concurrently on at least two time scales (approx. 20-80 ms, approx. 150-300 ms), commensurate with (sub)segmental and syllabic analyses, respectively. Second, at the algorithmic level, we suggest that perception proceeds on the basis of internal forward models, or uses an 'analysis-by-synthesis' approach. Third, at the computational level (in the sense of Marr), the theory of lexical representation that we adopt is principally informed by phonological research and assumes that words are represented in the mental lexicon in terms of sequences of discrete segments composed of distinctive features. One important goal of the research programme is to develop linking hypotheses between putative neurobiological primitives (e.g. temporal primitives) and those primitives derived from linguistic inquiry, to arrive ultimately at a biologically sensible and theoretically satisfying model of representation and computation in speech.

  1. Proposal for Classifying the Severity of Speech Disorder Using a Fuzzy Model in Accordance with the Implicational Model of Feature Complexity

    ERIC Educational Resources Information Center

    Brancalioni, Ana Rita; Magnago, Karine Faverzani; Keske-Soares, Marcia

    2012-01-01

    The objective of this study is to create a new proposal for classifying the severity of speech disorders using a fuzzy model in accordance with a linguistic model that represents the speech acquisition of Brazilian Portuguese. The fuzzy linguistic model was run in the MATLAB software fuzzy toolbox from a set of fuzzy rules, and it encompassed…

  2. Speech disorders in Parkinson's disease: early diagnostics and effects of medication and brain stimulation.

    PubMed

    Brabenec, L; Mekyska, J; Galaz, Z; Rektorova, Irena

    2017-03-01

    Hypokinetic dysarthria (HD) occurs in 90% of Parkinson's disease (PD) patients. It manifests specifically in the areas of articulation, phonation, prosody, speech fluency, and faciokinesis. We aimed to systematically review papers on HD in PD with a special focus on (1) early PD diagnosis and monitoring of the disease progression using acoustic voice and speech analysis, and (2) functional imaging studies exploring neural correlates of HD in PD, and (3) clinical studies using acoustic analysis to evaluate effects of dopaminergic medication and brain stimulation. A systematic literature search of articles written in English before March 2016 was conducted in the Web of Science, PubMed, SpringerLink, and IEEE Xplore databases using and combining specific relevant keywords. Articles were categorized into three groups: (1) articles focused on neural correlates of HD in PD using functional imaging (n = 13); (2) articles dealing with the acoustic analysis of HD in PD (n = 52); and (3) articles concerning specifically dopaminergic and brain stimulation-related effects as assessed by acoustic analysis (n = 31); the groups were then reviewed. We identified 14 combinations of speech tasks and acoustic features that can be recommended for use in describing the main features of HD in PD. While only a few acoustic parameters correlate with limb motor symptoms and can be partially relieved by dopaminergic medication, HD in PD seems to be mainly related to non-dopaminergic deficits and associated particularly with non-motor symptoms. Future studies should combine non-invasive brain stimulation with voice behavior approaches to achieve the best treatment effects by enhancing auditory-motor integration.

  3. Perception of Speech Reflects Optimal Use of Probabilistic Speech Cues

    ERIC Educational Resources Information Center

    Clayards, Meghan; Tanenhaus, Michael K.; Aslin, Richard N.; Jacobs, Robert A.

    2008-01-01

    Listeners are exquisitely sensitive to fine-grained acoustic detail within phonetic categories for sounds and words. Here we show that this sensitivity is optimal given the probabilistic nature of speech cues. We manipulated the probability distribution of one probabilistic cue, voice onset time (VOT), which differentiates word initial labial…

  4. Automated analysis of connected speech reveals early biomarkers of Parkinson's disease in patients with rapid eye movement sleep behaviour disorder.

    PubMed

    Hlavnička, Jan; Čmejla, Roman; Tykalová, Tereza; Šonka, Karel; Růžička, Evžen; Rusz, Jan

    2017-12-01

    For generations, the evaluation of speech abnormalities in neurodegenerative disorders such as Parkinson's disease (PD) has been limited to perceptual tests or user-controlled laboratory analysis based upon rather small samples of human vocalizations. Our study introduces a fully automated method that yields significant features related to respiratory deficits, dysphonia, imprecise articulation and dysrhythmia from acoustic microphone data of natural connected speech for predicting early and distinctive patterns of neurodegeneration. We compared speech recordings of 50 subjects with rapid eye movement sleep behaviour disorder (RBD), 30 newly diagnosed, untreated PD patients and 50 healthy controls, and showed that subliminal parkinsonian speech deficits can be reliably captured even in RBD patients, which are at high risk of developing PD or other synucleinopathies. Thus, automated vocal analysis should soon be able to contribute to screening and diagnostic procedures for prodromal parkinsonian neurodegeneration in natural environments.

  5. Automatic speech recognition in cocktail-party situations: a specific training for separated speech.

    PubMed

    Marti, Amparo; Cobos, Maximo; Lopez, Jose J

    2012-02-01

    Automatic speech recognition (ASR) refers to the task of extracting a transcription of the linguistic content of an acoustical speech signal automatically. Despite several decades of research in this important area of acoustic signal processing, the accuracy of ASR systems is still far behind human performance, especially in adverse acoustic scenarios. In this context, one of the most challenging situations is the one concerning simultaneous speech in cocktail-party environments. Although source separation methods have already been investigated to deal with this problem, the separation process is not perfect and the resulting artifacts pose an additional problem to ASR performance. In this paper, a specific training to improve the percentage of recognized words in real simultaneous speech cases is proposed. The combination of source separation and this specific training is explored and evaluated under different acoustical conditions, leading to improvements of up to a 35% in ASR performance.

  6. Utilizing computer models for optimizing classroom acoustics

    NASA Astrophysics Data System (ADS)

    Hinckley, Jennifer M.; Rosenberg, Carl J.

    2002-05-01

    The acoustical conditions in a classroom play an integral role in establishing an ideal learning environment. Speech intelligibility is dependent on many factors, including speech loudness, room finishes, and background noise levels. The goal of this investigation was to use computer modeling techniques to study the effect of acoustical conditions on speech intelligibility in a classroom. This study focused on a simulated classroom which was generated using the CATT-acoustic computer modeling program. The computer was utilized as an analytical tool in an effort to optimize speech intelligibility in a typical classroom environment. The factors that were focused on were reverberation time, location of absorptive materials, and background noise levels. Speech intelligibility was measured with the Rapid Speech Transmission Index (RASTI) method.

  7. Acoustic analysis of voice in patients treated by reconstructive subtotal laryngectomy. Evaluation and critical review

    PubMed Central

    Di Nicola, V; Fiorella, ML; Spinelli, DA; Fiorella, R

    2006-01-01

    Summary Aim of this investigation was to analyse the voice in a group of 20 patients submitted to supracricoid partial laryngectomy (cricohyoidopexy, sparing two arytenoids) by the Multi Dimensional Voice Programme acoustic analysis system. Results revealed the following sound characteristics: high rate of noise, lack of periodic component of the signal, high rate of segments with no sound signal, vocal segments with marked air-turbulent flow, variation amplitude and frequency coefficients doubled compared to normal values, average fundamental frequency, if present, extremely variable and unsteady. These results show that the phonatory ability of the residual larynx, due to the altered anatomo-physiology of the structure after surgery, has to be completely re-estimated. In fact, the residual larynx determines a definitely reduced periodic acoustic signal, rich in noise and which can not be modulated. Good phonatory results of this treatment are basically due to preservation of a still understandable (but not perfect!) speech which, by ensuring the subjects’ speech ability, overcomes and has little influence on the really poor quality of the vocal signal in these patients. However, the patient obtains a “new voice” as far as concerns acoustic features and this is very important for communication and social life. Moreover, the possibility of objectively estimating acoustic vocal function ability allows monitoring of the trend and results of possible speech therapy and/or phonosurgical rehabilitation treatment which should start from new anatomical and physiological bases, as well as from the new physical acoustic mechanism of signal production. PMID:16886848

  8. Acoustic analysis of voice in patients treated by reconstructive subtotal laryngectomy. Evaluation and critical review.

    PubMed

    Di Nicola, V; Fiorella, M L; Spinelli, D A; Fiorella, R

    2006-04-01

    Aim of this investigation was to analyse the voice in a group of 20 patients submitted to supracricoid partial laryngectomy (cricohyoidopexy, sparing two arytenoids) by the Multi Dimensional Voice Programme acoustic analysis system. Results revealed the following sound characteristics: high rate of noise, lack of periodic component of the signal, high rate of segments with no sound signal, vocal segments with marked air-turbulent flow, variation amplitude and frequency coefficients doubled compared to normal values, average fundamental frequency, if present, extremely variable and unsteady. These results show that the phonatory ability of the residual larynx, due to the altered anatomo-physiology of the structure after surgery, has to be completely re-estimated. In fact, the residual larynx determines a definitely reduced periodic acoustic signal, rich in noise and which can not be modulated. Good phonatory results of this treatment are basically due to preservation of a still understandable (but not perfect!) speech which, by ensuring the subjects' speech ability, overcomes and has little influence on the really poor quality of the vocal signal in these patients. However, the patient obtains a "new voice" as far as concerns acoustic features and this is very important for communication and social life. Moreover, the possibility of objectively estimating acoustic vocal function ability allows monitoring of the trend and results of possible speech therapy and/or phonosurgical rehabilitation treatment which should start from new anatomical and physiological bases, as well as from the new physical acoustic mechanism of signal production.

  9. Speech Problems

    MedlinePlus

    ... and the respiratory system . The ability to understand language and produce speech is coordinated by the brain. So a person with brain damage from an accident, stroke, or birth defect may have speech and language problems. Some people with speech problems, particularly articulation ...

  10. Recognizing articulatory gestures from speech for robust speech recognition.

    PubMed

    Mitra, Vikramjit; Nam, Hosung; Espy-Wilson, Carol; Saltzman, Elliot; Goldstein, Louis

    2012-03-01

    Studies have shown that supplementary articulatory information can help to improve the recognition rate of automatic speech recognition systems. Unfortunately, articulatory information is not directly observable, necessitating its estimation from the speech signal. This study describes a system that recognizes articulatory gestures from speech, and uses the recognized gestures in a speech recognition system. Recognizing gestures for a given utterance involves recovering the set of underlying gestural activations and their associated dynamic parameters. This paper proposes a neural network architecture for recognizing articulatory gestures from speech and presents ways to incorporate articulatory gestures for a digit recognition task. The lack of natural speech database containing gestural information prompted us to use three stages of evaluation. First, the proposed gestural annotation architecture was tested on a synthetic speech dataset, which showed that the use of estimated tract-variable-time-functions improved gesture recognition performance. In the second stage, gesture-recognition models were applied to natural speech waveforms and word recognition experiments revealed that the recognized gestures can improve the noise-robustness of a word recognition system. In the final stage, a gesture-based Dynamic Bayesian Network was trained and the results indicate that incorporating gestural information can improve word recognition performance compared to acoustic-only systems.

  11. Embedding speech into virtual realities

    NASA Technical Reports Server (NTRS)

    Bohn, Christian-Arved; Krueger, Wolfgang

    1993-01-01

    In this work a speaker-independent speech recognition system is presented, which is suitable for implementation in Virtual Reality applications. The use of an artificial neural network in connection with a special compression of the acoustic input leads to a system, which is robust, fast, easy to use and needs no additional hardware, beside a common VR-equipment.

  12. High-speed imaging, acoustic features, and aeroacoustic computations of jet noise from Strombolian (and Vulcanian) explosions

    NASA Astrophysics Data System (ADS)

    Taddeucci, J.; Sesterhenn, J.; Scarlato, P.; Stampka, K.; Del Bello, E.; Pena Fernandez, J. J.; Gaudin, D.

    2014-05-01

    High-speed imaging of explosive eruptions at Stromboli (Italy), Fuego (Guatemala), and Yasur (Vanuatu) volcanoes allowed visualization of pressure waves from seconds-long explosions. From the explosion jets, waves radiate with variable geometry, timing, and apparent direction and velocity. Both the explosion jets and their wave fields are replicated well by numerical simulations of supersonic jets impulsively released from a pressurized vessel. The scaled acoustic signal from one explosion at Stromboli displays a frequency pattern with an excellent match to those from the simulated jets. We conclude that both the observed waves and the audible sound from the explosions are jet noise, i.e., the typical acoustic field radiating from high-velocity jets. Volcanic jet noise was previously quantified only in the infrasonic emissions from large, sub-Plinian to Plinian eruptions. Our combined approach allows us to define the spatial and temporal evolution of audible jet noise from supersonic jets in small-scale volcanic eruptions.

  13. Cross-modal interactions during perception of audiovisual speech and nonspeech signals: an fMRI study.

    PubMed

    Hertrich, Ingo; Dietrich, Susanne; Ackermann, Hermann

    2011-01-01

    During speech communication, visual information may interact with the auditory system at various processing stages. Most noteworthy, recent magnetoencephalography (MEG) data provided first evidence for early and preattentive phonetic/phonological encoding of the visual data stream--prior to its fusion with auditory phonological features [Hertrich, I., Mathiak, K., Lutzenberger, W., & Ackermann, H. Time course of early audiovisual interactions during speech and non-speech central-auditory processing: An MEG study. Journal of Cognitive Neuroscience, 21, 259-274, 2009]. Using functional magnetic resonance imaging, the present follow-up study aims to further elucidate the topographic distribution of visual-phonological operations and audiovisual (AV) interactions during speech perception. Ambiguous acoustic syllables--disambiguated to /pa/ or /ta/ by the visual channel (speaking face)--served as test materials, concomitant with various control conditions (nonspeech AV signals, visual-only and acoustic-only speech, and nonspeech stimuli). (i) Visual speech yielded an AV-subadditive activation of primary auditory cortex and the anterior superior temporal gyrus (STG), whereas the posterior STG responded both to speech and nonspeech motion. (ii) The inferior frontal and the fusiform gyrus of the right hemisphere showed a strong phonetic/phonological impact (differential effects of visual /pa/ vs. /ta/) upon hemodynamic activation during presentation of speaking faces. Taken together with the previous MEG data, these results point at a dual-pathway model of visual speech information processing: On the one hand, access to the auditory system via the anterior supratemporal “what" path may give rise to direct activation of "auditory objects." On the other hand, visual speech information seems to be represented in a right-hemisphere visual working memory, providing a potential basis for later interactions with auditory information such as the McGurk effect.

  14. Effects of Computer-Based Intervention through Acoustically Modified Speech (Fast ForWord) in Severe Mixed Receptive-Expressive Language Impairment: Outcomes from a Randomized Controlled Trial

    ERIC Educational Resources Information Center

    Cohen, Wendy; Hodson, Ann; O'Hare, Anne; Boyle, James; Durrani, Tariq; McCartney, Elspeth; Mattey, Mike; Naftalin, Lionel; Watson, Jocelynne

    2005-01-01

    Seventy-seven children between the ages of 6 and 10 years, with severe mixed receptive-expressive specific language impairment (SLI), participated in a randomized controlled trial (RCT) of Fast ForWord (FFW; Scientific Learning Corporation, 1997, 2001). FFW is a computer-based intervention for treating SLI using acoustically enhanced speech…

  15. Intelligibility of laryngectomees' substitute speech: automatic speech recognition and subjective rating.

    PubMed

    Schuster, Maria; Haderlein, Tino; Nöth, Elmar; Lohscheller, Jörg; Eysholdt, Ulrich; Rosanowski, Frank

    2006-02-01

    Substitute speech after laryngectomy is characterized by restricted aero-acoustic properties in comparison with laryngeal speech and has therefore lower intelligibility. Until now, an objective means to determine and quantify the intelligibility has not existed, although the intelligibility can serve as a global outcome parameter of voice restoration after laryngectomy. An automatic speech recognition system was applied on recordings of a standard text read by 18 German male laryngectomees with tracheoesophageal substitute speech. The system was trained with normal laryngeal speakers and not adapted to severely disturbed voices. Substitute speech was compared to laryngeal speech of a control group. Subjective evaluation of intelligibility was performed by a panel of five experts and compared to automatic speech evaluation. Substitute speech showed lower syllables/s and lower word accuracy than laryngeal speech. Automatic speech recognition for substitute speech yielded word accuracy between 10.0 and 50% (28.7+/-12.1%) with sufficient discrimination. It complied with experts' subjective evaluations of intelligibility. The multi-rater kappa of the experts alone did not differ from the multi-rater kappa of experts and the recognizer. Automatic speech recognition serves as a good means to objectify and quantify global speech outcome of laryngectomees. For clinical use, the speech recognition system will be adapted to disturbed voices and can also be applied in other languages.

  16. Virtual acoustics displays

    NASA Technical Reports Server (NTRS)

    Wenzel, Elizabeth M.; Fisher, Scott S.; Stone, Philip K.; Foster, Scott H.

    1991-01-01

    The real time acoustic display capabilities are described which were developed for the Virtual Environment Workstation (VIEW) Project at NASA-Ames. The acoustic display is capable of generating localized acoustic cues in real time over headphones. An auditory symbology, a related collection of representational auditory 'objects' or 'icons', can be designed using ACE (Auditory Cue Editor), which links both discrete and continuously varying acoustic parameters with information or events in the display. During a given display scenario, the symbology can be dynamically coordinated in real time with 3-D visual objects, speech, and gestural displays. The types of displays feasible with the system range from simple warnings and alarms to the acoustic representation of multidimensional data or events.

  17. Virtual acoustics displays

    NASA Astrophysics Data System (ADS)

    Wenzel, Elizabeth M.; Fisher, Scott S.; Stone, Philip K.; Foster, Scott H.

    1991-03-01

    The real time acoustic display capabilities are described which were developed for the Virtual Environment Workstation (VIEW) Project at NASA-Ames. The acoustic display is capable of generating localized acoustic cues in real time over headphones. An auditory symbology, a related collection of representational auditory 'objects' or 'icons', can be designed using ACE (Auditory Cue Editor), which links both discrete and continuously varying acoustic parameters with information or events in the display. During a given display scenario, the symbology can be dynamically coordinated in real time with 3-D visual objects, speech, and gestural displays. The types of displays feasible with the system range from simple warnings and alarms to the acoustic representation of multidimensional data or events.

  18. Neural pathways for visual speech perception

    PubMed Central

    Bernstein, Lynne E.; Liebenthal, Einat

    2014-01-01

    This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody) can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns of activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1) The visual perception of speech relies on visual pathway representations of speech qua speech. (2) A proposed site of these representations, the temporal visual speech area (TVSA) has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS). (3) Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA. PMID:25520611

  19. Applications for Subvocal Speech

    NASA Technical Reports Server (NTRS)

    Jorgensen, Charles; Betts, Bradley

    2007-01-01

    A research and development effort now underway is directed toward the use of subvocal speech for communication in settings in which (1) acoustic noise could interfere excessively with ordinary vocal communication and/or (2) acoustic silence or secrecy of communication is required. By "subvocal speech" is meant sub-audible electromyographic (EMG) signals, associated with speech, that are acquired from the surface of the larynx and lingual areas of the throat. Topics addressed in this effort include recognition of the sub-vocal EMG signals that represent specific original words or phrases; transformation (including encoding and/or enciphering) of the signals into forms that are less vulnerable to distortion, degradation, and/or interception; and reconstruction of the original words or phrases at the receiving end of a communication link. Potential applications include ordinary verbal communications among hazardous- material-cleanup workers in protective suits, workers in noisy environments, divers, and firefighters, and secret communications among law-enforcement officers and military personnel in combat and other confrontational situations.

  20. Experimental comparison between speech transmission index, rapid speech transmission index, and speech intelligibility index.

    PubMed

    Larm, Petra; Hongisto, Valtteri

    2006-02-01

    During the acoustical design of, e.g., auditoria or open-plan offices, it is important to know how speech can be perceived in various parts of the room. Different objective methods have been developed to measure and predict speech intelligibility, and these have been extensively used in various spaces. In this study, two such methods were compared, the speech transmission index (STI) and the speech intelligibility index (SII). Also the simplification of the STI, the room acoustics speech transmission index (RASTI), was considered. These quantities are all based on determining an apparent speech-to-noise ratio on selected frequency bands and summing them using a specific weighting. For comparison, some data were needed on the possible differences of these methods resulting from the calculation scheme and also measuring equipment. Their prediction accuracy was also of interest. Measurements were made in a laboratory having adjustable noise level and absorption, and in a real auditorium. It was found that the measurement equipment, especially the selection of the loudspeaker, can greatly affect the accuracy of the results. The prediction accuracy of the RASTI was found acceptable, if the input values for the prediction are accurately known, even though the studied space was not ideally diffuse.

  1. Abnormal laughter-like vocalisations replacing speech in primary progressive aphasia.

    PubMed

    Rohrer, Jonathan D; Warren, Jason D; Rossor, Martin N

    2009-09-15

    We describe ten patients with a clinical diagnosis of primary progressive aphasia (PPA) (pathologically confirmed in three cases) who developed abnormal laughter-like vocalisations in the context of progressive speech output impairment leading to mutism. Failure of speech output was accompanied by increasing frequency of the abnormal vocalisations until ultimately they constituted the patient's only extended utterance. The laughter-like vocalisations did not show contextual sensitivity but occurred as an automatic vocal output that replaced speech. Acoustic analysis of the vocalisations in two patients revealed abnormal motor features including variable note duration and inter-note interval, loss of temporal symmetry of laugh notes and loss of the normal decrescendo. Abnormal laughter-like vocalisations may be a hallmark of a subgroup in the PPA spectrum with impaired control and production of nonverbal vocal behaviour due to disruption of fronto-temporal networks mediating vocalisation.

  2. Watch what you say, your computer might be listening: A review of automated speech recognition

    NASA Technical Reports Server (NTRS)

    Degennaro, Stephen V.

    1991-01-01

    Spoken language is the most convenient and natural means by which people interact with each other and is, therefore, a promising candidate for human-machine interactions. Speech also offers an additional channel for hands-busy applications, complementing the use of motor output channels for control. Current speech recognition systems vary considerably across a number of important characteristics, including vocabulary size, speaking mode, training requirements for new speakers, robustness to acoustic environments, and accuracy. Algorithmically, these systems range from rule-based techniques through more probabilistic or self-learning approaches such as hidden Markov modeling and neural networks. This tutorial begins with a brief summary of the relevant features of current speech recognition systems and the strengths and weaknesses of the various algorithmic approaches.

  3. Acoustic neuroma

    MedlinePlus

    Vestibular schwannoma; Tumor - acoustic; Cerebellopontine angle tumor; Angle tumor; Hearing loss - acoustic; Tinnitus - acoustic ... Acoustic neuromas have been linked with the genetic disorder neurofibromatosis type 2 (NF2). Acoustic neuromas are uncommon.

  4. Face configuration affects speech perception: Evidence from a McGurk mismatch negativity study.

    PubMed

    Eskelund, Kasper; MacDonald, Ewen N; Andersen, Tobias S

    2015-01-01

    We perceive identity, expression and speech from faces. While perception of identity and expression depends crucially on the configuration of facial features it is less clear whether this holds for visual speech perception. Facial configuration is poorly perceived for upside-down faces as demonstrated by the Thatcher illusion in which the orientation of the eyes and mouth with respect to the face is inverted (Thatcherization). This gives the face a grotesque appearance but this is only seen when the face is upright. Thatcherization can likewise disrupt visual speech perception but only when the face is upright indicating that facial configuration can be important for visual speech perception. This effect can propagate to auditory speech perception through audiovisual integration so that Thatcherization disrupts the McGurk illusion in which visual speech perception alters perception of an incongruent acoustic phoneme. This is known as the McThatcher effect. Here we show that the McThatcher effect is reflected in the McGurk mismatch negativity (MMN). The MMN is an event-related potential elicited by a change in auditory perception. The McGurk-MMN can be elicited by a change in auditory perception due to the McGurk illusion without any change in the acoustic stimulus. We found that Thatcherization disrupted a strong McGurk illusion and a correspondingly strong McGurk-MMN only for upright faces. This confirms that facial configuration can be important for audiovisual speech perception. For inverted faces we found a weaker McGurk illusion but, surprisingly, no MMN. We also found no correlation between the strength of the McGurk illusion and the amplitude of the McGurk-MMN. We suggest that this may be due to a threshold effect so that a strong McGurk illusion is required to elicit the McGurk-MMN.

  5. Auditory-neurophysiological responses to speech during early childhood: Effects of background noise.

    PubMed

    White-Schwoch, Travis; Davies, Evan C; Thompson, Elaine C; Woodruff Carr, Kali; Nicol, Trent; Bradlow, Ann R; Kraus, Nina

    2015-10-01

    Early childhood is a critical period of auditory learning, during which children are constantly mapping sounds to meaning. But this auditory learning rarely occurs in ideal listening conditions-children are forced to listen against a relentless din. This background noise degrades the neural coding of these critical sounds, in turn interfering with auditory learning. Despite the importance of robust and reliable auditory processing during early childhood, little is known about the neurophysiology underlying speech processing in children so young. To better understand the physiological constraints these adverse listening scenarios impose on speech sound coding during early childhood, auditory-neurophysiological responses were elicited to a consonant-vowel syllable in quiet and background noise in a cohort of typically-developing preschoolers (ages 3-5 yr). Overall, responses were degraded in noise: they were smaller, less stable across trials, slower, and there was poorer coding of spectral content and the temporal envelope. These effects were exacerbated in response to the consonant transition relative to the vowel, suggesting that the neural coding of spectrotemporally-dynamic speech features is more tenuous in noise than the coding of static features-even in children this young. Neural coding of speech temporal fine structure, however, was more resilient to the addition of background noise than coding of temporal envelope information. Taken together, these results demonstrate that noise places a neurophysiological constraint on speech processing during early childhood by causing a breakdown in neural processing of speech acoustics. These results may explain why some listeners have inordinate difficulties understanding speech in noise. Speech-elicited auditory-neurophysiological responses offer objective insight into listening skills during early childhood by reflecting the integrity of neural coding in quiet and noise; this paper documents typical response

  6. Flow and Acoustic Features of a Mach 0.9 Free Jet Using High-Frequency Excitation

    NASA Astrophysics Data System (ADS)

    Upadhyay, Puja; Alvi, Farrukh

    2016-11-01

    This study focuses on active control of a Mach 0.9 (ReD = 6 ×105) free jet using high-frequency excitation for noise reduction. Eight resonance-enhanced microjet actuators with nominal frequencies of 25 kHz (StD 2 . 2) are used to excite the shear layer at frequencies that are approximately an order of magnitude higher than the jet preferred frequency. The influence of control on mean and turbulent characteristics of the jet is studied using Particle Image Velocimetry. Additionally, far-field acoustic measurements are acquired to estimate the effect of pulsed injection on noise characteristics of the jet. Flow field measurements revealed that strong streamwise vortex pairs, formed as a result of control, result in a significantly thicker initial shear layer. This excited shear layer is also prominently undulated, resulting in a modified initial velocity profile. Also, the distribution of turbulent kinetic energy revealed that forcing results in increased turbulence levels for near-injection regions, followed by a global reduction for all downstream locations. Far-field acoustic measurements showed noise reductions at low to moderate frequencies. Additionally, an increase in high-frequency noise, mostly dominated by the actuators' resonant noise, was observed. AFOSR and ARO.

  7. Articulatory-to-Acoustic Relations in Response to Speaking Rate and Loudness Manipulations

    ERIC Educational Resources Information Center

    Mefferd, Antje S.; Green, Jordan R.

    2010-01-01

    Purpose: In this investigation, the authors determined the strength of association between tongue kinematic and speech acoustics changes in response to speaking rate and loudness manipulations. Performance changes in the kinematic and acoustic domains were measured using two aspects of speech production presumably affecting speech clarity:…

  8. An Acoustic Study of the Relationships among Neurologic Disease, Dysarthria Type, and Severity of Dysarthria

    ERIC Educational Resources Information Center

    Kim, Yunjung; Kent, Raymond D.; Weismer, Gary

    2011-01-01

    Purpose: This study examined acoustic predictors of speech intelligibility in speakers with several types of dysarthria secondary to different diseases and conducted classification analysis solely by acoustic measures according to 3 variables (disease, speech severity, and dysarthria type). Method: Speech recordings from 107 speakers with…

  9. Prediction and constraint in audiovisual speech perception.

    PubMed

    Peelle, Jonathan E; Sommers, Mitchell S

    2015-07-01

    During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing the precision of prediction. Electrophysiological studies demonstrate that oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to acoustic information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration

  10. Headphone localization of speech

    NASA Technical Reports Server (NTRS)

    Begault, Durand R.; Wenzel, Elizabeth M.

    1993-01-01

    Three-dimensional acoustic display systems have recently been developed that synthesize virtual sound sources over headphones based on filtering by head-related transfer functions (HRTFs), the direction-dependent spectral changes caused primarily by the pinnae. In this study, 11 inexperienced subjects judged the apparent spatial location of headphone-presented speech stimuli filtered with nonindividualized HRTFs. About half of the subjects 'pulled' their judgments toward either the median or the lateral-vertical planes, and estimates were almost always elevated. Individual differences were pronounced for the distance judgments; 15 to 46 percent of stimuli were heard inside the head, with the shortest estimates near the median plane. The results suggest that most listeners can obtain useful azimuth information from speech stimuli filtered by nonindividualized HRTFs. Measurements of localization error and reversal rates are comparable with a previous study that used broadband noise stimuli.

  11. Headphone localization of speech.

    PubMed

    Begault, D R; Wenzel, E M

    1993-06-01

    Three-dimensional acoustic display systems have recently been developed that synthesize virtual sound sources over headphones based on filtering by head-related transfer functions (HRTFs), the direction-dependent spectral changes caused primarily by the pinnae. In this study 11 inexperienced subjects judged the apparent spatial location of headphone-presented speech stimuli filtered with non-individualized HRTFs. About half of the subjects "pulled" their judgments toward either the median or the lateral-vertical planes, and estimates were almost always elevated. Individual differences were pronounced for the distance judgments; 15% to 46% of stimuli were heard inside the head, with the shortest estimates near the median plane. The results suggest that most listeners can obtain useful azimuth information from speech stimuli filtered by nonindividualized HRTFs. Measurements of localization error and reversal rates are comparable with a previous study that used broadband noise stimuli.

  12. A Joint Feature Extraction and Data Compression Method for Low Bit Rate Transmission in Distributed Acoustic Sensor Environments

    DTIC Science & Technology

    2004-12-01

    target classification. In this Phase I research, a subband-based joint detection, feature extraction, data compression/ encoding system for low bit...well as data compression/ encoding without incurring degradation in the overall performance. New methods for formation of the optimal sparse sensor arrays...Features & Data Compression ............................... 4 2.3 Encoding Methods ......... ......................................... 5 3 A joint

  13. The Rhythm of Perception: Entrainment to Acoustic Rhythms Induces Subsequent Perceptual Oscillation.

    PubMed

    Hickok, Gregory; Farahbod, Haleh; Saberi, Kourosh

    2015-07-01

    Acoustic rhythms are pervasive in speech, music, and environmental sounds. Recent evidence for neural codes representing periodic information suggests that they may be a neural basis for the ability to detect rhythm. Further, rhythmic information has been found to modulate auditory-system excitability, which provides a potential mechanism for parsing the acoustic stream. Here, we explored the effects of a rhythmic stimulus on subsequent auditory perception. We found that a low-frequency (3 Hz), amplitude-modulated signal induces a subsequent oscillation of the perceptual detectability of a brief nonperiodic acoustic stimulus (1-kHz tone); the frequency but not the phase of the perceptual oscillation matches the entrained stimulus-driven rhythmic oscillation. This provides evidence that rhythmic contexts have a direct influence on subsequent auditory perception of discrete acoustic events. Rhythm coding is likely a fundamental feature of auditory-system design that predates the development of explicit human enjoyment of rhythm in music or poetry.

  14. Variability in English vowels is comparable in articulation and acoustics.

    PubMed

    Noiray, Aude; Iskarous, Khalil; Whalen, D H

    2014-05-01

    The nature of the links between speech production and perception has been the subject of longstanding debate. The present study investigated the articulatory parameter of tongue height and the acoustic F1-F0 difference for the phonological distinction of vowel height in American English front vowels. Multiple repetitions of /i, ɪ, e, ε, æ/ in [(h)Vd] sequences were recorded in seven adult speakers. Articulatory (ultrasound) and acoustic data were collected simultaneously to provide a direct comparison of variability in vowel production in both domains. Results showed idiosyncratic patterns of articulation for contrasting the three front vowel pairs /i-ɪ/, /e-ε/ and /ε-æ/ across subjects, with the degree of variability in vowel articulation comparable to that observed in the acoustics for all seven participants. However, contrary to what was expected, some speakers showed reversals for tongue height for /ɪ/-/e/ that was also reflected in acoustics with F1 higher for /ɪ/ than for /e/. The data suggest the phonological distinction of height is conveyed via speaker-specific articulatory-acoustic patterns that do not strictly match features descriptions. However, the acoustic signal is faithful to the articulatory configuration that generated it, carrying the crucial information for perceptual contrast.

  15. Improving on hidden Markov models: An articulatorily constrained, maximum likelihood approach to speech recognition and speech coding

    SciTech Connect

    Hogden, J.

    1996-11-05

    The goal of the proposed research is to test a statistical model of speech recognition that incorporates the knowledge that speech is produced by relatively slow motions of the tongue, lips, and other speech articulators. This model is called Maximum Likelihood Continuity Mapping (Malcom). Many speech researchers believe that by using constraints imposed by articulator motions, we can improve or replace the current hidden Markov model based speech recognition algorithms. Unfortunately, previous efforts to incorporate information about articulation into speech recognition algorithms have suffered because (1) slight inaccuracies in our knowledge or the formulation of our knowledge about articulation may decrease recognition performance, (2) small changes in the assumptions underlying models of speech production can lead to large changes in the speech derived from the models, and (3) collecting measurements of human articulator positions in sufficient quantity for training a speech recognition algorithm is still impractical. The most interesting (and in fact, unique) quality of Malcom is that, even though Malcom makes use of a mapping between acoustics and articulation, Malcom can be trained to recognize speech using only acoustic data. By learning the mapping between acoustics and articulation using only acoustic data, Malcom avoids the difficulties involved in collecting articulator position measurements and does not require an articulatory synthesizer model to estimate the mapping between vocal tract shapes and speech acoustics. Preliminary experiments that demonstrate that Malcom can learn the mapping between acoustics and articulation are discussed. Potential applications of Malcom aside from speech recognition are also discussed. Finally, specific deliverables resulting from the proposed research are described.

  16. Acoustic biosensors

    PubMed Central

    Fogel, Ronen; Seshia, Ashwin A.

    2016-01-01

    Resonant and acoustic wave devices have been researched for several decades for application in the gravimetric sensing of a variety of biological and chemical analytes. These devices operate by coupling the measurand (e.g. analyte adsorption) as a modulation in the physical properties of the acoustic wave (e.g. resonant frequency, acoustic velocity, dissipation) that can then be correlated with the amount of adsorbed analyte. These devices can also be miniaturized with advantages in terms of cost, size and scalability, as well as potential additional features including integration with microfluidics and electronics, scaled sensitivities associated with smaller dimensions and higher operational frequencies, the ability to multiplex detection across arrays of hundreds of devices embedded in a single chip, increased throughput and the ability to interrogate a wider range of modes including within the same device. Additionally, device fabrication is often compatible with semiconductor volume batch manufacturing techniques enabling cost scalability and a high degree of precision and reproducibility in the manufacturing process. Integration with microfluidics handling also enables suitable sample pre-processing/separation/purification/amplification steps that could improve selectivity and the overall signal-to-noise ratio. Three device types are reviewed here: (i) bulk acoustic wave sensors, (ii) surface acoustic wave sensors, and (iii) micro/nano-electromechanical system (MEMS/NEMS) sensors. PMID:27365040

  17. Acoustic biosensors.

    PubMed

    Fogel, Ronen; Limson, Janice; Seshia, Ashwin A

    2016-06-30

    Resonant and acoustic wave devices have been researched for several decades for application in the gravimetric sensing of a variety of biological and chemical analytes. These devices operate by coupling the measurand (e.g. analyte adsorption) as a modulation in the physical properties of the acoustic wave (e.g. resonant frequency, acoustic velocity, dissipation) that can then be correlated with the amount of adsorbed analyte. These devices can also be miniaturized with advantages in terms of cost, size and scalability, as well as potential additional features including integration with microfluidics and electronics, scaled sensitivities associated with smaller dimensions and higher operational frequencies, the ability to multiplex detection across arrays of hundreds of devices embedded in a single chip, increased throughput and the ability to interrogate a wider range of modes including within the same device. Additionally, device fabrication is often compatible with semiconductor volume batch manufacturing techniques enabling cost scalability and a high degree of precision and reproducibility in the manufacturing process. Integration with microfluidics handling also enables suitable sample pre-processing/separation/purification/amplification steps that could improve selectivity and the overall signal-to-noise ratio. Three device types are reviewed here: (i) bulk acoustic wave sensors, (ii) surface acoustic wave sensors, and (iii) micro/nano-electromechanical system (MEMS/NEMS) sensors.

  18. Some behavioral and neurobiological constraints on theories of audiovisual speech integration: a review and suggestions for new directions.

    PubMed

    Altieri, Nicholas; Pisoni, David B; Townsend, James T

    2011-01-01

    Summerfield (1987) proposed several accounts of audiovisual speech perception, a field of research that has burgeoned in recent years. The proposed accounts included the integration of discrete phonetic features, vectors describing the values of independent acoustical and optical parameters, the filter function of the vocal tract, and articulatory dynamics of the vocal tract. The latter two accounts assume that the representations of audiovisual speech perception are based on abstract gestures, while the former two assume that the representations consist of symbolic or featural information obtained from visual and auditory modalities. Recent converging evidence from several different disciplines reveals that the general framework of Summerfield's feature-based theories should be expanded. An updated framework building upon the feature-based theories is presented. We propose a processing model arguing that auditory and visual brain circuits provide facilitatory information when the inputs are correctly timed, and that auditory and visual speech representations do not necessarily undergo translation into a common code during information processing. Future research on multisensory processing in speech perception should investigate the connections between auditory and visual brain regions, and utilize dynamic modeling tools to further understand the timing and information processing mechanisms involved in audiovisual speech integration.

  19. Auditory-perceptual learning improves speech motor adaptation in children.

    PubMed

    Shiller, Douglas M; Rochon, Marie-Lyne

    2014-08-01

    Auditory feedback plays an important role in children's speech development by providing the child with information about speech outcomes that is used to learn and fine-tune speech motor plans. The use of auditory feedback in speech motor learning has been extensively studied in adults by examining oral motor responses to manipulations of auditory feedback during speech production. Children are also capable of adapting speech motor patterns to perceived changes in auditory feedback; however, it is not known whether their capacity for motor learning is limited by immature auditory-perceptual abilities. Here, the link between speech perceptual ability and the capacity for motor learning was explored in two groups of 5- to 7-year-old children who underwent a period of auditory perceptual training followed by tests of speech motor adaptation to altered auditory feedback. One group received perceptual training on a speech acoustic property relevant to the motor task while a control group received perceptual training on an irrelevant speech contrast. Learned perceptual improvements led to an enhancement in speech motor adaptation (proportional to the perceptual change) only for the experimental group. The results indicate that children's ability to perceive relevant speech acoustic properties has a direct influence on their capacity for sensory-based speech motor adaptation.

  20. Automatic Speech Recognition

    NASA Astrophysics Data System (ADS)

    Potamianos, Gerasimos; Lamel, Lori; Wölfel, Matthias; Huang, Jing; Marcheret, Etienne; Barras, Claude; Zhu, Xuan; McDonough, John; Hernando, Javier; Macho, Dusan; Nadeu, Climent

    Automatic speech recognition (ASR) is a critical component for CHIL services. For example, it provides the input to higher-level technologies, such as summarization and question answering, as discussed in Chapter 8. In the spirit of ubiquitous computing, the goal of ASR in CHIL is to achieve a high performance using far-field sensors (networks of microphone arrays and distributed far-field microphones). However, close-talking microphones are also of interest, as they are used to benchmark ASR system development by providing a best-case acoustic channel scenario to compare against.

  1. Multiclassifier fusion of an ultrasonic lip reader in automatic speech recognition

    NASA Astrophysics Data System (ADS)

    Jennnings, David L.

    1994-12-01

    This thesis investigates the use of two active ultrasonic devices in collecting lip information for performing and enhancing automatic speech recognition. The two devices explored are called the 'Ultrasonic Mike' and the 'Lip Lock Loop.' The devices are tested in a speaker dependent isolated word recognition task with a vocabulary consisting of the spoken digits from zero to nine. Two automatic lip readers are designed and tested based on the output of the ultrasonic devices. The automatic lip readers use template matching and dynamic time warping to determine the best candidate for a given test utterance. The automatic lip readers alone achieve accuracies of 65-89%, depending on the number of reference templates used. Next the automatic lip reader is combined with a conventional automatic speech recognizer. Both classifier level fusion and feature level fusion are investigated. Feature fusion is based on combining the feature vectors prior to dynamic time warping. Classifier fusion is based on a pseudo probability mass function derived from the dynamic time warping distances. The combined systems are tested with various levels of acoustic noise added. In one typical test, at a signal to noise ratio of 0dB, the acoustic recognizer's accuracy alone was 78%, the automatic lip reader's accuracy was 69%, but the combined accuracy was 93%. This experiment demonstrates that a simple ultrasonic lip motion detector, that has an output data rate 12,500 times less than a typical video camera, can significantly improve the accuracy of automatic speech recognition in noise.

  2. Multiple-input multiple-output (MIMO) analog-to-feature converter chipsets for sub-wavelength acoustic source localization and bearing estimation

    NASA Astrophysics Data System (ADS)

    Chakrabartty, Shantanu

    2010-04-01

    Localization of acoustic sources using miniature microphone arrays poses a significant challenge due to fundamental limitations imposed by the physics of sound propagation. With sub-wavelength distances between the microphones, resolving acute localization cues become difficult due to precision artifacts. In this work, we present the design of a miniature, microphone array sensor based on a patented Multiple-input Multiple-output (MIMO) analog-to-feature converter (AFC) chip-sets which overcomes the limitations due to precision artifacts. Measured results from fabricated prototypes demonstrate a bearing range of 0 degrees to 90 degrees with a resolution less than 2 degrees. The power dissipation of the MIMO-ADC chip-set for this task was measured to be less than 75 microwatts making it ideal for portable, battery powered sniper and gunshot detection applications.

  3. Speech Aids

    NASA Technical Reports Server (NTRS)

    1987-01-01

    Designed to assist deaf and hearing impaired-persons in achieving better speech, Resnick Worldwide Inc.'s device provides a visual means of cuing the deaf as a speech-improvement measure. This is done by electronically processing the subjects' sounds and comparing them with optimum values which are displayed for comparison.

  4. Speech Communication.

    ERIC Educational Resources Information Center

    Brooks, William D.

    Presented in this book is a view of speech communication which enables an individual to become fully aware of his or her role as both initiator and recipient of messages. Communication is treated broadly with emphasis on the understanding and skills relating to various types of speech communication across the broad spectrum of human communication.…

  5. Symbolic Speech

    ERIC Educational Resources Information Center

    Podgor, Ellen S.

    1976-01-01

    The concept of symbolic speech emanates from the 1967 case of United States v. O'Brien. These discussions of flag desecration, grooming and dress codes, nude entertainment, buttons and badges, and musical expression show that the courts place symbolic speech in different strata from verbal communication. (LBH)

  6. Speech coding

    SciTech Connect

    Ravishankar, C., Hughes Network Systems, Germantown, MD

    1998-05-08

    Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the

  7. Improving the speech intelligibility in classrooms

    NASA Astrophysics Data System (ADS)

    Lam, Choi Ling Coriolanus

    One of the major acoustical concerns in classrooms is the establishment of effective verbal communication between teachers and students. Non-optimal acoustical conditions, resulting in reduced verbal communication, can cause two main problems. First, they can lead to reduce learning efficiency. Second, they can also cause fatigue, stress, vocal strain and health problems, such as headaches and sore throats, among teachers who are forced to compensate for poor acoustical conditions by raising their voices. Besides, inadequate acoustical conditions can induce the usage of public address system. Improper usage of such amplifiers or loudspeakers can lead to impairment of students' hearing systems. The social costs of poor classroom acoustics will be large to impair the learning of children. This invisible problem has far reaching implications for learning, but is easily solved. Many researches have been carried out that they have accurately and concisely summarized the research findings on classrooms acoustics. Though, there is still a number of challenging questions remaining unanswered. Most objective indices for speech intelligibility are essentially based on studies of western languages. Even several studies of tonal languages as Mandarin have been conducted, there is much less on Cantonese. In this research, measurements have been done in unoccupied rooms to investigate the acoustical parameters and characteristics of the classrooms. The speech intelligibility tests, which based on English, Mandarin and Cantonese, and the survey were carried out on students aged from 5 years old to 22 years old. It aims to investigate the differences in intelligibility between English, Mandarin and Cantonese of the classrooms in Hong Kong. The significance on speech transmission index (STI) related to Phonetically Balanced (PB) word scores will further be developed. Together with developed empirical relationship between the speech intelligibility in classrooms with the variations

  8. Phrase-level speech simulation with an airway modulation model of speech production

    PubMed Central

    Story, Brad H.

    2012-01-01

    Artificial talkers and speech synthesis systems have long been used as a means of understanding both speech production and speech perception. The development of an airway modulation model is described that simulates the time-varying changes of the glottis and vocal tract, as well as acoustic wave propagation, during speech production. The result is a type of artificial talker that can be used to study various aspects of how sound is generated by humans and how that sound is perceived by a listener. The primary components of the model are introduced and simulation of words and phrases are demonstrated. PMID:23503742

  9. Contemporary Issues in Phoneme Production by Hearing-Impaired Persons: Physiological and Acoustic Aspects.

    ERIC Educational Resources Information Center

    McGarr, Nancy S.; Whitehead, Robert

    1992-01-01

    This paper on physiologic correlates of speech production in children and youth with hearing impairments focuses specifically on the production of phonemes and includes data on respiration for speech production, phonation, speech aerodynamics, articulation, and acoustic analyses of speech by hearing-impaired persons. (Author/DB)

  10. Some articulatory details of emotional speech

    NASA Astrophysics Data System (ADS)

    Lee, Sungbok; Yildirim, Serdar; Bulut, Murtaza; Kazemzadeh, Abe; Narayanan, Shrikanth

    2005-09-01

    Differences in speech articulation among four emotion types, neutral, anger, sadness, and happiness are investigated by analyzing tongue tip, jaw, and lip movement data collected from one male and one female speaker of American English. The data were collected using an electromagnetic articulography (EMA) system while subjects produce simulated emotional speech. Pitch, root-mean-square (rms) energy and the first three formants were estimated for vowel segments. For both speakers, angry speech exhibited the largest rms energy and largest articulatory activity in terms of displacement range and movement speed. Happy speech is characterized by largest pitch variability. It has higher rms energy than neutral speech but articulatory activity is rather comparable to, or less than, neutral speech. That is, happy speech is more prominent in voicing activity than in articulation. Sad speech exhibits longest sentence duration and lower rms energy. However, its articulatory activity is no less than neutral speech. Interestingly, for the male speaker, articulation for vowels in sad speech is consistently more peripheral (i.e., more forwarded displacements) when compared to other emotions. However, this does not hold for female subject. These and other results will be discussed in detail with associated acoustics and perceived emotional qualities. [Work supported by NIH.

  11. Talking while chewing: speaker response to natural perturbation of speech.

    PubMed

    Mayer, Connor; Gick, Bryan

    2012-01-01

    This study looks at how the conflicting goals of chewing and speech production are reconciled by examining the acoustic and articulatory output of talking while chewing. We consider chewing to be a type of perturbation with regard to speech production, but with some important differences. Ultrasound and acoustic measurements were made while participants chewed gum and produced various utterances containing the sounds /s/, /ʃ/, and /r/. Results show a great deal of individual variation in articulation and acoustics between speakers, but consistent productions and maintenance of relative acoustic distances within speakers. Although chewing interfered with speech production, and this interference manifested itself in a variety of ways across speakers, the objectives of speech production were indirectly achieved within the constraints and variability introduced by individual chewing strategies.

  12. Psychophysics of Complex Auditory and Speech Stimuli.

    DTIC Science & Technology

    1996-10-01

    it more distinctive (i.e. in a different instrument timbre than the other musical voices) and less distinctive (i.e. presenting the musical pieces in...complex acoustic signals, including speech and music . Traditional, solid psycho- ,physical procedures were employed to systematically investigate...result in the perception of classes of complex auditory i stimuli, including speech and music . In health, industry, and human factors, the M.. SUBJECT

  13. Spectral identification of sperm whales from Littoral Acoustic Demonstration Center passive acoustic recordings

    NASA Astrophysics Data System (ADS)

    Sidorovskaia, Natalia A.; Richard, Blake; Ioup, George E.; Ioup, Juliette W.

    2005-09-01

    The Littoral Acoustic Demonstration Center (LADC) made a series of passive broadband acoustic recordings in the Gulf of Mexico and Ligurian Sea to study noise and marine mammal phonations. The collected data contain a large amount of various types of sperm whale phonations, such as isolated clicks and communication codas. It was previously reported that the spectrograms of the extracted clicks and codas contain well-defined null patterns that seem to be unique for individuals. The null pattern is formed due to individual features of the sound production organs of an animal. These observations motivated the present studies of adapting human speech identification techniques for deep-diving marine mammal phonations. A three-state trained hidden Markov model (HMM) was used with the phonation spectra of sperm whales. The HHM-algorithm gave 75% accuracy in identifying individuals when it had been initially tested for the acoustic data set correlated with visual observations of sperm whales. A comparison of the identification accuracy based on null-pattern similarity analysis and the HMM-algorithm is presented. The results can establish the foundation for developing an acoustic identification database for sperm whales and possibly other deep-diving marine mammals that would be difficult to observe visually. [Research supported by ONR.

  14. Levels of Processing of Speech and Non-Speech

    DTIC Science & Technology

    1991-05-10

    Timbre : A better musical analogv to speech? Presented to the Acoustical Society of America. Anaheim. A. Samuel. (Fall 1987) Central and peripheal...Thle studies of listener based factors include studies of perceptual. restoration of deleted sounds (phonemes or musical notes), and studies of the... music . The attentional investi- ctnsdemons;trate, rjAher fine-tuned ittentional control under high-predictability condi- Lios. ic~ifcart oogrssh&A; been

  15. Speech entrainment enables patients with Broca's aphasia to produce fluent speech.

    PubMed

    Fridriksson, Julius; Hubbard, H Isabel; Hudspeth, Sarah Grace; Holland, Audrey L; Bonilha, Leonardo; Fromm, Davida; Rorden, Chris

    2012-12-01

    A distinguishing feature of Broca's aphasia is non-fluent halting speech typically involving one to three words per utterance. Yet, despite such profound impairments, some patients can mimic audio-visual speech stimuli enabling them to produce fluent speech in real time. We call this effect 'speech entrainment' and reveal its neural mechanism as well as explore its usefulness as a treatment for speech production in Broca's aphasia. In Experiment 1, 13 patients with Broca's aphasia were tested in three conditions: (i) speech entrainment with audio-visual feedback where they attempted to mimic a speaker whose mouth was seen on an iPod screen; (ii) speech entrainment with audio-only feedback where patients mimicked heard speech; and (iii) spontaneous speech where patients spoke freely about assigned topics. The patients produced a greater variety of words using audio-visual feedback compared with audio-only feedback and spontaneous speech. No difference was found between audio-only feedback and spontaneous speech. In Experiment 2, 10 of the 13 patients included in Experiment 1 and 20 control subjects underwent functional magnetic resonance imaging to determine the neural mechanism that supports speech entrainment. Group results with patients and controls revealed greater bilateral cortical activation for speech produced during speech entrainment compared with spontaneous speech at the junction of the anterior insula and Brodmann area 47, in Brodmann area 37, and unilaterally in the left middle temporal gyrus and the dorsal portion of Broca's area. Probabilistic white matter tracts constructed for these regions in the normal subjects revealed a structural network connected via the corpus callosum and ventral fibres through the extreme capsule. Unilateral areas were connected via the arcuate fasciculus. In Experiment 3, all patients included in Experiment 1 participated in a 6-week treatment phase using speech entrainment to improve speech production. Behavioural and

  16. Acoustic Event Detection and Classification

    NASA Astrophysics Data System (ADS)

    Temko, Andrey; Nadeu, Climent; Macho, Dušan; Malkin, Robert; Zieger, Christian; Omologo, Maurizio

    The human activity that takes place in meeting rooms or classrooms is reflected in a rich variety of acoustic events (AE), produced either by the human body or by objects handled by humans, so the determination of both the identity of sounds and their position in time may help to detect and describe that human activity. Indeed, speech is usually the most informative sound, but other kinds of AEs may also carry useful information, for example, clapping or laughing inside a speech, a strong yawn in the middle of a lecture, a chair moving or a door slam when the meeting has just started. Additionally, detection and classification of sounds other than speech may be useful to enhance the robustness of speech technologies like automatic speech recognition.

  17. The Human Neural Alpha Response to Speech is a Proxy of Attentional Control.

    PubMed

    Wöstmann, Malte; Lim, Sung-Joo; Obleser, Jonas

    2017-03-18

    Human alpha (~10 Hz) oscillatory power is a prominent neural marker of cognitive effort. When listeners attempt to process and retain acoustically degraded speech, alpha power enhances. It is unclear whether these alpha modulations reflect the degree of acoustic degradation per se or the degradation-driven demand to a listener's attentional control. Using an irrelevant-speech paradigm and measuring the electroencephalogram (EEG), the current experiment demonstrates that the neural alpha response to speech is a surprisingly clear proxy of top-down control, entirely driven by the listening goals of attending versus ignoring degraded speech. While (n = 23) listeners retained the serial order of 9 to-be-recalled digits, one to-be-ignored sentence was presented. Distractibility of the to-be-ignored sentence parametrically varied in acoustic detail (noise-vocoding), with more acoustic detail of distracting speech increasingly disrupting listeners' serial memory recall. Where previous studies had observed decreases in parietal and auditory alpha power with more acoustic detail (of target speech), alpha power here showed the opposite pattern and increased with more acoustic detail in the speech distractor. In sum, the neural alpha response reflects almost exclusively a listener's goal, which is decisive for whether more acoustic detail facilitates comprehension (of attended speech) or enhances distraction (of ignored speech).

  18. Acoustic Emphasis in Four Year Olds

    ERIC Educational Resources Information Center

    Wonnacott, Elizabeth; Watson, Duane G.

    2008-01-01

    Acoustic emphasis may convey a range of subtle discourse distinctions, yet little is known about how this complex ability develops in children. This paper presents a first investigation of the factors which influence the production of acoustic prominence in young children's spontaneous speech. In a production experiment, SVO sentences were…

  19. Articulatory Mediation of Speech Perception: A Causal Analysis of Multi-Modal Imaging Data

    ERIC Educational Resources Information Center

    Gow, David W., Jr.; Segawa, Jennifer A.

    2009-01-01

    The inherent confound between the organization of articulation and the acoustic-phonetic structure of the speech signal makes it exceptionally difficult to evaluate the competing claims of motor and acoustic-phonetic accounts of how listeners recognize coarticulated speech. Here we use Granger causation analysis of high spatiotemporal resolution…

  20. Compensation for Coarticulation: Disentangling Auditory and Gestural Theories of Perception of Coarticulatory Effects in Speech

    ERIC Educational Resources Information Center

    Viswanathan, Navin; Magnuson, James S.; Fowler, Carol A.

    2010-01-01

    According to one approach to speech perception, listeners perceive speech by applying general pattern matching mechanisms to the acoustic signal (e.g., Diehl, Lotto, & Holt, 2004). An alternative is that listeners perceive the phonetic gestures that structured the acoustic signal (e.g., Fowler, 1986). The two accounts have offered different…