Science.gov

Sample records for acoustically modified speech

  1. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2006-02-14

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  2. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2004-03-23

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  3. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2006-08-08

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  4. Speech acoustics: How much science?

    PubMed

    Tiwari, Manjul

    2012-01-01

    Human vocalizations are sounds made exclusively by a human vocal tract. Among other vocalizations, for example, laughs or screams, speech is the most important. Speech is the primary medium of that supremely human symbolic communication system called language. One of the functions of a voice, perhaps the main one, is to realize language, by conveying some of the speaker's thoughts in linguistic form. Speech is language made audible. Moreover, when phoneticians compare and describe voices, they usually do so with respect to linguistic units, especially speech sounds, like vowels or consonants. It is therefore necessary to understand the structure as well as nature of speech sounds and how they are described. In order to understand and evaluate the speech, it is important to have at least a basic understanding of science of speech acoustics: how the acoustics of speech are produced, how they are described, and how differences, both between speakers and within speakers, arise in an acoustic output. One of the aims of this article is try to facilitate this understanding.

  5. System And Method For Characterizing Voiced Excitations Of Speech And Acoustic Signals, Removing Acoustic Noise From Speech, And Synthesizi

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2006-04-25

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  6. Mapping acoustics to kinematics in speech

    NASA Astrophysics Data System (ADS)

    Bali, Rohan

    An accurate mapping from speech acoustics to speech articulator movements has many practical applications, as well as theoretical implications of speech planning and perception science. This work can be divided into two parts. In the first part, we show that a simple codebook can be used to map acoustics to speech articulator movements in natural, conversational speech. In the second part, we incorporate cost optimization principles that have been shown to be relevant in motor control tasks into the codebook approach. These cost optimizations are defined as minimization of integral of magnitude velocity, acceleration and jerk of the speech articulators, and are implemented using a dynamic programming technique. Results show that incorporating cost minimization of speech articulator movements can significantly improve mapping acoustics to speech articulator movements. This suggests underlying physiological or neural planning principles used by speech articulators during speech production.

  7. Acoustics of Clear Speech: Effect of Instruction

    ERIC Educational Resources Information Center

    Lam, Jennifer; Tjaden, Kris; Wilding, Greg

    2012-01-01

    Purpose: This study investigated how different instructions for eliciting clear speech affected selected acoustic measures of speech. Method: Twelve speakers were audio-recorded reading 18 different sentences from the Assessment of Intelligibility of Dysarthric Speech (Yorkston & Beukelman, 1984). Sentences were produced in habitual, clear,…

  8. Effects of computer-based intervention through acoustically modified speech (Fast ForWord) in severe mixed receptive-expressive language impairment: outcomes from a randomized controlled trial.

    PubMed

    Cohen, Wendy; Hodson, Ann; O'Hare, Anne; Boyle, James; Durrani, Tariq; McCartney, Elspeth; Mattey, Mike; Naftalin, Lionel; Watson, Jocelynne

    2005-06-01

    Seventy-seven children between the ages of 6 and 10 years, with severe mixed receptive-expressive specific language impairment (SLI), participated in a randomized controlled trial (RCT) of Fast ForWord (FFW; Scientific Learning Corporation, 1997, 2001). FFW is a computer-based intervention for treating SLI using acoustically enhanced speech stimuli. These stimuli are modified to exaggerate their time and intensity properties as part of an adaptive training process. All children who participated in the RCT maintained their regular speech and language therapy and school regime throughout the trial. Standardized measures of receptive and expressive language were used to assess performance at baseline and to measure outcome from treatment at 9 weeks and 6 months. Children were allocated to 1 of 3 groups. Group A (n = 23) received the FFW intervention as a home-based therapy for 6 weeks. Group B (n = 27) received commercially available computer-based activities designed to promote language as a control for computer games exposure. Group C (n = 27) received no additional study intervention. Each group made significant gains in language scores, but there was no additional effect for either computer intervention. Thus, the findings from this RCT do not support the efficacy of FFW as an intervention for children with severe mixed receptive-expressive SLI.

  9. Analog Acoustic Expression in Speech Communication

    ERIC Educational Resources Information Center

    Shintel, Hadas; Nusbaum, Howard C.; Okrent, Arika

    2006-01-01

    We present the first experimental evidence of a phenomenon in speech communication we call "analog acoustic expression." Speech is generally thought of as conveying information in two distinct ways: discrete linguistic-symbolic units such as words and sentences represent linguistic meaning, and continuous prosodic forms convey information about…

  10. Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

    DOEpatents

    Holzrichter, John F.; Ng, Lawrence C.

    1998-01-01

    The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching.

  11. Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

    DOEpatents

    Holzrichter, J.F.; Ng, L.C.

    1998-03-17

    The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching. 35 figs.

  12. Optimizing acoustical conditions for speech intelligibility in classrooms

    NASA Astrophysics Data System (ADS)

    Yang, Wonyoung

    High speech intelligibility is imperative in classrooms where verbal communication is critical. However, the optimal acoustical conditions to achieve a high degree of speech intelligibility have previously been investigated with inconsistent results, and practical room-acoustical solutions to optimize the acoustical conditions for speech intelligibility have not been developed. This experimental study validated auralization for speech-intelligibility testing, investigated the optimal reverberation for speech intelligibility for both normal and hearing-impaired listeners using more realistic room-acoustical models, and proposed an optimal sound-control design for speech intelligibility based on the findings. The auralization technique was used to perform subjective speech-intelligibility tests. The validation study, comparing auralization results with those of real classroom speech-intelligibility tests, found that if the room to be auralized is not very absorptive or noisy, speech-intelligibility tests using auralization are valid. The speech-intelligibility tests were done in two different auralized sound fields---approximately diffuse and non-diffuse---using the Modified Rhyme Test and both normal and hearing-impaired listeners. A hybrid room-acoustical prediction program was used throughout the work, and it and a 1/8 scale-model classroom were used to evaluate the effects of ceiling barriers and reflectors. For both subject groups, in approximately diffuse sound fields, when the speech source was closer to the listener than the noise source, the optimal reverberation time was zero. When the noise source was closer to the listener than the speech source, the optimal reverberation time was 0.4 s (with another peak at 0.0 s) with relative output power levels of the speech and noise sources SNS = 5 dB, and 0.8 s with SNS = 0 dB. In non-diffuse sound fields, when the noise source was between the speaker and the listener, the optimal reverberation time was 0.6 s with

  13. Speech Intelligibility Advantages using an Acoustic Beamformer Display

    NASA Technical Reports Server (NTRS)

    Begault, Durand R.; Sunder, Kaushik; Godfroy, Martine; Otto, Peter

    2015-01-01

    A speech intelligibility test conforming to the Modified Rhyme Test of ANSI S3.2 "Method for Measuring the Intelligibility of Speech Over Communication Systems" was conducted using a prototype 12-channel acoustic beamformer system. The target speech material (signal) was identified against speech babble (noise), with calculated signal-noise ratios of 0, 5 and 10 dB. The signal was delivered at a fixed beam orientation of 135 deg (re 90 deg as the frontal direction of the array) and the noise at 135 deg (co-located) and 0 deg (separated). A significant improvement in intelligibility from 57% to 73% was found for spatial separation for the same signal-noise ratio (0 dB). Significant effects for improved intelligibility due to spatial separation were also found for higher signal-noise ratios (5 and 10 dB).

  14. Acoustic Evidence for Phonologically Mismatched Speech Errors

    ERIC Educational Resources Information Center

    Gormley, Andrea

    2015-01-01

    Speech errors are generally said to accommodate to their new phonological context. This accommodation has been validated by several transcription studies. The transcription methodology is not the best choice for detecting errors at this level, however, as this type of error can be difficult to perceive. This paper presents an acoustic analysis of…

  15. Perceptual centres in speech - an acoustic analysis

    NASA Astrophysics Data System (ADS)

    Scott, Sophie Kerttu

    Perceptual centres, or P-centres, represent the perceptual moments of occurrence of acoustic signals - the 'beat' of a sound. P-centres underlie the perception and production of rhythm in perceptually regular speech sequences. P-centres have been modelled both in speech and non speech (music) domains. The three aims of this thesis were toatest out current P-centre models to determine which best accounted for the experimental data bto identify a candidate parameter to map P-centres onto (a local approach) as opposed to the previous global models which rely upon the whole signal to determine the P-centre the final aim was to develop a model of P-centre location which could be applied to speech and non speech signals. The first aim was investigated by a series of experiments in which a) speech from different speakers was investigated to determine whether different models could account for variation between speakers b) whether rendering the amplitude time plot of a speech signal affects the P-centre of the signal c) whether increasing the amplitude at the offset of a speech signal alters P-centres in the production and perception of speech. The second aim was carried out by a) manipulating the rise time of different speech signals to determine whether the P-centre was affected, and whether the type of speech sound ramped affected the P-centre shift b) manipulating the rise time and decay time of a synthetic vowel to determine whether the onset alteration was had more affect on P-centre than the offset manipulation c) and whether the duration of a vowel affected the P-centre, if other attributes (amplitude, spectral contents) were held constant. The third aim - modelling P-centres - was based on these results. The Frequency dependent Amplitude Increase Model of P-centre location (FAIM) was developed using a modelling protocol, the APU GammaTone Filterbank and the speech from different speakers. The P-centres of the stimuli corpus were highly predicted by attributes of

  16. Acoustic Analysis of PD Speech

    PubMed Central

    Chenausky, Karen; MacAuslan, Joel; Goldhor, Richard

    2011-01-01

    According to the U.S. National Institutes of Health, approximately 500,000 Americans have Parkinson's disease (PD), with roughly another 50,000 receiving new diagnoses each year. 70%–90% of these people also have the hypokinetic dysarthria associated with PD. Deep brain stimulation (DBS) substantially relieves motor symptoms in advanced-stage patients for whom medication produces disabling dyskinesias. This study investigated speech changes as a result of DBS settings chosen to maximize motor performance. The speech of 10 PD patients and 12 normal controls was analyzed for syllable rate and variability, syllable length patterning, vowel fraction, voice-onset time variability, and spirantization. These were normalized by the controls' standard deviation to represent distance from normal and combined into a composite measure. Results show that DBS settings relieving motor symptoms can improve speech, making it up to three standard deviations closer to normal. However, the clinically motivated settings evaluated here show greater capacity to impair, rather than improve, speech. A feedback device developed from these findings could be useful to clinicians adjusting DBS parameters, as a means for ensuring they do not unwittingly choose DBS settings which impair patients' communication. PMID:21977333

  17. Gender difference in speech intelligibility using speech intelligibility tests and acoustic analyses

    PubMed Central

    2010-01-01

    PURPOSE The purpose of this study was to compare men with women in terms of speech intelligibility, to investigate the validity of objective acoustic parameters related with speech intelligibility, and to try to set up the standard data for the future study in various field in prosthodontics. MATERIALS AND METHODS Twenty men and women were served as subjects in the present study. After recording of sample sounds, speech intelligibility tests by three speech pathologists and acoustic analyses were performed. Comparison of the speech intelligibility test scores and acoustic parameters such as fundamental frequency, fundamental frequency range, formant frequency, formant ranges, vowel working space area, and vowel dispersion were done between men and women. In addition, the correlations between the speech intelligibility values and acoustic variables were analyzed. RESULTS Women showed significantly higher speech intelligibility scores than men and there were significant difference between men and women in most of acoustic parameters used in the present study. However, the correlations between the speech intelligibility scores and acoustic parameters were low. CONCLUSION Speech intelligibility test and acoustic parameters used in the present study were effective in differentiating male voice from female voice and their values might be used in the future studies related patients involved with maxillofacial prosthodontics. However, further studies are needed on the correlation between speech intelligibility tests and objective acoustic parameters. PMID:21165272

  18. Acoustic assessment of erygmophonic speech of Moroccan laryngectomized patients

    PubMed Central

    Ouattassi, Naouar; Benmansour, Najib; Ridal, Mohammed; Zaki, Zouheir; Bendahhou, Karima; Nejjari, Chakib; Cherkaoui, Abdeljabbar; El Alami, Mohammed Nouredine El Amine

    2015-01-01

    Introduction Acoustic evaluation of alaryngeal voices is among the most prominent issues in speech analysis field. In fact, many methods have been developed to date to substitute the classic perceptual evaluation. The Aim of this study is to present our experience in erygmophonic speech objective assessment and to discuss the most widely used methods of acoustic speech appraisal. through a prospective case-control study we have measured acoustic parameters of speech quality during one year of erygmophonic rehabilitation therapy of Moroccan laryngectomized patients. Methods We have assessed acoustic parameters of erygmophonic speech samples of eleven laryngectomized patients through the speech rehabilitation therapy. Acoustic parameters were obtained by perturbation analysis method and linear predictive coding algorithms also through the broadband spectrogram. Results Using perturbation analysis methods, we have found erygmophonic voice to be significantly poorer than normal speech and it exhibits higher formant frequency values. However, erygmophonic voice shows also higher and extremely variable Error values that were greater than the acceptable level. And thus, live a doubt on the reliability of those analytic methods results. Conclusion Acoustic parameters for objective evaluation of alaryngeal voices should allow a reliable representation of the perceptual evaluation of the quality of speech. This requirement has not been fulfilled by the common methods used so far. Therefore, acoustical assessment of erygmophonic speech needs more investigations. PMID:26587121

  19. Acoustic Study of Acted Emotions in Speech

    NASA Astrophysics Data System (ADS)

    Wang, Rong

    An extensive set of carefully recorded utterances provided a speech database for investigating acoustic correlates among eight emotional states. Four professional actors and four professional actresses simulated the emotional states of joy, conversation, nervousness, anger, sadness, hate, fear, and depression. The values of 14 acoustic parameters were extracted from analyses of the simulated portrayals. Normalization of the parameters was made to reduce the talker-dependence. Correlates of emotion were investigated by means of principal components analysis. Sadness and depression were found to be "ambiguous" with respect to each other, but "unique" with respect to joy and anger in the principal components space. Joy, conversation, nervousness, anger, hate, and fear did not separate well in the space and so exhibited ambiguity with respect to one another. The different talkers expressed joy, anger, sadness, and depression more consistently than the other four emotions. The analysis results were compared with the results of a subjective study using the same speech database and considerable consistency between the two was found.

  20. Acoustics in Halls for Speech and Music

    NASA Astrophysics Data System (ADS)

    Gade, Anders C.

    This chapter deals specifically with concepts, tools, and architectural variables of importance when designing auditoria for speech and music. The focus will be on cultivating the useful components of the sound in the room rather than on avoiding noise from outside or from installations, which is dealt with in Chap. 11. The chapter starts by presenting the subjective aspects of the room acoustic experience according to consensus at the time of writing. Then follows a description of their objective counterparts, the objective room acoustic parameters, among which the classical reverberation time measure is only one of many, but still of fundamental value. After explanations on how these parameters can be measured and predicted during the design phase, the remainder of the chapter deals with how the acoustic properties can be controlled by the architectural design of auditoria. This is done by presenting the influence of individual design elements as well as brief descriptions of halls designed for specific purposes, such as drama, opera, and symphonic concerts. Finally, some important aspects of loudspeaker installations in auditoria are briefly touched upon.

  1. School cafeteria noise-The impact of room acoustics and speech intelligibility on children's voice levels

    NASA Astrophysics Data System (ADS)

    Bridger, Joseph F.

    2002-05-01

    The impact of room acoustics and speech intelligibility conditions of different school cafeterias on the voice levels of children is examined. Methods of evaluating cafeteria designs and predicting noise levels are discussed. Children are shown to modify their voice levels with changes in speech intelligibility like adults. Reverberation and signal to noise ratio are the important acoustical factors affecting speech intelligibility. Children have much more difficulty than adults in conditions where noise and reverberation are present. To evaluate the relationship of voice level and speech intelligibility, a database of real sound levels and room acoustics data was generated from measurements and data recorded during visits to a variety of existing cafeterias under different occupancy conditions. The effects of speech intelligibility and room acoustics on childrens voice levels are demonstrated. A new method is presented for predicting speech intelligibility conditions and resulting noise levels for the design of new cafeterias and renovation of existing facilities. Measurements are provided for an existing school cafeteria before and after new room acoustics treatments were added. This will be helpful for acousticians, architects, school systems, regulatory agencies, and Parent Teacher Associations to create less noisy cafeteria environments.

  2. Acoustic markers of prosodic boundaries in Spanish spontaneous alaryngeal speech.

    PubMed

    Cuenca, M H; Barrio, M M

    2010-11-01

    Prosodic information aids segmentation of the continuous speech signal and thereby facilitates auditory speech processing. Durational and pitch variations are prosodic cues especially necessary to convey prosodic boundaries, but alaryngeal speakers have inconsistent control over acoustic parameters such as F0 and duration, being as a result noisy and less intelligible than normal speech. This case study has investigated whether one Spanish alaryngeal speaker proficient in both oesophageal and tracheoesophageal speech modes used the same acoustic cues for prosodic boundaries in both types of voicing. Pre-boundary lengthening, F0-excursions and pausing (number of pauses and position) were measured in spontaneous speech samples, using Praat. The acoustic analysis has revealed that the subject has relied on a different combination of cues in each type of voicing to convey the presence of prosodic boundaries.

  3. Age-Related Changes in Acoustic Characteristics of Adult Speech

    ERIC Educational Resources Information Center

    Torre, Peter, III; Barlow, Jessica A.

    2009-01-01

    This paper addresses effects of age and sex on certain acoustic properties of speech, given conflicting findings on such effects reported in prior research. The speech of 27 younger adults (15 women, 12 men; mean age 25.5 years) and 59 older adults (32 women, 27 men; mean age 75.2 years) was evaluated for identification of differences for sex and…

  4. Acoustic Markers of Prosodic Boundaries in Spanish Spontaneous Alaryngeal Speech

    ERIC Educational Resources Information Center

    Cuenca, M. H.; Barrio, M. M.

    2010-01-01

    Prosodic information aids segmentation of the continuous speech signal and thereby facilitates auditory speech processing. Durational and pitch variations are prosodic cues especially necessary to convey prosodic boundaries, but alaryngeal speakers have inconsistent control over acoustic parameters such as F0 and duration, being as a result noisy…

  5. Clear Speech Variants: An Acoustic Study in Parkinson's Disease

    ERIC Educational Resources Information Center

    Lam, Jennifer; Tjaden, Kris

    2016-01-01

    Purpose: The authors investigated how different variants of clear speech affect segmental and suprasegmental acoustic measures of speech in speakers with Parkinson's disease and a healthy control group. Method: A total of 14 participants with Parkinson's disease and 14 control participants served as speakers. Each speaker produced 18 different…

  6. Detecting suspicious behaviour using speech: acoustic correlates of deceptive speech -- an exploratory investigation.

    PubMed

    Kirchhübel, Christin; Howard, David M

    2013-09-01

    The current work intended to enhance our knowledge of changes or lack of changes in the speech signal when people were being deceptive. In particular, the study attempted to investigate the appropriateness of using speech cues in detecting deception. Truthful, deceptive and control speech were elicited from ten speakers in an interview setting. The data were subjected to acoustic analysis and results are presented on a range of speech parameters including fundamental frequency (f0), overall amplitude and mean vowel formants F1, F2 and F3. A significant correlation could not be established between deceptiveness/truthfulness and any of the acoustic features examined. Directions for future work are highlighted.

  7. Acoustic richness modulates the neural networks supporting intelligible speech processing.

    PubMed

    Lee, Yune-Sang; Min, Nam Eun; Wingfield, Arthur; Grossman, Murray; Peelle, Jonathan E

    2016-03-01

    The information contained in a sensory signal plays a critical role in determining what neural processes are engaged. Here we used interleaved silent steady-state (ISSS) functional magnetic resonance imaging (fMRI) to explore how human listeners cope with different degrees of acoustic richness during auditory sentence comprehension. Twenty-six healthy young adults underwent scanning while hearing sentences that varied in acoustic richness (high vs. low spectral detail) and syntactic complexity (subject-relative vs. object-relative center-embedded clause structures). We manipulated acoustic richness by presenting the stimuli as unprocessed full-spectrum speech, or noise-vocoded with 24 channels. Importantly, although the vocoded sentences were spectrally impoverished, all sentences were highly intelligible. These manipulations allowed us to test how intelligible speech processing was affected by orthogonal linguistic and acoustic demands. Acoustically rich speech showed stronger activation than acoustically less-detailed speech in a bilateral temporoparietal network with more pronounced activity in the right hemisphere. By contrast, listening to sentences with greater syntactic complexity resulted in increased activation of a left-lateralized network including left posterior lateral temporal cortex, left inferior frontal gyrus, and left dorsolateral prefrontal cortex. Significant interactions between acoustic richness and syntactic complexity occurred in left supramarginal gyrus, right superior temporal gyrus, and right inferior frontal gyrus, indicating that the regions recruited for syntactic challenge differed as a function of acoustic properties of the speech. Our findings suggest that the neural systems involved in speech perception are finely tuned to the type of information available, and that reducing the richness of the acoustic signal dramatically alters the brain's response to spoken language, even when intelligibility is high.

  8. Evaluation of disfluent speech by means of automatic acoustic measurements.

    PubMed

    Lustyk, Tomas; Bergl, Petr; Cmejla, Roman

    2014-03-01

    An experiment was carried out to determine whether the level of the speech fluency disorder can be estimated by means of automatic acoustic measurements. These measures analyze, for example, the amount of silence in a recording or the number of abrupt spectral changes in a speech signal. All the measures were designed to take into account symptoms of stuttering. In the experiment, 118 audio recordings of read speech by Czech native speakers were employed. The results indicate that the human-made rating of the speech fluency disorder in read speech can be predicted on the basis of automatic measurements. The number of abrupt spectral changes in the speech segments turns out to be the most appropriate measure to describe the overall speech performance. The results also imply that there are measures with good results describing partial symptoms (especially fixed postures without audible airflow).

  9. Study of acoustic correlates associate with emotional speech

    NASA Astrophysics Data System (ADS)

    Yildirim, Serdar; Lee, Sungbok; Lee, Chul Min; Bulut, Murtaza; Busso, Carlos; Kazemzadeh, Ebrahim; Narayanan, Shrikanth

    2004-10-01

    This study investigates the acoustic characteristics of four different emotions expressed in speech. The aim is to obtain detailed acoustic knowledge on how a speech signal is modulated by changes from neutral to a certain emotional state. Such knowledge is necessary for automatic emotion recognition and classification and emotional speech synthesis. Speech data obtained from two semi-professional actresses are analyzed and compared. Each subject produces 211 sentences with four different emotions; neutral, sad, angry, happy. We analyze changes in temporal and acoustic parameters such as magnitude and variability of segmental duration, fundamental frequency and the first three formant frequencies as a function of emotion. Acoustic differences among the emotions are also explored with mutual information computation, multidimensional scaling and acoustic likelihood comparison with normal speech. Results indicate that speech associated with anger and happiness is characterized by longer duration, shorter interword silence, higher pitch and rms energy with wider ranges. Sadness is distinguished from other emotions by lower rms energy and longer interword silence. Interestingly, the difference in formant pattern between [happiness/anger] and [neutral/sadness] are better reflected in back vowels such as /a/(/father/) than in front vowels. Detailed results on intra- and interspeaker variability will be reported.

  10. Acoustic assessment of speech privacy curtains in two nursing units.

    PubMed

    Pope, Diana S; Miller-Klein, Erik T

    2016-01-01

    Hospitals have complex soundscapes that create challenges to patient care. Extraneous noise and high reverberation rates impair speech intelligibility, which leads to raised voices. In an unintended spiral, the increasing noise may result in diminished speech privacy, as people speak loudly to be heard over the din. The products available to improve hospital soundscapes include construction materials that absorb sound (acoustic ceiling tiles, carpet, wall insulation) and reduce reverberation rates. Enhanced privacy curtains are now available and offer potential for a relatively simple way to improve speech privacy and speech intelligibility by absorbing sound at the hospital patient's bedside. Acoustic assessments were performed over 2 days on two nursing units with a similar design in the same hospital. One unit was built with the 1970s' standard hospital construction and the other was newly refurbished (2013) with sound-absorbing features. In addition, we determined the effect of an enhanced privacy curtain versus standard privacy curtains using acoustic measures of speech privacy and speech intelligibility indexes. Privacy curtains provided auditory protection for the patients. In general, that protection was increased by the use of enhanced privacy curtains. On an average, the enhanced curtain improved sound absorption from 20% to 30%; however, there was considerable variability, depending on the configuration of the rooms tested. Enhanced privacy curtains provide measureable improvement to the acoustics of patient rooms but cannot overcome larger acoustic design issues. To shorten reverberation time, additional absorption, and compact and more fragmented nursing unit floor plate shapes should be considered.

  11. Acoustic assessment of speech privacy curtains in two nursing units

    PubMed Central

    Pope, Diana S.; Miller-Klein, Erik T.

    2016-01-01

    Hospitals have complex soundscapes that create challenges to patient care. Extraneous noise and high reverberation rates impair speech intelligibility, which leads to raised voices. In an unintended spiral, the increasing noise may result in diminished speech privacy, as people speak loudly to be heard over the din. The products available to improve hospital soundscapes include construction materials that absorb sound (acoustic ceiling tiles, carpet, wall insulation) and reduce reverberation rates. Enhanced privacy curtains are now available and offer potential for a relatively simple way to improve speech privacy and speech intelligibility by absorbing sound at the hospital patient's bedside. Acoustic assessments were performed over 2 days on two nursing units with a similar design in the same hospital. One unit was built with the 1970s’ standard hospital construction and the other was newly refurbished (2013) with sound-absorbing features. In addition, we determined the effect of an enhanced privacy curtain versus standard privacy curtains using acoustic measures of speech privacy and speech intelligibility indexes. Privacy curtains provided auditory protection for the patients. In general, that protection was increased by the use of enhanced privacy curtains. On an average, the enhanced curtain improved sound absorption from 20% to 30%; however, there was considerable variability, depending on the configuration of the rooms tested. Enhanced privacy curtains provide measureable improvement to the acoustics of patient rooms but cannot overcome larger acoustic design issues. To shorten reverberation time, additional absorption, and compact and more fragmented nursing unit floor plate shapes should be considered. PMID:26780959

  12. Effects of Computer-Based Intervention through Acoustically Modified Speech (Fast ForWord) in Severe Mixed Receptive-Expressive Language Impairment: Outcomes from a Randomized Controlled Trial

    ERIC Educational Resources Information Center

    Cohen, Wendy; Hodson, Ann; O'Hare, Anne; Boyle, James; Durrani, Tariq; McCartney, Elspeth; Mattey, Mike; Naftalin, Lionel; Watson, Jocelynne

    2005-01-01

    Seventy-seven children between the ages of 6 and 10 years, with severe mixed receptive-expressive specific language impairment (SLI), participated in a randomized controlled trial (RCT) of Fast ForWord (FFW; Scientific Learning Corporation, 1997, 2001). FFW is a computer-based intervention for treating SLI using acoustically enhanced speech…

  13. The acoustic-modeling problem in automatic speech recognition

    NASA Astrophysics Data System (ADS)

    Brown, Peter F.

    1987-12-01

    This thesis examines the acoustic-modeling problem in automatic speech recognition from an information-theoretic point of view. This problem is to design a speech-recognition system which can extract from the speech waveform as much information as possible about the corresponding word sequence. The information extraction process is broken down into two steps: a signal processing step which converts a speech waveform into a sequence of information bearing acoustic feature vectors, and a step which models such a sequence. This thesis is primarily concerned with the use of hidden Markov models to model sequences of feature vectors which lie in a continuous space such as R sub N. It explores the trade-off between packing a lot of information into such sequences and being able to model them accurately. The difficulty of developing accurate models of continuous parameter sequences is addressed by investigating a method of parameter estimation which is specifically designed to cope with inaccurate modeling assumptions.

  14. Methods and apparatus for non-acoustic speech characterization and recognition

    DOEpatents

    Holzrichter, John F.

    1999-01-01

    By simultaneously recording EM wave reflections and acoustic speech information, the positions and velocities of the speech organs as speech is articulated can be defined for each acoustic speech unit. Well defined time frames and feature vectors describing the speech, to the degree required, can be formed. Such feature vectors can uniquely characterize the speech unit being articulated each time frame. The onset of speech, rejection of external noise, vocalized pitch periods, articulator conditions, accurate timing, the identification of the speaker, acoustic speech unit recognition, and organ mechanical parameters can be determined.

  15. Methods and apparatus for non-acoustic speech characterization and recognition

    SciTech Connect

    Holzrichter, J.F.

    1999-12-21

    By simultaneously recording EM wave reflections and acoustic speech information, the positions and velocities of the speech organs as speech is articulated can be defined for each acoustic speech unit. Well defined time frames and feature vectors describing the speech, to the degree required, can be formed. Such feature vectors can uniquely characterize the speech unit being articulated each time frame. The onset of speech, rejection of external noise, vocalized pitch periods, articulator conditions, accurate timing, the identification of the speaker, acoustic speech unit recognition, and organ mechanical parameters can be determined.

  16. Relationship between acoustic measures and speech naturalness ratings in Parkinson's disease: A within-speaker approach.

    PubMed

    Klopfenstein, Marie

    2015-01-01

    This study investigated the acoustic basis of across-utterance, within-speaker variation in speech naturalness for four speakers with dysarthria secondary to Parkinson's disease (PD). Speakers read sentences and produced spontaneous speech. Acoustic measures of fundamental frequency, phrase-final syllable lengthening, intensity and speech rate were obtained. A group of listeners judged speech naturalness using a nine-point Likert scale. Relationships between judgements of speech naturalness and acoustic measures were determined for individual speakers with PD. Relationships among acoustic measures also were quantified. Despite variability between speakers, measures of mean F0, intensity range, articulation rate, average syllable duration, duration of final syllables, vocalic nucleus length of final unstressed syllables and pitch accent of final syllables emerged as possible acoustic variables contributing to within-speaker variations in speech naturalness. Results suggest that acoustic measures correlate with speech naturalness, but in dysarthric speech they depend on the speaker due to the within-speaker variation in speech impairment.

  17. Clear Speech Variants: An Acoustic Study in Parkinson's Disease

    PubMed Central

    Tjaden, Kris

    2016-01-01

    Purpose The authors investigated how different variants of clear speech affect segmental and suprasegmental acoustic measures of speech in speakers with Parkinson's disease and a healthy control group. Method A total of 14 participants with Parkinson's disease and 14 control participants served as speakers. Each speaker produced 18 different sentences selected from the Sentence Intelligibility Test (Yorkston & Beukelman, 1996). All speakers produced stimuli in 4 speaking conditions (habitual, clear, overenunciate, and hearing impaired). Segmental acoustic measures included vowel space area and first moment (M1) coefficient difference measures for consonant pairs. Second formant slope of diphthongs and measures of vowel and fricative durations were also obtained. Suprasegmental measures included fundamental frequency, sound pressure level, and articulation rate. Results For the majority of adjustments, all variants of clear speech instruction differed from the habitual condition. The overenunciate condition elicited the greatest magnitude of change for segmental measures (vowel space area, vowel durations) and the slowest articulation rates. The hearing impaired condition elicited the greatest fricative durations and suprasegmental adjustments (fundamental frequency, sound pressure level). Conclusions Findings have implications for a model of speech production for healthy speakers as well as for speakers with dysarthria. Findings also suggest that particular clear speech instructions may target distinct speech subsystems. PMID:27355431

  18. An Acoustic Measure for Word Prominence in Spontaneous Speech

    PubMed Central

    Wang, Dagen; Narayanan, Shrikanth

    2010-01-01

    An algorithm for automatic speech prominence detection is reported in this paper. We describe a comparative analysis on various acoustic features for word prominence detection and report results using a spoken dialog corpus with manually assigned prominence labels. The focus is on features such as spectral intensity and speech rate that are directly extracted from speech based on a correlation-based approach without requiring explicit linguistic or phonetic knowledge. Additionally, various pitch-based measures are studied with respect to their discriminating ability for prominence detection. A parametric scheme for modeling pitch plateau is proposed and this feature alone is found to outperform the traditional local pitch statistics. Two sets of experiments are used to explore the usefulness of the acoustic score generated using these features. The first set focuses on a more traditional way of word prominence detection based on a manually-tagged corpus. A 76.8% classification accuracy was achieved on a corpus of role-playing spoken dialogs. Due to difficulties in manually tagging speech prominence into discrete levels (categories), the second set of experiments focuses on evaluating the score indirectly. Specifically, through experiments on the Switchboard corpus, it is shown that the proposed acoustic score can discriminate between content word and function words in a statistically significant way. The relation between speech prominence and content/function words is also explored. Since prominent words tend to be predominantly content words, and since content words can be automatically marked from text-derived part of speech (POS) information, it is shown that the proposed acoustic score can be indirectly cross-validated through POS information. PMID:20454538

  19. Multiexpert automatic speech recognition using acoustic and myoelectric signals.

    PubMed

    Chan, Adrian D C; Englehart, Kevin B; Hudgins, Bernard; Lovely, Dennis F

    2006-04-01

    Classification accuracy of conventional automatic speech recognition (ASR) systems can decrease dramatically under acoustically noisy conditions. To improve classification accuracy and increase system robustness a multiexpert ASR system is implemented. In this system, acoustic speech information is supplemented with information from facial myoelectric signals (MES). A new method of combining experts, known as the plausibility method, is employed to combine an acoustic ASR expert and a MES ASR expert. The plausibility method of combining multiple experts, which is based on the mathematical framework of evidence theory, is compared to the Borda count and score-based methods of combination. Acoustic and facial MES data were collected from 5 subjects, using a 10-word vocabulary across an 18-dB range of acoustic noise. As expected the performance of an acoustic expert decreases with increasing acoustic noise; classification accuracies of the acoustic ASR expert are as low as 11.5%. The effect of noise is significantly reduced with the addition of the MES ASR expert. Classification accuracies remain above 78.8% across the 18-dB range of acoustic noise, when the plausibility method is used to combine the opinions of multiple experts. In addition, the plausibility method produced classification accuracies higher than any individual expert at all noise levels, as well as the highest classification accuracies, except at the 9-dB noise level. Using the Borda count and score-based multiexpert systems, classification accuracies are improved relative to the acoustic ASR expert but are as low as 51.5% and 59.5%, respectively.

  20. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2002-01-01

    Low power EM waves are used to detect motions of vocal tract tissues of the human speech system before, during, and after voiced speech. A voiced excitation function is derived. The excitation function provides speech production information to enhance speech characterization and to enable noise removal from human speech.

  1. Acoustic Characteristics of Ataxic Speech in Japanese Patients with Spinocerebellar Degeneration (SCD)

    ERIC Educational Resources Information Center

    Ikui, Yukiko; Tsukuda, Mamoru; Kuroiwa, Yoshiyuki; Koyano, Shigeru; Hirose, Hajime; Taguchi, Takahide

    2012-01-01

    Background: In English- and German-speaking countries, ataxic speech is often described as showing scanning based on acoustic impressions. Although the term "scanning" is generally considered to represent abnormal speech features including prosodic excess or insufficiency, any precise acoustic analysis of ataxic speech has not been…

  2. Adaptation to Room Acoustics Using the Modified Rhyme Test

    PubMed Central

    Brandewie, Eugene; Zahorik, Pavel

    2012-01-01

    The negative effect of reverberant sound energy on speech intelligibility is well documented. Recently, however, prior exposure to room acoustics has been shown to increase intelligibility for a number of listeners in simulated room environments. This room adaptation effect, a possible extension of dynamic echo suppression, has been shown to be specific to reverberant rooms and requires binaural input. Because this effect has been demonstrated only using the Coordinated Response Measure (CRM) corpus it is important to determine whether the increase in intelligibility scores reported previously was due to the specific nature of the CRM task. Here we demonstrate a comparable room-acoustic effect using the Modified Rhyme Test (MRT) corpus in multiple room environments. The results are consistent with the idea that the room adaptation effect may be a natural phenomenon of listening in reverberant environments. PMID:23437415

  3. Speech intelligibility in complex acoustic environments in young children

    NASA Astrophysics Data System (ADS)

    Litovsky, Ruth

    2003-04-01

    While the auditory system undergoes tremendous maturation during the first few years of life, it has become clear that in complex scenarios when multiple sounds occur and when echoes are present, children's performance is significantly worse than their adult counterparts. The ability of children (3-7 years of age) to understand speech in a simulated multi-talker environment and to benefit from spatial separation of the target and competing sounds was investigated. In these studies, competing sources vary in number, location, and content (speech, modulated or unmodulated speech-shaped noise and time-reversed speech). The acoustic spaces were also varied in size and amount of reverberation. Finally, children with chronic otitis media who received binaural training were tested pre- and post-training on a subset of conditions. Results indicated the following. (1) Children experienced significantly more masking than adults, even in the simplest conditions tested. (2) When the target and competing sounds were spatially separated speech intelligibility improved, but the amount varied with age, type of competing sound, and number of competitors. (3) In a large reverberant classroom there was no benefit of spatial separation. (4) Binaural training improved speech intelligibility performance in children with otitis media. Future work includes similar studies in children with unilateral and bilateral cochlear implants. [Work supported by NIDCD, DRF, and NOHR.

  4. Effects of age, acoustic challenge, and verbal working memory on recall of narrative speech

    PubMed Central

    Ward, Caitlin M.; Rogers, Chad S.; Van Engen, Kristin J.; Peelle, Jonathan E.

    2016-01-01

    Background A common goal during speech comprehension is to remember what we have heard. Encoding speech into long-term memory frequently requires processes such as verbal working memory that may also be involved in processing degraded speech. Here we tested whether young and older adult listeners’ memory for short stories was worse when the stories were acoustically degraded, or whether the additional contextual support provided by a narrative would protect against these effects. Methods We tested 30 young adults (aged 18–28 years) and 30 older adults (aged 65–79 years) with good self-reported hearing. Participants heard short stories that were presented as normal (unprocessed) speech, or acoustically degraded using a noise vocoding algorithm with 24 or 16 channels. The degraded stories were still fully intelligible. Following each story, participants were asked to repeat the story in as much detail as possible. Recall was scored using a modified idea unit scoring approach, which included separately scoring hierarchical levels of narrative detail. Results Memory for acoustically degraded stories was significantly worse than for normal stories at some levels of narrative detail. Older adults’ memory for the stories was significantly worse overall, but there was no interaction between age and acoustic clarity or level of narrative detail. Verbal working memory (assessed by reading span) significantly correlated with recall accuracy for both young and older adults, whereas hearing ability (better ear pure-tone average) did not. Conclusion Our findings are consistent with a framework in which the additional cognitive demands caused by a degraded acoustic signal use resources that would otherwise be available for memory encoding for both young and older adults. Verbal working memory is a likely candidate for supporting both of these processes. PMID:26683044

  5. Automatic speech segmentation using throat-acoustic correlation coefficients

    NASA Astrophysics Data System (ADS)

    Mussabayev, Rustam Rafikovich; Kalimoldayev, Maksat N.; Amirgaliyev, Yedilkhan N.; Mussabayev, Timur R.

    2016-11-01

    This work considers one of the approaches to the solution of the task of discrete speech signal automatic segmentation. The aim of this work is to construct such an algorithm which should meet the following requirements: segmentation of a signal into acoustically homogeneous segments, high accuracy and segmentation speed, unambiguity and reproducibility of segmentation results, lack of necessity of preliminary training with the use of a special set consisting of manually segmented signals. Development of the algorithm which corresponds to the given requirements was conditioned by the necessity of formation of automatically segmented speech databases that have a large volume. One of the new approaches to the solution of this task is viewed in this article. For this purpose we use the new type of informative features named TAC-coefficients (Throat-Acoustic Correlation coefficients) which provide sufficient segmentation accuracy and effi- ciency.

  6. Tongue-Palate Contact Pressure, Oral Air Pressure, and Acoustics of Clear Speech

    ERIC Educational Resources Information Center

    Searl, Jeff; Evitts, Paul M.

    2013-01-01

    Purpose: The authors compared articulatory contact pressure (ACP), oral air pressure (Po), and speech acoustics for conversational versus clear speech. They also assessed the relationship of these measures to listener perception. Method: Twelve adults with normal speech produced monosyllables in a phrase using conversational and clear speech.…

  7. Adding articulatory features to acoustic features for automatic speech recognition

    SciTech Connect

    Zlokarnik, I.

    1995-05-01

    A hidden-Markov-model (HMM) based speech recognition system was evaluated that makes use of simultaneously recorded acoustic and articulatory data. The articulatory measurements were gathered by means of electromagnetic articulography and describe the movement of small coils fixed to the speakers` tongue and jaw during the production of German V{sub 1}CV{sub 2} sequences [P. Hoole and S. Gfoerer, J. Acoust. Soc. Am. Suppl. 1 {bold 87}, S123 (1990)]. Using the coordinates of the coil positions as an articulatory representation, acoustic and articulatory features were combined to make up an acoustic--articulatory feature vector. The discriminant power of this combined representation was evaluated for two subjects on a speaker-dependent isolated word recognition task. When the articulatory measurements were used both for training and testing the HMMs, the articulatory representation was capable of reducing the error rate of comparable acoustic-based HMMs by a relative percentage of more than 60%. In a separate experiment, the articulatory movements during the testing phase were estimated using a multilayer perceptron that performed an acoustic-to-articulatory mapping. Under these more realistic conditions, when articulatory measurements are only available during the training, the error rate could be reduced by a relative percentage of 18% to 25%.

  8. Effects and modeling of phonetic and acoustic confusions in accented speech

    NASA Astrophysics Data System (ADS)

    Fung, Pascale; Liu, Yi

    2005-11-01

    Accented speech recognition is more challenging than standard speech recognition due to the effects of phonetic and acoustic confusions. Phonetic confusion in accented speech occurs when an expected phone is pronounced as a different one, which leads to erroneous recognition. Acoustic confusion occurs when the pronounced phone is found to lie acoustically between two baseform models and can be equally recognized as either one. We propose that it is necessary to analyze and model these confusions separately in order to improve accented speech recognition without degrading standard speech recognition. Since low phonetic confusion units in accented speech do not give rise to automatic speech recognition errors, we focus on analyzing and reducing phonetic and acoustic confusability under high phonetic confusion conditions. We propose using likelihood ratio test to measure phonetic confusion, and asymmetric acoustic distance to measure acoustic confusion. Only accent-specific phonetic units with low acoustic confusion are used in an augmented pronunciation dictionary, while phonetic units with high acoustic confusion are reconstructed using decision tree merging. Experimental results show that our approach is effective and superior to methods modeling phonetic confusion or acoustic confusion alone in accented speech, with a significant 5.7% absolute WER reduction, without degrading standard speech recognition.

  9. Acoustic Detail But Not Predictability of Task-Irrelevant Speech Disrupts Working Memory

    PubMed Central

    Wöstmann, Malte; Obleser, Jonas

    2016-01-01

    Attended speech is comprehended better not only if more acoustic detail is available, but also if it is semantically highly predictable. But can more acoustic detail or higher predictability turn into disadvantages and distract a listener if the speech signal is to be ignored? Also, does the degree of distraction increase for older listeners who typically show a decline in attentional control ability? Adopting the irrelevant-speech paradigm, we tested whether younger (age 23–33 years) and older (60–78 years) listeners’ working memory for the serial order of spoken digits would be disrupted by the presentation of task-irrelevant speech varying in its acoustic detail (using noise-vocoding) and its semantic predictability (of sentence endings). More acoustic detail, but not higher predictability, of task-irrelevant speech aggravated memory interference. This pattern of results did not differ between younger and older listeners, despite generally lower performance in older listeners. Our findings suggest that the focus of attention determines how acoustics and predictability affect the processing of speech: first, as more acoustic detail is known to enhance speech comprehension and memory for speech, we here demonstrate that more acoustic detail of ignored speech enhances the degree of distraction. Second, while higher predictability of attended speech is known to also enhance speech comprehension under acoustically adverse conditions, higher predictability of ignored speech is unable to exert any distracting effect upon working memory performance in younger or older listeners. These findings suggest that features that make attended speech easier to comprehend do not necessarily enhance distraction by ignored speech. PMID:27826235

  10. Influences of noise-interruption and information-bearing acoustic changes on understanding simulated electric-acoustic speech.

    PubMed

    Stilp, Christian; Donaldson, Gail; Oh, Soohee; Kong, Ying-Yee

    2016-11-01

    In simulations of electrical-acoustic stimulation (EAS), vocoded speech intelligibility is aided by preservation of low-frequency acoustic cues. However, the speech signal is often interrupted in everyday listening conditions, and effects of interruption on hybrid speech intelligibility are poorly understood. Additionally, listeners rely on information-bearing acoustic changes to understand full-spectrum speech (as measured by cochlea-scaled entropy [CSE]) and vocoded speech (CSECI), but how listeners utilize these informational changes to understand EAS speech is unclear. Here, normal-hearing participants heard noise-vocoded sentences with three to six spectral channels in two conditions: vocoder-only (80-8000 Hz) and simulated hybrid EAS (vocoded above 500 Hz; original acoustic signal below 500 Hz). In each sentence, four 80-ms intervals containing high-CSECI or low-CSECI acoustic changes were replaced with speech-shaped noise. As expected, performance improved with the preservation of low-frequency fine-structure cues (EAS). This improvement decreased for continuous EAS sentences as more spectral channels were added, but increased as more channels were added to noise-interrupted EAS sentences. Performance was impaired more when high-CSECI intervals were replaced by noise than when low-CSECI intervals were replaced, but this pattern did not differ across listening modes. Utilizing information-bearing acoustic changes to understand speech is predicted to generalize to cochlear implant users who receive EAS inputs.

  11. Talker Differences in Clear and Conversational Speech: Acoustic Characteristics of Vowels

    ERIC Educational Resources Information Center

    Ferguson, Sarah Hargus; Kewley-Port, Diane

    2007-01-01

    Purpose: To determine the specific acoustic changes that underlie improved vowel intelligibility in clear speech. Method: Seven acoustic metrics were measured for conversational and clear vowels produced by 12 talkers--6 who previously were found (S. H. Ferguson, 2004) to produce a large clear speech vowel intelligibility effect for listeners with…

  12. Denoising of human speech using combined acoustic and em sensor signal processing

    SciTech Connect

    Ng, L C; Burnett, G C; Holzrichter, J F; Gable, T J

    1999-11-29

    Low Power EM radar-like sensors have made it possible to measure properties of the human speech production system in real-time, without acoustic interference. This greatly enhances the quality and quantify of information for many speech related applications. See Holzrichter, Burnett, Ng, and Lea, J. Acoustic. Soc. Am. 103 (1) 622 (1998). By using combined Glottal-EM- Sensor- and Acoustic-signals, segments of voiced, unvoiced, and no-speech can be reliably defined. Real-time Denoising filters can be constructed to remove noise from the user's corresponding speech signal.

  13. Acoustic Analysis of the Voiced-Voiceless Distinction in Dutch Tracheoesophageal Speech

    ERIC Educational Resources Information Center

    Jongmans, Petra; Wempe, Ton G.; van Tinteren, Harm; Hilgers, Frans J. M.; Pols, Louis C. W.; van As-Brooks, Corina J.

    2010-01-01

    Purpose: Confusions between voiced and voiceless plosives and voiced and voiceless fricatives are common in Dutch tracheoesophageal (TE) speech. This study investigates (a) which acoustic measures are found to convey a correct voicing contrast in TE speech and (b) whether different measures are found in TE speech than in normal laryngeal (NL)…

  14. Perceptual and Acoustic Reliability Estimates for the Speech Disorders Classification System (SDCS)

    ERIC Educational Resources Information Center

    Shriberg, Lawrence D.; Fourakis, Marios; Hall, Sheryl D.; Karlsson, Heather B.; Lohmeier, Heather L.; McSweeny, Jane L.; Potter, Nancy L.; Scheer-Cohen, Alison R.; Strand, Edythe A.; Tilkens, Christie M.; Wilson, David L.

    2010-01-01

    A companion paper describes three extensions to a classification system for paediatric speech sound disorders termed the Speech Disorders Classification System (SDCS). The SDCS uses perceptual and acoustic data reduction methods to obtain information on a speaker's speech, prosody, and voice. The present paper provides reliability estimates for…

  15. Acoustic Predictors of Intelligibility for Segmentally Interrupted Speech: Temporal Envelope, Voicing, and Duration

    ERIC Educational Resources Information Center

    Fogerty, Daniel

    2013-01-01

    Purpose: Temporal interruption limits the perception of speech to isolated temporal glimpses. An analysis was conducted to determine the acoustic parameter that best predicts speech recognition from temporal fragments that preserve different types of speech information--namely, consonants and vowels. Method: Young listeners with normal hearing…

  16. Teachers and Teaching: Speech Production Accommodations Due to Changes in the Acoustic Environment

    PubMed Central

    Hunter, Eric J.; Bottalico, Pasquale; Graetzer, Simone; Leishman, Timothy W.; Berardi, Mark L.; Eyring, Nathan G.; Jensen, Zachary R.; Rolins, Michael K.; Whiting, Jennifer K.

    2016-01-01

    School teachers have an elevated risk of voice problems due to the vocal demands in the workplace. This manuscript presents the results of three studies investigating teachers’ voice use at work. In the first study, 57 teachers were observed for 2 weeks (waking hours) to compare how they used their voice in the school environment and in non-school environments. In a second study, 45 participants performed a short vocal task in two different rooms: a variable acoustic room and an anechoic chamber. Subjects were taken back and forth between the two rooms. Each time they entered the variable acoustics room, the reverberation time and/or the background noise condition had been modified. In this latter study, subjects responded to questions about their vocal comfort and their perception of changes in the acoustic environment. In a third study, 20 untrained vocalists performed a simple vocal task in the following conditions: with and without background babble and with and without transparent plexiglass shields to increase the first reflection. Relationships were examined between [1] the results for the room acoustic parameters; [2] the subjects’ perception of the room; and [3] the recorded speech acoustic. Several differences between male and female subjects were found; some of those differences held for each room condition (at school vs. not at school, reverberation level, noise level, and early reflection). PMID:26949426

  17. Acoustic scattering by a modified Werner method

    PubMed

    Ravel; Trad

    2000-02-01

    A modified integral Werner method is used to calculate pressure scattered by an axisymmetric body immersed in a perfect and compressible fluid subject to a harmonic acoustic field. This integral representation is built as the sum of a potential of a simple layer and a potential of volume. It is equivalent to the exterior Helmholtz problem with Neumann boundary condition for all real wave numbers of the incident acoustic field. For elastic structure scattering problems, the modified Werner method is coupled with an elastodynamic integral formulation in order to account for the elastic contribution of the displacement field at the fluid/structure interface. The resulting system of integral equations is solved by the collocation method with a quadratic interpolation. The introduction of a weighting factor in the modified Werner method decreases the number of volume elements necessary for a good convergence of results. This approach becomes very competitive when it is compared with other integral methods that are valid for all wave numbers. A numerical comparison with an experiment on a tungsten carbide end-capped cylinder allows a glimpse of the interesting possibilities for using the coupling of the modified Werner method and the integral elastodynamic equation used in this research.

  18. Frequency overlap between electric and acoustic stimulation and speech-perception benefit in patients with combined electric and acoustic stimulation

    PubMed Central

    Zhang, Ting; Spahr, Anthony J.; Dorman, Michael F.

    2010-01-01

    Objectives Our aim was to assess, for patients with a cochlear implant in one ear and low-frequency acoustic hearing in the contralateral ear, whether reducing the overlap in frequencies conveyed in the acoustic signal and those analyzed by the cochlear implant speech processor would improve speech recognition. Design The recognition of monosyllabic words in quiet and sentences in noise was evaluated in three listening configurations: electric stimulation alone, acoustic stimulation alone, and combined electric and acoustic stimulation. The acoustic stimuli were either unfiltered or low-pass (LP) filtered at 250 Hz, 500 Hz, or 750 Hz. The electric stimuli were either unfiltered or high-pass (HP) filtered at 250 Hz, 500 Hz or 750 Hz. In the combined condition the unfiltered acoustic signal was paired with the unfiltered electric signal, the 250 LP acoustic signal was paired with the 250 Hz HP electric signal, the 500 Hz LP acoustic signal was paired with the 500 Hz HP electric signal and the 750 Hz LP acoustic signal was paired with the 750 Hz HP electric signal. Results For both acoustic and electric signals performance increased as the bandwith increased. The highest level of performance in the combined condition was observed in the unfiltered acoustic plus unfiltered electric condition. Conclusions Reducing the overlap in frequency representation between acoustic and electric stimulation does not increase speech understanding scores for patients who have residual hearing in the ear contralateral to the implant. We find that acoustic information below 250 Hz significantly improves performance for patients who combine electric and acoustic stimulation and accounts for the majority of the speech-perception benefit when acoustic stimulation is combined with electric stimulation. PMID:19915474

  19. Speech Compensation for Time-Scale-Modified Auditory Feedback

    ERIC Educational Resources Information Center

    Ogane, Rintaro; Honda, Masaaki

    2014-01-01

    Purpose: The purpose of this study was to examine speech compensation in response to time-scale-modified auditory feedback during the transition of the semivowel for a target utterance of /ija/. Method: Each utterance session consisted of 10 control trials in the normal feedback condition followed by 20 perturbed trials in the modified auditory…

  20. Location and acoustic scale cues in concurrent speech recognition1

    PubMed Central

    Ives, D. Timothy; Vestergaard, Martin D.; Kistler, Doris J.; Patterson, Roy D.

    2010-01-01

    Location and acoustic scale cues have both been shown to have an effect on the recognition of speech in multi-speaker environments. This study examines the interaction of these variables. Subjects were presented with concurrent triplets of syllables from a target voice and a distracting voice, and asked to recognize a specific target syllable. The task was made more or less difficult by changing (a) the location of the distracting speaker, (b) the scale difference between the two speakers, and∕or (c) the relative level of the two speakers. Scale differences were produced by changing the vocal tract length and glottal pulse rate during syllable synthesis: 32 acoustic scale differences were used. Location cues were produced by convolving head-related transfer functions with the stimulus. The angle between the target speaker and the distracter was 0°, 4°, 8°, 16°, or 32° on the 0° horizontal plane. The relative level of the target to the distracter was 0 or −6 dB. The results show that location and scale difference interact, and the interaction is greatest when one of these cues is small. Increasing either the acoustic scale or the angle between target and distracter speakers quickly elevates performance to ceiling levels. PMID:20550271

  1. Mandarin Speech Perception in Combined Electric and Acoustic Stimulation

    PubMed Central

    Li, Yongxin; Zhang, Guoping; Galvin, John J.; Fu, Qian-Jie

    2014-01-01

    For deaf individuals with residual low-frequency acoustic hearing, combined use of a cochlear implant (CI) and hearing aid (HA) typically provides better speech understanding than with either device alone. Because of coarse spectral resolution, CIs do not provide fundamental frequency (F0) information that contributes to understanding of tonal languages such as Mandarin Chinese. The HA can provide good representation of F0 and, depending on the range of aided acoustic hearing, first and second formant (F1 and F2) information. In this study, Mandarin tone, vowel, and consonant recognition in quiet and noise was measured in 12 adult Mandarin-speaking bimodal listeners with the CI-only and with the CI+HA. Tone recognition was significantly better with the CI+HA in noise, but not in quiet. Vowel recognition was significantly better with the CI+HA in quiet, but not in noise. There was no significant difference in consonant recognition between the CI-only and the CI+HA in quiet or in noise. There was a wide range in bimodal benefit, with improvements often greater than 20 percentage points in some tests and conditions. The bimodal benefit was compared to CI subjects’ HA-aided pure-tone average (PTA) thresholds between 250 and 2000 Hz; subjects were divided into two groups: “better” PTA (<50 dB HL) or “poorer” PTA (>50 dB HL). The bimodal benefit differed significantly between groups only for consonant recognition. The bimodal benefit for tone recognition in quiet was significantly correlated with CI experience, suggesting that bimodal CI users learn to better combine low-frequency spectro-temporal information from acoustic hearing with temporal envelope information from electric hearing. Given the small number of subjects in this study (n = 12), further research with Chinese bimodal listeners may provide more information regarding the contribution of acoustic and electric hearing to tonal language perception. PMID:25386962

  2. Acoustical Characteristics of Mastication Sounds: Application of Speech Analysis Techniques

    NASA Astrophysics Data System (ADS)

    Brochetti, Denise

    Food scientists have used acoustical methods to study characteristics of mastication sounds in relation to food texture. However, a model for analysis of the sounds has not been identified, and reliability of the methods has not been reported. Therefore, speech analysis techniques were applied to mastication sounds, and variation in measures of the sounds was examined. To meet these objectives, two experiments were conducted. In the first experiment, a digital sound spectrograph generated waveforms and wideband spectrograms of sounds by 3 adult subjects (1 male, 2 females) for initial chews of food samples differing in hardness and fracturability. Acoustical characteristics were described and compared. For all sounds, formants appeared in the spectrograms, and energy occurred across a 0 to 8000-Hz range of frequencies. Bursts characterized waveforms for peanut, almond, raw carrot, ginger snap, and hard candy. Duration and amplitude of the sounds varied with the subjects. In the second experiment, the spectrograph was used to measure the duration, amplitude, and formants of sounds for the initial 2 chews of cylindrical food samples (raw carrot, teething toast) differing in diameter (1.27, 1.90, 2.54 cm). Six adult subjects (3 males, 3 females) having normal occlusions and temporomandibular joints chewed the samples between the molar teeth and with the mouth open. Ten repetitions per subject were examined for each food sample. Analysis of estimates of variation indicated an inconsistent intrasubject variation in the acoustical measures. Food type and sample diameter also affected the estimates, indicating the variable nature of mastication. Generally, intrasubject variation was greater than intersubject variation. Analysis of ranks of the data indicated that the effect of sample diameter on the acoustical measures was inconsistent and depended on the subject and type of food. If inferences are to be made concerning food texture from acoustical measures of mastication

  3. Automatic speech recognition using articulatory features from subject-independent acoustic-to-articulatory inversion.

    PubMed

    Ghosh, Prasanta Kumar; Narayanan, Shrikanth

    2011-10-01

    An automatic speech recognition approach is presented which uses articulatory features estimated by a subject-independent acoustic-to-articulatory inversion. The inversion allows estimation of articulatory features from any talker's speech acoustics using only an exemplary subject's articulatory-to-acoustic map. Results are reported on a broad class phonetic classification experiment on speech from English talkers using data from three distinct English talkers as exemplars for inversion. Results indicate that the inclusion of the articulatory information improves classification accuracy but the improvement is more significant when the speaking style of the exemplar and the talker are matched compared to when they are mismatched.

  4. Speech enhancement using the modified phase-opponency model.

    PubMed

    Deshmukh, Om D; Espy-Wilson, Carol Y; Carney, Laurel H

    2007-06-01

    In this paper we present a model called the Modified Phase-Opponency (MPO) model for single-channel speech enhancement when the speech is corrupted by additive noise. The MPO model is based on the auditory PO model, proposed for detection of tones in noise. The PO model includes a physiologically realistic mechanism for processing the information in neural discharge times and exploits the frequency-dependent phase properties of the tuned filters in the auditory periphery by using a cross-auditory-nerve-fiber coincidence detection for extracting temporal cues. The MPO model alters the components of the PO model such that the basic functionality of the PO model is maintained but the properties of the model can be analyzed and modified independently. The MPO-based speech enhancement scheme does not need to estimate the noise characteristics nor does it assume that the noise satisfies any statistical model. The MPO technique leads to the lowest value of the LPC-based objective measures and the highest value of the perceptual evaluation of speech quality measure compared to other methods when the speech signals are corrupted by fluctuating noise. Combining the MPO speech enhancement technique with our aperiodicity, periodicity, and pitch detector further improves its performance.

  5. Acoustic properties of naturally produced clear speech at normal speaking rates

    NASA Astrophysics Data System (ADS)

    Krause, Jean C.; Braida, Louis D.

    2004-01-01

    Sentences spoken ``clearly'' are significantly more intelligible than those spoken ``conversationally'' for hearing-impaired listeners in a variety of backgrounds [Picheny et al., J. Speech Hear. Res. 28, 96-103 (1985); Uchanski et al., ibid. 39, 494-509 (1996); Payton et al., J. Acoust. Soc. Am. 95, 1581-1592 (1994)]. While producing clear speech, however, talkers often reduce their speaking rate significantly [Picheny et al., J. Speech Hear. Res. 29, 434-446 (1986); Uchanski et al., ibid. 39, 494-509 (1996)]. Yet speaking slowly is not solely responsible for the intelligibility benefit of clear speech (over conversational speech), since a recent study [Krause and Braida, J. Acoust. Soc. Am. 112, 2165-2172 (2002)] showed that talkers can produce clear speech at normal rates with training. This finding suggests that clear speech has inherent acoustic properties, independent of rate, that contribute to improved intelligibility. Identifying these acoustic properties could lead to improved signal processing schemes for hearing aids. To gain insight into these acoustical properties, conversational and clear speech produced at normal speaking rates were analyzed at three levels of detail (global, phonological, and phonetic). Although results suggest that talkers may have employed different strategies to achieve clear speech at normal rates, two global-level properties were identified that appear likely to be linked to the improvements in intelligibility provided by clear/normal speech: increased energy in the 1000-3000-Hz range of long-term spectra and increased modulation depth of low frequency modulations of the intensity envelope. Other phonological and phonetic differences associated with clear/normal speech include changes in (1) frequency of stop burst releases, (2) VOT of word-initial voiceless stop consonants, and (3) short-term vowel spectra.

  6. Effect of Reflective Practice on Student Recall of Acoustics for Speech Science

    ERIC Educational Resources Information Center

    Walden, Patrick R.; Bell-Berti, Fredericka

    2013-01-01

    Researchers have developed models of learning through experience; however, these models are rarely named as a conceptual frame for educational research in the sciences. This study examined the effect of reflective learning responses on student recall of speech acoustics concepts. Two groups of undergraduate students enrolled in a speech science…

  7. Formant Centralization Ratio: A Proposal for a New Acoustic Measure of Dysarthric Speech

    ERIC Educational Resources Information Center

    Sapir, Shimon; Ramig, Lorraine O.; Spielman, Jennifer L.; Fox, Cynthia

    2010-01-01

    Purpose: The vowel space area (VSA) has been used as an acoustic metric of dysarthric speech, but with varying degrees of success. In this study, the authors aimed to test an alternative metric to the VSA--the "formant centralization ratio" (FCR), which is hypothesized to more effectively differentiate dysarthric from healthy speech and register…

  8. The Use of Artificial Neural Networks to Estimate Speech Intelligibility from Acoustic Variables: A Preliminary Analysis.

    ERIC Educational Resources Information Center

    Metz, Dale Evan; And Others

    1992-01-01

    A preliminary scheme for estimating the speech intelligibility of hearing-impaired speakers from acoustic parameters, using a computerized artificial neural network to process mathematically the acoustic input variables, is outlined. Tests with 60 hearing-impaired speakers found the scheme to be highly accurate in identifying speakers separated by…

  9. Speech intelligibility and speech quality of modified loudspeaker announcements examined in a simulated aircraft cabin.

    PubMed

    Pennig, Sibylle; Quehl, Julia; Wittkowski, Martin

    2014-01-01

    Acoustic modifications of loudspeaker announcements were investigated in a simulated aircraft cabin to improve passengers' speech intelligibility and quality of communication in this specific setting. Four experiments with 278 participants in total were conducted in an acoustic laboratory using a standardised speech test and subjective rating scales. In experiments 1 and 2 the sound pressure level (SPL) of the announcements was varied (ranging from 70 to 85 dB(A)). Experiments 3 and 4 focused on frequency modification (octave bands) of the announcements. All studies used a background noise with the same SPL (74 dB(A)), but recorded at different seat positions in the aircraft cabin (front, rear). The results quantify speech intelligibility improvements with increasing signal-to-noise ratio and amplification of particular octave bands, especially the 2 kHz and the 4 kHz band. Thus, loudspeaker power in an aircraft cabin can be reduced by using appropriate filter settings in the loudspeaker system.

  10. A maximum likelihood approach to estimating articulator positions from speech acoustics

    SciTech Connect

    Hogden, J.

    1996-09-23

    This proposal presents an algorithm called maximum likelihood continuity mapping (MALCOM) which recovers the positions of the tongue, jaw, lips, and other speech articulators from measurements of the sound-pressure waveform of speech. MALCOM differs from other techniques for recovering articulator positions from speech in three critical respects: it does not require training on measured or modeled articulator positions, it does not rely on any particular model of sound propagation through the vocal tract, and it recovers a mapping from acoustics to articulator positions that is linearly, not topographically, related to the actual mapping from acoustics to articulation. The approach categorizes short-time windows of speech into a finite number of sound types, and assumes the probability of using any articulator position to produce a given sound type can be described by a parameterized probability density function. MALCOM then uses maximum likelihood estimation techniques to: (1) find the most likely smooth articulator path given a speech sample and a set of distribution functions (one distribution function for each sound type), and (2) change the parameters of the distribution functions to better account for the data. Using this technique improves the accuracy of articulator position estimates compared to continuity mapping -- the only other technique that learns the relationship between acoustics and articulation solely from acoustics. The technique has potential application to computer speech recognition, speech synthesis and coding, teaching the hearing impaired to speak, improving foreign language instruction, and teaching dyslexics to read. 34 refs., 7 figs.

  11. Language-specific developmental differences in speech production: A cross-language acoustic study

    PubMed Central

    Li, Fangfang

    2013-01-01

    Speech productions of 40 English- and 40 Japanese-speaking children (aged 2 to 5) were examined and compared with the speech produced by 20 adult speakers (10 speakers per language). Participants were recorded while repeating words that began with “s” and “sh” sounds. Clear language-specific patterns in adults’ speech were found, with English speakers differentiating “s” and “sh” in one acoustic dimension (i.e., spectral mean) and Japanese speakers differentiating the two categories in three acoustic dimensions (i.e., spectral mean, standard deviation, and onset F2 frequency). For both language groups, children’s speech exhibited a gradual change from an early undifferentiated form to later differentiated categories. The separation processes, however, only occur in those acoustic dimensions used by adults in the corresponding languages. PMID:22540834

  12. Intelligibility and acoustic characteristics of clear and conversational speech in telugu (a South Indian dravidian language).

    PubMed

    Durisala, Naresh; Prakash, S G R; Nambi, Arivudai; Batra, Ridhima

    2011-04-01

    The overall goal of this study is to examine the intelligibility differences of clear and conversational speech and also to objectively analyze the acoustic properties contributing to these differences. Seventeen post-lingual stable sensory-neural hearing impaired listeners with an age range of 17-40 years were recruited for the study. Forty Telugu sentences spoken by a female Telugu speaker in both clear and conversational speech styles were used as stimuli for the subjects. Results revealed that mean scores of clear speech were higher (mean = 84.5) when compared to conversational speech (mean = 61.4) with an advantage of 23.1% points. Acoustic properties revealed greater fundamental frequency (f0) and intensity, longer duration, higher consonant-vowel ratio (CVR) and greater temporal energy in clear speech.

  13. A Hybrid Acoustic and Pronunciation Model Adaptation Approach for Non-native Speech Recognition

    NASA Astrophysics Data System (ADS)

    Oh, Yoo Rhee; Kim, Hong Kook

    In this paper, we propose a hybrid model adaptation approach in which pronunciation and acoustic models are adapted by incorporating the pronunciation and acoustic variabilities of non-native speech in order to improve the performance of non-native automatic speech recognition (ASR). Specifically, the proposed hybrid model adaptation can be performed at either the state-tying or triphone-modeling level, depending at which acoustic model adaptation is performed. In both methods, we first analyze the pronunciation variant rules of non-native speakers and then classify each rule as either a pronunciation variant or an acoustic variant. The state-tying level hybrid method then adapts pronunciation models and acoustic models by accommodating the pronunciation variants in the pronunciation dictionary and by clustering the states of triphone acoustic models using the acoustic variants, respectively. On the other hand, the triphone-modeling level hybrid method initially adapts pronunciation models in the same way as in the state-tying level hybrid method; however, for the acoustic model adaptation, the triphone acoustic models are then re-estimated based on the adapted pronunciation models and the states of the re-estimated triphone acoustic models are clustered using the acoustic variants. From the Korean-spoken English speech recognition experiments, it is shown that ASR systems employing the state-tying and triphone-modeling level adaptation methods can relatively reduce the average word error rates (WERs) by 17.1% and 22.1% for non-native speech, respectively, when compared to a baseline ASR system.

  14. The acoustics for speech of eight auditoriums in the city of Sao Paulo

    NASA Astrophysics Data System (ADS)

    Bistafa, Sylvio R.

    2002-11-01

    Eight auditoriums with a proscenium type of stage, which usually operate as dramatic theaters in the city of Sao Paulo, were acoustically surveyed in terms of their adequacy to unassisted speech. Reverberation times, early decay times, and speech levels were measured in different positions, together with objective measures of speech intelligibility. The measurements revealed reverberation time values rather uniform throughout the rooms, whereas significant variations were found in the values of the other acoustical measures with position. The early decay time was found to be better correlated with the objective measures of speech intelligibility than the reverberation time. The results from the objective measurements of speech intelligibility revealed that the speech transmission index STI, and its simplified version RaSTI, are strongly correlated with the early-to-late sound ratio C50 (1 kHz). However, it was found that the criterion value of acceptability of the latter is more easily met than the former. The results from these measurements enable to understand how the characteristics of the architectural design determine the acoustical quality for speech. Measurements of ST1-Gade were made as an attempt to validate it as an objective measure of ''support'' for the actor. The preliminary diagnosing results with ray tracing simulations will also be presented.

  15. A magnetic resonance imaging study on the articulatory and acoustic speech parameters of Malay vowels.

    PubMed

    Zourmand, Alireza; Mirhassani, Seyed Mostafa; Ting, Hua-Nong; Bux, Shaik Ismail; Ng, Kwan Hoong; Bilgen, Mehmet; Jalaludin, Mohd Amin

    2014-07-25

    The phonetic properties of six Malay vowels are investigated using magnetic resonance imaging (MRI) to visualize the vocal tract in order to obtain dynamic articulatory parameters during speech production. To resolve image blurring due to the tongue movement during the scanning process, a method based on active contour extraction is used to track tongue contours. The proposed method efficiently tracks tongue contours despite the partial blurring of MRI images. Consequently, the articulatory parameters that are effectively measured as tongue movement is observed, and the specific shape of the tongue and its position for all six uttered Malay vowels are determined.Speech rehabilitation procedure demands some kind of visual perceivable prototype of speech articulation. To investigate the validity of the measured articulatory parameters based on acoustic theory of speech production, an acoustic analysis based on the uttered vowels by subjects has been performed. As the acoustic speech and articulatory parameters of uttered speech were examined, a correlation between formant frequencies and articulatory parameters was observed. The experiments reported a positive correlation between the constriction location of the tongue body and the first formant frequency, as well as a negative correlation between the constriction location of the tongue tip and the second formant frequency. The results demonstrate that the proposed method is an effective tool for the dynamic study of speech production.

  16. Speech and melody recognition in binaurally combined acoustic and electric hearing

    NASA Astrophysics Data System (ADS)

    Kong, Ying-Yee; Stickney, Ginger S.; Zeng, Fan-Gang

    2005-03-01

    Speech recognition in noise and music perception is especially challenging for current cochlear implant users. The present study utilizes the residual acoustic hearing in the nonimplanted ear in five cochlear implant users to elucidate the role of temporal fine structure at low frequencies in auditory perception and to test the hypothesis that combined acoustic and electric hearing produces better performance than either mode alone. The first experiment measured speech recognition in the presence of competing noise. It was found that, although the residual low-frequency (<1000 Hz) acoustic hearing produced essentially no recognition for speech recognition in noise, it significantly enhanced performance when combined with the electric hearing. The second experiment measured melody recognition in the same group of subjects and found that, contrary to the speech recognition result, the low-frequency acoustic hearing produced significantly better performance than the electric hearing. It is hypothesized that listeners with combined acoustic and electric hearing might use the correlation between the salient pitch in low-frequency acoustic hearing and the weak pitch in the envelope to enhance segregation between signal and noise. The present study suggests the importance and urgency of accurately encoding the fine-structure cue in cochlear implants. .

  17. Changes in speech production in a child with a cochlear implant: acoustic and kinematic evidence.

    PubMed

    Goffman, Lisa; Ertmer, David J; Erdle, Christa

    2002-10-01

    A method is presented for examining change in motor patterns used to produce linguistic contrasts. In this case study, the method is applied to a child receiving new auditory input following cochlear implantation. This child experienced hearing loss at age 3 years and received a multichannel cochlear implant at age 7 years. Data collection points occurred both pre- and postimplant and included acoustic and kinematic analyses. Overall, this child's speech output was transcribed as accurate across the pre- and postimplant periods. Postimplant, with the onset of new auditory experience, acoustic durations showed a predictable maturational change, usually decreasing in duration. Conversely, the spatiotemporal stability of speech movements initially became more variable postimplantation. The auditory perturbations experienced by this child during development led to changes in the physiological underpinnings of speech production, even when speech output was perceived as accurate.

  18. Dynamic Encoding of Acoustic Features in Neural Responses to Continuous Speech.

    PubMed

    Khalighinejad, Bahar; Cruzatto da Silva, Guilherme; Mesgarani, Nima

    2017-02-22

    Humans are unique in their ability to communicate using spoken language. However, it remains unclear how the speech signal is transformed and represented in the brain at different stages of the auditory pathway. In this study, we characterized electroencephalography responses to continuous speech by obtaining the time-locked responses to phoneme instances (phoneme-related potential). We showed that responses to different phoneme categories are organized by phonetic features. We found that each instance of a phoneme in continuous speech produces multiple distinguishable neural responses occurring as early as 50 ms and as late as 400 ms after the phoneme onset. Comparing the patterns of phoneme similarity in the neural responses and the acoustic signals confirms a repetitive appearance of acoustic distinctions of phonemes in the neural data. Analysis of the phonetic and speaker information in neural activations revealed that different time intervals jointly encode the acoustic similarity of both phonetic and speaker categories. These findings provide evidence for a dynamic neural transformation of low-level speech features as they propagate along the auditory pathway, and form an empirical framework to study the representational changes in learning, attention, and speech disorders.SIGNIFICANCE STATEMENT We characterized the properties of evoked neural responses to phoneme instances in continuous speech. We show that each instance of a phoneme in continuous speech produces several observable neural responses at different times occurring as early as 50 ms and as late as 400 ms after the phoneme onset. Each temporal event explicitly encodes the acoustic similarity of phonemes, and linguistic and nonlinguistic information are best represented at different time intervals. Finally, we show a joint encoding of phonetic and speaker information, where the neural representation of speakers is dependent on phoneme category. These findings provide compelling new evidence for

  19. Dynamic Encoding of Acoustic Features in Neural Responses to Continuous Speech

    PubMed Central

    Khalighinejad, Bahar; Cruzatto da Silva, Guilherme

    2017-01-01

    Humans are unique in their ability to communicate using spoken language. However, it remains unclear how the speech signal is transformed and represented in the brain at different stages of the auditory pathway. In this study, we characterized electroencephalography responses to continuous speech by obtaining the time-locked responses to phoneme instances (phoneme-related potential). We showed that responses to different phoneme categories are organized by phonetic features. We found that each instance of a phoneme in continuous speech produces multiple distinguishable neural responses occurring as early as 50 ms and as late as 400 ms after the phoneme onset. Comparing the patterns of phoneme similarity in the neural responses and the acoustic signals confirms a repetitive appearance of acoustic distinctions of phonemes in the neural data. Analysis of the phonetic and speaker information in neural activations revealed that different time intervals jointly encode the acoustic similarity of both phonetic and speaker categories. These findings provide evidence for a dynamic neural transformation of low-level speech features as they propagate along the auditory pathway, and form an empirical framework to study the representational changes in learning, attention, and speech disorders. SIGNIFICANCE STATEMENT We characterized the properties of evoked neural responses to phoneme instances in continuous speech. We show that each instance of a phoneme in continuous speech produces several observable neural responses at different times occurring as early as 50 ms and as late as 400 ms after the phoneme onset. Each temporal event explicitly encodes the acoustic similarity of phonemes, and linguistic and nonlinguistic information are best represented at different time intervals. Finally, we show a joint encoding of phonetic and speaker information, where the neural representation of speakers is dependent on phoneme category. These findings provide compelling new evidence for

  20. Mathematical model of acoustic speech production with mobile walls of the vocal tract

    NASA Astrophysics Data System (ADS)

    Lyubimov, N. A.; Zakharov, E. V.

    2016-03-01

    A mathematical speech production model is considered that describes acoustic oscillation propagation in a vocal tract with mobile walls. The wave field function satisfies the Helmholtz equation with boundary conditions of the third kind (impedance type). The impedance mode corresponds to a threeparameter pendulum oscillation model. The experimental research demonstrates the nonlinear character of how the mobility of the vocal tract walls influence the spectral envelope of a speech signal.

  1. Improving Robustness of Deep Neural Network Acoustic Models via Speech Separation and Joint Adaptive Training

    PubMed Central

    Narayanan, Arun; Wang, DeLiang

    2015-01-01

    Although deep neural network (DNN) acoustic models are known to be inherently noise robust, especially with matched training and testing data, the use of speech separation as a frontend and for deriving alternative feature representations has been shown to improve performance in challenging environments. We first present a supervised speech separation system that significantly improves automatic speech recognition (ASR) performance in realistic noise conditions. The system performs separation via ratio time-frequency masking; the ideal ratio mask (IRM) is estimated using DNNs. We then propose a framework that unifies separation and acoustic modeling via joint adaptive training. Since the modules for acoustic modeling and speech separation are implemented using DNNs, unification is done by introducing additional hidden layers with fixed weights and appropriate network architecture. On the CHiME-2 medium-large vocabulary ASR task, and with log mel spectral features as input to the acoustic model, an independently trained ratio masking frontend improves word error rates by 10.9% (relative) compared to the noisy baseline. In comparison, the jointly trained system improves performance by 14.4%. We also experiment with alternative feature representations to augment the standard log mel features, like the noise and speech estimates obtained from the separation module, and the standard feature set used for IRM estimation. Our best system obtains a word error rate of 15.4% (absolute), an improvement of 4.6 percentage points over the next best result on this corpus. PMID:26973851

  2. Speech privacy and annoyance considerations in the acoustic environment of passenger cars of high-speed trains.

    PubMed

    Jeon, Jin Yong; Hong, Joo Young; Jang, Hyung Suk; Kim, Jae Hyeon

    2015-12-01

    It is necessary to consider not only annoyance of interior noises but also speech privacy to achieve acoustic comfort in a passenger car of a high-speed train because speech from other passengers can be annoying. This study aimed to explore an optimal acoustic environment to satisfy speech privacy and reduce annoyance in a passenger car. Two experiments were conducted using speech sources and compartment noise of a high speed train with varying speech-to-noise ratios (SNRA) and background noise levels (BNL). Speech intelligibility was tested in experiment I, and in experiment II, perceived speech privacy, annoyance, and acoustic comfort of combined sounds with speech and background noise were assessed. The results show that speech privacy and annoyance were significantly influenced by the SNRA. In particular, the acoustic comfort was evaluated as acceptable when the SNRA was less than -6 dB for both speech privacy and noise annoyance. In addition, annoyance increased significantly as the BNL exceeded 63 dBA, whereas the effect of the background-noise level on the speech privacy was not significant. These findings suggest that an optimal level of interior noise in a passenger car might exist between 59 and 63 dBA, taking normal speech levels into account.

  3. Prosodic influences on speech production in children with specific language impairment and speech deficits: kinematic, acoustic, and transcription evidence.

    PubMed

    Goffman, L

    1999-12-01

    It is often hypothesized that young children's difficulties with producing weak-strong (iambic) prosodic forms arise from perceptual or linguistically based production factors. A third possible contributor to errors in the iambic form may be biological constraints, or biases, of the motor system. In the present study, 7 children with specific language impairment (SLI) and speech deficits were matched to same age peers. Multiple levels of analysis, including kinematic (modulation and stability of movement), acoustic, and transcription, were applied to children's productions of iambic (weak-strong) and trochaic (strong-weak) prosodic forms. Findings suggest that a motor bias toward producing unmodulated rhythmic articulatory movements, similar to that observed in canonical babbling, contribute to children's acquisition of metrical forms. Children with SLI and speech deficits show less mature segmental and speech motor systems, as well as decreased modulation of movement in later developing iambic forms. Further, components of prosodic and segmental acquisition develop independently and at different rates.

  4. Precategorical Acoustic Storage and the Perception of Speech

    ERIC Educational Resources Information Center

    Frankish, Clive

    2008-01-01

    Theoretical accounts of both speech perception and of short term memory must consider the extent to which perceptual representations of speech sounds might survive in relatively unprocessed form. This paper describes a novel version of the serial recall task that can be used to explore this area of shared interest. In immediate recall of digit…

  5. Vowel Acoustics in Adults with Apraxia of Speech

    ERIC Educational Resources Information Center

    Jacks, Adam; Mathes, Katey A.; Marquardt, Thomas P.

    2010-01-01

    Purpose: To investigate the hypothesis that vowel production is more variable in adults with acquired apraxia of speech (AOS) relative to healthy individuals with unimpaired speech. Vowel formant frequency measures were selected as the specific target of focus. Method: Seven adults with AOS and aphasia produced 15 repetitions of 6 American English…

  6. Perceiving speech in context: Compensation for contextual variability during acoustic cue encoding and categorization

    NASA Astrophysics Data System (ADS)

    Toscano, Joseph Christopher

    Several fundamental questions about speech perception concern how listeners understand spoken language despite considerable variability in speech sounds across different contexts (the problem of lack of invariance in speech). This contextual variability is caused by several factors, including differences between individual talkers' voices, variation in speaking rate, and effects of coarticulatory context. A number of models have been proposed to describe how the speech system handles differences across contexts. Critically, these models make different predictions about (1) whether contextual variability is handled at the level of acoustic cue encoding or categorization, (2) whether it is driven by feedback from category-level processes or interactions between cues, and (3) whether listeners discard fine-grained acoustic information to compensate for contextual variability. Separating the effects of cue- and category-level processing has been difficult because behavioral measures tap processes that occur well after initial cue encoding and are influenced by task demands and linguistic information. Recently, we have used the event-related brain potential (ERP) technique to examine cue encoding and online categorization. Specifically, we have looked at differences in the auditory N1 as a measure of acoustic cue encoding and the P3 as a measure of categorization. This allows us to examine multiple levels of processing during speech perception and can provide a useful tool for studying effects of contextual variability. Here, I apply this approach to determine the point in processing at which context has an effect on speech perception and to examine whether acoustic cues are encoded continuously. Several types of contextual variability (talker gender, speaking rate, and coarticulation), as well as several acoustic cues (voice onset time, formant frequencies, and bandwidths), are examined in a series of experiments. The results suggest that (1) at early stages of speech

  7. Aging Affects Neural Synchronization to Speech-Related Acoustic Modulations

    PubMed Central

    Goossens, Tine; Vercammen, Charlotte; Wouters, Jan; van Wieringen, Astrid

    2016-01-01

    As people age, speech perception problems become highly prevalent, especially in noisy situations. In addition to peripheral hearing and cognition, temporal processing plays a key role in speech perception. Temporal processing of speech features is mediated by synchronized activity of neural oscillations in the central auditory system. Previous studies indicate that both the degree and hemispheric lateralization of synchronized neural activity relate to speech perception performance. Based on these results, we hypothesize that impaired speech perception in older persons may, in part, originate from deviances in neural synchronization. In this study, auditory steady-state responses that reflect synchronized activity of theta, beta, low and high gamma oscillations (i.e., 4, 20, 40, and 80 Hz ASSR, respectively) were recorded in young, middle-aged, and older persons. As all participants had normal audiometric thresholds and were screened for (mild) cognitive impairment, differences in synchronized neural activity across the three age groups were likely to be attributed to age. Our data yield novel findings regarding theta and high gamma oscillations in the aging auditory system. At an older age, synchronized activity of theta oscillations is increased, whereas high gamma synchronization is decreased. In contrast to young persons who exhibit a right hemispheric dominance for processing of high gamma range modulations, older adults show a symmetrical processing pattern. These age-related changes in neural synchronization may very well underlie the speech perception problems in aging persons. PMID:27378906

  8. Acoustic Markers of Prominence Influence Infants' and Adults' Segmentation of Speech Sequences

    ERIC Educational Resources Information Center

    Bion, Ricardo A. H.; Benavides-Varela, Silvia; Nespor, Marina

    2011-01-01

    Two experiments investigated the way acoustic markers of prominence influence the grouping of speech sequences by adults and 7-month-old infants. In the first experiment, adults were familiarized with and asked to memorize sequences of adjacent syllables that alternated in either pitch or duration. During the test phase, participants heard pairs…

  9. Acoustic and Articulatory Features of Diphthong Production: A Speech Clarity Study

    ERIC Educational Resources Information Center

    Tasko, Stephen M.; Greilick, Kristin

    2010-01-01

    Purpose: The purpose of this study was to evaluate how speaking clearly influences selected acoustic and orofacial kinematic measures associated with diphthong production. Method: Forty-nine speakers, drawn from the University of Wisconsin X-Ray Microbeam Speech Production Database (J. R. Westbury, 1994), served as participants. Samples of clear…

  10. Acoustic sleepiness detection: framework and validation of a speech-adapted pattern recognition approach.

    PubMed

    Krajewski, Jarek; Batliner, Anton; Golz, Martin

    2009-08-01

    This article describes a general framework for detecting sleepiness states on the basis of prosody, articulation, and speech-quality-related speech characteristics. The advantages of this automatic real-time approach are that obtaining speech data is nonobstrusive and is free from sensor application and calibration efforts. Different types of acoustic features derived from speech, speaker, and emotion recognition were employed (frame-level-based speech features). Combing these features with high-level contour descriptors, which capture the temporal information of frame-level descriptor contours, results in 45,088 features per speech sample. In general, the measurement process follows the speech-adapted steps of pattern recognition: (1) recording speech, (2) preprocessing, (3) feature computation (using perceptual and signal-processing-related features such as, e.g., fundamental frequency, intensity, pause patterns, formants, and cepstral coefficients), (4) dimensionality reduction, (5) classification, and (6) evaluation. After a correlation-filter-based feature subset selection employed on the feature space in order to find most relevant features, different classification models were trained. The best model-namely, the support-vector machine-achieved 86.1% classification accuracy in predicting sleepiness in a sleep deprivation study (two-class problem, N=12; 01.00-08.00 a.m.).

  11. Effects of contextual cues on speech recognition in simulated electric-acoustic stimulation

    PubMed Central

    Kong, Ying-Yee; Donaldson, Gail; Somarowthu, Ala

    2015-01-01

    Low-frequency acoustic cues have shown to improve speech perception in cochlear-implant listeners. However, the mechanisms underlying this benefit are still not well understood. This study investigated the extent to which low-frequency cues can facilitate listeners' use of linguistic knowledge in simulated electric-acoustic stimulation (EAS). Experiment 1 examined differences in the magnitude of EAS benefit at the phoneme, word, and sentence levels. Speech materials were processed via noise-channel vocoding and lowpass (LP) filtering. The amount of spectral degradation in the vocoder speech was varied by applying different numbers of vocoder channels. Normal-hearing listeners were tested on vocoder-alone, LP-alone, and vocoder + LP conditions. Experiment 2 further examined factors that underlie the context effect on EAS benefit at the sentence level by limiting the low-frequency cues to temporal envelope and periodicity (AM + FM). Results showed that EAS benefit was greater for higher-context than for lower-context speech materials even when the LP ear received only low-frequency AM + FM cues. Possible explanations for the greater EAS benefit observed with higher-context materials may lie in the interplay between perceptual and expectation-driven processes for EAS speech recognition, and/or the band-importance functions for different types of speech materials. PMID:25994712

  12. Effects of contextual cues on speech recognition in simulated electric-acoustic stimulation.

    PubMed

    Kong, Ying-Yee; Donaldson, Gail; Somarowthu, Ala

    2015-05-01

    Low-frequency acoustic cues have shown to improve speech perception in cochlear-implant listeners. However, the mechanisms underlying this benefit are still not well understood. This study investigated the extent to which low-frequency cues can facilitate listeners' use of linguistic knowledge in simulated electric-acoustic stimulation (EAS). Experiment 1 examined differences in the magnitude of EAS benefit at the phoneme, word, and sentence levels. Speech materials were processed via noise-channel vocoding and lowpass (LP) filtering. The amount of spectral degradation in the vocoder speech was varied by applying different numbers of vocoder channels. Normal-hearing listeners were tested on vocoder-alone, LP-alone, and vocoder + LP conditions. Experiment 2 further examined factors that underlie the context effect on EAS benefit at the sentence level by limiting the low-frequency cues to temporal envelope and periodicity (AM + FM). Results showed that EAS benefit was greater for higher-context than for lower-context speech materials even when the LP ear received only low-frequency AM + FM cues. Possible explanations for the greater EAS benefit observed with higher-context materials may lie in the interplay between perceptual and expectation-driven processes for EAS speech recognition, and/or the band-importance functions for different types of speech materials.

  13. Contributions of electric and acoustic hearing to bimodal speech and music perception.

    PubMed

    Crew, Joseph D; Galvin, John J; Landsberger, David M; Fu, Qian-Jie

    2015-01-01

    Cochlear implant (CI) users have difficulty understanding speech in noisy listening conditions and perceiving music. Aided residual acoustic hearing in the contralateral ear can mitigate these limitations. The present study examined contributions of electric and acoustic hearing to speech understanding in noise and melodic pitch perception. Data was collected with the CI only, the hearing aid (HA) only, and both devices together (CI+HA). Speech reception thresholds (SRTs) were adaptively measured for simple sentences in speech babble. Melodic contour identification (MCI) was measured with and without a masker instrument; the fundamental frequency of the masker was varied to be overlapping or non-overlapping with the target contour. Results showed that the CI contributes primarily to bimodal speech perception and that the HA contributes primarily to bimodal melodic pitch perception. In general, CI+HA performance was slightly improved relative to the better ear alone (CI-only) for SRTs but not for MCI, with some subjects experiencing a decrease in bimodal MCI performance relative to the better ear alone (HA-only). Individual performance was highly variable, and the contribution of either device to bimodal perception was both subject- and task-dependent. The results suggest that individualized mapping of CIs and HAs may further improve bimodal speech and music perception.

  14. Contributions of Electric and Acoustic Hearing to Bimodal Speech and Music Perception

    PubMed Central

    Crew, Joseph D.; Galvin III, John J.; Landsberger, David M.; Fu, Qian-Jie

    2015-01-01

    Cochlear implant (CI) users have difficulty understanding speech in noisy listening conditions and perceiving music. Aided residual acoustic hearing in the contralateral ear can mitigate these limitations. The present study examined contributions of electric and acoustic hearing to speech understanding in noise and melodic pitch perception. Data was collected with the CI only, the hearing aid (HA) only, and both devices together (CI+HA). Speech reception thresholds (SRTs) were adaptively measured for simple sentences in speech babble. Melodic contour identification (MCI) was measured with and without a masker instrument; the fundamental frequency of the masker was varied to be overlapping or non-overlapping with the target contour. Results showed that the CI contributes primarily to bimodal speech perception and that the HA contributes primarily to bimodal melodic pitch perception. In general, CI+HA performance was slightly improved relative to the better ear alone (CI-only) for SRTs but not for MCI, with some subjects experiencing a decrease in bimodal MCI performance relative to the better ear alone (HA-only). Individual performance was highly variable, and the contribution of either device to bimodal perception was both subject- and task-dependent. The results suggest that individualized mapping of CIs and HAs may further improve bimodal speech and music perception. PMID:25790349

  15. Primary Progressive Apraxia of Speech: Clinical Features and Acoustic and Neurologic Correlates

    PubMed Central

    Strand, Edythe A.; Clark, Heather; Machulda, Mary; Whitwell, Jennifer L.; Josephs, Keith A.

    2015-01-01

    Purpose This study summarizes 2 illustrative cases of a neurodegenerative speech disorder, primary progressive apraxia of speech (AOS), as a vehicle for providing an overview of the disorder and an approach to describing and quantifying its perceptual features and some of its temporal acoustic attributes. Method Two individuals with primary progressive AOS underwent speech-language and neurologic evaluations on 2 occasions, ranging from 2.0 to 7.5 years postonset. Performance on several tests, tasks, and rating scales, as well as several acoustic measures, were compared over time within and between cases. Acoustic measures were compared with performance of control speakers. Results Both patients initially presented with AOS as the only or predominant sign of disease and without aphasia or dysarthria. The presenting features and temporal progression were captured in an AOS Rating Scale, an Articulation Error Score, and temporal acoustic measures of utterance duration, syllable rates per second, rates of speechlike alternating motion and sequential motion, and a pairwise variability index measure. Conclusions AOS can be the predominant manifestation of neurodegenerative disease. Clinical ratings of its attributes and acoustic measures of some of its temporal characteristics can support its diagnosis and help quantify its salient characteristics and progression over time. PMID:25654422

  16. Measurements of speech intelligibility in common rooms for older adults as a first step towards acoustical guidelines.

    PubMed

    Reinten, Jikke; van Hout, Nicole; Hak, Constant; Kort, Helianthe

    2015-01-01

    Adapting the built environment to the needs of nursing- or care-home residents has become common practice. Even though hearing loss due to ageing is a normal occurring biological process, little research has been performed on the effects of room acoustic parameters on the speech intelligibility for older adults. This article presents the results of room acoustic measurements in common rooms for older adults and the effect on speech intelligibility. Perceived speech intelligibility amongst the users of the rooms was also investigated. The results have led to ongoing research at Utrecht University of Applied Sciences and Eindhoven University of Technology, aimed at the development of acoustical guidelines for elderly care facilities.

  17. The advantages of sound localization and speech perception of bilateral electric acoustic stimulation

    PubMed Central

    Moteki, Hideaki; Kitoh, Ryosuke; Tsukada, Keita; Iwasaki, Satoshi; Nishio, Shin-Ya

    2015-01-01

    Conclusion: Bilateral electric acoustic stimulation (EAS) effectively improved speech perception in noise and sound localization in patients with high-frequency hearing loss. Objective: To evaluate bilateral EAS efficacy of sound localization detection and speech perception in noise in two cases of high-frequency hearing loss. Methods: Two female patients, aged 38 and 45 years, respectively, received bilateral EAS sequentially. Pure-tone audiometry was performed preoperatively and postoperatively to evaluate the hearing preservation in the lower frequencies. Speech perception outcomes in quiet and noise and sound localization were assessed with unilateral and bilateral EAS. Results: Residual hearing in the lower frequencies was well preserved after insertion of a FLEX24 electrode (24 mm) using the round window approach. After bilateral EAS, speech perception improved in quiet and even more so in noise. In addition, the sound localization ability of both cases with bilateral EAS improved remarkably. PMID:25423260

  18. Estimation of glottal source features from the spectral envelope of the acoustic speech signal

    NASA Astrophysics Data System (ADS)

    Torres, Juan Felix

    Speech communication encompasses diverse types of information, including phonetics, affective state, voice quality, and speaker identity. From a speech production standpoint, the acoustic speech signal can be mainly divided into glottal source and vocal tract components, which play distinct roles in rendering the various types of information it contains. Most deployed speech analysis systems, however, do not explicitly represent these two components as distinct entities, as their joint estimation from the acoustic speech signal becomes an ill-defined blind deconvolution problem. Nevertheless, because of the desire to understand glottal behavior and how it relates to perceived voice quality, there has been continued interest in explicitly estimating the glottal component of the speech signal. To this end, several inverse filtering (IF) algorithms have been proposed, but they are unreliable in practice because of the blind formulation of the separation problem. In an effort to develop a method that can bypass the challenging IF process, this thesis proposes a new glottal source information extraction method that relies on supervised machine learning to transform smoothed spectral representations of speech, which are already used in some of the most widely deployed and successful speech analysis applications, into a set of glottal source features. A transformation method based on Gaussian mixture regression (GMR) is presented and compared to current IF methods in terms of feature similarity, reliability, and speaker discrimination capability on a large speech corpus, and potential representations of the spectral envelope of speech are investigated for their ability represent glottal source variation in a predictable manner. The proposed system was found to produce glottal source features that reasonably matched their IF counterparts in many cases, while being less susceptible to spurious errors. The development of the proposed method entailed a study into the aspects

  19. Vowel Acoustics in Dysarthria: Speech Disorder Diagnosis and Classification

    ERIC Educational Resources Information Center

    Lansford, Kaitlin L.; Liss, Julie M.

    2014-01-01

    Purpose: The purpose of this study was to determine the extent to which vowel metrics are capable of distinguishing healthy from dysarthric speech and among different forms of dysarthria. Method: A variety of vowel metrics were derived from spectral and temporal measurements of vowel tokens embedded in phrases produced by 45 speakers with…

  20. Acoustical properties of speech as indicators of depression and suicidal risk.

    PubMed

    France, D J; Shiavi, R G; Silverman, S; Silverman, M; Wilkes, D M

    2000-07-01

    Acoustic properties of speech have previously been identified as possible cues to depression, and there is evidence that certain vocal parameters may be used further to objectively discriminate between depressed and suicidal speech. Studies were performed to analyze and compare the speech acoustics of separate male and female samples comprised of normal individuals and individuals carrying diagnoses of depression and high-risk, near-term suicidality. The female sample consisted of ten control subjects, 17 dysthymic patients, and 21 major depressed patients. The male sample contained 24 control subjects, 21 major depressed patients, and 22 high-risk suicidal patients. Acoustic analyses of voice fundamental frequency (Fo), amplitude modulation (AM), formants, and power distribution were performed on speech samples extracted from audio recordings collected from the sample members. Multivariate feature and discriminant analyses were performed on feature vectors representing the members of the control and disordered classes. Features derived from the formant and power spectral density measurements were found to be the best discriminators of class membership in both the male and female studies. AM features emerged as strong class discriminators of the male classes. Features describing Fo were generally ineffective discriminators in both studies. The results support theories that identify psychomotor disturbances as central elements in depression and suicidality.

  1. A Statistical Model-Based Speech Enhancement Using Acoustic Noise Classification for Robust Speech Communication

    NASA Astrophysics Data System (ADS)

    Choi, Jae-Hun; Chang, Joon-Hyuk

    In this paper, we present a speech enhancement technique based on the ambient noise classification that incorporates the Gaussian mixture model (GMM). The principal parameters of the statistical model-based speech enhancement algorithm such as the weighting parameter in the decision-directed (DD) method and the long-term smoothing parameter of the noise estimation, are set according to the classified context to ensure best performance under each noise. For real-time context awareness, the noise classification is performed on a frame-by-frame basis using the GMM with the soft decision framework. The speech absence probability (SAP) is used in detecting the speech absence periods and updating the likelihood of the GMM.

  2. Effects of cognitive workload on speech production: acoustic analyses and perceptual consequences.

    PubMed

    Lively, S E; Pisoni, D B; Van Summers, W; Bernacki, R H

    1993-05-01

    The present investigation examined the effects of cognitive workload on speech production. Workload was manipulated by having talkers perform a compensatory visual tracking task while speaking test sentences of the form "Say hVd again." Acoustic measurements were made to compare utterances produced under workload with the same utterances produced in a control condition. In the workload condition, some talkers produced utterances with increased amplitude and amplitude variability, decreased spectral tilt and F0 variability and increased speaking rate. No changes in F1, F2, or F3 were observed across conditions for any of the talkers. These findings indicate both laryngeal and sublaryngeal adjustments in articulation, as well as modifications in the absolute timing of articulatory gestures. The results of a perceptual identification experiment paralleled the acoustic measurements. Small but significant advantages in intelligibility were observed for utterances produced under workload for talkers who showed robust changes in speech production. Changes in amplitude and amplitude variability for utterances produced under workload appeared to be the major factor controlling intelligibility. The results of the present investigation support the assumptions of Lindblom's ["Explaining phonetic variation: A sketch of the H&H theory," in Speech Production and Speech Modeling (Klewer Academic, The Netherlands, 1990)] H&H model: Talkers adapt their speech to suit the demands of the environment and these modifications are designed to maximize intelligibility.

  3. Acoustic and auditory phonetics: the adaptive design of speech sound systems.

    PubMed

    Diehl, Randy L

    2008-03-12

    Speech perception is remarkably robust. This paper examines how acoustic and auditory properties of vowels and consonants help to ensure intelligibility. First, the source-filter theory of speech production is briefly described, and the relationship between vocal-tract properties and formant patterns is demonstrated for some commonly occurring vowels. Next, two accounts of the structure of preferred sound inventories, quantal theory and dispersion theory, are described and some of their limitations are noted. Finally, it is suggested that certain aspects of quantal and dispersion theories can be unified in a principled way so as to achieve reasonable predictive accuracy.

  4. Subphonetic Acoustic Modeling for Speaker-Independent Continuous Speech Recognition

    DTIC Science & Technology

    1993-12-17

    concept of senone sharing across all hidden Markov models, such as triphones, multi-phones, words, or even phrase models ................. 50 3.15 The...For instance, training the 50 phone HMMs for English usually requires only 1-2 hours of training data, while to sufficiently train syllable models may...require 50 hours of speech. Faced with a limited amount of training data, the advantage of the improved structure of the stochastic model may not be

  5. From prosodic structure to acoustic saliency: A fMRI investigation of speech rate, clarity, and emphasis

    NASA Astrophysics Data System (ADS)

    Golfinopoulos, Elisa

    Acoustic variability in fluent speech can arise at many stages in speech production planning and execution. For example, at the phonological encoding stage, the grouping of phonemes into syllables determines which segments are coarticulated and, by consequence, segment-level acoustic variation. Likewise phonetic encoding, which determines the spatiotemporal extent of articulatory gestures, will affect the acoustic detail of segments. Functional magnetic resonance imaging (fMRI) was used to measure brain activity of fluent adult speakers in four speaking conditions: fast, normal, clear, and emphatic (or stressed) speech. These speech manner changes typically result in acoustic variations that do not change the lexical or semantic identity of productions but do affect the acoustic saliency of phonemes, syllables and/or words. Acoustic responses recorded inside the scanner were assessed quantitatively using eight acoustic measures and sentence duration was used as a covariate of non-interest in the neuroimaging analysis. Compared to normal speech, emphatic speech was characterized acoustically by a greater difference between stressed and unstressed vowels in intensity, duration, and fundamental frequency, and neurally by increased activity in right middle premotor cortex and supplementary motor area, and bilateral primary sensorimotor cortex. These findings are consistent with right-lateralized motor planning of prosodic variation in emphatic speech. Clear speech involved an increase in average vowel and sentence durations and average vowel spacing, along with increased activity in left middle premotor cortex and bilateral primary sensorimotor cortex. These findings are consistent with an increased reliance on feedforward control, resulting in hyper-articulation, under clear as compared to normal speech. Fast speech was characterized acoustically by reduced sentence duration and average vowel spacing, and neurally by increased activity in left anterior frontal

  6. A modified diffusion equation for room-acoustic predication.

    PubMed

    Jing, Yun; Xiang, Ning

    2007-06-01

    This letter presents a modified diffusion model using an Eyring absorption coefficient to predict the reverberation time and sound pressure distributions in enclosures. While the original diffusion model [Ollendorff, Acustica 21, 236-245 (1969); J. Picaut et al., Acustica 83, 614-621 (1997); Valeau et al., J. Acoust. Soc. Am. 119, 1504-1513 (2006)] usually has good performance for low absorption, the modified diffusion model yields more satisfactory results for both low and high absorption. Comparisons among the modified model, the original model, a geometrical-acoustics model, and several well-established theories in terms of reverberation times and sound pressure level distributions, indicate significantly improved prediction accuracy by the modification.

  7. Speech Recognition and Acoustic Features in Combined Electric and Acoustic Stimulation

    ERIC Educational Resources Information Center

    Yoon, Yang-soo; Li, Yongxin; Fu, Qian-Jie

    2012-01-01

    Purpose: In this study, the authors aimed to identify speech information processed by a hearing aid (HA) that is additive to information processed by a cochlear implant (CI) as a function of signal-to-noise ratio (SNR). Method: Speech recognition was measured with CI alone, HA alone, and CI + HA. Ten participants were separated into 2 groups; good…

  8. Accuracy of perceptual and acoustic methods for the detection of inspiratory loci in spontaneous speech.

    PubMed

    Wang, Yu-Tsai; Nip, Ignatius S B; Green, Jordan R; Kent, Ray D; Kent, Jane Finley; Ullman, Cara

    2012-12-01

    The present study investigates the accuracy of perceptually and acoustically determined inspiratory loci in spontaneous speech for the purpose of identifying breath groups. Sixteen participants were asked to talk about simple topics in daily life at a comfortable speaking rate and loudness while connected to a pneumotach and audio microphone. The locations of inspiratory loci were determined on the basis of the aerodynamic signal, which served as a reference for loci identified perceptually and acoustically. Signal detection theory was used to evaluate the accuracy of the methods. The results showed that the greatest accuracy in pause detection was achieved (1) perceptually, on the basis of agreement between at least two of three judges, and (2) acoustically, using a pause duration threshold of 300 ms. In general, the perceptually based method was more accurate than was the acoustically based method. Inconsistencies among perceptually determined, acoustically determined, and aerodynamically determined inspiratory loci for spontaneous speech should be weighed in selecting a method of breath group determination.

  9. Speech after Radial Forearm Free Flap Reconstruction of the Tongue: A Longitudinal Acoustic Study of Vowel and Diphthong Sounds

    ERIC Educational Resources Information Center

    Laaksonen, Juha-Pertti; Rieger, Jana; Happonen, Risto-Pekka; Harris, Jeffrey; Seikaly, Hadi

    2010-01-01

    The purpose of this study was to use acoustic analyses to describe speech outcomes over the course of 1 year after radial forearm free flap (RFFF) reconstruction of the tongue. Eighteen Canadian English-speaking females and males with reconstruction for oral cancer had speech samples recorded (pre-operative, and 1 month, 6 months, and 1 year…

  10. Acoustic-Phonetic Differences between Infant- and Adult-Directed Speech: The Role of Stress and Utterance Position

    ERIC Educational Resources Information Center

    Wang, Yuanyuan; Seidl, Amanda; Cristia, Alejandrina

    2015-01-01

    Previous studies have shown that infant-directed speech (IDS) differs from adult-directed speech (ADS) on a variety of dimensions. The aim of the current study was to investigate whether acoustic differences between IDS and ADS in English are modulated by prosodic structure. We compared vowels across the two registers (IDS, ADS) in both stressed…

  11. [The acoustic aspect of the speech development in children during the third year of life].

    PubMed

    Liakso, E E; Gromova, A D; Frolova, O V; Romanova, O D

    2004-01-01

    The current part of a Russian language acquisition longitudinal study based on auditory, phonetic and instrumental analysis is devoted to the third year of child's life. We examined the development of supplementary acoustic and phonetic features of the child's speech providing for the possibility for the speech to be recognized. The instrumental analysis and statistical processing of vowel formant dynamics as well as stress, palatalization and VOT development, has been performed for the first time in Russian children. We showed that the high probability of children words recognition by auditors was due to establishment of a system of acoustically stable features which, in combination with each other, provide for the informative sufficiency of a message.

  12. An eighth-scale speech source for subjective assessments in acoustic models

    NASA Astrophysics Data System (ADS)

    Orlowski, R. J.

    1981-08-01

    The design of a source is described which is suitable for making speech recordings in eighth-scale acoustic models of auditoria. An attempt was made to match the directionality of the source with the directionality of the human voice using data reported in the literature. A narrow aperture was required for the design which was provided by mounting an inverted conical horn over the diaphragm of a high frequency loudspeaker. Resonance problems were encountered with the use of a horn and a description is given of the electronic techniques adopted to minimize the effect of these resonances. Subjective and objective assessments on the completed speech source have proved satisfactory. It has been used in a modelling exercise concerned with the acoustic design of a theatre with a thrust-type stage.

  13. The Acoustic-Modeling Problem in Automatic Speech Recognition.

    DTIC Science & Technology

    1987-12-01

    systems that use an artificial grammar do so in order to set this uncertainty by fiat, thereby ensuring that their task, will not be too difficult...an artificial grammar , the Pr (W = w)’s are known and Hm (W) can, in fact, achieve its lower bound if the system simply uses these probabilities. In a...finite-state grammar represented by that chain. As Jim Baker points out, the modeling of speech by a hidden Markov model should not be regarded as a

  14. Are you a good mimic? Neuro-acoustic signatures for speech imitation ability.

    PubMed

    Reiterer, Susanne M; Hu, Xiaochen; Sumathi, T A; Singh, Nandini C

    2013-01-01

    We investigated individual differences in speech imitation ability in late bilinguals using a neuro-acoustic approach. One hundred and thirty-eight German-English bilinguals matched on various behavioral measures were tested for "speech imitation ability" in a foreign language, Hindi, and categorized into "high" and "low ability" groups. Brain activations and speech recordings were obtained from 26 participants from the two extreme groups as they performed a functional neuroimaging experiment which required them to "imitate" sentences in three conditions: (A) German, (B) English, and (C) German with fake English accent. We used recently developed novel acoustic analysis, namely the "articulation space" as a metric to compare speech imitation abilities of the two groups. Across all three conditions, direct comparisons between the two groups, revealed brain activations (FWE corrected, p < 0.05) that were more widespread with significantly higher peak activity in the left supramarginal gyrus and postcentral areas for the low ability group. The high ability group, on the other hand showed significantly larger articulation space in all three conditions. In addition, articulation space also correlated positively with imitation ability (Pearson's r = 0.7, p < 0.01). Our results suggest that an expanded articulation space for high ability individuals allows access to a larger repertoire of sounds, thereby providing skilled imitators greater flexibility in pronunciation and language learning.

  15. Are you a good mimic? Neuro-acoustic signatures for speech imitation ability

    PubMed Central

    Reiterer, Susanne M.; Hu, Xiaochen; Sumathi, T. A.; Singh, Nandini C.

    2013-01-01

    We investigated individual differences in speech imitation ability in late bilinguals using a neuro-acoustic approach. One hundred and thirty-eight German-English bilinguals matched on various behavioral measures were tested for “speech imitation ability” in a foreign language, Hindi, and categorized into “high” and “low ability” groups. Brain activations and speech recordings were obtained from 26 participants from the two extreme groups as they performed a functional neuroimaging experiment which required them to “imitate” sentences in three conditions: (A) German, (B) English, and (C) German with fake English accent. We used recently developed novel acoustic analysis, namely the “articulation space” as a metric to compare speech imitation abilities of the two groups. Across all three conditions, direct comparisons between the two groups, revealed brain activations (FWE corrected, p < 0.05) that were more widespread with significantly higher peak activity in the left supramarginal gyrus and postcentral areas for the low ability group. The high ability group, on the other hand showed significantly larger articulation space in all three conditions. In addition, articulation space also correlated positively with imitation ability (Pearson's r = 0.7, p < 0.01). Our results suggest that an expanded articulation space for high ability individuals allows access to a larger repertoire of sounds, thereby providing skilled imitators greater flexibility in pronunciation and language learning. PMID:24155739

  16. Quantifying the effect of compression hearing aid release time on speech acoustics and intelligibility.

    PubMed

    Jenstad, Lorienne M; Souza, Pamela E

    2005-06-01

    Compression hearing aids have the inherent, and often adjustable, feature of release time from compression. Research to date does not provide a consensus on how to choose or set release time. The current study had 2 purposes: (a) a comprehensive evaluation of the acoustic effects of release time for a single-channel compression system in quiet and (b) an evaluation of the relation between the acoustic changes and speech recognition. The release times under study were 12, 100, and 800 ms. All of the stimuli were VC syllables from the Nonsense Syllable Task spoken by a female talker. The stimuli were processed through a hearing aid simulator at 3 input levels. Two acoustic measures were made on individual syllables: the envelope-difference index and CV ratio. These measurements allowed for quantification of the short-term amplitude characteristics of the speech signal and the changes to these amplitude characteristics caused by compression. The acoustic analyses revealed statistically significant effects among the 3 release times. The size of the effect was dependent on characteristics of the phoneme. Twelve listeners with moderate sensorineural hearing loss were tested for their speech recognition for the same stimuli. Although release time for this single-channel, 3:1 compression ratio system did not directly predict overall intelligibility for these nonsense syllables in quiet, the acoustic measurements reflecting the changes due to release time were significant predictors of phoneme recognition. Increased temporal-envelope distortion was predictive of reduced recognition for some individual phonemes, which is consistent with previous research on the importance of relative amplitude as a cue to syllable recognition for some phonemes.

  17. Suppressed Alpha Oscillations Predict Intelligibility of Speech and its Acoustic Details

    PubMed Central

    Weisz, Nathan

    2012-01-01

    Modulations of human alpha oscillations (8–13 Hz) accompany many cognitive processes, but their functional role in auditory perception has proven elusive: Do oscillatory dynamics of alpha reflect acoustic details of the speech signal and are they indicative of comprehension success? Acoustically presented words were degraded in acoustic envelope and spectrum in an orthogonal design, and electroencephalogram responses in the frequency domain were analyzed in 24 participants, who rated word comprehensibility after each trial. First, the alpha power suppression during and after a degraded word depended monotonically on spectral and, to a lesser extent, envelope detail. The magnitude of this alpha suppression exhibited an additional and independent influence on later comprehension ratings. Second, source localization of alpha suppression yielded superior parietal, prefrontal, as well as anterior temporal brain areas. Third, multivariate classification of the time–frequency pattern across participants showed that patterns of late posterior alpha power allowed best for above-chance classification of word intelligibility. Results suggest that both magnitude and topography of late alpha suppression in response to single words can indicate a listener's sensitivity to acoustic features and the ability to comprehend speech under adverse listening conditions. PMID:22100354

  18. Acoustic-Emergent Phonology in the Amplitude Envelope of Child-Directed Speech

    PubMed Central

    Leong, Victoria; Goswami, Usha

    2015-01-01

    When acquiring language, young children may use acoustic spectro-temporal patterns in speech to derive phonological units in spoken language (e.g., prosodic stress patterns, syllables, phonemes). Children appear to learn acoustic-phonological mappings rapidly, without direct instruction, yet the underlying developmental mechanisms remain unclear. Across different languages, a relationship between amplitude envelope sensitivity and phonological development has been found, suggesting that children may make use of amplitude modulation (AM) patterns within the envelope to develop a phonological system. Here we present the Spectral Amplitude Modulation Phase Hierarchy (S-AMPH) model, a set of algorithms for deriving the dominant AM patterns in child-directed speech (CDS). Using Principal Components Analysis, we show that rhythmic CDS contains an AM hierarchy comprising 3 core modulation timescales. These timescales correspond to key phonological units: prosodic stress (Stress AM, ~2 Hz), syllables (Syllable AM, ~5 Hz) and onset-rime units (Phoneme AM, ~20 Hz). We argue that these AM patterns could in principle be used by naïve listeners to compute acoustic-phonological mappings without lexical knowledge. We then demonstrate that the modulation statistics within this AM hierarchy indeed parse the speech signal into a primitive hierarchically-organised phonological system comprising stress feet (proto-words), syllables and onset-rime units. We apply the S-AMPH model to two other CDS corpora, one spontaneous and one deliberately-timed. The model accurately identified 72–82% (freely-read CDS) and 90–98% (rhythmically-regular CDS) stress patterns, syllables and onset-rime units. This in-principle demonstration that primitive phonology can be extracted from speech AMs is termed Acoustic-Emergent Phonology (AEP) theory. AEP theory provides a set of methods for examining how early phonological development is shaped by the temporal modulation structure of speech across

  19. Combining acoustic and electric stimulation in the service of speech recognition

    PubMed Central

    Dorman, Michael F.; Gifford, Rene H.

    2010-01-01

    The majority of recently implanted, cochlear implant patients can potentially benefit from a hearing aid in the ear contralateral to the implant. When patients combine electric and acoustic stimulation, word recognition in quiet and sentence recognition in noise increase significantly. Several studies suggest that the acoustic information that leads to the increased level of performance resides mostly in the frequency region of the voice fundamental, e.g. 125 Hz for a male voice. Recent studies suggest that this information aids speech recognition in noise by improving the recognition of lexical boundaries or word onsets. In some noise environments, patients with bilateral implants can achieve similar levels of performance as patients who combine electric and acoustic stimulation. Patients who have undergone hearing preservation surgery, and who have electric stimulation from a cochlear implant and who have low-frequency hearing in both the implanted and not-implanted ears, achieve the best performance in a high noise environment. PMID:20874053

  20. Speech Recognition and Acoustic Features in Combined Electric and Acoustic Stimulation

    PubMed Central

    Yoon, Yang-soo; Li, Yongxin; Fu, Qian-Jie

    2011-01-01

    Purpose This study aimed to identify speech information processed by a hearing aid (HA) that is additive to information processed by a cochlear implant (CI) as a function of signal-to-noise ratio (SNR). Method Speech recognition was measured with CI alone, HA alone, and CI+HA. Ten participants were separated into two groups; good (aided pure-tone average (PTA) < 55 dB) and poor (aided PTA ≥ 55 dB) at audiometric frequencies ≤ 1 kHz in HA. Results Results showed that the good aided PTA group derived a clear bimodal benefit (performance difference between CI+HA and CI alone) for vowel and sentence recognition in noise while the poor aided PTA group received little benefit across speech tests and SNRs. Results also showed that a better aided PTA helped in processing cues embedded in both low and high frequencies; none of these cues were significantly perceived by the poor aided PTA group. Conclusions The aided PTA is an important indicator for bimodal advantage in speech perception. The lack of bimodal benefits in the poor group may be attributed to the non-optimal HA fitting. Bimodal listening provides a synergistic effect for cues in both low and high frequency components in speech. PMID:22199183

  1. On the Acoustics of Emotion in Audio: What Speech, Music, and Sound have in Common

    PubMed Central

    Weninger, Felix; Eyben, Florian; Schuller, Björn W.; Mortillaro, Marcello; Scherer, Klaus R.

    2013-01-01

    Without doubt, there is emotional information in almost any kind of sound received by humans every day: be it the affective state of a person transmitted by means of speech; the emotion intended by a composer while writing a musical piece, or conveyed by a musician while performing it; or the affective state connected to an acoustic event occurring in the environment, in the soundtrack of a movie, or in a radio play. In the field of affective computing, there is currently some loosely connected research concerning either of these phenomena, but a holistic computational model of affect in sound is still lacking. In turn, for tomorrow’s pervasive technical systems, including affective companions and robots, it is expected to be highly beneficial to understand the affective dimensions of “the sound that something makes,” in order to evaluate the system’s auditory environment and its own audio output. This article aims at a first step toward a holistic computational model: starting from standard acoustic feature extraction schemes in the domains of speech, music, and sound analysis, we interpret the worth of individual features across these three domains, considering four audio databases with observer annotations in the arousal and valence dimensions. In the results, we find that by selection of appropriate descriptors, cross-domain arousal, and valence regression is feasible achieving significant correlations with the observer annotations of up to 0.78 for arousal (training on sound and testing on enacted speech) and 0.60 for valence (training on enacted speech and testing on music). The high degree of cross-domain consistency in encoding the two main dimensions of affect may be attributable to the co-evolution of speech and music from multimodal affect bursts, including the integration of nature sounds for expressive effects. PMID:23750144

  2. Segment-based acoustic models for continuous speech recognition

    NASA Astrophysics Data System (ADS)

    Ostendorf, Mari; Rohlicek, J. R.

    1993-07-01

    This research aims to develop new and more accurate stochastic models for speaker-independent continuous speech recognition, by extending previous work in segment-based modeling and by introducing a new hierarchical approach to representing intra-utterance statistical dependencies. These techniques, which are more costly than traditional approaches because of the large search space associated with higher order models, are made feasible through rescoring a set of HMM-generated N-best sentence hypotheses. We expect these different modeling techniques to result in improved recognition performance over that achieved by current systems, which handle only frame-based observations and assume that these observations are independent given an underlying state sequence. In the fourth quarter of the project, we have completed the following: (1) ported our recognition system to the Wall Street Journal task, a standard task in the ARPA community; (2) developed an initial dependency-tree model of intra-utterance observation correlation; and (3) implemented baseline language model estimation software. Our initial results on the Wall Street Journal task are quite good and represent significantly improved performance over most HMM systems reporting on the Nov. 1992 5k vocabulary test set.

  3. Acoustic landmarks drive delta-theta oscillations to enable speech comprehension by facilitating perceptual parsing.

    PubMed

    Doelling, Keith B; Arnal, Luc H; Ghitza, Oded; Poeppel, David

    2014-01-15

    A growing body of research suggests that intrinsic neuronal slow (<10 Hz) oscillations in auditory cortex appear to track incoming speech and other spectro-temporally complex auditory signals. Within this framework, several recent studies have identified critical-band temporal envelopes as the specific acoustic feature being reflected by the phase of these oscillations. However, how this alignment between speech acoustics and neural oscillations might underpin intelligibility is unclear. Here we test the hypothesis that the 'sharpness' of temporal fluctuations in the critical band envelope acts as a temporal cue to speech syllabic rate, driving delta-theta rhythms to track the stimulus and facilitate intelligibility. We interpret our findings as evidence that sharp events in the stimulus cause cortical rhythms to re-align and parse the stimulus into syllable-sized chunks for further decoding. Using magnetoencephalographic recordings, we show that by removing temporal fluctuations that occur at the syllabic rate, envelope-tracking activity is reduced. By artificially reinstating these temporal fluctuations, envelope-tracking activity is regained. These changes in tracking correlate with intelligibility of the stimulus. Together, the results suggest that the sharpness of fluctuations in the stimulus, as reflected in the cochlear output, drive oscillatory activity to track and entrain to the stimulus, at its syllabic rate. This process likely facilitates parsing of the stimulus into meaningful chunks appropriate for subsequent decoding, enhancing perception and intelligibility.

  4. Control of Spoken Vowel Acoustics and the Influence of Phonetic Context in Human Speech Sensorimotor Cortex

    PubMed Central

    Bouchard, Kristofer E.

    2014-01-01

    Speech production requires the precise control of vocal tract movements to generate individual speech sounds (phonemes) which, in turn, are rapidly organized into complex sequences. Multiple productions of the same phoneme can exhibit substantial variability, some of which is inherent to control of the vocal tract and its biomechanics, and some of which reflects the contextual effects of surrounding phonemes (“coarticulation”). The role of the CNS in these aspects of speech motor control is not well understood. To address these issues, we recorded multielectrode cortical activity directly from human ventral sensory-motor cortex (vSMC) during the production of consonant-vowel syllables. We analyzed the relationship between the acoustic parameters of vowels (pitch and formants) and cortical activity on a single-trial level. We found that vSMC activity robustly predicted acoustic parameters across vowel categories (up to 80% of variance), as well as different renditions of the same vowel (up to 25% of variance). Furthermore, we observed significant contextual effects on vSMC representations of produced phonemes that suggest active control of coarticulation: vSMC representations for vowels were biased toward the representations of the preceding consonant, and conversely, representations for consonants were biased toward upcoming vowels. These results reveal that vSMC activity for phonemes are not invariant and provide insight into the cortical mechanisms of coarticulation. PMID:25232105

  5. Transient noise reduction in speech signal with a modified long-term predictor

    NASA Astrophysics Data System (ADS)

    Choi, Min-Seok; Kang, Hong-Goo

    2011-12-01

    This article proposes an efficient median filter based algorithm to remove transient noise in a speech signal. The proposed algorithm adopts a modified long-term predictor (LTP) as the pre-processor of the noise reduction process to reduce speech distortion caused by the nonlinear nature of the median filter. This article shows that the LTP analysis does not modify to the characteristic of transient noise during the speech modeling process. Oppositely, if a short-term linear prediction (STP) filter is employed as a pre-processor, the enhanced output includes residual noise because the STP analysis and synthesis process keeps and restores transient noise components. To minimize residual noise and speech distortion after the transient noise reduction, a modified LTP method is proposed which estimates the characteristic of speech more accurately. By ignoring transient noise presence regions in the pitch lag detection step, the modified LTP successfully avoids being affected by transient noise. A backward pitch prediction algorithm is also adopted to reduce speech distortion in the onset regions. Experimental results verify that the proposed system efficiently eliminates transient noise while preserving desired speech signal.

  6. Effective Acoustic Modeling for Pronunciation Quality Scoring of Strongly Accented Mandarin Speech

    NASA Astrophysics Data System (ADS)

    Ge, Fengpei; Liu, Changliang; Shao, Jian; Pan, Fuping; Dong, Bin; Yan, Yonghong

    In this paper we present our investigation into improving the performance of our computer-assisted language learning (CALL) system through exploiting the acoustic model and features within the speech recognition framework. First, to alleviate channel distortion, speaker-dependent cepstrum mean normalization (CMN) is adopted and the average correlation coefficient (average CC) between machine and expert scores is improved from 78.00% to 84.14%. Second, heteroscedastic linear discriminant analysis (HLDA) is adopted to enhance the discriminability of the acoustic model, which successfully increases the average CC from 84.14% to 84.62%. Additionally, HLDA causes the scoring accuracy to be more stable at various pronunciation proficiency levels, and thus leads to an increase in the speaker correct-rank rate from 85.59% to 90.99%. Finally, we use maximum a posteriori (MAP) estimation to tune the acoustic model to fit strongly accented test speech. As a result, the average CC is improved from 84.62% to 86.57%. These three novel techniques improve the accuracy of evaluating pronunciation quality.

  7. Logopenic and Nonfluent Variants of Primary Progressive Aphasia Are Differentiated by Acoustic Measures of Speech Production

    PubMed Central

    Ballard, Kirrie J.; Savage, Sharon; Leyton, Cristian E.; Vogel, Adam P.; Hornberger, Michael; Hodges, John R.

    2014-01-01

    Differentiation of logopenic (lvPPA) and nonfluent/agrammatic (nfvPPA) variants of Primary Progressive Aphasia is important yet remains challenging since it hinges on expert based evaluation of speech and language production. In this study acoustic measures of speech in conjunction with voxel-based morphometry were used to determine the success of the measures as an adjunct to diagnosis and to explore the neural basis of apraxia of speech in nfvPPA. Forty-one patients (21 lvPPA, 20 nfvPPA) were recruited from a consecutive sample with suspected frontotemporal dementia. Patients were diagnosed using the current gold-standard of expert perceptual judgment, based on presence/absence of particular speech features during speaking tasks. Seventeen healthy age-matched adults served as controls. MRI scans were available for 11 control and 37 PPA cases; 23 of the PPA cases underwent amyloid ligand PET imaging. Measures, corresponding to perceptual features of apraxia of speech, were periods of silence during reading and relative vowel duration and intensity in polysyllable word repetition. Discriminant function analyses revealed that a measure of relative vowel duration differentiated nfvPPA cases from both control and lvPPA cases (r2 = 0.47) with 88% agreement with expert judgment of presence of apraxia of speech in nfvPPA cases. VBM analysis showed that relative vowel duration covaried with grey matter intensity in areas critical for speech motor planning and programming: precentral gyrus, supplementary motor area and inferior frontal gyrus bilaterally, only affected in the nfvPPA group. This bilateral involvement of frontal speech networks in nfvPPA potentially affects access to compensatory mechanisms involving right hemisphere homologues. Measures of silences during reading also discriminated the PPA and control groups, but did not increase predictive accuracy. Findings suggest that a measure of relative vowel duration from of a polysyllable word repetition task

  8. Logopenic and nonfluent variants of primary progressive aphasia are differentiated by acoustic measures of speech production.

    PubMed

    Ballard, Kirrie J; Savage, Sharon; Leyton, Cristian E; Vogel, Adam P; Hornberger, Michael; Hodges, John R

    2014-01-01

    Differentiation of logopenic (lvPPA) and nonfluent/agrammatic (nfvPPA) variants of Primary Progressive Aphasia is important yet remains challenging since it hinges on expert based evaluation of speech and language production. In this study acoustic measures of speech in conjunction with voxel-based morphometry were used to determine the success of the measures as an adjunct to diagnosis and to explore the neural basis of apraxia of speech in nfvPPA. Forty-one patients (21 lvPPA, 20 nfvPPA) were recruited from a consecutive sample with suspected frontotemporal dementia. Patients were diagnosed using the current gold-standard of expert perceptual judgment, based on presence/absence of particular speech features during speaking tasks. Seventeen healthy age-matched adults served as controls. MRI scans were available for 11 control and 37 PPA cases; 23 of the PPA cases underwent amyloid ligand PET imaging. Measures, corresponding to perceptual features of apraxia of speech, were periods of silence during reading and relative vowel duration and intensity in polysyllable word repetition. Discriminant function analyses revealed that a measure of relative vowel duration differentiated nfvPPA cases from both control and lvPPA cases (r(2) = 0.47) with 88% agreement with expert judgment of presence of apraxia of speech in nfvPPA cases. VBM analysis showed that relative vowel duration covaried with grey matter intensity in areas critical for speech motor planning and programming: precentral gyrus, supplementary motor area and inferior frontal gyrus bilaterally, only affected in the nfvPPA group. This bilateral involvement of frontal speech networks in nfvPPA potentially affects access to compensatory mechanisms involving right hemisphere homologues. Measures of silences during reading also discriminated the PPA and control groups, but did not increase predictive accuracy. Findings suggest that a measure of relative vowel duration from of a polysyllable word repetition task

  9. Acoustic temporal modulation detection and speech perception in cochlear implant listeners.

    PubMed

    Won, Jong Ho; Drennan, Ward R; Nie, Kaibao; Jameyson, Elyse M; Rubinstein, Jay T

    2011-07-01

    The goals of the present study were to measure acoustic temporal modulation transfer functions (TMTFs) in cochlear implant listeners and examine the relationship between modulation detection and speech recognition abilities. The effects of automatic gain control, presentation level and number of channels on modulation detection thresholds (MDTs) were examined using the listeners' clinical sound processor. The general form of the TMTF was low-pass, consistent with previous studies. The operation of automatic gain control had no effect on MDTs when the stimuli were presented at 65 dBA. MDTs were not dependent on the presentation levels (ranging from 50 to 75 dBA) nor on the number of channels. Significant correlations were found between MDTs and speech recognition scores. The rates of decay of the TMTFs were predictive of speech recognition abilities. Spectral-ripple discrimination was evaluated to examine the relationship between temporal and spectral envelope sensitivities. No correlations were found between the two measures, and 56% of the variance in speech recognition was predicted jointly by the two tasks. The present study suggests that temporal modulation detection measured with the sound processor can serve as a useful measure of the ability of clinical sound processing strategies to deliver clinically pertinent temporal information.

  10. The role of metrical information in apraxia of speech. Perceptual and acoustic analyses of word stress.

    PubMed

    Aichert, Ingrid; Späth, Mona; Ziegler, Wolfram

    2016-02-01

    Several factors are known to influence speech accuracy in patients with apraxia of speech (AOS), e.g., syllable structure or word length. However, the impact of word stress has largely been neglected so far. More generally, the role of prosodic information at the phonetic encoding stage of speech production often remains unconsidered in models of speech production. This study aimed to investigate the influence of word stress on error production in AOS. Two-syllabic words with stress on the first (trochees) vs. the second syllable (iambs) were compared in 14 patients with AOS, three of them exhibiting pure AOS, and in a control group of six normal speakers. The patients produced significantly more errors on iambic than on trochaic words. A most prominent metrical effect was obtained for segmental errors. Acoustic analyses of word durations revealed a disproportionate advantage of the trochaic meter in the patients relative to the healthy controls. The results indicate that German apraxic speakers are sensitive to metrical information. It is assumed that metrical patterns function as prosodic frames for articulation planning, and that the regular metrical pattern in German, the trochaic form, has a facilitating effect on word production in patients with AOS.

  11. A Bayesian view on acoustic model-based techniques for robust speech recognition

    NASA Astrophysics Data System (ADS)

    Maas, Roland; Huemmer, Christian; Sehr, Armin; Kellermann, Walter

    2015-12-01

    This article provides a unifying Bayesian view on various approaches for acoustic model adaptation, missing feature, and uncertainty decoding that are well-known in the literature of robust automatic speech recognition. The representatives of these classes can often be deduced from a Bayesian network that extends the conventional hidden Markov models used in speech recognition. These extensions, in turn, can in many cases be motivated from an underlying observation model that relates clean and distorted feature vectors. By identifying and converting the observation models into a Bayesian network representation, we formulate the corresponding compensation rules. We thus summarize the various approaches as approximations or modifications of the same Bayesian decoding rule leading to a unified view on known derivations as well as to new formulations for certain approaches.

  12. The effects of selective attention and speech acoustics on neural speech-tracking in a multi-talker scene.

    PubMed

    Rimmele, Johanna M; Zion Golumbic, Elana; Schröger, Erich; Poeppel, David

    2015-07-01

    Attending to one speaker in multi-speaker situations is challenging. One neural mechanism proposed to underlie the ability to attend to a particular speaker is phase-locking of low-frequency activity in auditory cortex to speech's temporal envelope ("speech-tracking"), which is more precise for attended speech. However, it is not known what brings about this attentional effect, and specifically if it reflects enhanced processing of the fine structure of attended speech. To investigate this question we compared attentional effects on speech-tracking of natural versus vocoded speech which preserves the temporal envelope but removes the fine structure of speech. Pairs of natural and vocoded speech stimuli were presented concurrently and participants attended to one stimulus and performed a detection task while ignoring the other stimulus. We recorded magnetoencephalography (MEG) and compared attentional effects on the speech-tracking response in auditory cortex. Speech-tracking of natural, but not vocoded, speech was enhanced by attention, whereas neural tracking of ignored speech was similar for natural and vocoded speech. These findings suggest that the more precise speech-tracking of attended natural speech is related to processing its fine structure, possibly reflecting the application of higher-order linguistic processes. In contrast, when speech is unattended its fine structure is not processed to the same degree and thus elicits less precise speech-tracking more similar to vocoded speech.

  13. Acoustic measurements through analysis of binaural recordings of speech and music

    NASA Astrophysics Data System (ADS)

    Griesinger, David

    2004-10-01

    This paper will present and demonstrate some recent work on the measurement of acoustic properties from binaural recordings of live performances. It is found that models of the process of stream formation can be used to measure intelligibility, and, when combined with band-limited running cross-correlation, can be used to measure spaciousness and envelopment. Analysis of the running cross correlation during sound onsets can be used to measure the accuracy of azimuth perception. It is additionally found that the ease of detecting fundamental pitch from the upper partials of speech and music can be used as a measure of sound quality, particularly for solo instruments and singers.

  14. Temporal acoustic measures distinguish primary progressive apraxia of speech from primary progressive aphasia.

    PubMed

    Duffy, Joseph R; Hanley, Holly; Utianski, Rene; Clark, Heather; Strand, Edythe; Josephs, Keith A; Whitwell, Jennifer L

    2017-02-07

    The purpose of this study was to determine if acoustic measures of duration and syllable rate during word and sentence repetition, and a measure of within-word lexical stress, distinguish speakers with primary progressive apraxia of speech (PPAOS) from nonapraxic speakers with the agrammatic or logopenic variants of primary progressive aphasia (PPA), and control speakers. Results revealed that the PPAOS group had longer durations and reduced rate of syllable production for most words and sentences, and the measure of lexical stress. Sensitivity and specificity indices for the PPAOS versus the other groups were highest for longer multisyllabic words and sentences. For the PPAOS group, correlations between acoustic measures and perceptual ratings of AOS were moderately high to high. Several temporal measures used in this study may aid differential diagnosis and help quantify features of PPAOS that are distinct from those associated with PPA in which AOS is not present.

  15. The effects of selective attention and speech acoustics on neural speech-tracking in a multi-talker scene

    PubMed Central

    Rimmele, Johanna M.; Golumbic, Elana Zion; Schröger, Erich; Poeppel, David

    2015-01-01

    Attending to one speaker in multi-speaker situations is challenging. One neural mechanism proposed to underlie the ability to attend to a particular speaker is phase-locking of low-frequency activity in auditory cortex to speech’s temporal envelope (“speech-tracking”), which is more precise for attended speech. However, it is not known what brings about this attentional effect, and specifically if it reflects enhanced processing of the fine structure of attended speech. To investigate this question we compared attentional effects on speech-tracking of natural vs. vocoded speech which preserves the temporal envelope but removes the fine-structure of speech. Pairs of natural and vocoded speech stimuli were presented concurrently and participants attended to one stimulus and performed a detection task while ignoring the other stimulus. We recorded magnetoencephalography (MEG) and compared attentional effects on the speech-tracking response in auditory cortex. Speech-tracking of natural, but not vocoded, speech was enhanced by attention, whereas neural tracking of ignored speech was similar for natural and vocoded speech. These findings suggest that the more precise speech tracking of attended natural speech is related to processing its fine structure, possibly reflecting the application of higher-order linguistic processes. In contrast, when speech is unattended its fine structure is not processed to the same degree and thus elicits less precise speech tracking more similar to vocoded speech. PMID:25650107

  16. A physiologically-inspired model reproducing the speech intelligibility benefit in cochlear implant listeners with residual acoustic hearing.

    PubMed

    Zamaninezhad, Ladan; Hohmann, Volker; Büchner, Andreas; Schädler, Marc René; Jürgens, Tim

    2017-02-01

    This study introduces a speech intelligibility model for cochlear implant users with ipsilateral preserved acoustic hearing that aims at simulating the observed speech-in-noise intelligibility benefit when receiving simultaneous electric and acoustic stimulation (EA-benefit). The model simulates the auditory nerve spiking in response to electric and/or acoustic stimulation. The temporally and spatially integrated spiking patterns were used as the final internal representation of noisy speech. Speech reception thresholds (SRTs) in stationary noise were predicted for a sentence test using an automatic speech recognition framework. The model was employed to systematically investigate the effect of three physiologically relevant model factors on simulated SRTs: (1) the spatial spread of the electric field which co-varies with the number of electrically stimulated auditory nerves, (2) the "internal" noise simulating the deprivation of auditory system, and (3) the upper bound frequency limit of acoustic hearing. The model results show that the simulated SRTs increase monotonically with increasing spatial spread for fixed internal noise, and also increase with increasing the internal noise strength for a fixed spatial spread. The predicted EA-benefit does not follow such a systematic trend and depends on the specific combination of the model parameters. Beyond 300 Hz, the upper bound limit for preserved acoustic hearing is less influential on speech intelligibility of EA-listeners in stationary noise. The proposed model-predicted EA-benefits are within the range of EA-benefits shown by 18 out of 21 actual cochlear implant listeners with preserved acoustic hearing.

  17. Depression Diagnoses and Fundamental Frequency-Based Acoustic Cues in Maternal Infant-Directed Speech

    ERIC Educational Resources Information Center

    Porritt, Laura L.; Zinser, Michael C.; Bachorowski, Jo-Anne; Kaplan, Peter S.

    2014-01-01

    F[subscript 0]-based acoustic measures were extracted from a brief, sentence-final target word spoken during structured play interactions between mothers and their 3- to 14-month-old infants and were analyzed based on demographic variables and DSM-IV Axis-I clinical diagnoses and their common modifiers. F[subscript 0] range (?F[subscript 0]) was…

  18. Acoustic Context Alters Vowel Categorization in Perception of Noise-Vocoded Speech.

    PubMed

    Stilp, Christian E

    2017-03-09

    Normal-hearing listeners' speech perception is widely influenced by spectral contrast effects (SCEs), where perception of a given sound is biased away from stable spectral properties of preceding sounds. Despite this influence, it is not clear how these contrast effects affect speech perception for cochlear implant (CI) users whose spectral resolution is notoriously poor. This knowledge is important for understanding how CIs might better encode key spectral properties of the listening environment. Here, SCEs were measured in normal-hearing listeners using noise-vocoded speech to simulate poor spectral resolution. Listeners heard a noise-vocoded sentence where low-F1 (100-400 Hz) or high-F1 (550-850 Hz) frequency regions were amplified to encourage "eh" (/ɛ/) or "ih" (/ɪ/) responses to the following target vowel, respectively. This was done by filtering with +20 dB (experiment 1a) or +5 dB gain (experiment 1b) or filtering using 100 % of the difference between spectral envelopes of /ɛ/ and /ɪ/ endpoint vowels (experiment 2a) or only 25 % of this difference (experiment 2b). SCEs influenced identification of noise-vocoded vowels in each experiment at every level of spectral resolution. In every case but one, SCE magnitudes exceeded those reported for full-spectrum speech, particularly when spectral peaks in the preceding sentence were large (+20 dB gain, 100 % of the spectral envelope difference). Even when spectral resolution was insufficient for accurate vowel recognition, SCEs were still evident. Results are suggestive of SCEs influencing CI users' speech perception as well, encouraging further investigation of CI users' sensitivity to acoustic context.

  19. Prosodic Influences on Speech Production in Children with Specific Language Impairment and Speech Deficits: Kinematic, Acoustic, and Transcription Evidence.

    ERIC Educational Resources Information Center

    Goffman, Lisa

    1999-01-01

    In this study, seven children with specific language impairment (SLI) and speech deficits were matched with same age peers and evaluated for iambic (weak-strong) and trochaic (strong-weak) prosodic speech forms. Findings indicated that children with SLI and speech deficits show less mature segmental and speech motor systems, as well as decreased…

  20. Modified acoustic transmission tube apparatus incorporating an active downstream termination.

    PubMed

    Machuca-Tzili, F Arturo; Orduña-Bustamante, Felipe; Pérez-López, Antonio; Pérez-Ruiz, Santiago J; Pérez-Matzumoto, Andrés E

    2017-02-01

    Current techniques for measuring normal incidence sound transmission loss with a modified impedance tube, or transmission tube, require setting up two different absorbing termination loads at the end of the downstream tube [ASTM E2611-09, Standard Test Method for Measurement of Normal Incidence Sound Transmission of Acoustical Materials Based on the Transfer Matrix Method (American Society for Testing and Materials, West Conshohocken, 2009)]. The process of physically handling the two required passive absorbing loads is a possible source of measurement errors, which are mainly due to changes in sample test position, or in test setup re-assembly, between measurements. In this paper, a modified transmission tube apparatus is proposed for non-intrusively changing the downstream acoustic load by means of a combined passive-active termination. It provides a controlled variable sound absorption which simplifies the setup of standard two-load techniques, without the need of physically handling the apparatus during the tests. This virtually eliminates the risk of errors associated with the physical manipulation of the two passive terminations. Transmission loss measurements in some representative test conditions are reported, showing improvements over current implementations, in reducing by approximately 50% the measurement variations associated with the setup of the two required absorbing terminations. Measurement results agree within 0.4 dB (maximum difference in high resolution broadband), and 0.04 dB (mean difference in 1/3-octave bands), with those obtained using standard passive two-load methods.

  1. Acoustic evaluation of short-term effects of repetitive transcranial magnetic stimulation on motor aspects of speech in Parkinson's disease.

    PubMed

    Eliasova, I; Mekyska, J; Kostalova, M; Marecek, R; Smekal, Z; Rektorova, I

    2013-04-01

    Hypokinetic dysarthria in Parkinson's disease (PD) can be characterized by monotony of pitch and loudness, reduced stress, variable rate, imprecise consonants, and a breathy and harsh voice. Using acoustic analysis, we studied the effects of high-frequency repetitive transcranial magnetic stimulation (rTMS) applied over the primary orofacial sensorimotor area (SM1) and the left dorsolateral prefrontal cortex (DLPFC) on motor aspects of voiced speech in PD. Twelve non-depressed and non-demented men with PD (mean age 64.58 ± 8.04 years, mean PD duration 10.75 ± 7.48 years) and 21 healthy age-matched men (a control group, mean age 64 ± 8.55 years) participated in the speech study. The PD patients underwent two sessions of 10 Hz rTMS over the dominant hemisphere with 2,250 stimuli/day in a random order: (1) over the SM1; (2) over the left DLPFC in the "on" motor state. Speech examination comprised the perceptual rating of global speech performance and an acoustic analysis based upon a standardized speech task. The Mann-Whitney U test was used to compare acoustic speech variables between controls and PD patients. The Wilcoxon test was used to compare data prior to and after each stimulation in the PD group. rTMS applied over the left SM1 was associated with a significant increase in harmonic-to-noise ratio and net speech rate in the sentence tasks. With respect to the vowel task results, increased median values and range of Teager-Kaiser energy operator, increased vowel space area, and significant jitter decrease were observed after the left SM1 stimulation. rTMS over the left DLPFC did not induce any significant effects. The positive results of acoustic analysis were not reflected in a subjective rating of speech performance quality as assessed by a speech therapist. Our pilot results indicate that one session of rTMS applied over the SM1 may lead to measurable improvement in voice quality and intensity and an increase in speech rate and tongue movements

  2. The effect of different cochlear implant microphones on acoustic hearing individuals’ binaural benefits for speech perception in noise

    PubMed Central

    Aronoff, Justin M.; Freed, Daniel J.; Fisher, Laurel M.; Pal, Ivan; Soli, Sigfrid D.

    2011-01-01

    Objectives Cochlear implant microphones differ in placement, frequency response, and other characteristics such as whether they are directional. Although normal hearing individuals are often used as controls in studies examining cochlear implant users’ binaural benefits, the considerable differences across cochlear implant microphones make such comparisons potentially misleading. The goal of this study was to examine binaural benefits for speech perception in noise for normal hearing individuals using stimuli processed by head-related transfer functions (HRTFs) based on the different cochlear implant microphones. Design HRTFs were created for different cochlear implant microphones and used to test participants on the Hearing in Noise Test. Experiment 1 tested cochlear implant users and normal hearing individuals with HRTF-processed stimuli and with sound field testing to determine whether the HRTFs adequately simulated sound field testing. Experiment 2 determined the measurement error and performance-intensity function for the Hearing in Noise Test with normal hearing individuals listening to stimuli processed with the various HRTFs. Experiment 3 compared normal hearing listeners’ performance across HRTFs to determine how the HRTFs affected performance. Experiment 4 evaluated binaural benefits for normal hearing listeners using the various HRTFs, including ones that were modified to investigate the contributions of interaural time and level cues. Results The results indicated that the HRTFs adequately simulated sound field testing for the Hearing in Noise Test. They also demonstrated that the test-retest reliability and performance-intensity function were consistent across HRTFs, and that the measurement error for the test was 1.3 dB, with a change in signal-to-noise ratio of 1 dB reflecting a 10% change in intelligibility. There were significant differences in performance when using the various HRTFs, with particularly good thresholds for the HRTF based on the

  3. A Frame-Based Context-Dependent Acoustic Modeling for Speech Recognition

    NASA Astrophysics Data System (ADS)

    Terashima, Ryuta; Zen, Heiga; Nankaku, Yoshihiko; Tokuda, Keiichi

    We propose a novel acoustic model for speech recognition, named FCD (Frame-based Context Dependent) model. It can obtain a probability distribution by using a top-down clustering technique to simultaneously consider the local frame position in phoneme, phoneme duration, and phoneme context. The model topology is derived from connecting left-to-right HMM models without self-loop transition for each phoneme duration. Because the FCD model can change the probability distribution into a sequence corresponding with one phoneme duration, it can has the ability to generate a smooth trajectory of speech feature vector. We also performed an experiment to evaluate the performance of speech recognition for the model. In the experiment, 132 questions for frame position, 66 questions for phoneme duration and 134 questions for phoneme context were used to train the sub-phoneme FCD model. In order to compare the performance, left-to-right HMM and two types of HSMM models with almost same number of states were also trained. As a result, 18% of relative improvement of tri-phone accuracy was achieved by the FCD model.

  4. The effect of intertalker speech rate variation on acoustic vowel space.

    PubMed

    Tsao, Ying-Chiao; Weismer, Gary; Iqbal, Kamran

    2006-02-01

    The present study aimed to examine the size of the acoustic vowel space in talkers who had previously been identified as having slow and fast habitual speaking rates [Tsao, Y.-C. and Weismer, G. (1997) J. Speech Lang. Hear. Res. 40, 858-866]. Within talkers, it is fairly well known that faster speaking rates result in a compression of the vowel space relative to that measured for slower rates, so the current study was completed to determine if the same differences in the size of the vowel space occur across talkers who differ significantly in their habitual speaking rates. Results indicated that there was no difference in the average size of the vowel space for slow vs fast talkers, and no relationship across talkers between vowel duration and formant frequencies. One difference between the slow and fast talkers was in intertalker variability of the vowel spaces, which was clearly greater for the slow talkers, for both speaker sexes. Results are discussed relative to theories of speech production and vowel normalization in speech perception.

  5. Discrimination of speech stimuli based on neuronal response phase patterns depends on acoustics but not comprehension.

    PubMed

    Howard, Mary F; Poeppel, David

    2010-11-01

    Speech stimuli give rise to neural activity in the listener that can be observed as waveforms using magnetoencephalography. Although waveforms vary greatly from trial to trial due to activity unrelated to the stimulus, it has been demonstrated that spoken sentences can be discriminated based on theta-band (3-7 Hz) phase patterns in single-trial response waveforms. Furthermore, manipulations of the speech signal envelope and fine structure that reduced intelligibility were found to produce correlated reductions in discrimination performance, suggesting a relationship between theta-band phase patterns and speech comprehension. This study investigates the nature of this relationship, hypothesizing that theta-band phase patterns primarily reflect cortical processing of low-frequency (<40 Hz) modulations present in the acoustic signal and required for intelligibility, rather than processing exclusively related to comprehension (e.g., lexical, syntactic, semantic). Using stimuli that are quite similar to normal spoken sentences in terms of low-frequency modulation characteristics but are unintelligible (i.e., their time-inverted counterparts), we find that discrimination performance based on theta-band phase patterns is equal for both types of stimuli. Consistent with earlier findings, we also observe that whereas theta-band phase patterns differ across stimuli, power patterns do not. We use a simulation model of the single-trial response to spoken sentence stimuli to demonstrate that phase-locked responses to low-frequency modulations of the acoustic signal can account not only for the phase but also for the power results. The simulation offers insight into the interpretation of the empirical results with respect to phase-resetting and power-enhancement models of the evoked response.

  6. Discrimination of Speech Stimuli Based on Neuronal Response Phase Patterns Depends on Acoustics But Not Comprehension

    PubMed Central

    Poeppel, David

    2010-01-01

    Speech stimuli give rise to neural activity in the listener that can be observed as waveforms using magnetoencephalography. Although waveforms vary greatly from trial to trial due to activity unrelated to the stimulus, it has been demonstrated that spoken sentences can be discriminated based on theta-band (3–7 Hz) phase patterns in single-trial response waveforms. Furthermore, manipulations of the speech signal envelope and fine structure that reduced intelligibility were found to produce correlated reductions in discrimination performance, suggesting a relationship between theta-band phase patterns and speech comprehension. This study investigates the nature of this relationship, hypothesizing that theta-band phase patterns primarily reflect cortical processing of low-frequency (<40 Hz) modulations present in the acoustic signal and required for intelligibility, rather than processing exclusively related to comprehension (e.g., lexical, syntactic, semantic). Using stimuli that are quite similar to normal spoken sentences in terms of low-frequency modulation characteristics but are unintelligible (i.e., their time-inverted counterparts), we find that discrimination performance based on theta-band phase patterns is equal for both types of stimuli. Consistent with earlier findings, we also observe that whereas theta-band phase patterns differ across stimuli, power patterns do not. We use a simulation model of the single-trial response to spoken sentence stimuli to demonstrate that phase-locked responses to low-frequency modulations of the acoustic signal can account not only for the phase but also for the power results. The simulation offers insight into the interpretation of the empirical results with respect to phase-resetting and power-enhancement models of the evoked response. PMID:20484530

  7. Acoustic diagnosis of pulmonary hypertension: automated speech- recognition-inspired classification algorithm outperforms physicians

    PubMed Central

    Kaddoura, Tarek; Vadlamudi, Karunakar; Kumar, Shine; Bobhate, Prashant; Guo, Long; Jain, Shreepal; Elgendi, Mohamed; Coe, James Y; Kim, Daniel; Taylor, Dylan; Tymchak, Wayne; Schuurmans, Dale; Zemp, Roger J.; Adatia, Ian

    2016-01-01

    We hypothesized that an automated speech- recognition-inspired classification algorithm could differentiate between the heart sounds in subjects with and without pulmonary hypertension (PH) and outperform physicians. Heart sounds, electrocardiograms, and mean pulmonary artery pressures (mPAp) were recorded simultaneously. Heart sound recordings were digitized to train and test speech-recognition-inspired classification algorithms. We used mel-frequency cepstral coefficients to extract features from the heart sounds. Gaussian-mixture models classified the features as PH (mPAp ≥ 25 mmHg) or normal (mPAp < 25 mmHg). Physicians blinded to patient data listened to the same heart sound recordings and attempted a diagnosis. We studied 164 subjects: 86 with mPAp ≥ 25 mmHg (mPAp 41 ± 12 mmHg) and 78 with mPAp < 25 mmHg (mPAp 17 ± 5 mmHg) (p  < 0.005). The correct diagnostic rate of the automated speech-recognition-inspired algorithm was 74% compared to 56% by physicians (p = 0.005). The false positive rate for the algorithm was 34% versus 50% (p = 0.04) for clinicians. The false negative rate for the algorithm was 23% and 68% (p = 0.0002) for physicians. We developed an automated speech-recognition-inspired classification algorithm for the acoustic diagnosis of PH that outperforms physicians that could be used to screen for PH and encourage earlier specialist referral. PMID:27609672

  8. Acoustic diagnosis of pulmonary hypertension: automated speech- recognition-inspired classification algorithm outperforms physicians

    NASA Astrophysics Data System (ADS)

    Kaddoura, Tarek; Vadlamudi, Karunakar; Kumar, Shine; Bobhate, Prashant; Guo, Long; Jain, Shreepal; Elgendi, Mohamed; Coe, James Y.; Kim, Daniel; Taylor, Dylan; Tymchak, Wayne; Schuurmans, Dale; Zemp, Roger J.; Adatia, Ian

    2016-09-01

    We hypothesized that an automated speech- recognition-inspired classification algorithm could differentiate between the heart sounds in subjects with and without pulmonary hypertension (PH) and outperform physicians. Heart sounds, electrocardiograms, and mean pulmonary artery pressures (mPAp) were recorded simultaneously. Heart sound recordings were digitized to train and test speech-recognition-inspired classification algorithms. We used mel-frequency cepstral coefficients to extract features from the heart sounds. Gaussian-mixture models classified the features as PH (mPAp ≥ 25 mmHg) or normal (mPAp < 25 mmHg). Physicians blinded to patient data listened to the same heart sound recordings and attempted a diagnosis. We studied 164 subjects: 86 with mPAp ≥ 25 mmHg (mPAp 41 ± 12 mmHg) and 78 with mPAp < 25 mmHg (mPAp 17 ± 5 mmHg) (p  < 0.005). The correct diagnostic rate of the automated speech-recognition-inspired algorithm was 74% compared to 56% by physicians (p = 0.005). The false positive rate for the algorithm was 34% versus 50% (p = 0.04) for clinicians. The false negative rate for the algorithm was 23% and 68% (p = 0.0002) for physicians. We developed an automated speech-recognition-inspired classification algorithm for the acoustic diagnosis of PH that outperforms physicians that could be used to screen for PH and encourage earlier specialist referral.

  9. Transient Auditory Storage of Acoustic Details Is Associated with Release of Speech from Informational Masking in Reverberant Conditions

    ERIC Educational Resources Information Center

    Huang, Ying; Huang, Qiang; Chen, Xun; Wu, Xihong; Li, Liang

    2009-01-01

    Perceptual integration of the sound directly emanating from the source with reflections needs both temporal storage and correlation computation of acoustic details. We examined whether the temporal storage is frequency dependent and associated with speech unmasking. In Experiment 1, a break in correlation (BIC) between interaurally correlated…

  10. Effect of Digital Frequency Compression (DFC) on Speech Recognition in Candidates for Combined Electric and Acoustic Stimulation (EAS)

    ERIC Educational Resources Information Center

    Gifford, Rene H.; Dorman, Michael F.; Spahr, Anthony J.; McKarns, Sharon A.

    2007-01-01

    Purpose: To compare the effects of conventional amplification (CA) and digital frequency compression (DFC) amplification on the speech recognition abilities of candidates for a partial-insertion cochlear implant, that is, candidates for combined electric and acoustic stimulation (EAS). Method: The participants were 6 patients whose audiometric…

  11. Comments on "Effects of Noise on Speech Production: Acoustic and Perceptual Analyses" [J. Acoust. Soc. Am. 84, 917-928 (1988)].

    PubMed

    Fitch, H

    1989-11-01

    The effect of background noise on speech production is an important issue, both from the practical standpoint of developing speech recognition algorithms and from the theoretical standpoint of understanding how speech is tuned to the environment in which it is spoken. Summers et al. [J. Acoust. Soc. Am. 84, 917-928 (1988]) address this issue by experimentally manipulating the level of noise delivered through headphones to two talkers and making several kinds of acoustic measurements on the resulting speech. They indicate that they have replicated effects on amplitude, duration, and pitch and have found effects on spectral tilt and first-formant frequency (F1). The authors regard these acoustic changes as effects in themselves rather than as consequences of a change in vocal effort, and thus treat equally the change in spectral tilt and the change in F1. In fact, the change in spectral tilt is a well-documented and understood consequence of the change in the glottal waveform, which is known to occur with increased effort. The situation with F1 is less clear and is made difficult by measurement problems. The bias in linear predictive coding (LPC) techniques related to two of the other changes-fundamental frequency and spectral tilt-is discussed.

  12. The sound of motion in spoken language: visual information conveyed by acoustic properties of speech.

    PubMed

    Shintel, Hadas; Nusbaum, Howard C

    2007-12-01

    Language is generally viewed as conveying information through symbols whose form is arbitrarily related to their meaning. This arbitrary relation is often assumed to also characterize the mental representations underlying language comprehension. We explore the idea that visuo-spatial information can be analogically conveyed through acoustic properties of speech and that such information is integrated into an analog perceptual representation as a natural part of comprehension. Listeners heard sentences describing objects, spoken at varying speaking rates. After each sentence, participants saw a picture of an object and judged whether it had been mentioned in the sentence. Participants were faster to recognize the object when motion implied by speaking rate matched the motion implied by the picture. Results suggest that visuo-spatial referential information can be analogically conveyed and represented.

  13. Recognition of Emotions in Mexican Spanish Speech: An Approach Based on Acoustic Modelling of Emotion-Specific Vowels

    PubMed Central

    Caballero-Morales, Santiago-Omar

    2013-01-01

    An approach for the recognition of emotions in speech is presented. The target language is Mexican Spanish, and for this purpose a speech database was created. The approach consists in the phoneme acoustic modelling of emotion-specific vowels. For this, a standard phoneme-based Automatic Speech Recognition (ASR) system was built with Hidden Markov Models (HMMs), where different phoneme HMMs were built for the consonants and emotion-specific vowels associated with four emotional states (anger, happiness, neutral, sadness). Then, estimation of the emotional state from a spoken sentence is performed by counting the number of emotion-specific vowels found in the ASR's output for the sentence. With this approach, accuracy of 87–100% was achieved for the recognition of emotional state of Mexican Spanish speech. PMID:23935410

  14. Recognition of emotions in Mexican Spanish speech: an approach based on acoustic modelling of emotion-specific vowels.

    PubMed

    Caballero-Morales, Santiago-Omar

    2013-01-01

    An approach for the recognition of emotions in speech is presented. The target language is Mexican Spanish, and for this purpose a speech database was created. The approach consists in the phoneme acoustic modelling of emotion-specific vowels. For this, a standard phoneme-based Automatic Speech Recognition (ASR) system was built with Hidden Markov Models (HMMs), where different phoneme HMMs were built for the consonants and emotion-specific vowels associated with four emotional states (anger, happiness, neutral, sadness). Then, estimation of the emotional state from a spoken sentence is performed by counting the number of emotion-specific vowels found in the ASR's output for the sentence. With this approach, accuracy of 87-100% was achieved for the recognition of emotional state of Mexican Spanish speech.

  15. Statistical evidence that musical universals derive from the acoustic characteristics of human speech

    NASA Astrophysics Data System (ADS)

    Schwartz, David; Howe, Catharine; Purves, Dale

    2003-04-01

    Listeners of all ages and societies produce a similar consonance ordering of chromatic scale tone combinations. Despite intense interest in this perceptual phenomenon over several millennia, it has no generally accepted explanation in physical, psychological, or physiological terms. Here we show that the musical universal of consonance ordering can be understood in terms of the statistical relationship between a pattern of sound pressure at the ear and the possible generative sources of the acoustic energy pattern. Since human speech is the principal naturally occurring source of tone-evoking (i.e., periodic) sound energy for human listeners, we obtained normalized spectra from more than 100000 recorded speech segments. The probability distribution of amplitude/frequency combinations derived from these spectra predicts both the fundamental frequency ratios that define the chromatic scale intervals and the consonance ordering of chromatic scale tone combinations. We suggest that these observations reveal the statistical character of the perceptual process by which the auditory system guides biologically successful behavior in response to inherently ambiguous sound stimuli.

  16. Correlation of orofacial speeds with voice acoustic measures in the fluent speech of persons who stutter.

    PubMed

    McClean, Michael D; Tasko, Stephen M

    2004-12-01

    Stuttering is often viewed as a problem in coordinating the movements of different muscle systems involved in speech production. From this perspective, it is logical that efforts be made to quantify and compare the strength of neural coupling between muscle systems in persons who stutter (PS) and those who do not stutter (NS). This problem was addressed by correlating the speeds of different orofacial structures with vowel fundamental frequency (F0) and intensity as subjects produced fluent repetitions of a simple nonsense phrase at habitual, high, and low intensity levels. It is assumed that resulting correlations indirectly reflect the strength of neural coupling between particular orofacial structures and the respiratory-laryngeal system. An electromagnetic system was employed to record movements of the upper lip, lower lip, tongue, and jaw in 43 NS and 39 PS. The acoustic speech signal was recorded and used to obtain measures of vowel F0 and intensity. For each subject, correlation measures were obtained relating peak orofacial speeds to F0 and intensity. Correlations were significantly reduced in PS compared to NS for the lower lip and tongue, although the magnitude of these group differences covaried with the correlation levels relating F0 and intensity. It is suggested that the group difference in correlation pattern reflects a reduced strength of neural coupling of the lower lip and tongue systems to the respiratory-laryngeal system in PS. Consideration is given to how this may contribute to temporal discoordination and stuttering.

  17. Objective assessment of tracheoesophageal and esophageal speech using acoustic analysis of voice.

    PubMed

    Sirić, Ljiljana; Sos, Dario; Rosso, Marinela; Stevanović, Sinisa

    2012-11-01

    The aim of this study was to analyze the voice quality of alaryngeal tracheoesophageal and esophageal speech, and to determine which of them is more similar to laryngeal voice production, and thus more acceptable as a rehabilitation method of laryngectomized persons. Objective voice evaluation was performed on a sample of 20 totally laryngectomized subjects of both sexes, average age 61.3 years. Subjects were divided into two groups: 10 (50%) respondents with built tracheoesophageal prosthesis and 10 (50%) who acquired esophageal speech. Testing included 6 variables: 5 parameters of acoustic analysis of voice and one parameter of aerodynamic measurements. The obtained data was statistically analyzed by analysis of variance. Analysis of the data showed a statistically significant difference between the two groups in the terms of intensity, fundamental frequency and maximum phonation time of vowel at a significance level of 5% and confidence interval of 95%. A statistically significant difference was not found between the values of jitter, shimmer, and harmonic-to-noise ratio between tracheoesophageal and esophageal voice. There is no ideal method of rehabilitation and every one of them requires an individual approach to the patient, but the results shows the advantages of rehabilitation by means of installing voice prosthesis.

  18. Assessing the Treatment Effects in Apraxia of Speech: Introduction and Evaluation of the Modified Diadochokinesis Test

    ERIC Educational Resources Information Center

    Hurkmans, Joost; Jonkers, Roel; Boonstra, Anne M.; Stewart, Roy E.; Reinders-Messelink, Heleen A.

    2012-01-01

    Background: The number of reliable and valid instruments to measure the effects of therapy in apraxia of speech (AoS) is limited. Aims: To evaluate the newly developed Modified Diadochokinesis Test (MDT), which is a task to assess the effects of rate and rhythm therapies for AoS in a multiple baseline across behaviours design. Methods: The…

  19. Elicitation of the Acoustic Change Complex to Long-Duration Speech Stimuli in Four-Month-Old Infants

    PubMed Central

    Chen, Ke Heng; Small, Susan A.

    2015-01-01

    The acoustic change complex (ACC) is an auditory-evoked potential elicited to changes within an ongoing stimulus that indicates discrimination at the level of the auditory cortex. Only a few studies to date have attempted to record ACCs in young infants. The purpose of the present study was to investigate the elicitation of ACCs to long-duration speech stimuli in English-learning 4-month-old infants. ACCs were elicited to consonant contrasts made up of two concatenated speech tokens. The stimuli included native dental-dental /dada/ and dental-labial /daba/ contrasts and a nonnative Hindi dental-retroflex /daDa/ contrast. Each consonant-vowel speech token was 410 ms in duration. Slow cortical responses were recorded to the onset of the stimulus and to the acoustic change from /da/ to either /ba/ or /Da/ within the stimulus with significantly prolonged latencies compared with adults. ACCs were reliably elicited for all stimulus conditions with more robust morphology compared with our previous findings using stimuli that were shorter in duration. The P1 amplitudes elicited to the acoustic change in /daba/ and /daDa/ were significantly larger compared to /dada/ supporting that the brain discriminated between the speech tokens. These findings provide further evidence for the use of ACCs as an index of discrimination ability. PMID:26798343

  20. Elicitation of the Acoustic Change Complex to Long-Duration Speech Stimuli in Four-Month-Old Infants.

    PubMed

    Chen, Ke Heng; Small, Susan A

    2015-01-01

    The acoustic change complex (ACC) is an auditory-evoked potential elicited to changes within an ongoing stimulus that indicates discrimination at the level of the auditory cortex. Only a few studies to date have attempted to record ACCs in young infants. The purpose of the present study was to investigate the elicitation of ACCs to long-duration speech stimuli in English-learning 4-month-old infants. ACCs were elicited to consonant contrasts made up of two concatenated speech tokens. The stimuli included native dental-dental /dada/ and dental-labial /daba/ contrasts and a nonnative Hindi dental-retroflex /daDa/ contrast. Each consonant-vowel speech token was 410 ms in duration. Slow cortical responses were recorded to the onset of the stimulus and to the acoustic change from /da/ to either /ba/ or /Da/ within the stimulus with significantly prolonged latencies compared with adults. ACCs were reliably elicited for all stimulus conditions with more robust morphology compared with our previous findings using stimuli that were shorter in duration. The P1 amplitudes elicited to the acoustic change in /daba/ and /daDa/ were significantly larger compared to /dada/ supporting that the brain discriminated between the speech tokens. These findings provide further evidence for the use of ACCs as an index of discrimination ability.

  1. Acoustics in human communication: evolving ideas about the nature of speech.

    PubMed

    Cooper, F S

    1980-07-01

    This paper discusses changes in attitude toward the nature of speech during the past half century. After reviewing early views on the subject, it considers the role of speech spectrograms, speech articulation, speech perception, messages and computers, and the nature of fluent speech.

  2. Speaker height estimation from speech: Fusing spectral regression and statistical acoustic models.

    PubMed

    Hansen, John H L; Williams, Keri; Bořil, Hynek

    2015-08-01

    Estimating speaker height can assist in voice forensic analysis and provide additional side knowledge to benefit automatic speaker identification or acoustic model selection for automatic speech recognition. In this study, a statistical approach to height estimation that incorporates acoustic models within a non-uniform height bin width Gaussian mixture model structure as well as a formant analysis approach that employs linear regression on selected phones are presented. The accuracy and trade-offs of these systems are explored by examining the consistency of the results, location, and causes of error as well a combined fusion of the two systems using data from the TIMIT corpus. Open set testing is also presented using the Multi-session Audio Research Project corpus and publicly available YouTube audio to examine the effect of channel mismatch between training and testing data and provide a realistic open domain testing scenario. The proposed algorithms achieve a highly competitive performance to previously published literature. Although the different data partitioning in the literature and this study may prevent performance comparisons in absolute terms, the mean average error of 4.89 cm for males and 4.55 cm for females provided by the proposed algorithm on TIMIT utterances containing selected phones suggest a considerable estimation error decrease compared to past efforts.

  3. Speech timing and linguistic rhythm: on the acoustic bases of rhythm typologies.

    PubMed

    Rathcke, Tamara V; Smith, Rachel H

    2015-05-01

    Research into linguistic rhythm has been dominated by the idea that languages can be classified according to rhythmic templates, amenable to assessment by acoustic measures of vowel and consonant durations. This study tested predictions of two proposals explaining the bases of rhythmic typologies: the Rhythm Class Hypothesis which assumes that the templates arise from an extensive vs a limited use of durational contrasts, and the Control and Compensation Hypothesis which proposes that the templates are rooted in more vs less flexible speech production strategies. Temporal properties of segments, syllables and rhythmic feet were examined in two accents of British English, a "stress-timed" variety from Leeds, and a "syllable-timed" variety spoken by Panjabi-English bilinguals from Bradford. Rhythm metrics were calculated. A perception study confirmed that the speakers of the two varieties differed in their perceived rhythm. The results revealed that both typologies were informative in that to a certain degree, they predicted temporal patterns of the two varieties. None of the metrics tested was capable of adequately reflecting the temporal complexity found in the durational data. These findings contribute to the critical evaluation of the explanatory adequacy of rhythm metrics. Acoustic bases and limitations of the traditional rhythmic typologies are discussed.

  4. Can acoustic vowel space predict the habitual speech rate of the speaker?

    PubMed

    Tsao, Y-C; Iqbal, K

    2005-01-01

    This study aims to find whether the acoustic vowel space reflect the habitual speaking rate of the speaker. The vowel space is defined as the area of the quadrilateral formed by the four corner vowels (i.e.,/i/,/æ/,/u/,/α) in the F1F2- 2 plane. The study compares the acoustic vowel space in the speech of habitually slow and fast talkers and further analyzes them by gender. In addition to the measurement of vowel duration and midpoint frequencies of F1 and F2, the F1/F2 vowel space areas were measured and compared across speakers. The results indicate substantial overlap in vowel space area functions between slow and fast talkers, though the slow speakers were found to have larger vowel spaces. Furthermore, large variability in vowel space area functions was noted among interspeakers in each group. Both F1 and F2 formant frequencies were found to be gender sensitive in consistence with the existing data. No predictive relation between vowel duration and formant frequencies was observed among speakers.

  5. Intelligibility of normal speech I: Global and fine-grained acoustic-phonetic talker characteristics1,2

    PubMed Central

    Bradlow, Ann R.; Torretta, Gina M.; Pisoni, David B.

    2011-01-01

    This study used a multi-talker database containing intelligibility scores for 2000 sentences (20 talkers, 100 sentences), to identify talker-related correlates of speech intelligibility. We first investigated “global” talker characteristics (e.g., gender, F0 and speaking rate). Findings showed female talkers to be more intelligible as a group than male talkers. Additionally, we found a tendency for F0 range to correlate positively with higher speech intelligibility scores. However, F0 mean and speaking rate did not correlate with intelligibility. We then examined several fine-grained acoustic-phonetic talker-characteristics as correlates of overall intelligibility. We found that talkers with larger vowel spaces were generally more intelligible than talkers with reduced spaces. In investigating two cases of consistent listener errors (segment deletion and syllable affiliation), we found that these perceptual errors could be traced directly to detailed timing characteristics in the speech signal. Results suggest that a substantial portion of variability in normal speech intelligibility is traceable to specific acoustic-phonetic characteristics of the talker. Knowledge about these factors may be valuable for improving speech synthesis and recognition strategies, and for special populations (e.g., the hearing-impaired and second-language learners) who are particularly sensitive to intelligibility differences among talkers. PMID:21461127

  6. Design of acoustic beam aperture modifier using gradient-index phononic crystals

    NASA Astrophysics Data System (ADS)

    Lin, Sz-Chin Steven; Tittmann, Bernhard R.; Huang, Tony Jun

    2012-06-01

    This article reports the design concept of a novel acoustic beam aperture modifier using butt-jointed gradient-index phononic crystals (GRIN PCs) consisting of steel cylinders embedded in a homogeneous epoxy background. By gradually tuning the period of a GRIN PC, the propagating direction of acoustic waves can be continuously bent to follow a sinusoidal trajectory in the structure. The aperture of an acoustic beam can therefore be shrunk or expanded through change of the gradient refractive index profiles of the butt-jointed GRIN PCs. Our computational results elucidate the effectiveness of the proposed acoustic beam aperture modifier. Such an acoustic device can be fabricated through a simple process and will be valuable in applications, such as biomedical imaging and surgery, nondestructive evaluation, communication, and acoustic absorbers.

  7. Modified ion-acoustic solitary waves in plasmas with field-aligned shear flows

    SciTech Connect

    Saleem, H.; Haque, Q.

    2015-08-15

    The nonlinear dynamics of ion-acoustic waves is investigated in a plasma having field-aligned shear flow. A Korteweg-deVries-type nonlinear equation for a modified ion-acoustic wave is obtained which admits a single pulse soliton solution. The theoretical result has been applied to solar wind plasma at 1 AU for illustration.

  8. Acoustic Source Characteristics, Across-Formant Integration, and Speech Intelligibility Under Competitive Conditions

    PubMed Central

    2015-01-01

    An important aspect of speech perception is the ability to group or select formants using cues in the acoustic source characteristics—for example, fundamental frequency (F0) differences between formants promote their segregation. This study explored the role of more radical differences in source characteristics. Three-formant (F1+F2+F3) synthetic speech analogues were derived from natural sentences. In Experiment 1, F1+F3 were generated by passing a harmonic glottal source (F0 = 140 Hz) through second-order resonators (H1+H3); in Experiment 2, F1+F3 were tonal (sine-wave) analogues (T1+T3). F2 could take either form (H2 or T2). In some conditions, the target formants were presented alone, either monaurally or dichotically (left ear = F1+F3; right ear = F2). In others, they were accompanied by a competitor for F2 (F1+F2C+F3; F2), which listeners must reject to optimize recognition. Competitors (H2C or T2C) were created using the time-reversed frequency and amplitude contours of F2. Dichotic presentation of F2 and F2C ensured that the impact of the competitor arose primarily through informational masking. In the absence of F2C, the effect of a source mismatch between F1+F3 and F2 was relatively modest. When F2C was present, intelligibility was lowest when F2 was tonal and F2C was harmonic, irrespective of which type matched F1+F3. This finding suggests that source type and context, rather than similarity, govern the phonetic contribution of a formant. It is proposed that wideband harmonic analogues are more effective informational maskers than narrowband tonal analogues, and so become dominant in across-frequency integration of phonetic information when placed in competition. PMID:25751040

  9. Differential Effects of Visual-Acoustic Biofeedback Intervention for Residual Speech Errors

    PubMed Central

    McAllister Byun, Tara; Campbell, Heather

    2016-01-01

    Recent evidence suggests that the incorporation of visual biofeedback technologies may enhance response to treatment in individuals with residual speech errors. However, there is a need for controlled research systematically comparing biofeedback versus non-biofeedback intervention approaches. This study implemented a single-subject experimental design with a crossover component to investigate the relative efficacy of visual-acoustic biofeedback and traditional articulatory treatment for residual rhotic errors. Eleven child/adolescent participants received ten sessions of visual-acoustic biofeedback and 10 sessions of traditional treatment, with the order of biofeedback and traditional phases counterbalanced across participants. Probe measures eliciting untreated rhotic words were administered in at least three sessions prior to the start of treatment (baseline), between the two treatment phases (midpoint), and after treatment ended (maintenance), as well as before and after each treatment session. Perceptual accuracy of rhotic production was assessed by outside listeners in a blinded, randomized fashion. Results were analyzed using a combination of visual inspection of treatment trajectories, individual effect sizes, and logistic mixed-effects regression. Effect sizes and visual inspection revealed that participants could be divided into categories of strong responders (n = 4), mixed/moderate responders (n = 3), and non-responders (n = 4). Individual results did not reveal a reliable pattern of stronger performance in biofeedback versus traditional blocks, or vice versa. Moreover, biofeedback versus traditional treatment was not a significant predictor of accuracy in the logistic mixed-effects model examining all within-treatment word probes. However, the interaction between treatment condition and treatment order was significant: biofeedback was more effective than traditional treatment in the first phase of treatment, and traditional treatment was more effective

  10. Modifying the acoustic impedance of polyurea-based composites

    NASA Astrophysics Data System (ADS)

    Nantasetphong, Wiroj; Amirkhizi, Alireza V.; Jia, Zhanzhan; Nemat-Nasser, Sia

    2013-04-01

    Acoustic impedance is a material property that depends on mass density and acoustic wave speed. An impedance mismatch between two media leads to the partial reflection of an acoustic wave sent from one medium to another. Active sonar is one example of a useful application of this phenomenon, where reflected and scattered acoustic waves enable the detection of objects. If the impedance of an object is matched to that of the surrounding medium, however, the object may be hidden from observation (at least directly) by sonar. In this study, polyurea composites are developed to facilitate such impedance matching. Polyurea is used due to its excellent blast-mitigating properties, easy casting, corrosion protection, abrasion resistance, and various uses in current military technology. Since pure polyurea has impedance higher than that of water (the current medium of interest), low mass density phenolic microballoon particles are added to create composite materials with reduced effective impedances. The volume fraction of particles is varied to study the effect of filler quantity on the acoustic impedance of the resulting composite. The composites are experimentally characterized via ultrasonic measurements. Computational models based on the method of dilute-randomly-distributed inclusions are developed and compared with the experimental results. These experiments and models will facilitate the design of new elastomeric composites with desirable acoustic impedances.

  11. Language-Specific Developmental Differences in Speech Production: A Cross-Language Acoustic Study

    ERIC Educational Resources Information Center

    Li, Fangfang

    2012-01-01

    Speech productions of 40 English- and 40 Japanese-speaking children (aged 2-5) were examined and compared with the speech produced by 20 adult speakers (10 speakers per language). Participants were recorded while repeating words that began with "s" and "sh" sounds. Clear language-specific patterns in adults' speech were found,…

  12. Fast multi-feature paradigm for recording several mismatch negativities (MMNs) to phonetic and acoustic changes in speech sounds.

    PubMed

    Pakarinen, Satu; Lovio, Riikka; Huotilainen, Minna; Alku, Paavo; Näätänen, Risto; Kujala, Teija

    2009-12-01

    In this study, we addressed whether a new fast multi-feature mismatch negativity (MMN) paradigm can be used for determining the central auditory discrimination accuracy for several acoustic and phonetic changes in speech sounds. We recorded the MMNs in the multi-feature paradigm to changes in syllable intensity, frequency, and vowel length, as well as for consonant and vowel change, and compared these MMNs to those obtained with the traditional oddball paradigm. In addition, we examined the reliability of the multi-feature paradigm by repeating the recordings with the same subjects 1-7 days after the first recordings. The MMNs recorded with the multi-feature paradigm were similar to those obtained with the oddball paradigm. Furthermore, only minor differences were observed in the MMN amplitudes across the two recording sessions. Thus, this new multi-feature paradigm with speech stimuli provides similar results as the oddball paradigm, and the MMNs recorded with the new paradigm were reproducible.

  13. A study of acoustic-to-articulatory inversion of speech by analysis-by-synthesis using chain matrices and the Maeda articulatory model.

    PubMed

    Panchapagesan, Sankaran; Alwan, Abeer

    2011-04-01

    In this paper, a quantitative study of acoustic-to-articulatory inversion for vowel speech sounds by analysis-by-synthesis using the Maeda articulatory model is performed. For chain matrix calculation of vocal tract (VT) acoustics, the chain matrix derivatives with respect to area function are calculated and used in a quasi-Newton method for optimizing articulatory trajectories. The cost function includes a distance measure between natural and synthesized first three formants, and parameter regularization and continuity terms. Calibration of the Maeda model to two speakers, one male and one female, from the University of Wisconsin x-ray microbeam (XRMB) database, using a cost function, is discussed. Model adaptation includes scaling the overall VT and the pharyngeal region and modifying the outer VT outline using measured palate and pharyngeal traces. The inversion optimization is initialized by a fast search of an articulatory codebook, which was pruned using XRMB data to improve inversion results. Good agreement between estimated midsagittal VT outlines and measured XRMB tongue pellet positions was achieved for several vowels and diphthongs for the male speaker, with average pellet-VT outline distances around 0.15 cm, smooth articulatory trajectories, and less than 1% average error in the first three formants.

  14. The Perception of Telephone-Processed Speech by Combined Electric and Acoustic Stimulation

    PubMed Central

    Tahmina, Qudsia; Runge, Christina; Friedland, David R.

    2013-01-01

    This study assesses the effects of adding low- or high-frequency information to the band-limited telephone-processed speech on bimodal listeners’ telephone speech perception in quiet environments. In the proposed experiments, bimodal users were presented under quiet listening conditions with wideband speech (WB), bandpass-filtered telephone speech (300–3,400 Hz, BP), high-pass filtered speech (f > 300 Hz, HP, i.e., distorted frequency components above 3,400 Hz in telephone speech were restored), and low-pass filtered speech (f < 3,400 Hz, LP, i.e., distorted frequency components below 300 Hz in telephone speech were restored). Results indicated that in quiet environments, for all four types of stimuli, listening with both hearing aid (HA) and cochlear implant (CI) was significantly better than listening with CI alone. For both bimodal and CI-alone modes, there were no statistically significant differences between the LP and BP scores and between the WB and HP scores. However, the HP scores were significantly better than the BP scores. In quiet conditions, both CI alone and bimodal listening achieved the largest benefits when telephone speech was augmented with high rather than low-frequency information. These findings provide support for the design of algorithms that would extend higher frequency information, at least in quiet environments. PMID:24265213

  15. An acoustic feature-based similarity scoring system for speech rehabilitation assistance.

    PubMed

    Syauqy, Dahnial; Wu, Chao-Min; Setyawati, Onny

    2016-08-01

    The purpose of this study is to develop a tool to assist speech therapy and rehabilitation, which focused on automatic scoring based on the comparison of the patient's speech with another normal speech on several aspects including pitch, vowel, voiced-unvoiced segments, strident fricative and sound intensity. The pitch estimation employed the use of cepstrum-based algorithm for its robustness; the vowel classification used multilayer perceptron (MLP) to classify vowel from pitch and formants; and the strident fricative detection was based on the major peak spectral intensity, location and the pitch existence in the segment. In order to evaluate the performance of the system, this study analyzed eight patient's speech recordings (four males, four females; 4-58-years-old), which had been recorded in previous study in cooperation with Taipei Veterans General Hospital and Taoyuan General Hospital. The experiment result on pitch algorithm showed that the cepstrum method had 5.3% of gross pitch error from a total of 2086 frames. On the vowel classification algorithm, MLP method provided 93% accuracy (men), 87% (women) and 84% (children). In total, the overall results showed that 156 tool's grading results (81%) were consistent compared to 192 audio and visual observations done by four experienced respondents. Implication for Rehabilitation Difficulties in communication may limit the ability of a person to transfer and exchange information. The fact that speech is one of the primary means of communication has encouraged the needs of speech diagnosis and rehabilitation. The advances of technology in computer-assisted speech therapy (CAST) improve the quality, time efficiency of the diagnosis and treatment of the disorders. The present study attempted to develop tool to assist speech therapy and rehabilitation, which provided simple interface to let the assessment be done even by the patient himself without the need of particular knowledge of speech processing while at the

  16. Designing acoustics for linguistically diverse classrooms: Effects of background noise, reverberation and talker foreign accent on speech comprehension by native and non-native English-speaking listeners

    NASA Astrophysics Data System (ADS)

    Peng, Zhao Ellen

    The current classroom acoustics standard (ANSI S12.60-2010) recommends core learning spaces not to exceed background noise level (BNL) of 35 dBA and reverberation time (RT) of 0.6 second, based on speech intelligibility performance mainly by the native English-speaking population. Existing literature has not correlated these recommended values well with student learning outcomes. With a growing population of non-native English speakers in American classrooms, the special needs for perceiving degraded speech among non-native listeners, either due to realistic room acoustics or talker foreign accent, have not been addressed in the current standard. This research seeks to investigate the effects of BNL and RT on the comprehension of English speech from native English and native Mandarin Chinese talkers as perceived by native and non-native English listeners, and to provide acoustic design guidelines to supplement the existing standard. This dissertation presents two studies on the effects of RT and BNL on more realistic classroom learning experiences. How do native and non-native English-speaking listeners perform on speech comprehension tasks under adverse acoustic conditions, if the English speech is produced by talkers of native English (Study 1) versus native Mandarin Chinese (Study 2)? Speech comprehension materials were played back in a listening chamber to individual listeners: native and non-native English-speaking in Study 1; native English, native Mandarin Chinese, and other non-native English-speaking in Study 2. Each listener was screened for baseline English proficiency level, and completed dual tasks simultaneously involving speech comprehension and adaptive dot-tracing under 15 acoustic conditions, comprised of three BNL conditions (RC-30, 40, and 50) and five RT scenarios (0.4 to 1.2 seconds). The results show that BNL and RT negatively affect both objective performance and subjective perception of speech comprehension, more severely for non

  17. A Computational Model of Word Segmentation from Continuous Speech Using Transitional Probabilities of Atomic Acoustic Events

    ERIC Educational Resources Information Center

    Rasanen, Okko

    2011-01-01

    Word segmentation from continuous speech is a difficult task that is faced by human infants when they start to learn their native language. Several studies indicate that infants might use several different cues to solve this problem, including intonation, linguistic stress, and transitional probabilities between subsequent speech sounds. In this…

  18. Acoustic Analysis of Clear Versus Conversational Speech in Individuals with Parkinson Disease

    ERIC Educational Resources Information Center

    Goberman, A.M.; Elmer, L.W.

    2005-01-01

    A number of studies have been devoted to the examination of clear versus conversational speech in non-impaired speakers. The purpose of these previous studies has been primarily to help increase speech intelligibility for the benefit of hearing-impaired listeners. The goal of the present study was to examine differences between conversational and…

  19. Systematic Studies of Modified Vocalization: The Effect of Speech Rate on Speech Production Measures During Metronome-Paced Speech in Persons who Stutter

    PubMed Central

    Davidow, Jason H.

    2013-01-01

    Background Metronome-paced speech results in the elimination, or substantial reduction, of stuttering moments. The cause of fluency during this fluency-inducing condition is unknown. Several investigations have reported changes in speech pattern characteristics from a control condition to a metronome-paced speech condition, but failure to control speech rate between conditions limits our ability to determine if the changes were necessary for fluency. Aims This study examined the effect of speech rate on several speech production variables during one-syllable-per-beat metronomic speech, in order to determine changes that may be important for fluency during this fluency-inducing condition. Methods and Procedures Thirteen persons who stutter (PWS), aged 18–62 years, completed a series of speaking tasks. Several speech production variables were compared between conditions produced at different metronome beat rates, and between a control condition and a metronome-paced speech condition produced at a rate equal to the control condition. Outcomes & Results Vowel duration, voice onset time, pressure rise time, and phonated intervals were significantly impacted by metronome beat rate. Voice onset time and the percentage of short (30–100 ms) phonated intervals significantly decreased from the control condition to the equivalent rate metronome-paced speech condition. Conclusions & Implications A reduction in the percentage of short phonated intervals may be important for fluency during syllable-based metronome-paced speech for PWS. Future studies should continue examining the necessity of this reduction. In addition, speech rate must be controlled in future fluency-inducing condition studies, including neuroimaging investigations, in order for this research to make a substantial contribution to finding the fluency-inducing mechanism of fluency-inducing conditions. PMID:24372888

  20. Echo-acoustic flow dynamically modifies the cortical map of target range in bats

    NASA Astrophysics Data System (ADS)

    Bartenstein, Sophia K.; Gerstenberg, Nadine; Vanderelst, Dieter; Peremans, Herbert; Firzlaff, Uwe

    2014-08-01

    Echolocating bats use the delay between their sonar emissions and the reflected echoes to measure target range, a crucial parameter for avoiding collisions or capturing prey. In many bat species, target range is represented as an orderly organized map of echo delay in the auditory cortex. Here we show that the map of target range in bats is dynamically modified by the continuously changing flow of acoustic information perceived during flight (‘echo-acoustic flow’). Combining dynamic acoustic stimulation in virtual space with extracellular recordings, we found that neurons in the auditory cortex of the bat Phyllostomus discolor encode echo-acoustic flow information on the geometric relation between targets and the bat’s flight trajectory, rather than echo delay per se. Specifically, the cortical representation of close-range targets is enlarged when the lateral passing distance of the target decreases. This flow-dependent enlargement of target representation may trigger adaptive behaviours such as vocal control or flight manoeuvres.

  1. Depression Diagnoses and Fundamental Frequency-Based Acoustic Cues in Maternal Infant-Directed Speech

    PubMed Central

    Porritt, Laura L.; Zinser, Michael C.; Bachorowski, Jo-Anne; Kaplan, Peter S.

    2013-01-01

    F0-based acoustic measures were extracted from a brief, sentence-final target word spoken during structured play interactions between mothers and their 3- to 14-month-old infants, and were analyzed based on demographic variables and DSM-IV Axis-I clinical diagnoses and their common modifiers. F0 range (ΔF0) was negatively correlated with infant age and number of children. ΔF0 was significantly smaller in clinically depressed mothers and mothers diagnosed with depression in partial remission, relative to non-depressed mothers, mothers diagnosed with depression in full remission, and those diagnosed with depressive disorder not otherwise specified. ΔF0 was significantly lower in mothers experiencing their first major depressive episode relative to mothers with recurrent depression. Deficits in ΔF0 were specific to diagnosed clinical depression, and were not well predicted by elevated self-report scores only, or by diagnosed anxiety disorders. Mothers with higher ΔF0 had infants with reportedly larger productive vocabularies, but depression was unrelated to vocabulary development. Implications for cognitive-linguistic development are discussed. PMID:24489521

  2. Effect of Digital Frequency Compression (DFC) on Speech Recognition in Candidates for Combined Electric and Acoustic Stimulation (EAS)

    PubMed Central

    Gifford, René H.; Dorman, Michael F.; Spahr, Anthony J.; McKarns, Sharon A.

    2008-01-01

    Purpose To compare the effects of conventional amplification (CA) and digital frequency compression (DFC) amplification on the speech recognition abilities of candidates for a partial-insertion cochlear implant, that is, candidates for combined electric and acoustic stimulation (EAS). Method The participants were 6 patients whose audiometric thresholds at 500 Hz and below were ≤60 dB HL and whose thresholds at 2000 Hz and above were ≥80 dB HL. Six tests of speech understanding were administered with CA and DFC. The Abbreviated Profile of Hearing Aid Benefit (APHAB) was also administered following use of CA and DFC. Results Group mean scores were not statistically different in the CA and DFC conditions. However, 2 patients received substantial benefit in DFC conditions. APHAB scores suggested increased ease of communication, but also increased aversive sound quality. Conclusion Results suggest that a relatively small proportion of individuals who meet EAS candidacy will receive substantial benefit from a DFC hearing aid and that a larger proportion will receive at least a small benefit when speech is presented against a background of noise. This benefit, however, comes at a cost—aversive sound quality. PMID:17905905

  3. Experimental investigation of the effects of the acoustical conditions in a simulated classroom on speech recognition and learning in children.

    PubMed

    Valente, Daniel L; Plevinsky, Hallie M; Franco, John M; Heinrichs-Graham, Elizabeth C; Lewis, Dawna E

    2012-01-01

    The potential effects of acoustical environment on speech understanding are especially important as children enter school where students' ability to hear and understand complex verbal information is critical to learning. However, this ability is compromised because of widely varied and unfavorable classroom acoustics. The extent to which unfavorable classroom acoustics affect children's performance on longer learning tasks is largely unknown as most research has focused on testing children using words, syllables, or sentences as stimuli. In the current study, a simulated classroom environment was used to measure comprehension performance of two classroom learning activities: a discussion and lecture. Comprehension performance was measured for groups of elementary-aged students in one of four environments with varied reverberation times and background noise levels. The reverberation time was either 0.6 or 1.5 s, and the signal-to-noise level was either +10 or +7 dB. Performance is compared to adult subjects as well as to sentence-recognition in the same condition. Significant differences were seen in comprehension scores as a function of age and condition; both increasing background noise and reverberation degraded performance in comprehension tasks compared to minimal differences in measures of sentence-recognition.

  4. Acoustic Variations in Adductor Spasmodic Dysphonia as a Function of Speech Task.

    ERIC Educational Resources Information Center

    Sapienza, Christine M.; Walton, Suzanne; Murry, Thomas

    1999-01-01

    Acoustic phonatory events were identified in 14 women diagnosed with adductor spasmodic dysphonia (ADSD), a focal laryngeal dystonia that disturbs phonatory function, and compared with those of 14 age-matched women with no vocal dysfunction. Findings indicated ADSD subjects produced more aberrant acoustic events than controls during tasks of…

  5. Emotion in speech: the acoustic attributes of fear, anger, sadness, and joy.

    PubMed

    Sobin, C; Alpert, M

    1999-07-01

    Decoders can detect emotion in voice with much greater accuracy than can be achieved by objective acoustic analysis. Studies that have established this advantage, however, used methods that may have favored decoders and disadvantaged acoustic analysis. In this study, we applied several methodologic modifications for the analysis of the acoustic differentiation of fear, anger, sadness, and joy. Thirty-one female subjects between the ages of 18 and 35 (encoders) were audio-recorded during an emotion-induction procedure and produced a total of 620 emotion-laden sentences. Twelve female judges (decoders), three for each of the four emotions, were assigned to rate the intensity of one emotion each. Their combined ratings were used to select 38 prototype samples per emotion. Past acoustic findings were replicated, and increased acoustic differentiation among the emotions was achieved. Multiple regression analysis suggested that some, although not all, of the acoustic variables were associated with decoders' ratings. Signal detection analysis gave some insight into this disparity. However, the analysis of the classic constellation of acoustic variables may not completely capture the acoustic features that influence decoders' ratings. Future analyses would likely benefit from the parallel assessment of respiration, phonation, and articulation.

  6. A human vocal utterance corpus for perceptual and acoustic analysis of speech, singing, and intermediate vocalizations

    NASA Astrophysics Data System (ADS)

    Gerhard, David

    2002-11-01

    In this paper we present the collection and annotation process of a corpus of human utterance vocalizations used for speech and song research. The corpus was collected to fill a void in current research tools, since no corpus currently exists which is useful for the classification of intermediate utterances between speech and monophonic singing. Much work has been done in the domain of speech versus music discrimination, and several corpora exist which can be used for this research. A specific example is the work done by Eric Scheirer and Malcom Slaney [IEEE ICASSP, 1997, pp. 1331-1334]. The collection of the corpus is described including questionnaire design and intended and actual response characteristics, as well as the collection and annotation of pre-existing samples. The annotation of the corpus consisted of a survey tool for a subset of the corpus samples, including ratings of the clips based on a speech-song continuum, and questions on the perceptual qualities of speech and song, both generally and corresponding to particular clips in the corpus.

  7. Applied Self-Statement Modification and Applied Modified Desensitization in the Treatment of Speech Anxiety: The Synergy Hypothesis.

    ERIC Educational Resources Information Center

    Melanson, Diane C.

    A study examined the relative effectiveness and synergistic effect of two treatments for reducing speech anxiety--Self-Statement Modification (SSM), a therapy focused on modification of cognitive behavior; and Modified Desensitization (MD), a therapy focused on physiological variables, utilizing relaxation training, group hierarchy construction,…

  8. Revisiting Neil Armstrongs Moon-Landing Quote: Implications for Speech Perception, Function Word Reduction, and Acoustic Ambiguity

    PubMed Central

    Baese-Berk, Melissa M.; Dilley, Laura C.; Schmidt, Stephanie; Morrill, Tuuli H.; Pitt, Mark A.

    2016-01-01

    Neil Armstrong insisted that his quote upon landing on the moon was misheard, and that he had said one small step for a man, instead of one small step for man. What he said is unclear in part because function words like a can be reduced and spectrally indistinguishable from the preceding context. Therefore, their presence can be ambiguous, and they may disappear perceptually depending on the rate of surrounding speech. Two experiments are presented examining production and perception of reduced tokens of for and for a in spontaneous speech. Experiment 1 investigates the distributions of several acoustic features of for and for a. The results suggest that the distributions of for and for a overlap substantially, both in terms of temporal and spectral characteristics. Experiment 2 examines perception of these same tokens when the context speaking rate differs. The perceptibility of the function word a varies as a function of this context speaking rate. These results demonstrate that substantial ambiguity exists in the original quote from Armstrong, and that this ambiguity may be understood through context speaking rate. PMID:27603209

  9. Revisiting Neil Armstrongs Moon-Landing Quote: Implications for Speech Perception, Function Word Reduction, and Acoustic Ambiguity.

    PubMed

    Baese-Berk, Melissa M; Dilley, Laura C; Schmidt, Stephanie; Morrill, Tuuli H; Pitt, Mark A

    2016-01-01

    Neil Armstrong insisted that his quote upon landing on the moon was misheard, and that he had said one small step for a man, instead of one small step for man. What he said is unclear in part because function words like a can be reduced and spectrally indistinguishable from the preceding context. Therefore, their presence can be ambiguous, and they may disappear perceptually depending on the rate of surrounding speech. Two experiments are presented examining production and perception of reduced tokens of for and for a in spontaneous speech. Experiment 1 investigates the distributions of several acoustic features of for and for a. The results suggest that the distributions of for and for a overlap substantially, both in terms of temporal and spectral characteristics. Experiment 2 examines perception of these same tokens when the context speaking rate differs. The perceptibility of the function word a varies as a function of this context speaking rate. These results demonstrate that substantial ambiguity exists in the original quote from Armstrong, and that this ambiguity may be understood through context speaking rate.

  10. Impact of Aberrant Acoustic Properties on the Perception of Sound Quality in Electrolarynx Speech

    ERIC Educational Resources Information Center

    Meltzner, Geoffrey S.; Hillman, Robert E.

    2005-01-01

    A large percentage of patients who have undergone laryngectomy to treat advanced laryngeal cancer rely on an electrolarynx (EL) to communicate verbally. Although serviceable, EL speech is plagued by shortcomings in both sound quality and intelligibility. This study sought to better quantify the relative contributions of previously identified…

  11. The Exploitation of Subphonemic Acoustic Detail in L2 Speech Segmentation

    ERIC Educational Resources Information Center

    Shoemaker, Ellenor

    2014-01-01

    The current study addresses an aspect of second language (L2) phonological acquisition that has received little attention to date--namely, the acquisition of allophonic variation as a word boundary cue. The role of subphonemic variation in the segmentation of speech by native speakers has been indisputably demonstrated; however, the acquisition of…

  12. An acoustic key to eight languages/dialects: Factor analyses of critical-band-filtered speech

    PubMed Central

    Ueda, Kazuo; Nakajima, Yoshitaka

    2017-01-01

    The peripheral auditory system functions like a frequency analyser, often modelled as a bank of non-overlapping band-pass filters called critical bands; 20 bands are necessary for simulating frequency resolution of the ear within an ordinary frequency range of speech (up to 7,000 Hz). A far smaller number of filters seemed sufficient, however, to re-synthesise intelligible speech sentences with power fluctuations of the speech signals passing through them; nevertheless, the number and frequency ranges of the frequency bands for efficient speech communication are yet unknown. We derived four common frequency bands—covering approximately 50–540, 540–1,700, 1,700–3,300, and above 3,300 Hz—from factor analyses of spectral fluctuations in eight different spoken languages/dialects. The analyses robustly led to three factors common to all languages investigated—the low & mid-high factor related to the two separate frequency ranges of 50–540 and 1,700–3,300 Hz, the mid-low factor the range of 540–1,700 Hz, and the high factor the range above 3,300 Hz—in these different languages/dialects, suggesting a language universal. PMID:28198405

  13. An acoustic key to eight languages/dialects: Factor analyses of critical-band-filtered speech.

    PubMed

    Ueda, Kazuo; Nakajima, Yoshitaka

    2017-02-15

    The peripheral auditory system functions like a frequency analyser, often modelled as a bank of non-overlapping band-pass filters called critical bands; 20 bands are necessary for simulating frequency resolution of the ear within an ordinary frequency range of speech (up to 7,000 Hz). A far smaller number of filters seemed sufficient, however, to re-synthesise intelligible speech sentences with power fluctuations of the speech signals passing through them; nevertheless, the number and frequency ranges of the frequency bands for efficient speech communication are yet unknown. We derived four common frequency bands-covering approximately 50-540, 540-1,700, 1,700-3,300, and above 3,300 Hz-from factor analyses of spectral fluctuations in eight different spoken languages/dialects. The analyses robustly led to three factors common to all languages investigated-the low &mid-high factor related to the two separate frequency ranges of 50-540 and 1,700-3,300 Hz, the mid-low factor the range of 540-1,700 Hz, and the high factor the range above 3,300 Hz-in these different languages/dialects, suggesting a language universal.

  14. Comparison of Acoustic and Kinematic Approaches to Measuring Utterance-Level Speech Variability

    ERIC Educational Resources Information Center

    Howell, Peter; Anderson, Andrew J.; Bartrip, Jon; Bailey, Eleanor

    2009-01-01

    Purpose: The spatiotemporal index (STI) is one measure of variability. As currently implemented, kinematic data are used, requiring equipment that cannot be used with some patient groups or in scanners. An experiment is reported that addressed whether STI can be extended to an audio measure of sound pressure of the speech envelope over time that…

  15. Acoustic correlates of inflectional morphology in the speech of children with specific language impairment and their typically developing peers.

    PubMed

    Owen, Amanda J; Goffman, Lisa

    2007-07-01

    The development of the use of the third-person singular -s in open syllable verbs in children with specific language impairment (SLI) and their typically developing peers was examined. Verbs that included overt productions of the third-person singular -s morpheme (e.g. Bobby plays ball everyday; Bear laughs when mommy buys popcorn) were contrasted with clearly bare stem contexts (e.g. Mommy, buy popcorn; I saw Bobby play ball) on both global and local measures of acoustic duration. A durational signature for verbs inflected with -s was identified separately from factors related to sentence length. These duration measures were also used to identify acoustic changes related to the omission of the -s morpheme. The omitted productions from the children with SLI were significantly longer than their correct third-person singular and bare stem productions. This result was unexpected given that the omitted productions have fewer phonemes than correctly inflected productions. Typically developing children did not show the same pattern, instead producing omitted productions that patterned most closely with bare stem forms. These results are discussed in relation to current theoretical approaches to SLI, with an emphasis on performance and speech-motor accounts.

  16. Speech production knowledge in automatic speech recognition.

    PubMed

    King, Simon; Frankel, Joe; Livescu, Karen; McDermott, Erik; Richmond, Korin; Wester, Mirjam

    2007-02-01

    Although much is known about how speech is produced, and research into speech production has resulted in measured articulatory data, feature systems of different kinds, and numerous models, speech production knowledge is almost totally ignored in current mainstream approaches to automatic speech recognition. Representations of speech production allow simple explanations for many phenomena observed in speech which cannot be easily analyzed from either acoustic signal or phonetic transcription alone. In this article, a survey of a growing body of work in which such representations are used to improve automatic speech recognition is provided.

  17. Classifying acoustic signals into phoneme categories: average and dyslexic readers make use of complex dynamical patterns and multifractal scaling properties of the speech signal

    PubMed Central

    2015-01-01

    Several competing aetiologies of developmental dyslexia suggest that the problems with acquiring literacy skills are causally entailed by low-level auditory and/or speech perception processes. The purpose of this study is to evaluate the diverging claims about the specific deficient peceptual processes under conditions of strong inference. Theoretically relevant acoustic features were extracted from a set of artificial speech stimuli that lie on a /bAk/-/dAk/ continuum. The features were tested on their ability to enable a simple classifier (Quadratic Discriminant Analysis) to reproduce the observed classification performance of average and dyslexic readers in a speech perception experiment. The ‘classical’ features examined were based on component process accounts of developmental dyslexia such as the supposed deficit in Envelope Rise Time detection and the deficit in the detection of rapid changes in the distribution of energy in the frequency spectrum (formant transitions). Studies examining these temporal processing deficit hypotheses do not employ measures that quantify the temporal dynamics of stimuli. It is shown that measures based on quantification of the dynamics of complex, interaction-dominant systems (Recurrence Quantification Analysis and the multifractal spectrum) enable QDA to classify the stimuli almost identically as observed in dyslexic and average reading participants. It seems unlikely that participants used any of the features that are traditionally associated with accounts of (impaired) speech perception. The nature of the variables quantifying the temporal dynamics of the speech stimuli imply that the classification of speech stimuli cannot be regarded as a linear aggregate of component processes that each parse the acoustic signal independent of one another, as is assumed by the ‘classical’ aetiologies of developmental dyslexia. It is suggested that the results imply that the differences in speech perception performance between

  18. Quantifying the Effect of Compression Hearing Aid Release Time on Speech Acoustics and Intelligibility

    ERIC Educational Resources Information Center

    Jenstad, Lorienne M.; Souza, Pamela E.

    2005-01-01

    Compression hearing aids have the inherent, and often adjustable, feature of release time from compression. Research to date does not provide a consensus on how to choose or set release time. The current study had 2 purposes: (a) a comprehensive evaluation of the acoustic effects of release time for a single-channel compression system in quiet and…

  19. Exploration of Acoustic Features for Automatic Vowel Discrimination in Spontaneous Speech

    ERIC Educational Resources Information Center

    Tyson, Na'im R.

    2012-01-01

    In an attempt to understand what acoustic/auditory feature sets motivated transcribers towards certain labeling decisions, I built machine learning models that were capable of discriminating between canonical and non-canonical vowels excised from the Buckeye Corpus. Specifically, I wanted to model when the dictionary form and the transcribed-form…

  20. Changes in Speech Production in a Child with a Cochlear Implant: Acoustic and Kinematic Evidence.

    ERIC Educational Resources Information Center

    Goffman, Lisa; Ertmer, David J.; Erdle, Christa

    2002-01-01

    A method is presented for examining change in motor patterns used to produce linguistic contrasts. In this case study, the method is applied to a child who experienced hearing loss at age 3 and received a multi-channel cochlear implant at 7. Post-implant, acoustic durations showed a maturational change. (Contains references.) (Author/CR)

  1. Filled pause refinement based on the pronunciation probability for lecture speech.

    PubMed

    Long, Yan-Hua; Ye, Hong

    2015-01-01

    Nowadays, although automatic speech recognition has become quite proficient in recognizing or transcribing well-prepared fluent speech, the transcription of speech that contains many disfluencies remains problematic, such as spontaneous conversational and lecture speech. Filled pauses (FPs) are the most frequently occurring disfluencies in this type of speech. Most recent studies have shown that FPs are widely believed to increase the error rates for state-of-the-art speech transcription, primarily because most FPs are not well annotated or provided in training data transcriptions and because of the similarities in acoustic characteristics between FPs and some common non-content words. To enhance the speech transcription system, we propose a new automatic refinement approach to detect FPs in British English lecture speech transcription. This approach combines the pronunciation probabilities for each word in the dictionary and acoustic language model scores for FP refinement through a modified speech recognition forced-alignment framework. We evaluate the proposed approach on the Reith Lectures speech transcription task, in which only imperfect training transcriptions are available. Successful results are achieved for both the development and evaluation datasets. Acoustic models trained on different styles of speech genres have been investigated with respect to FP refinement. To further validate the effectiveness of the proposed approach, speech transcription performance has also been examined using systems built on training data transcriptions with and without FP refinement.

  2. Protein-modified shear mode film bulk acoustic resonator for bio-sensing applications

    NASA Astrophysics Data System (ADS)

    Wang, Jingjing; Liu, Weihui; Xu, Yan; Chen, Da; Li, Dehua; Zhang, Luyin

    2014-09-01

    In this paper, we present a shear mode film bulk acoustic biosensor based on micro-electromechanical technology. The film bulk acoustic biosensor is a diaphragmatic structure consisting of a lateral field excited ZnO piezoelectric film piezoelectric stack built on an Si3N4 membrane. The device works at near 1.6 GHz with Q factors of 579 in water and 428 in glycerol. A frequency shift of 5.4 MHz and a small decline in the amplitude are found for the measurements in glycerol compared with those in water because of the viscous damping derived from the adjacent glycerol. For bio-sensing demonstration, the resonator was modified with biotin molecule to detect protein-ligand interactions in real-time and in situ. The resonant frequency of the biotin-modified device drops rapidly and gradually reaches equilibrium when exposed to the streptavidin solution due to the biotin-streptavidin interaction. The proposed film bulk acoustic biosensor shows promising applications for disease diagnostics, prognosis, and drug discovery.

  3. Acoustic and optoelectronic nature and interfacial durability of modified CNT and GnP-PVDF composites with nanostructural control

    NASA Astrophysics Data System (ADS)

    Park, Joung-Man; Kwon, Dong-Jun; Wang, Zuo-Jia; DeVries, Lawrence

    2014-03-01

    Nano- and hetero-structures of modified carbon nanotube (CNT) and Graphene nano Platelet (GnP) can control significantly piezoresistive and optoelectronic properties in Microelectromechanical Systems (MEMS) as acoustic actuators. Interfacial durability and electrical properties of modified CNT and GnP embedded in poly (vinylidene fluoride) (PVDF) nanocomposites were investigated for use in acoustic actuator applications. Modified GnP coated PVDF nanocomposite exhibited better electrical conductivity than neat and modified CNT due to the unique electrical nature of GnP. Modified GnP coating also exhibited good acoustical properties. Contact angle, surface energy, work of adhesion, and spreading coefficient measurements were contributed to explore the interfacial adhesion durability between neat CNT or plasma treated CNT and plasma treated PVDF. Acoustic actuation performance of modified GnP coated PVDF nanocomposites were investigated for different radii of curvature and different coating conditions, using a sound level meter. Modified GnP can be a more appropriate acoustic actuator than CNT cases because of improved electrical properties. Optimum radius of curvature and coating thickness was also obtained for the most appropriate sound pressure level (SPL) performance. This study can provide manufacturing parameters of transparent sound actuators with good quality practically.

  4. Measurement of Trained Speech Patterns in Stuttering: Interjudge and Intrajudge Agreement of Experts by Means of Modified Time-Interval Analysis

    ERIC Educational Resources Information Center

    Alpermann, Anke; Huber, Walter; Natke, Ulrich; Willmes, Klaus

    2010-01-01

    Improved fluency after stuttering therapy is usually measured by the percentage of stuttered syllables. However, outcome studies rarely evaluate the use of trained speech patterns that speakers use to manage stuttering. This study investigated whether the modified time interval analysis can distinguish between trained speech patterns, fluent…

  5. Streptavidin Modified ZnO Film Bulk Acoustic Resonator for Detection of Tumor Marker Mucin 1

    NASA Astrophysics Data System (ADS)

    Zheng, Dan; Guo, Peng; Xiong, Juan; Wang, Shengfu

    2016-09-01

    A ZnO-based film bulk acoustic resonator has been fabricated using a magnetron sputtering technology, which was employed as a biosensor for detection of mucin 1. The resonant frequency of the thin-film bulk acoustic resonator was located near at 1503.3 MHz. The average electromechanical coupling factor {K}_{eff}^2 and quality factor Q were 2.39 % and 224, respectively. Using the specific binding system of avidin-biotin, the streptavidin was self-assembled on the top gold electrode as the sensitive layer to indirectly test the MUC1 molecules. The resonant frequency of the biosensor decreases in response to the mass loading in range of 20-500 nM. The sensor modified with the streptavidin exhibits a high sensitivity of 4642.6 Hz/nM and a good selectivity.

  6. Predicting Speech Intelligibility with a Multiple Speech Subsystems Approach in Children with Cerebral Palsy

    ERIC Educational Resources Information Center

    Lee, Jimin; Hustad, Katherine C.; Weismer, Gary

    2014-01-01

    Purpose: Speech acoustic characteristics of children with cerebral palsy (CP) were examined with a multiple speech subsystems approach; speech intelligibility was evaluated using a prediction model in which acoustic measures were selected to represent three speech subsystems. Method: Nine acoustic variables reflecting different subsystems, and…

  7. A Robust Approach For Acoustic Noise Suppression In Speech Using ANFIS

    NASA Astrophysics Data System (ADS)

    Martinek, Radek; Kelnar, Michal; Vanus, Jan; Bilik, Petr; Zidek, Jan

    2015-11-01

    The authors of this article deals with the implementation of a combination of techniques of the fuzzy system and artificial intelligence in the application area of non-linear noise and interference suppression. This structure used is called an Adaptive Neuro Fuzzy Inference System (ANFIS). This system finds practical use mainly in audio telephone (mobile) communication in a noisy environment (transport, production halls, sports matches, etc). Experimental methods based on the two-input adaptive noise cancellation concept was clearly outlined. Within the experiments carried out, the authors created, based on the ANFIS structure, a comprehensive system for adaptive suppression of unwanted background interference that occurs in audio communication and degrades the audio signal. The system designed has been tested on real voice signals. This article presents the investigation and comparison amongst three distinct approaches to noise cancellation in speech; they are LMS (least mean squares) and RLS (recursive least squares) adaptive filtering and ANFIS. A careful review of literatures indicated the importance of non-linear adaptive algorithms over linear ones in noise cancellation. It was concluded that the ANFIS approach had the overall best performance as it efficiently cancelled noise even in highly noise-degraded speech. Results were drawn from the successful experimentation, subjective-based tests were used to analyse their comparative performance while objective tests were used to validate them. Implementation of algorithms was experimentally carried out in Matlab to justify the claims and determine their relative performances.

  8. [A modified speech enhancement algorithm for electronic cochlear implant and its digital signal processing realization].

    PubMed

    Wang, Yulin; Tian, Xuelong

    2014-08-01

    In order to improve the speech quality and auditory perceptiveness of electronic cochlear implant under strong noise background, a speech enhancement system used for electronic cochlear implant front-end was constructed. Taking digital signal processing (DSP) as the core, the system combines its multi-channel buffered serial port (McBSP) data transmission channel with extended audio interface chip TLV320AIC10, so speech signal acquisition and output with high speed are realized. Meanwhile, due to the traditional speech enhancement method which has the problems as bad adaptability, slow convergence speed and big steady-state error, versiera function and de-correlation principle were used to improve the existing adaptive filtering algorithm, which effectively enhanced the quality of voice communications. Test results verified the stability of the system and the de-noising performance of the algorithm, and it also proved that they could provide clearer speech signals for the deaf or tinnitus patients.

  9. Modified Particle Filtering Algorithm for Single Acoustic Vector Sensor DOA Tracking

    PubMed Central

    Li, Xinbo; Sun, Haixin; Jiang, Liangxu; Shi, Yaowu; Wu, Yue

    2015-01-01

    The conventional direction of arrival (DOA) estimation algorithm with static sources assumption usually estimates the source angles of two adjacent moments independently and the correlation of the moments is not considered. In this article, we focus on the DOA estimation of moving sources and a modified particle filtering (MPF) algorithm is proposed with state space model of single acoustic vector sensor. Although the particle filtering (PF) algorithm has been introduced for acoustic vector sensor applications, it is not suitable for the case that one dimension angle of source is estimated with large deviation, the two dimension angles (pitch angle and azimuth angle) cannot be simultaneously employed to update the state through resampling processing of PF algorithm. To solve the problems mentioned above, the MPF algorithm is proposed in which the state estimation of previous moment is introduced to the particle sampling of present moment to improve the importance function. Moreover, the independent relationship of pitch angle and azimuth angle is considered and the two dimension angles are sampled and evaluated, respectively. Then, the MUSIC spectrum function is used as the “likehood” function of the MPF algorithm, and the modified PF-MUSIC (MPF-MUSIC) algorithm is proposed to improve the root mean square error (RMSE) and the probability of convergence. The theoretical analysis and the simulation results validate the effectiveness and feasibility of the two proposed algorithms. PMID:26501280

  10. Modified symplectic schemes with nearly-analytic discrete operators for acoustic wave simulations

    NASA Astrophysics Data System (ADS)

    Liu, Shaolin; Yang, Dinghui; Lang, Chao; Wang, Wenshuai; Pan, Zhide

    2017-04-01

    Using a structure-preserving algorithm significantly increases the computational efficiency of solving wave equations. However, only a few explicit symplectic schemes are available in the literature, and the capabilities of these symplectic schemes have not been sufficiently exploited. Here, we propose a modified strategy to construct explicit symplectic schemes for time advance. The acoustic wave equation is transformed into a Hamiltonian system. The classical symplectic partitioned Runge-Kutta (PRK) method is used for the temporal discretization. Additional spatial differential terms are added to the PRK schemes to form the modified symplectic methods and then two modified time-advancing symplectic methods with all of positive symplectic coefficients are then constructed. The spatial differential operators are approximated by nearly-analytic discrete (NAD) operators, and we call the fully discretized scheme modified symplectic nearly analytic discrete (MSNAD) method. Theoretical analyses show that the MSNAD methods exhibit less numerical dispersion and higher stability limits than conventional methods. Three numerical experiments are conducted to verify the advantages of the MSNAD methods, such as their numerical accuracy, computational cost, stability, and long-term calculation capability.

  11. Systematic Studies of Modified Vocalization: Effects of Speech Rate and Instatement Style during Metronome Stimulation

    ERIC Educational Resources Information Center

    Davidow, Jason H.; Bothe, Anne K.; Richardson, Jessica D.; Andreatta, Richard D.

    2010-01-01

    Purpose: This study introduces a series of systematic investigations intended to clarify the parameters of the fluency-inducing conditions (FICs) in stuttering. Method: Participants included 11 adults, aged 20-63 years, with typical speech-production skills. A repeated measures design was used to examine the relationships between several speech…

  12. Speech masking and cancelling and voice obscuration

    DOEpatents

    Holzrichter, John F.

    2013-09-10

    A non-acoustic sensor is used to measure a user's speech and then broadcasts an obscuring acoustic signal diminishing the user's vocal acoustic output intensity and/or distorting the voice sounds making them unintelligible to persons nearby. The non-acoustic sensor is positioned proximate or contacting a user's neck or head skin tissue for sensing speech production information.

  13. Early recognition of speech

    PubMed Central

    Remez, Robert E; Thomas, Emily F

    2013-01-01

    Classic research on the perception of speech sought to identify minimal acoustic correlates of each consonant and vowel. In explaining perception, this view designated momentary components of an acoustic spectrum as cues to the recognition of elementary phonemes. This conceptualization of speech perception is untenable given the findings of phonetic sensitivity to modulation independent of the acoustic and auditory form of the carrier. The empirical key is provided by studies of the perceptual organization of speech, a low-level integrative function that finds and follows the sensory effects of speech amid concurrent events. These projects have shown that the perceptual organization of speech is keyed to modulation; fast; unlearned; nonsymbolic; indifferent to short-term auditory properties; and organization requires attention. The ineluctably multisensory nature of speech perception also imposes conditions that distinguish language among cognitive systems. WIREs Cogn Sci 2013, 4:213–223. doi: 10.1002/wcs.1213 PMID:23926454

  14. Acoustics

    NASA Technical Reports Server (NTRS)

    Goodman, Jerry R.; Grosveld, Ferdinand

    2007-01-01

    The acoustics environment in space operations is important to maintain at manageable levels so that the crewperson can remain safe, functional, effective, and reasonably comfortable. High acoustic levels can produce temporary or permanent hearing loss, or cause other physiological symptoms such as auditory pain, headaches, discomfort, strain in the vocal cords, or fatigue. Noise is defined as undesirable sound. Excessive noise may result in psychological effects such as irritability, inability to concentrate, decrease in productivity, annoyance, errors in judgment, and distraction. A noisy environment can also result in the inability to sleep, or sleep well. Elevated noise levels can affect the ability to communicate, understand what is being said, hear what is going on in the environment, degrade crew performance and operations, and create habitability concerns. Superfluous noise emissions can also create the inability to hear alarms or other important auditory cues such as an equipment malfunctioning. Recent space flight experience, evaluations of the requirements in crew habitable areas, and lessons learned (Goodman 2003; Allen and Goodman 2003; Pilkinton 2003; Grosveld et al. 2003) show the importance of maintaining an acceptable acoustics environment. This is best accomplished by having a high-quality set of limits/requirements early in the program, the "designing in" of acoustics in the development of hardware and systems, and by monitoring, testing and verifying the levels to ensure that they are acceptable.

  15. Effect of Acoustic Spectrographic Instruction on Production of English /i/ and /I/ by Spanish Pre-Service English Teachers

    ERIC Educational Resources Information Center

    Quintana-Lara, Marcela

    2014-01-01

    This study investigates the effects of Acoustic Spectrographic Instruction on the production of the English phonological contrast /i/ and / I /. Acoustic Spectrographic Instruction is based on the assumption that physical representations of speech sounds and spectrography allow learners to objectively see and modify those non-accurate features in…

  16. Multilevel Analysis in Analyzing Speech Data

    ERIC Educational Resources Information Center

    Guddattu, Vasudeva; Krishna, Y.

    2011-01-01

    The speech produced by human vocal tract is a complex acoustic signal, with diverse applications in phonetics, speech synthesis, automatic speech recognition, speaker identification, communication aids, speech pathology, speech perception, machine translation, hearing research, rehabilitation and assessment of communication disorders and many…

  17. Enhanced Acoustic Black Hole effect in beams with a modified thickness profile and extended platform

    NASA Astrophysics Data System (ADS)

    Tang, Liling; Cheng, Li

    2017-03-01

    The phenomenon of Acoustics Black Hole (ABH) benefits from the bending wave propagating properties inside a thin-walled structure with power-law thickness variation to achieve zero reflection when the structural thickness approaches zero in the ideal scenario. However, manufacturing an ideally tailored power-law profile of a structure with embedded ABH feature can hardly be achieved in practice. Past research showed that the inevitable truncation at the wedge tip of the structure can significantly weaken the expected ABH effect by creating wave reflections. On the premise of the minimum achievable truncation thickness by the current manufacturing technology, exploring ways to ensure and achieve better ABH effect becomes important. In this paper, we investigate this issue by using a previously developed wavelet-decomposed semi-analytical model on an Euler-Bernoulli beam with a modified power-law profile and an extended platform of constant thickness. Through comparisons with the conventional ABH profile in terms of system loss factor and energy distribution, numerical results show that the modified thickness profile brings about a systematic increase in the ABH effect at mid-to-high frequencies, especially when the truncation thickness is small and the profile parameter m is large. The use of an extended platform further increases the ABH effect to broader the frequency band whilst providing rooms for catering particular low frequency applications.

  18. Design Foundations for Content-Rich Acoustic Interfaces: Investigating Audemes as Referential Non-Speech Audio Cues

    ERIC Educational Resources Information Center

    Ferati, Mexhid Adem

    2012-01-01

    To access interactive systems, blind and visually impaired users can leverage their auditory senses by using non-speech sounds. The current structure of non-speech sounds, however, is geared toward conveying user interface operations (e.g., opening a file) rather than large theme-based information (e.g., a history passage) and, thus, is ill-suited…

  19. A self-organizing neural network architecture for auditory and speech perception with applications to acoustic and other temporal prediction problems

    NASA Astrophysics Data System (ADS)

    Cohen, Michael; Grossberg, Stephen

    1994-09-01

    This project is developing autonomous neural network models for the real-time perception and production of acoustic and speech signals. Our SPINET pitch model was developed to take realtime acoustic input and to simulate the key pitch data. SPINET was embedded into a model for auditory scene analysis, or how the auditory system separates sound sources in environments with multiple sources. The model groups frequency components based on pitch and spatial location cues and resonantly binds them within different streams. The model simulates psychophysical grouping data, such as how an ascending, tone groups with a descending tone even if noise exists at the intersection point, and how a tone before and after a noise burst is perceived to continue through the noise. These resonant streams input to working memories, wherein phonetic percepts adapt to global speech rate. Computer simulations quantitatively generate the experimentally observed category boundary shifts for voiced stop pairs that have the same or different place of articulation, including why the interval to hear a double (geminate) stop is twice as long as that to hear two different stops. This model also uses resonant feedback, here between list categories and working memory.

  20. Systematic Studies of Modified Vocalization: Speech Production Changes During a Variation of Metronomic Speech in Persons Who Do and Do Not Stutter

    PubMed Central

    Davidow, Jason H.; Bothe, Anne K.; Ye, Jun

    2011-01-01

    The most common way to induce fluency using rhythm requires persons who stutter to speak one syllable or one word to each beat of a metronome, but stuttering can also be eliminated when the stimulus is of a particular duration (e.g., 1 s). The present study examined stuttering frequency, speech production changes, and speech naturalness during rhythmic speech that alternated 1 s of reading with 1 s of silence. A repeated-measures design was used to compare data obtained during a control reading condition and during rhythmic reading in 10 persons who stutter (PWS) and 10 normally fluent controls. Ratings for speech naturalness were also gathered from naïve listeners. Results showed that mean vowel duration increased significantly, and the percentage of short phonated intervals decreased significantly, for both groups from the control to the experimental condition. Mean phonated interval length increased significantly for the fluent controls. Mean speech naturalness ratings during the experimental condition were approximately 7 on a 1–9 scale (1 = highly natural; 9 = highly unnatural), and these ratings were significantly correlated with vowel duration and phonated intervals for PWS. The findings indicate that PWS may be altering vocal fold vibration duration to obtain fluency during this rhythmic speech style, and that vocal fold vibration duration may have an impact on speech naturalness during rhythmic speech. Future investigations should examine speech production changes and speech naturalness during variations of this rhythmic condition. Educational Objectives The reader will be able to: (1) describe changes (from a control reading condition) in speech production variables when alternating between 1 s of reading and 1 s of silence, (2) describe which rhythmic conditions have been found to sound and feel the most natural, (3) describe methodological issues for studies about alterations in speech production variables during fluency-inducing conditions, and (4

  1. Correlation of subjective and objective measures of speech intelligibility

    NASA Astrophysics Data System (ADS)

    Bowden, Erica E.; Wang, Lily M.; Palahanska, Milena S.

    2003-10-01

    Currently there are a number of objective evaluation methods used to quantify the speech intelligibility in a built environment, including the Speech Transmission Index (STI), Rapid Speech Transmission Index (RASTI), Articulation Index (AI), and the Percentage Articulation Loss of Consonants (%ALcons). Many of these have been used for years; however, questions remain about their accuracy in predicting the acoustics of a space. Current widely used software programs can quickly evaluate STI, RASTI, and %ALcons from a measured impulse response. This project compares subjective human performance on modified rhyme and phonetically balanced word tests with objective results calculated from impulse response measurements in four different spaces. The results of these tests aid in understanding performance of various methods of speech intelligibility evaluation. [Work supported by the Univ. of Nebraska Center for Building Integration.] For Speech Communication Best Student Paper Award.

  2. Experimental investigation of the effects of the acoustical conditions in a simulated classroom on speech recognition and learning in children a

    PubMed Central

    Valente, Daniel L.; Plevinsky, Hallie M.; Franco, John M.; Heinrichs-Graham, Elizabeth C.; Lewis, Dawna E.

    2012-01-01

    The potential effects of acoustical environment on speech understanding are especially important as children enter school where students’ ability to hear and understand complex verbal information is critical to learning. However, this ability is compromised because of widely varied and unfavorable classroom acoustics. The extent to which unfavorable classroom acoustics affect children’s performance on longer learning tasks is largely unknown as most research has focused on testing children using words, syllables, or sentences as stimuli. In the current study, a simulated classroom environment was used to measure comprehension performance of two classroom learning activities: a discussion and lecture. Comprehension performance was measured for groups of elementary-aged students in one of four environments with varied reverberation times and background noise levels. The reverberation time was either 0.6 or 1.5 s, and the signal-to-noise level was either +10 or +7 dB. Performance is compared to adult subjects as well as to sentence-recognition in the same condition. Significant differences were seen in comprehension scores as a function of age and condition; both increasing background noise and reverberation degraded performance in comprehension tasks compared to minimal differences in measures of sentence-recognition. PMID:22280587

  3. Acoustic wave propagation in bovine cancellous bone: Application of the Modified Biot-Attenborough model

    NASA Astrophysics Data System (ADS)

    Lee, Kang Il; Roh, Heui-Seol; Yoon, Suk Wang

    2003-10-01

    Acoustic wave propagation in bovine cancellous bone is experimentally and theoretically investigated in the frequency range of 0.5-1 MHz. The phase velocity, attenuation coefficient, and broadband ultrasonic attenuation (BUA) of bovine cancellous bone are measured as functions of frequency and porosity. For theoretical estimation, the Modified Biot-Attenborough (MBA) model is employed with three new phenomenological parameters: the boundary condition, phase velocity, and impedance parameters. The MBA model is based on the idealization of cancellous bone as a nonrigid porous medium with circular cylindrical pores oriented normal to the surface. It is experimentally observed that the phase velocity is approximately nondispersive and the attenuation coefficient linearly increases with frequency. The MBA model predicts a slightly negative dispersion of phase velocity linearly with frequency and the nonlinear relationships of attenuation and BUA with porosity. The experimental results are in good agreement with the theoretical results estimated with the MBA model. It is expected that the MBA model can be usefully employed in the field of clinical bone assessment for the diagnosis of osteoporosis.

  4. The Effect of Residual Acoustic Hearing and Adaptation to Uncertainty on Speech Perception in Cochlear Implant Users: Evidence from Eye-Tracking

    PubMed Central

    McMurray, Bob; Farris-Trimble, Ashley; Seedorff, Michael; Rigler, Hannah

    2015-01-01

    Objectives While outcomes with cochlear implants (CIs) are generally good, performance can be fragile. The authors examined two factors that are crucial for good CI performance. First, while there is a clear benefit for adding residual acoustic hearing to CI stimulation (typically in low frequencies), it is unclear whether this contributes directly to phonetic categorization. Thus, the authors examined perception of voicing (which uses low-frequency acoustic cues) and fricative place of articulation (s/ʃ, which does not) in CI users with and without residual acoustic hearing. Second, in speech categorization experiments, CI users typically show shallower identification functions. These are typically interpreted as deriving from noisy encoding of the signal. However, psycholinguistic work suggests shallow slopes may also be a useful way to adapt to uncertainty. The authors thus employed an eye-tracking paradigm to examine this in CI users. Design Participants were 30 CI users (with a variety of configurations) and 22 age-matched normal hearing (NH) controls. Participants heard tokens from six b/p and six s/ʃ continua (eight steps) spanning real words (e.g., beach/peach, sip/ship). Participants selected the picture corresponding to the word they heard from a screen containing four items (a b-, p-, s- and ʃ-initial item). Eye movements to each object were monitored as a measure of how strongly they were considering each interpretation in the moments leading up to their final percept. Results Mouse-click results (analogous to phoneme identification) for voicing showed a shallower slope for CI users than NH listeners, but no differences between CI users with and without residual acoustic hearing. For fricatives, CI users also showed a shallower slope, but unexpectedly, acoustic + electric listeners showed an even shallower slope. Eye movements showed a gradient response to fine-grained acoustic differences for all listeners. Even considering only trials in which a

  5. Modified impulse method for the measurement of the frequency response of acoustic filters to weakly nonlinear transient excitations

    PubMed

    Payri; Desantes; Broatch

    2000-02-01

    In this paper, a modified impulse method is proposed which allows the determination of the influence of the excitation characteristics on acoustic filter performance. Issues related to nonlinear propagation, namely wave steepening and wave interactions, have been addressed in an approximate way, validated against one-dimensional unsteady nonlinear flow calculations. The results obtained for expansion chambers and extended duct resonators indicate that the amplitude threshold for the onset of nonlinear phenomena is related to the geometry considered.

  6. Differential Diagnosis of Severe Speech Disorders Using Speech Gestures

    ERIC Educational Resources Information Center

    Bahr, Ruth Huntley

    2005-01-01

    The differentiation of childhood apraxia of speech from severe phonological disorder is a common clinical problem. This article reports on an attempt to describe speech errors in children with childhood apraxia of speech on the basis of gesture use and acoustic analyses of articulatory gestures. The focus was on the movement of articulators and…

  7. SPEECH COMMUNICATION RESEARCH.

    DTIC Science & Technology

    studies of the dynamics of speech production through cineradiographic techniques and through acoustic analysis of formant motions in vowels in various...particular, the activity of the vocal cords and the dynamics of tongue motion. Research on speech perception has included experiments on vowel

  8. Speech research: A report on the status and progress of studies on the nature of speech, instrumentation for its investigation, and practical applications

    NASA Astrophysics Data System (ADS)

    Liberman, A. M.

    1980-06-01

    This report (1 April - 30 June) is one of a regular series on the status and progress of studies on the nature of speech, instrumentation for its investigation, and practical applications. Manuscripts cover the following topics: The perceptual equivalance of two acoustic cues for a speech contrast is specific to phonetic perception; Duplex perception of acoustic patterns as speech and nonspeech; Evidence for phonetic processing of cues to place of articulation: Perceived manner affects perceived place; Some articulatory correlates of perceptual isochrony; Effects of utterance continuity on phonetic judgments; Laryngeal adjustments in stuttering: A glottographic observation using a modified reaction paradigm; Missing -ing in reading: Letter detection errors on word endings; Speaking rate; syllable stress, and vowel identity; Sonority and syllabicity: Acoustic correlates of perception, Influence of vocalic context on perception of the (S)-(s) distinction.

  9. Room Acoustics

    NASA Astrophysics Data System (ADS)

    Kuttruff, Heinrich; Mommertz, Eckard

    The traditional task of room acoustics is to create or formulate conditions which ensure the best possible propagation of sound in a room from a sound source to a listener. Thus, objects of room acoustics are in particular assembly halls of all kinds, such as auditoria and lecture halls, conference rooms, theaters, concert halls or churches. Already at this point, it has to be pointed out that these conditions essentially depend on the question if speech or music should be transmitted; in the first case, the criterion for transmission quality is good speech intelligibility, in the other case, however, the success of room-acoustical efforts depends on other factors that cannot be quantified that easily, not least it also depends on the hearing habits of the listeners. In any case, absolutely "good acoustics" of a room do not exist.

  10. Development of an analytical solution of modified Biot's equations for the optimization of lightweight acoustic protection.

    PubMed

    Kanfoud, Jamil; Ali Hamdi, Mohamed; Becot, François-Xavier; Jaouen, Luc

    2009-02-01

    During lift-off, space launchers are submitted to high-level of acoustic loads, which may damage sensitive equipments. A special acoustic absorber has been previously integrated inside the fairing of space launchers to protect the payload. A new research project has been launched to develop a low cost fairing acoustic protection system using optimized layers of porous materials covered by a thin layer of fabric. An analytical model is used for the analysis of acoustic wave propagation within the multilayer porous media. Results have been validated by impedance tube measurements. A parametric study has been conducted to determine optimal mechanical and acoustical properties of the acoustic protection under dimensional thickness constraints. The effect of the mounting conditions has been studied. Results reveal the importance of the lateral constraints on the absorption coefficient particularly in the low frequency range. A transmission study has been carried out, where the fairing structure has been simulated by a limp mass layer. The transmission loss and noise reduction factors have been computed using Biot's theory and the local acoustic impedance approximation to represent the porous layer effect. Comparisons between the two models show the frequency domains for which the local impedance model is valid.

  11. ON THE NATURE OF SPEECH SCIENCE.

    ERIC Educational Resources Information Center

    PETERSON, GORDON E.

    IN THIS ARTICLE THE NATURE OF THE DISCIPLINE OF SPEECH SCIENCE IS CONSIDERED AND THE VARIOUS BASIC AND APPLIED AREAS OF THE DISCIPLINE ARE DISCUSSED. THE BASIC AREAS ENCOMPASS THE VARIOUS PROCESSES OF THE PHYSIOLOGY OF SPEECH PRODUCTION, THE ACOUSTICAL CHARACTERISTICS OF SPEECH, INCLUDING THE SPEECH WAVE TYPES AND THE INFORMATION-BEARING ACOUSTIC…

  12. Analysis of False Starts in Spontaneous Speech.

    ERIC Educational Resources Information Center

    O'Shaughnessy, Douglas

    A primary difference between spontaneous speech and read speech concerns the use of false starts, where a speaker interrupts the flow of speech to restart his or her utterance. A study examined the acoustic aspects of such restarts in a widely-used speech database, examining approximately 1000 utterances, about 10% of which contained a restart.…

  13. a Wavelet Model for Vocalic Speech Coarticulation

    NASA Astrophysics Data System (ADS)

    Lange, Robert Charles

    A known aspect of human speech is that a vowel produced in isolation (for example, "ee") is acoustically different from a production of the same vowel in the company of two consonants ("deed"). This phenomenon, natural to the speech of any language, is known as consonant-vowel -consonant coarticulation. The effect of coarticulation results when a speech segment ("d") dynamically influences the articulation of an adjacent segment ("ee" within "deed"). A recent development in the theory of wavelet signal processing is wavelet system characterization. In wavelet system theory, the wavelet transform is used to describe the time-frequency behavior of a transmission channel, by virtue of its ability to describe the time -frequency content of the system's input and output signals. The present research proposes a wavelet-system model for speech coarticulation; wherein, the system is the process of transformation from a control speech state (input) to an effected speech state (output). Specifically, a vowel produced in isolation is transformed into an effected version of the same vowel produced in consonant-vowel-consonant, via the "coarticulation channel". Quantitatively, the channel is determined by the wavelet transform of the effected vowel's signal, using the control vowel's signal as the mother wavelet. A practical experiment is conducted to evaluate the coarticulation channel using samples of real speech. The results show that the model is capable of depicting coarticulation effects associated with certain vowel-consonant combinations. They suggest that elements of the vowel's acoustic composition are continuously present, in a modified form, throughout the consonant-vowel transition. For other phonetic combinations, however, the model does not respond to instances of segmental transition in a characteristic way. The conclusions drawn from the study are that the wavelet techniques employed here are effective tools for the general analysis of speech sounds, and can

  14. Tutorial on architectural acoustics

    NASA Astrophysics Data System (ADS)

    Shaw, Neil; Talaske, Rick; Bistafa, Sylvio

    2002-11-01

    This tutorial is intended to provide an overview of current knowledge and practice in architectural acoustics. Topics covered will include basic concepts and history, acoustics of small rooms (small rooms for speech such as classrooms and meeting rooms, music studios, small critical listening spaces such as home theatres) and the acoustics of large rooms (larger assembly halls, auditoria, and performance halls).

  15. [Factorial structure of discriminating speech perception in binaural electro-acoustic correction in patients with imparied hearing of various etiology].

    PubMed

    Tokarev, O P; Bagriantseva, M N

    1990-01-01

    These authors examined 260 patients with hypoacusis of various etiology who needed hearing aids. When measuring their hearing, the authors identified the basic factors that may influence speech intelligibility in the case of binaural correction and optimal type of hearing aids. For every group of patients with hypoacusis of various etiology regression curves of integrated parameters were plotted which helped predict the effectiveness of hearing aids on an individual basis.

  16. Two-microphone spatial filtering provides speech reception benefits for cochlear implant users in difficult acoustic environments

    PubMed Central

    Goldsworthy, Raymond L.; Delhorne, Lorraine A.; Desloge, Joseph G.; Braida, Louis D.

    2014-01-01

    This article introduces and provides an assessment of a spatial-filtering algorithm based on two closely-spaced (∼1 cm) microphones in a behind-the-ear shell. The evaluated spatial-filtering algorithm used fast (∼10 ms) temporal-spectral analysis to determine the location of incoming sounds and to enhance sounds arriving from straight ahead of the listener. Speech reception thresholds (SRTs) were measured for eight cochlear implant (CI) users using consonant and vowel materials under three processing conditions: An omni-directional response, a dipole-directional response, and the spatial-filtering algorithm. The background noise condition used three simultaneous time-reversed speech signals as interferers located at 90°, 180°, and 270°. Results indicated that the spatial-filtering algorithm can provide speech reception benefits of 5.8 to 10.7 dB SRT compared to an omni-directional response in a reverberant room with multiple noise sources. Given the observed SRT benefits, coupled with an efficient design, the proposed algorithm is promising as a CI noise-reduction solution. PMID:25096120

  17. Effects of Speech Intensity on the Callsign Acquisition Test (CAT) and Modified Rhyme Test (MRT) Presented in Noise

    DTIC Science & Technology

    2012-01-01

    Department of Industrial and Systems Engineering, North Carolina Agricultural & Technical State University 1601 E. Market Street, Greensboro, North...Department of Management, North Carolina Agricultural & Technical State University 1601 E. Market Street, Greensboro, North Carolina 27411, USA; e...to-noise ratio. 1. Introduction Speech intelligibility (SI) is defined as the percent- age of speech units (i.e., phonemes, syllables, words, phrases

  18. Segmenting Words from Natural Speech: Subsegmental Variation in Segmental Cues

    ERIC Educational Resources Information Center

    Rytting, C. Anton; Brew, Chris; Fosler-Lussier, Eric

    2010-01-01

    Most computational models of word segmentation are trained and tested on transcripts of speech, rather than the speech itself, and assume that speech is converted into a sequence of symbols prior to word segmentation. We present a way of representing speech corpora that avoids this assumption, and preserves acoustic variation present in speech. We…

  19. Development of a test battery for evaluating speech perception in complex listening environments.

    PubMed

    Brungart, Douglas S; Sheffield, Benjamin M; Kubli, Lina R

    2014-08-01

    In the real world, spoken communication occurs in complex environments that involve audiovisual speech cues, spatially separated sound sources, reverberant listening spaces, and other complicating factors that influence speech understanding. However, most clinical tools for assessing speech perception are based on simplified listening environments that do not reflect the complexities of real-world listening. In this study, speech materials from the QuickSIN speech-in-noise test by Killion, Niquette, Gudmundsen, Revit, and Banerjee [J. Acoust. Soc. Am. 116, 2395-2405 (2004)] were modified to simulate eight listening conditions spanning the range of auditory environments listeners encounter in everyday life. The standard QuickSIN test method was used to estimate 50% speech reception thresholds (SRT50) in each condition. A method of adjustment procedure was also used to obtain subjective estimates of the lowest signal-to-noise ratio (SNR) where the listeners were able to understand 100% of the speech (SRT100) and the highest SNR where they could detect the speech but could not understand any of the words (SRT0). The results show that the modified materials maintained most of the efficiency of the QuickSIN test procedure while capturing performance differences across listening conditions comparable to those reported in previous studies that have examined the effects of audiovisual cues, binaural cues, room reverberation, and time compression on the intelligibility of speech.

  20. Words from spontaneous conversational speech can be recognized with human-like accuracy by an error-driven learning algorithm that discriminates between meanings straight from smart acoustic features, bypassing the phoneme as recognition unit.

    PubMed

    Arnold, Denis; Tomaschek, Fabian; Sering, Konstantin; Lopez, Florence; Baayen, R Harald

    2017-01-01

    Sound units play a pivotal role in cognitive models of auditory comprehension. The general consensus is that during perception listeners break down speech into auditory words and subsequently phones. Indeed, cognitive speech recognition is typically taken to be computationally intractable without phones. Here we present a computational model trained on 20 hours of conversational speech that recognizes word meanings within the range of human performance (model 25%, native speakers 20-44%), without making use of phone or word form representations. Our model also generates successfully predictions about the speed and accuracy of human auditory comprehension. At the heart of the model is a 'wide' yet sparse two-layer artificial neural network with some hundred thousand input units representing summaries of changes in acoustic frequency bands, and proxies for lexical meanings as output units. We believe that our model holds promise for resolving longstanding theoretical problems surrounding the notion of the phone in linguistic theory.

  1. A preliminary study on improving the recognition of esophageal speech using a hybrid system based on statistical voice conversion.

    PubMed

    Lachhab, Othman; Di Martino, Joseph; Elhaj, Elhassane Ibn; Hammouch, Ahmed

    2015-01-01

    In this paper, we propose a hybrid system based on a modified statistical GMM voice conversion algorithm for improving the recognition of esophageal speech. This hybrid system aims to compensate for the distorted information present in the esophageal acoustic features by using a voice conversion method. The esophageal speech is converted into a "target" laryngeal speech using an iterative statistical estimation of a transformation function. We did not apply a speech synthesizer for reconstructing the converted speech signal, given that the converted Mel cepstral vectors are used directly as input of our speech recognition system. Furthermore the feature vectors are linearly transformed by the HLDA (heteroscedastic linear discriminant analysis) method to reduce their size in a smaller space having good discriminative properties. The experimental results demonstrate that our proposed system provides an improvement of the phone recognition accuracy with an absolute increase of 3.40 % when compared with the phone recognition accuracy obtained with neither HLDA nor voice conversion.

  2. Thai Automatic Speech Recognition

    DTIC Science & Technology

    2005-01-01

    reported elsewhere. 1. Introduction This research was performed as part of the DARPA-Babylon program aimed at rapidly developing multilingual speech-to...used in an external DARPA evaluation involving medical scenarios between an American Doctor and a naïve monolingual Thai patient. 2. Thai Language...To create more general acoustic models we collected read speech data from native speakers based on the concepts of our multilingual data collection

  3. Department of Cybernetic Acoustics

    NASA Astrophysics Data System (ADS)

    The development of the theory, instrumentation and applications of methods and systems for the measurement, analysis, processing and synthesis of acoustic signals within the audio frequency range, particularly of the speech signal and the vibro-acoustic signal emitted by technical and industrial equipments treated as noise and vibration sources was discussed. The research work, both theoretical and experimental, aims at applications in various branches of science, and medicine, such as: acoustical diagnostics and phoniatric rehabilitation of pathological and postoperative states of the speech organ; bilateral ""man-machine'' speech communication based on the analysis, recognition and synthesis of the speech signal; vibro-acoustical diagnostics and continuous monitoring of the state of machines, technical equipments and technological processes.

  4. The effect of boundary shape to acoustic parameters

    NASA Astrophysics Data System (ADS)

    Prawirasasra, M. S.; Sampurna, R.; Suwandi

    2016-11-01

    To design a room in term of acoustic, many variables need to be considered such as volume, acoustic characteristics & surface area of material and also boundary shape. Modifying each variable possibly change the sound field character. To find impact of boundary shape, every needed properties is simulated through acoustic prediction software. The simulation is using three models with different geometry (asymmetry and symmetry) to produce certain objective parameters. By applying just noticeable difference (JND), the effect is considered known. Furthermore, individual perception is needed to gain subjective parameter. The test is using recorded speech that is convoluted with room impulse of each model. The result indicates that 84% of participants could not recognize the speech which is emit from different geometry properties. In contrast, JND value of T30 is exceed 5%. But for D50, every model has JND below 5%.

  5. Speech and respiration.

    PubMed

    Conrad, B; Schönle, P

    1979-04-12

    This investigation deals with the temporal aspects of air volume changes during speech. Speech respiration differs fundamentally from resting respiration. In resting respiration the duration and velocity of inspiration (air flow or lung volume change) are in a range similar to that of expiration. In speech respiration the duration of inspiration decreases and its velocity increases; conversely, the duration of expiration increases and the volume of air flow decreases dramatically. The following questions arise: are these two respiration types different entities, or do they represent the end points of a continuum from resting to speech respiration? How does articulation without the generation of speech sound affect breathing? Does (verbalized?) thinking without articulation or speech modify the breathing pattern? The main test battery included four tasks (spontaneous speech, reading, serial speech, arithmetic) performed under three conditions (speaking aloud, articulating subvocally, quiet performance by tryping to exclusively 'think' the tasks). Respiratory movements were measured with a chest pneumograph and evaluated in comparison with a phonogram and the identified spoken text. For quiet performance the resulting respiratory time ratio (relation of duration of inspiration versus expiration) showed a gradual shift in the direction of speech respiration--the least for reading, the most for arithmetic. This change was even more apparent for the subvocal tasks. It is concluded that (a) there is a gradual automatic change from resting to speech respiration and (b) the degree of internal verbalization (activation of motor speech areas) defines the degree of activation of the speech respiratory pattern.

  6. Production of Syntactic Stress in Alaryngeal Speech.

    ERIC Educational Resources Information Center

    Gandour, Jack; Weinberg, Bernd

    1985-01-01

    Reports on an acoustical investigation of syntactic stress in alaryngeal speech. Measurements were made of fundamental frequency, relative intensity, vowel duration, and intersyllable duration. Findings suggest that stress contrasts in alaryngeal speech are based on a complex of acoustic cues which are influenced by linguistic structure.…

  7. Cylindrical and spherical dust-ion-acoustic modified Gardner solitons in dusty plasmas with two-temperature superthermal electrons

    SciTech Connect

    Alam, M. S.; Masud, M. M.; Mamun, A. A.

    2013-12-15

    A rigorous theoretical investigation has been performed on the propagation of cylindrical and spherical Gardner solitons (GSs) associated with dust-ion-acoustic (DIA) waves in a dusty plasma consisting of inertial ions, negatively charged immobile dust, and two populations of kappa distributed electrons having two distinct temperatures. The well-known reductive perturbation method has been used to derive the modified Gardner (mG) equation. The basic features (amplitude, width, polarity, etc.) of nonplanar DIA modified Gardner solitons (mGSs) have been thoroughly examined by the numerical analysis of the mG equation. It has been found that the characteristics of the nonplanar DIA mGSs significantly differ from those of planar ones. It has been also observed that kappa distributed electrons with two distinct temperatures significantly modify the basic properties of the DIA solitary waves and that the plasma system under consideration supports both compressive and rarefactive DIA mGSs. The present investigation should play an important role for understanding localized electrostatic disturbances in space and laboratory dusty plasmas where stationary negatively charged dust, inertial ions, and superthermal electrons with two distinct temperatures are omnipresent ingredients.

  8. Linear degrees of freedom in speech production: analysis of cineradio- and labio-film data and articulatory-acoustic modeling.

    PubMed

    Beautemps, D; Badin, P; Bailly, G

    2001-05-01

    The following contribution addresses several issues concerning speech degrees of freedom in French oral vowels, stop, and fricative consonants based on an analysis of tongue and lip shapes extracted from cineradio- and labio-films. The midsagittal tongue shapes have been submitted to a linear decomposition where some of the loading factors were selected such as jaw and larynx position while four other components were derived from principal component analysis (PCA). For the lips, in addition to the more traditional protrusion and opening components, a supplementary component was extracted to explain the upward movement of both the upper and lower lips in [v] production. A linear articulatory model was developed; the six tongue degrees of freedom were used as the articulatory control parameters of the midsagittal tongue contours and explained 96% of the tongue data variance. These control parameters were also used to specify the frontal lip width dimension derived from the labio-film front views. Finally, this model was complemented by a conversion model going from the midsagittal to the area function, based on a fitting of the midsagittal distances and the formant frequencies for both vowels and consonants.

  9. Speech Analysis Systems: An Evaluation.

    ERIC Educational Resources Information Center

    Read, Charles; And Others

    1992-01-01

    Performance characteristics are reviewed for seven computerized systems marketed for acoustic speech analysis: CSpeech, CSRE, ILS-PC, Kay Elemetrics model 550 Sona-Graph, MacSpeech Lab II, MSL, and Signalyze. Characteristics reviewed include system components, basic capabilities, documentation, user interface, data formats and journaling, and…

  10. Automatic speech recognition

    NASA Astrophysics Data System (ADS)

    Espy-Wilson, Carol

    2005-04-01

    Great strides have been made in the development of automatic speech recognition (ASR) technology over the past thirty years. Most of this effort has been centered around the extension and improvement of Hidden Markov Model (HMM) approaches to ASR. Current commercially-available and industry systems based on HMMs can perform well for certain situational tasks that restrict variability such as phone dialing or limited voice commands. However, the holy grail of ASR systems is performance comparable to humans-in other words, the ability to automatically transcribe unrestricted conversational speech spoken by an infinite number of speakers under varying acoustic environments. This goal is far from being reached. Key to the success of ASR is effective modeling of variability in the speech signal. This tutorial will review the basics of ASR and the various ways in which our current knowledge of speech production, speech perception and prosody can be exploited to improve robustness at every level of the system.

  11. Application of modified integration rule to time-domain finite-element acoustic simulation of rooms.

    PubMed

    Okuzono, Takeshi; Otsuru, Toru; Tomiku, Reiji; Okamoto, Noriko

    2012-08-01

    The applicability of the modified integration rule for time-domain finite-element analysis is tested in sound field analysis of rooms involving rectangular elements, distorted elements, and finite impedance boundary conditions. Dispersion error analysis in three dimensions is conducted to evaluate the dispersion error in time-domain finite-element analysis using eight-node hexahedral elements. The results of analysis confirmed that fourth-order accuracy with respect to dispersion error is obtainable using the Fox-Goodwin method (FG) with a modified integration rule, even for rectangular elements. The stability condition in three-dimensional analysis using the modified integration rule is also presented. Numerical experiments demonstrate that FG with a modified integration rule performs much better than FG with the conventional integration rule for problems with rectangular elements, distorted elements, and with finite impedance boundary conditions. Further, as another advantage, numerical results revealed that the use of modified integration rule engenders faster convergence of the iterative solver than a conventional rule for problems with the same degrees of freedom.

  12. Contributions of speech science to the technology of man-machine voice interactions

    NASA Technical Reports Server (NTRS)

    Lea, Wayne A.

    1977-01-01

    Research in speech understanding was reviewed. Plans which include prosodics research, phonological rules for speech understanding systems, and continued interdisciplinary phonetics research are discussed. Improved acoustic phonetic analysis capabilities in speech recognizers are suggested.

  13. Analogy instruction and speech performance under psychological stress.

    PubMed

    Tse, Andy C Y; Wong, Andus W-K; Whitehill, Tara L; Ma, Estella P-M; Masters, Rich S W

    2014-03-01

    To examine the efficacy of explicit and implicit forms of instruction for speech motor performance under conditions of psychological stress. In experiment 1, 20 participants were asked to deliver a formal presentation to validate the modified Trier Social Stress Test (TSST). In experiment 2, 40 participants were instructed explicitly by verbal explanation or implicitly by analogy to speak with minimum pitch variation and were subjected to psychological stress using the modified TSST. Acoustic correlates of pitch height (mean fundamental frequency) and pitch variation (standard deviation of fundamental frequency) significantly increased in experiment 1 when participants delivered a speech under modified TSST condition. In experiment 2, explicitly instructed participants were unable to maintain minimum pitch variation under psychological pressure caused by the modified TSST, whereas analogy-instructed participants maintained minimal pitch variation. The findings are consistent with existing evidence that analogy instructions may result in characteristics of implicit motor learning, such as greater stability of performance under pressure. Analogy instructions may therefore benefit speech motor performance and might provide a useful clinical tool for treatment of speech-disordered populations.

  14. Speech transmission index from running speech: A neural network approach

    NASA Astrophysics Data System (ADS)

    Li, F. F.; Cox, T. J.

    2003-04-01

    Speech transmission index (STI) is an important objective parameter concerning speech intelligibility for sound transmission channels. It is normally measured with specific test signals to ensure high accuracy and good repeatability. Measurement with running speech was previously proposed, but accuracy is compromised and hence applications limited. A new approach that uses artificial neural networks to accurately extract the STI from received running speech is developed in this paper. Neural networks are trained on a large set of transmitted speech examples with prior knowledge of the transmission channels' STIs. The networks perform complicated nonlinear function mappings and spectral feature memorization to enable accurate objective parameter extraction from transmitted speech. Validations via simulations demonstrate the feasibility of this new method on a one-net-one-speech extract basis. In this case, accuracy is comparable with normal measurement methods. This provides an alternative to standard measurement techniques, and it is intended that the neural network method can facilitate occupied room acoustic measurements.

  15. Language Specific Speech Perception and the Onset of Reading.

    ERIC Educational Resources Information Center

    Burnham, Denis

    2003-01-01

    Investigates the degree to which native speech perception is superior to non-native speech perception. Shows that language specific speech perception is a linguistic rather than an acoustic phenomenon. Discusses results in terms of early speech perception abilities, experience with oral communication, cognitive ability, alphabetic versus…

  16. The Effectiveness of Clear Speech as a Masker

    ERIC Educational Resources Information Center

    Calandruccio, Lauren; Van Engen, Kristin; Dhar, Sumitrajit; Bradlow, Ann R.

    2010-01-01

    Purpose: It is established that speaking clearly is an effective means of enhancing intelligibility. Because any signal-processing scheme modeled after known acoustic-phonetic features of clear speech will likely affect both target and competing speech, it is important to understand how speech recognition is affected when a competing speech signal…

  17. Experiment in Learning to Discriminate Frequency Transposed Speech.

    ERIC Educational Resources Information Center

    Ahlstrom, K.G.; And Others

    In order to improve speech perception by transposing the speech signals to lower frequencies, to determine which aspects of the information in the acoustic speech signals were influenced by transposition, and to compare two different methods of training speech perception, 44 subjects were trained to discriminate between transposed words or…

  18. Speech perception as categorization

    PubMed Central

    Holt, Lori L.; Lotto, Andrew J.

    2010-01-01

    Speech perception (SP) most commonly refers to the perceptual mapping from the highly variable acoustic speech signal to a linguistic representation, whether it be phonemes, diphones, syllables, or words. This is an example of categorization, in that potentially discriminable speech sounds are assigned to functionally equivalent classes. In this tutorial, we present some of the main challenges to our understanding of the categorization of speech sounds and the conceptualization of SP that has resulted from these challenges. We focus here on issues and experiments that define open research questions relevant to phoneme categorization, arguing that SP is best understood as perceptual categorization, a position that places SP in direct contact with research from other areas of perception and cognition. PMID:20601702

  19. A modified beam-to-earth transformation to measure short-wavelength internal waves with an acoustic Doppler current profiler

    USGS Publications Warehouse

    Scotti, A.; Butman, B.; Beardsley, R.C.; Alexander, P.S.; Anderson, S.

    2005-01-01

    The algorithm used to transform velocity signals from beam coordinates to earth coordinates in an acoustic Doppler current profiler (ADCP) relies on the assumption that the currents are uniform over the horizontal distance separating the beams. This condition may be violated by (nonlinear) internal waves, which can have wavelengths as small as 100-200 m. In this case, the standard algorithm combines velocities measured at different phases of a wave and produces horizontal velocities that increasingly differ from true velocities with distance from the ADCP. Observations made in Massachusetts Bay show that currents measured with a bottom-mounted upward-looking ADCP during periods when short-wavelength internal waves are present differ significantly from currents measured by point current meters, except very close to the instrument. These periods are flagged with high error velocities by the standard ADCP algorithm. In this paper measurements from the four spatially diverging beams and the backscatter intensity signal are used to calculate the propagation direction and celerity of the internal waves. Once this information is known, a modified beam-to-earth transformation that combines appropriately lagged beam measurements can be used to obtain current estimates in earth coordinates that compare well with pointwise measurements. ?? 2005 American Meteorological Society.

  20. Inverse Material Identification in Coupled Acoustic-Structure Interaction using a Modified Error in Constitutive Equation Functional

    PubMed Central

    Warner, James E.; Diaz, Manuel I.; Aquino, Wilkins; Bonnet, Marc

    2014-01-01

    This work focuses on the identification of heterogeneous linear elastic moduli in the context of frequency-domain, coupled acoustic-structure interaction (ASI), using either solid displacement or fluid pressure measurement data. The approach postulates the inverse problem as an optimization problem where the solution is obtained by minimizing a modified error in constitutive equation (MECE) functional. The latter measures the discrepancy in the constitutive equations that connect kinematically admissible strains and dynamically admissible stresses, while incorporating the measurement data as additional quadratic error terms. We demonstrate two strategies for selecting the MECE weighting coefficient to produce regularized solutions to the ill-posed identification problem: 1) the discrepancy principle of Morozov, and 2) an error-balance approach that selects the weight parameter as the minimizer of another functional involving the ECE and the data misfit. Numerical results demonstrate that the proposed methodology can successfully recover elastic parameters in 2D and 3D ASI systems from response measurements taken in either the solid or fluid subdomains. Furthermore, both regularization strategies are shown to produce accurate reconstructions when the measurement data is polluted with noise. The discrepancy principle is shown to produce nearly optimal solutions, while the error-balance approach, although not optimal, remains effective and does not need a priori information on the noise level. PMID:25339790

  1. Speech prosody in cerebellar ataxia

    NASA Astrophysics Data System (ADS)

    Casper, Maureen

    The present study sought an acoustic signature for the speech disturbance recognized in cerebellar degeneration. Magnetic resonance imaging was used for a radiological rating of cerebellar involvement in six cerebellar ataxic dysarthric speakers. Acoustic measures of the [pap] syllables in contrastive prosodic conditions and of normal vs. brain-damaged patients were used to further our understanding both of the speech degeneration that accompanies cerebellar pathology and of speech motor control and movement in general. Pair-wise comparisons of the prosodic conditions within the normal group showed statistically significant differences for four prosodic contrasts. For three of the four contrasts analyzed, the normal speakers showed both longer durations and higher formant and fundamental frequency values in the more prominent first condition of the contrast. The acoustic measures of the normal prosodic contrast values were then used as a model to measure the degree of speech deterioration for individual cerebellar subjects. This estimate of speech deterioration as determined by individual differences between cerebellar and normal subjects' acoustic values of the four prosodic contrasts was used in correlation analyses with MRI ratings. Moderate correlations between speech deterioration and cerebellar atrophy were found in the measures of syllable duration and f0. A strong negative correlation was found for F1. Moreover, the normal model presented by these acoustic data allows for a description of the flexibility of task- oriented behavior in normal speech motor control. These data challenge spatio-temporal theory which explains movement as an artifact of time wherein longer durations predict more extreme movements and give further evidence for gestural internal dynamics of movement in which time emerges from articulatory events rather than dictating those events. This model provides a sensitive index of cerebellar pathology with quantitative acoustic

  2. Production and perception of clear speech

    NASA Astrophysics Data System (ADS)

    Bradlow, Ann R.

    2003-04-01

    When a talker believes that the listener is likely to have speech perception difficulties due to a hearing loss, background noise, or a different native language, she or he will typically adopt a clear speaking style. Previous research has established that, with a simple set of instructions to the talker, ``clear speech'' can be produced by most talkers under laboratory recording conditions. Furthermore, there is reliable evidence that adult listeners with either impaired or normal hearing typically find clear speech more intelligible than conversational speech. Since clear speech production involves listener-oriented articulatory adjustments, a careful examination of the acoustic-phonetic and perceptual consequences of the conversational-to-clear speech transformation can serve as an effective window into talker- and listener-related forces in speech communication. Furthermore, clear speech research has considerable potential for the development of speech enhancement techniques. After reviewing previous and current work on the acoustic properties of clear versus conversational speech, this talk will present recent data from a cross-linguistic study of vowel production in clear speech and a cross-population study of clear speech perception. Findings from these studies contribute to an evolving view of clear speech production and perception as reflecting both universal, auditory and language-specific, phonological contrast enhancement features.

  3. An assessment of computer model techniques to predict quantitative and qualitative measures of speech perception in university classrooms for varying room sizes and noise levels

    NASA Astrophysics Data System (ADS)

    Kim, Hyeong-Seok

    The objective of this dissertation was to assess the use of computer modeling techniques to predict quantitative and qualitative measures of speech perception in classrooms under realistic conditions of background noise and reverberation. Secondary objectives included (1) finding relationships among acoustical measurements made in actual classrooms and in the computer models of the actual rooms as a prediction tool of 15 acoustic parameters at the design stage of projects and (2) finding relationships among speech perception scores and 15 acoustic parameters to determine the best predictors of speech perception in actual classroom conditions. Fifteen types of acoustical measurements were made in three actual classrooms with reverberation times of 0.5, 1.3, and 5.1 seconds. Speech perception tests using a Modified Rhyme Test list were also given to 22 subject in each room with five noise conditions of signal-to-noise ratios of 31, 24, 15, 0, -10. Computer models of the rooms were constructed using a commercially available computer model software program. The 15 acoustical measurements were made at 6 or 9 locations in the model rooms. Impulse responses obtained in the computer models of the rooms were convolved with the anechoically recorded speech tests used in the full size rooms to produce a compact disk with the MRT lists with the acoustical response of the computer model rooms. Speech perception tests using this as source material were given to the subjects over loudspeaker in an acoustic test booth. The results of the study showed correlations (R2) of between acoustical measures made in the full size classrooms and the computer models of the classrooms of 0.92 to 0.99 with standard errors of 0.033 to 7.311. Comparisons between speech perception scores tested in the rooms and acoustical measurements made in the rooms and in the computer models of the classrooms showed that the measures have similar prediction accuracy with other studies in the literatures. The

  4. Localization of Sublexical Speech Perception Components

    PubMed Central

    Turkeltaub, Peter E; Coslett, H. Branch

    2010-01-01

    Models of speech perception are in general agreement with respect to the major cortical regions involved, but lack precision with regard to localization and lateralization of processing units. To refine these models we conducted two Activation Likelihood Estimation (ALE) meta-analyses of the neuroimaging literature on sublexical speech perception. Based on foci reported in 23 fMRI experiments, we identified significant activation likelihoods in left and right superior temporal cortex and the left posterior middle frontal gyrus. Subanalyses examining phonetic and phonological processes revealed only left mid-posterior superior temporal sulcus activation likelihood. A lateralization analysis demonstrated temporal lobe left lateralization in terms of magnitude, extent, and consistency of activity. Experiments requiring explicit attention to phonology drove this lateralization. An ALE analysis of eight fMRI studies on categorical phoneme perception revealed significant activation likelihood in the left supramarginal gyrus and angular gyrus. These results are consistent with a speech processing network in which the bilateral superior temporal cortices perform acoustic analysis of speech and nonspeech auditory stimuli, the left mid-posterior superior temporal sulcus performs phonetic and phonological analysis, and the left inferior parietal lobule is involved in detection of differences between phoneme categories. These results modify current speech perception models in three ways: 1) specifying the most likely locations of dorsal stream processing units, 2) clarifying that phonetic and phonological superior temporal sulcus processing is left lateralized and localized to the mid-posterior portion, and 3) suggesting that both the supramarginal gyrus and angular gyrus may be involved in phoneme discrimination. PMID:20413149

  5. Predicting Speech Intelligibility with A Multiple Speech Subsystems Approach in Children with Cerebral Palsy

    PubMed Central

    Lee, Jimin; Hustad, Katherine C.; Weismer, Gary

    2014-01-01

    Purpose Speech acoustic characteristics of children with cerebral palsy (CP) were examined with a multiple speech subsystem approach; speech intelligibility was evaluated using a prediction model in which acoustic measures were selected to represent three speech subsystems. Method Nine acoustic variables reflecting different subsystems, and speech intelligibility, were measured in 22 children with CP. These children included 13 with a clinical diagnosis of dysarthria (SMI), and nine judged to be free of dysarthria (NSMI). Data from children with CP were compared to data from age-matched typically developing children (TD). Results Multiple acoustic variables reflecting the articulatory subsystem were different in the SMI group, compared to the NSMI and TD groups. A significant speech intelligibility prediction model was obtained with all variables entered into the model (Adjusted R-squared = .801). The articulatory subsystem showed the most substantial independent contribution (58%) to speech intelligibility. Incremental R-squared analyses revealed that any single variable explained less than 9% of speech intelligibility variability. Conclusions Children in the SMI group have articulatory subsystem problems as indexed by acoustic measures. As in the adult literature, the articulatory subsystem makes the primary contribution to speech intelligibility variance in dysarthria, with minimal or no contribution from other systems. PMID:24824584

  6. Perception of acoustic scale and size in musical instrument sounds

    PubMed Central

    van Dinther, Ralph; Patterson, Roy D.

    2010-01-01

    There is size information in natural sounds. For example, as humans grow in height, their vocal tracts increase in length, producing a predictable decrease in the formant frequencies of speech sounds. Recent studies have shown that listeners can make fine discriminations about which of two speakers has the longer vocal tract, supporting the view that the auditory system discriminates changes on the acoustic-scale dimension. Listeners can also recognize vowels scaled well beyond the range of vocal tracts normally experienced, indicating that perception is robust to changes in acoustic scale. This paper reports two perceptual experiments designed to extend research on acoustic scale and size perception to the domain of musical sounds: The first study shows that listeners can discriminate the scale of musical instrument sounds reliably, although not quite as well as for voices. The second experiment shows that listeners can recognize the family of an instrument sound which has been modified in pitch and scale beyond the range of normal experience. We conclude that processing of acoustic scale in music perception is very similar to processing of acoustic scale in speech perception. PMID:17069313

  7. Speech endpoint detection with non-language speech sounds for generic speech processing applications

    NASA Astrophysics Data System (ADS)

    McClain, Matthew; Romanowski, Brian

    2009-05-01

    Non-language speech sounds (NLSS) are sounds produced by humans that do not carry linguistic information. Examples of these sounds are coughs, clicks, breaths, and filled pauses such as "uh" and "um" in English. NLSS are prominent in conversational speech, but can be a significant source of errors in speech processing applications. Traditionally, these sounds are ignored by speech endpoint detection algorithms, where speech regions are identified in the audio signal prior to processing. The ability to filter NLSS as a pre-processing step can significantly enhance the performance of many speech processing applications, such as speaker identification, language identification, and automatic speech recognition. In order to be used in all such applications, NLSS detection must be performed without the use of language models that provide knowledge of the phonology and lexical structure of speech. This is especially relevant to situations where the languages used in the audio are not known apriori. We present the results of preliminary experiments using data from American and British English speakers, in which segments of audio are classified as language speech sounds (LSS) or NLSS using a set of acoustic features designed for language-agnostic NLSS detection and a hidden-Markov model (HMM) to model speech generation. The results of these experiments indicate that the features and model used are capable of detection certain types of NLSS, such as breaths and clicks, while detection of other types of NLSS such as filled pauses will require future research.

  8. Interactions between distal speech rate, linguistic knowledge, and speech environment.

    PubMed

    Morrill, Tuuli; Baese-Berk, Melissa; Heffner, Christopher; Dilley, Laura

    2015-10-01

    During lexical access, listeners use both signal-based and knowledge-based cues, and information from the linguistic context can affect the perception of acoustic speech information. Recent findings suggest that the various cues used in lexical access are implemented with flexibility and may be affected by information from the larger speech context. We conducted 2 experiments to examine effects of a signal-based cue (distal speech rate) and a knowledge-based cue (linguistic structure) on lexical perception. In Experiment 1, we manipulated distal speech rate in utterances where an acoustically ambiguous critical word was either obligatory for the utterance to be syntactically well formed (e.g., Conner knew that bread and butter (are) both in the pantry) or optional (e.g., Don must see the harbor (or) boats). In Experiment 2, we examined identical target utterances as in Experiment 1 but changed the distribution of linguistic structures in the fillers. The results of the 2 experiments demonstrate that speech rate and linguistic knowledge about critical word obligatoriness can both influence speech perception. In addition, it is possible to alter the strength of a signal-based cue by changing information in the speech environment. These results provide support for models of word segmentation that include flexible weighting of signal-based and knowledge-based cues.

  9. Why Impromptu Speech Is Easy To Understand.

    ERIC Educational Resources Information Center

    Le Feal, K. Dejean

    Impromptu speech is characterized by the simultaneous processes of ideation (the elaboration and structuring of reasoning by the speaker as he improvises) and expression in the speaker. Other elements accompany this characteristic: division of speech flow into short segments, acoustic relief in the form of word stress following a pause, and both…

  10. Perception of Silent Pauses in Continuous Speech.

    ERIC Educational Resources Information Center

    Duez, Danielle

    1985-01-01

    Investigates the silent pauses in continuous speech in three genres: political speeches, political interviews, and casual interviews in order to see how the semantic-syntactic information of the message, the duration of silent pauses, and the acoustic environment of these pauses interact to produce the listener's perception of pauses. (Author/SED)

  11. Voice Modulations in German Ironic Speech

    ERIC Educational Resources Information Center

    Scharrer, Lisa; Christmann, Ursula; Knoll, Monja

    2011-01-01

    Previous research has shown that in different languages ironic speech is acoustically modulated compared to literal speech, and these modulations are assumed to aid the listener in the comprehension process by acting as cues that mark utterances as ironic. The present study was conducted to identify paraverbal features of German "ironic…

  12. Integrated speech enhancement for functional MRI environment.

    PubMed

    Pathak, Nishank; Milani, Ali A; Panahi, Issa; Briggs, Richard

    2009-01-01

    This paper presents an integrated speech enhancement (SE) method for the noisy MRI environment. We show that the performance of SE system improves considerably when the speech signal dominated by MRI acoustic noise at very low SNR is enhanced in two successive stages using two-channel SE methods followed by a single-channel post processing SE algorithm. Actual MRI noisy speech data are used in our experiments showing the improved performance of the proposed SE method.

  13. Speech Music Discrimination Using Class-Specific Features

    DTIC Science & Technology

    2004-08-01

    Speech Music Discrimination Using Class-Specific Features Thomas Beierholm...between speech and music . Feature extraction is class-specific and can therefore be tailored to each class meaning that segment size, model orders...interest. Some of the applications of audio signal classification are speech/ music classification [1], acoustical environmental classification [2][3

  14. Speech for the Deaf Child: Knowledge and Use.

    ERIC Educational Resources Information Center

    Connor, Leo E., Ed.

    Presented is a collection of 16 papers on speech development, handicaps, teaching methods, and educational trends for the aurally handicapped child. Arthur Boothroyd relates acoustic phonetics to speech teaching, and Jean Utley Lehman investigates a scheme of linguistic organization. Differences in speech production by deaf and normal hearing…

  15. Breathing-Impaired Speech after Brain Haemorrhage: A Case Study

    ERIC Educational Resources Information Center

    Heselwood, Barry

    2007-01-01

    Results are presented from an auditory and acoustic analysis of the speech of an adult male with impaired prosody and articulation due to brain haemorrhage. They show marked effects on phonation, speech rate and articulator velocity, and a speech rhythm disrupted by "intrusive" stresses. These effects are discussed in relation to the speaker's…

  16. Nonlinear Statistical Modeling of Speech

    NASA Astrophysics Data System (ADS)

    Srinivasan, S.; Ma, T.; May, D.; Lazarou, G.; Picone, J.

    2009-12-01

    Contemporary approaches to speech and speaker recognition decompose the problem into four components: feature extraction, acoustic modeling, language modeling and search. Statistical signal processing is an integral part of each of these components, and Bayes Rule is used to merge these components into a single optimal choice. Acoustic models typically use hidden Markov models based on Gaussian mixture models for state output probabilities. This popular approach suffers from an inherent assumption of linearity in speech signal dynamics. Language models often employ a variety of maximum entropy techniques, but can employ many of the same statistical techniques used for acoustic models. In this paper, we focus on introducing nonlinear statistical models to the feature extraction and acoustic modeling problems as a first step towards speech and speaker recognition systems based on notions of chaos and strange attractors. Our goal in this work is to improve the generalization and robustness properties of a speech recognition system. Three nonlinear invariants are proposed for feature extraction: Lyapunov exponents, correlation fractal dimension, and correlation entropy. We demonstrate an 11% relative improvement on speech recorded under noise-free conditions, but show a comparable degradation occurs for mismatched training conditions on noisy speech. We conjecture that the degradation is due to difficulties in estimating invariants reliably from noisy data. To circumvent these problems, we introduce two dynamic models to the acoustic modeling problem: (1) a linear dynamic model (LDM) that uses a state space-like formulation to explicitly model the evolution of hidden states using an autoregressive process, and (2) a data-dependent mixture of autoregressive (MixAR) models. Results show that LDM and MixAR models can achieve comparable performance with HMM systems while using significantly fewer parameters. Currently we are developing Bayesian parameter estimation and

  17. Classroom Acoustics: Understanding Barriers to Learning.

    ERIC Educational Resources Information Center

    Crandell, Carl C., Ed.; Smaldino, Joseph J., Ed.

    2001-01-01

    This booklet explores classroom acoustics and their importance on the learning potential of children with hearing loss and related disabilities. The booklet also reviews research on classroom acoustics and the need for the development of classroom acoustics standards. Chapters examine: 1) a speech-perception model demonstrating the linkage between…

  18. Perception of Speech Reflects Optimal Use of Probabilistic Speech Cues

    ERIC Educational Resources Information Center

    Clayards, Meghan; Tanenhaus, Michael K.; Aslin, Richard N.; Jacobs, Robert A.

    2008-01-01

    Listeners are exquisitely sensitive to fine-grained acoustic detail within phonetic categories for sounds and words. Here we show that this sensitivity is optimal given the probabilistic nature of speech cues. We manipulated the probability distribution of one probabilistic cue, voice onset time (VOT), which differentiates word initial labial…

  19. Automatic speech recognition in cocktail-party situations: a specific training for separated speech.

    PubMed

    Marti, Amparo; Cobos, Maximo; Lopez, Jose J

    2012-02-01

    Automatic speech recognition (ASR) refers to the task of extracting a transcription of the linguistic content of an acoustical speech signal automatically. Despite several decades of research in this important area of acoustic signal processing, the accuracy of ASR systems is still far behind human performance, especially in adverse acoustic scenarios. In this context, one of the most challenging situations is the one concerning simultaneous speech in cocktail-party environments. Although source separation methods have already been investigated to deal with this problem, the separation process is not perfect and the resulting artifacts pose an additional problem to ASR performance. In this paper, a specific training to improve the percentage of recognized words in real simultaneous speech cases is proposed. The combination of source separation and this specific training is explored and evaluated under different acoustical conditions, leading to improvements of up to a 35% in ASR performance.

  20. Modified Ion-Acoustic Shock Waves and Double Layers in a Degenerate Electron-Positron-Ion Plasma in Presence of Heavy Negative Ions

    NASA Astrophysics Data System (ADS)

    Hossen, M. A.; Hossen, M. R.; Mamun, A. A.

    2014-12-01

    A general theory for nonlinear propagation of one dimensional modified ion-acoustic waves in an unmagnetized electron-positron-ion (e-p-i) degenerate plasma is investigated. This plasma system is assumed to contain relativistic electron and positron fluids, non-degenerate viscous positive ions, and negatively charged static heavy ions. The modified Burgers and Gardner equations have been derived by employing the reductive perturbation method and analyzed in order to identify the basic features (polarity, width, speed, etc.) of shock and double layer (DL) structures. It is observed that the basic features of these shock and DL structures obtained from this analysis are significantly different from those obtained from the analysis of standard Gardner or Burgers equations. The implications of these results in space and interstellar compact objects (viz. non-rotating white dwarfs, neutron stars, etc.) are also briefly mentioned.

  1. Results of tests performed on the Acoustic Quiet Flow Facility Three-Dimensional Model Tunnel: Report on the Modified D.S.M.A. Design

    NASA Technical Reports Server (NTRS)

    Barna, P. S.

    1996-01-01

    Numerous tests were performed on the original Acoustic Quiet Flow Facility Three-Dimensional Model Tunnel, scaled down from the full-scale plans. Results of tests performed on the original scale model tunnel were reported in April 1995, which clearly showed that this model was lacking in performance. Subsequently this scale model was modified to attempt to possibly improve the tunnel performance. The modifications included: (a) redesigned diffuser; (b) addition of a collector; (c) addition of a Nozzle-Diffuser; (d) changes in location of vent-air. Tests performed on the modified tunnel showed a marked improvement in performance amounting to a nominal increase of pressure recovery in the diffuser from 34 percent to 54 percent. Results obtained in the tests have wider application. They may also be applied to other tunnels operating with an open test section not necessarily having similar geometry as the model under consideration.

  2. Utilizing computer models for optimizing classroom acoustics

    NASA Astrophysics Data System (ADS)

    Hinckley, Jennifer M.; Rosenberg, Carl J.

    2002-05-01

    The acoustical conditions in a classroom play an integral role in establishing an ideal learning environment. Speech intelligibility is dependent on many factors, including speech loudness, room finishes, and background noise levels. The goal of this investigation was to use computer modeling techniques to study the effect of acoustical conditions on speech intelligibility in a classroom. This study focused on a simulated classroom which was generated using the CATT-acoustic computer modeling program. The computer was utilized as an analytical tool in an effort to optimize speech intelligibility in a typical classroom environment. The factors that were focused on were reverberation time, location of absorptive materials, and background noise levels. Speech intelligibility was measured with the Rapid Speech Transmission Index (RASTI) method.

  3. Speech Problems

    MedlinePlus

    ... and the respiratory system . The ability to understand language and produce speech is coordinated by the brain. So a person with brain damage from an accident, stroke, or birth defect may have speech and language problems. Some people with speech problems, particularly articulation ...

  4. Recognizing articulatory gestures from speech for robust speech recognition.

    PubMed

    Mitra, Vikramjit; Nam, Hosung; Espy-Wilson, Carol; Saltzman, Elliot; Goldstein, Louis

    2012-03-01

    Studies have shown that supplementary articulatory information can help to improve the recognition rate of automatic speech recognition systems. Unfortunately, articulatory information is not directly observable, necessitating its estimation from the speech signal. This study describes a system that recognizes articulatory gestures from speech, and uses the recognized gestures in a speech recognition system. Recognizing gestures for a given utterance involves recovering the set of underlying gestural activations and their associated dynamic parameters. This paper proposes a neural network architecture for recognizing articulatory gestures from speech and presents ways to incorporate articulatory gestures for a digit recognition task. The lack of natural speech database containing gestural information prompted us to use three stages of evaluation. First, the proposed gestural annotation architecture was tested on a synthetic speech dataset, which showed that the use of estimated tract-variable-time-functions improved gesture recognition performance. In the second stage, gesture-recognition models were applied to natural speech waveforms and word recognition experiments revealed that the recognized gestures can improve the noise-robustness of a word recognition system. In the final stage, a gesture-based Dynamic Bayesian Network was trained and the results indicate that incorporating gestural information can improve word recognition performance compared to acoustic-only systems.

  5. Embedding speech into virtual realities

    NASA Technical Reports Server (NTRS)

    Bohn, Christian-Arved; Krueger, Wolfgang

    1993-01-01

    In this work a speaker-independent speech recognition system is presented, which is suitable for implementation in Virtual Reality applications. The use of an artificial neural network in connection with a special compression of the acoustic input leads to a system, which is robust, fast, easy to use and needs no additional hardware, beside a common VR-equipment.

  6. Prosodic Contrasts in Ironic Speech

    ERIC Educational Resources Information Center

    Bryant, Gregory A.

    2010-01-01

    Prosodic features in spontaneous speech help disambiguate implied meaning not explicit in linguistic surface structure, but little research has examined how these signals manifest themselves in real conversations. Spontaneously produced verbal irony utterances generated between familiar speakers in conversational dyads were acoustically analyzed…

  7. Intelligibility of laryngectomees' substitute speech: automatic speech recognition and subjective rating.

    PubMed

    Schuster, Maria; Haderlein, Tino; Nöth, Elmar; Lohscheller, Jörg; Eysholdt, Ulrich; Rosanowski, Frank

    2006-02-01

    Substitute speech after laryngectomy is characterized by restricted aero-acoustic properties in comparison with laryngeal speech and has therefore lower intelligibility. Until now, an objective means to determine and quantify the intelligibility has not existed, although the intelligibility can serve as a global outcome parameter of voice restoration after laryngectomy. An automatic speech recognition system was applied on recordings of a standard text read by 18 German male laryngectomees with tracheoesophageal substitute speech. The system was trained with normal laryngeal speakers and not adapted to severely disturbed voices. Substitute speech was compared to laryngeal speech of a control group. Subjective evaluation of intelligibility was performed by a panel of five experts and compared to automatic speech evaluation. Substitute speech showed lower syllables/s and lower word accuracy than laryngeal speech. Automatic speech recognition for substitute speech yielded word accuracy between 10.0 and 50% (28.7+/-12.1%) with sufficient discrimination. It complied with experts' subjective evaluations of intelligibility. The multi-rater kappa of the experts alone did not differ from the multi-rater kappa of experts and the recognizer. Automatic speech recognition serves as a good means to objectify and quantify global speech outcome of laryngectomees. For clinical use, the speech recognition system will be adapted to disturbed voices and can also be applied in other languages.

  8. Virtual acoustics displays

    NASA Technical Reports Server (NTRS)

    Wenzel, Elizabeth M.; Fisher, Scott S.; Stone, Philip K.; Foster, Scott H.

    1991-01-01

    The real time acoustic display capabilities are described which were developed for the Virtual Environment Workstation (VIEW) Project at NASA-Ames. The acoustic display is capable of generating localized acoustic cues in real time over headphones. An auditory symbology, a related collection of representational auditory 'objects' or 'icons', can be designed using ACE (Auditory Cue Editor), which links both discrete and continuously varying acoustic parameters with information or events in the display. During a given display scenario, the symbology can be dynamically coordinated in real time with 3-D visual objects, speech, and gestural displays. The types of displays feasible with the system range from simple warnings and alarms to the acoustic representation of multidimensional data or events.

  9. Virtual acoustics displays

    NASA Astrophysics Data System (ADS)

    Wenzel, Elizabeth M.; Fisher, Scott S.; Stone, Philip K.; Foster, Scott H.

    1991-03-01

    The real time acoustic display capabilities are described which were developed for the Virtual Environment Workstation (VIEW) Project at NASA-Ames. The acoustic display is capable of generating localized acoustic cues in real time over headphones. An auditory symbology, a related collection of representational auditory 'objects' or 'icons', can be designed using ACE (Auditory Cue Editor), which links both discrete and continuously varying acoustic parameters with information or events in the display. During a given display scenario, the symbology can be dynamically coordinated in real time with 3-D visual objects, speech, and gestural displays. The types of displays feasible with the system range from simple warnings and alarms to the acoustic representation of multidimensional data or events.

  10. Applications for Subvocal Speech

    NASA Technical Reports Server (NTRS)

    Jorgensen, Charles; Betts, Bradley

    2007-01-01

    A research and development effort now underway is directed toward the use of subvocal speech for communication in settings in which (1) acoustic noise could interfere excessively with ordinary vocal communication and/or (2) acoustic silence or secrecy of communication is required. By "subvocal speech" is meant sub-audible electromyographic (EMG) signals, associated with speech, that are acquired from the surface of the larynx and lingual areas of the throat. Topics addressed in this effort include recognition of the sub-vocal EMG signals that represent specific original words or phrases; transformation (including encoding and/or enciphering) of the signals into forms that are less vulnerable to distortion, degradation, and/or interception; and reconstruction of the original words or phrases at the receiving end of a communication link. Potential applications include ordinary verbal communications among hazardous- material-cleanup workers in protective suits, workers in noisy environments, divers, and firefighters, and secret communications among law-enforcement officers and military personnel in combat and other confrontational situations.

  11. Experimental comparison between speech transmission index, rapid speech transmission index, and speech intelligibility index.

    PubMed

    Larm, Petra; Hongisto, Valtteri

    2006-02-01

    During the acoustical design of, e.g., auditoria or open-plan offices, it is important to know how speech can be perceived in various parts of the room. Different objective methods have been developed to measure and predict speech intelligibility, and these have been extensively used in various spaces. In this study, two such methods were compared, the speech transmission index (STI) and the speech intelligibility index (SII). Also the simplification of the STI, the room acoustics speech transmission index (RASTI), was considered. These quantities are all based on determining an apparent speech-to-noise ratio on selected frequency bands and summing them using a specific weighting. For comparison, some data were needed on the possible differences of these methods resulting from the calculation scheme and also measuring equipment. Their prediction accuracy was also of interest. Measurements were made in a laboratory having adjustable noise level and absorption, and in a real auditorium. It was found that the measurement equipment, especially the selection of the loudspeaker, can greatly affect the accuracy of the results. The prediction accuracy of the RASTI was found acceptable, if the input values for the prediction are accurately known, even though the studied space was not ideally diffuse.

  12. Robust Speech Rate Estimation for Spontaneous Speech

    PubMed Central

    Wang, Dagen; Narayanan, Shrikanth S.

    2010-01-01

    In this paper, we propose a direct method for speech rate estimation from acoustic features without requiring any automatic speech transcription. We compare various spectral and temporal signal analysis and smoothing strategies to better characterize the underlying syllable structure to derive speech rate. The proposed algorithm extends the methods of spectral subband correlation by including temporal correlation and the use of prominent spectral subbands for improving the signal correlation essential for syllable detection. Furthermore, to address some of the practical robustness issues in previously proposed methods, we introduce some novel components into the algorithm such as the use of pitch confidence for filtering spurious syllable envelope peaks, magnifying window for tackling neighboring syllable smearing, and relative peak measure thresholds for pseudo peak rejection. We also describe an automated approach for learning algorithm parameters from data, and find the optimal settings through Monte Carlo simulations and parameter sensitivity analysis. Final experimental evaluations are conducted based on a portion of the Switchboard corpus for which manual phonetic segmentation information, and published results for direct comparison are available. The results show a correlation coefficient of 0.745 with respect to the ground truth based on manual segmentation. This result is about a 17% improvement compared to the current best single estimator and a 11% improvement over the multiestimator evaluated on the same Switchboard database. PMID:20428476

  13. Acoustic neuroma

    MedlinePlus

    Vestibular schwannoma; Tumor - acoustic; Cerebellopontine angle tumor; Angle tumor; Hearing loss - acoustic; Tinnitus - acoustic ... Acoustic neuromas have been linked with the genetic disorder neurofibromatosis type 2 (NF2). Acoustic neuromas are uncommon.

  14. Filtered back-projection reconstruction of photo-acoustic imaging based on an modified wavelet threshold function

    NASA Astrophysics Data System (ADS)

    Ren, Zhong; Liu, Guodong; Huang, Zhen

    2016-10-01

    In this study, the filtered back-projection algorithm was used to reconstruct the photoacoustic imaging. To improve the quality of the reconstructed image, the wavelet threshold denoising method was combined into the filtered back-projection reconstruction algorithm. To obtain the reconstructed effect of the photoacoustic imaging, a modified wavelet threshold function was proposed. To verify the feasibility of the modified wavelet threshold function, the simulation experiments of the standard test phantom were performed by using three different wavelet threshold functions. Compared with the soft- and hard-threshold functions, the modified wavelet threshold function has better denoised and reconstructed effect. Moreover, the peak signal-to-noises ratio (PSNR) value of the modified function is largest, and its mean root square error (MRSE) value is lest than that of two others. Therefore, the filtered back-projection reconstruction algorithm combined with the modified wavelet threshold function has potential value in the reconstruction of the photoacoustic imaging.

  15. Articulatory-to-Acoustic Relations in Response to Speaking Rate and Loudness Manipulations

    ERIC Educational Resources Information Center

    Mefferd, Antje S.; Green, Jordan R.

    2010-01-01

    Purpose: In this investigation, the authors determined the strength of association between tongue kinematic and speech acoustics changes in response to speaking rate and loudness manipulations. Performance changes in the kinematic and acoustic domains were measured using two aspects of speech production presumably affecting speech clarity:…

  16. An Acoustic Study of the Relationships among Neurologic Disease, Dysarthria Type, and Severity of Dysarthria

    ERIC Educational Resources Information Center

    Kim, Yunjung; Kent, Raymond D.; Weismer, Gary

    2011-01-01

    Purpose: This study examined acoustic predictors of speech intelligibility in speakers with several types of dysarthria secondary to different diseases and conducted classification analysis solely by acoustic measures according to 3 variables (disease, speech severity, and dysarthria type). Method: Speech recordings from 107 speakers with…

  17. A neural mechanism for recognizing speech spoken by different speakers.

    PubMed

    Kreitewolf, Jens; Gaudrain, Etienne; von Kriegstein, Katharina

    2014-05-01

    Understanding speech from different speakers is a sophisticated process, particularly because the same acoustic parameters convey important information about both the speech message and the person speaking. How the human brain accomplishes speech recognition under such conditions is unknown. One view is that speaker information is discarded at early processing stages and not used for understanding the speech message. An alternative view is that speaker information is exploited to improve speech recognition. Consistent with the latter view, previous research identified functional interactions between the left- and the right-hemispheric superior temporal sulcus/gyrus, which process speech- and speaker-specific vocal tract parameters, respectively. Vocal tract parameters are one of the two major acoustic features that determine both speaker identity and speech message (phonemes). Here, using functional magnetic resonance imaging (fMRI), we show that a similar interaction exists for glottal fold parameters between the left and right Heschl's gyri. Glottal fold parameters are the other main acoustic feature that determines speaker identity and speech message (linguistic prosody). The findings suggest that interactions between left- and right-hemispheric areas are specific to the processing of different acoustic features of speech and speaker, and that they represent a general neural mechanism when understanding speech from different speakers.

  18. Prediction and constraint in audiovisual speech perception.

    PubMed

    Peelle, Jonathan E; Sommers, Mitchell S

    2015-07-01

    During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing the precision of prediction. Electrophysiological studies demonstrate that oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to acoustic information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration

  19. Recognition of speech spectrograms.

    PubMed

    Greene, B G; Pisoni, D B; Carrell, T D

    1984-07-01

    The performance of eight naive observers in learning to identify speech spectrograms was studied over a 2-month period. Single tokens from a 50-word phonetically balanced (PB) list were recorded by several talkers and displayed on a Spectraphonics Speech Spectrographic Display system. Identification testing occurred immediately after daily training sessions. After approximately 20 h of training, naive subjects correctly identified the 50 PB words from a single talker over 95% of the time. Generalization tests with the same words were then carried out with different tokens from the original talker, new tokens from another male talker, a female talker, and finally, a synthetic talker. The generalization results for these talkers showed recognition performance at 91%, 76%, 76%, and 48%, respectively. Finally, generalization tests with a novel set of PB words produced by the original talker were also carried out to examine in detail the perceptual strategies and visual features that subjects abstracted from the training set. Our results demonstrate that even without formal training in phonetics or acoustics naive observers can learn to identify visual displays of speech at very high levels of accuracy. Analysis of subjects' performance in a verbal protocol task demonstrated that they rely on salient visual correlates of many phonetic features in speech.

  20. Hearing speech in music.

    PubMed

    Ekström, Seth-Reino; Borg, Erik

    2011-01-01

    The masking effect of a piano composition, played at different speeds and in different octaves, on speech-perception thresholds was investigated in 15 normal-hearing and 14 moderately-hearing-impaired subjects. Running speech (just follow conversation, JFC) testing and use of hearing aids increased the everyday validity of the findings. A comparison was made with standard audiometric noises [International Collegium of Rehabilitative Audiology (ICRA) noise and speech spectrum-filtered noise (SPN)]. All masking sounds, music or noise, were presented at the same equivalent sound level (50 dBA). The results showed a significant effect of piano performance speed and octave (P<.01). Low octave and fast tempo had the largest effect; and high octave and slow tempo, the smallest. Music had a lower masking effect than did ICRA noise with two or six speakers at normal vocal effort (P<.01) and SPN (P<.05). Subjects with hearing loss had higher masked thresholds than the normal-hearing subjects (P<.01), but there were smaller differences between masking conditions (P<.01). It is pointed out that music offers an interesting opportunity for studying masking under realistic conditions, where spectral and temporal features can be varied independently. The results have implications for composing music with vocal parts, designing acoustic environments and creating a balance between speech perception and privacy in social settings.

  1. Single-shot analytical assay based on graphene-oxide-modified surface acoustic wave biosensor for detection of single-nucleotide polymorphisms.

    PubMed

    Liu, Xiang; Wang, Jia-Ying; Mao, Xiao-Bing; Ning, Yong; Zhang, Guo-Jun

    2015-09-15

    The combination of a surface acoustic wave (SAW) biosensor with graphene oxide (GO) provides a promising perspective for detecting DNA mutation. The GO-modified SAW biosensor was prepared by conjugating GO onto the SAW chip surface via electrostatic interaction. Afterward, the probe was immobilized on the GO surface, and detection of DNA mutation was realized by hybridization. The hybridization with a variety of targets would yield different mass and conformational changes on the chip surface, causing the different SAW signals in real time. A total of 137 clinical samples were detected by a single-shot analytical assay based on GO-modified SAW biosensor and direct sequencing in parallel. The diagnostic performance (both sensitivity and specificity) of the assay was evaluated with the direct sequencing as a reference testing method. The phase-shift value of three genotypes in 137 clinical samples was significantly different (p < 0.001). Furthermore, testing of diagnostic performance yielded diagnostic sensitivity and specificity of 100% and 88.6% for identifying CT and CC genotype, 98.0% and 96.2% for identifying CT and TT genotype, respectively. The single-shot analytical assay based on the GO-modified SAW biosensor could be exploited as a potential useful tool to identify CYP2D6*10 polymorphisms in clinical practice of personalized medicine.

  2. Headphone localization of speech

    NASA Technical Reports Server (NTRS)

    Begault, Durand R.; Wenzel, Elizabeth M.

    1993-01-01

    Three-dimensional acoustic display systems have recently been developed that synthesize virtual sound sources over headphones based on filtering by head-related transfer functions (HRTFs), the direction-dependent spectral changes caused primarily by the pinnae. In this study, 11 inexperienced subjects judged the apparent spatial location of headphone-presented speech stimuli filtered with nonindividualized HRTFs. About half of the subjects 'pulled' their judgments toward either the median or the lateral-vertical planes, and estimates were almost always elevated. Individual differences were pronounced for the distance judgments; 15 to 46 percent of stimuli were heard inside the head, with the shortest estimates near the median plane. The results suggest that most listeners can obtain useful azimuth information from speech stimuli filtered by nonindividualized HRTFs. Measurements of localization error and reversal rates are comparable with a previous study that used broadband noise stimuli.

  3. Headphone localization of speech.

    PubMed

    Begault, D R; Wenzel, E M

    1993-06-01

    Three-dimensional acoustic display systems have recently been developed that synthesize virtual sound sources over headphones based on filtering by head-related transfer functions (HRTFs), the direction-dependent spectral changes caused primarily by the pinnae. In this study 11 inexperienced subjects judged the apparent spatial location of headphone-presented speech stimuli filtered with non-individualized HRTFs. About half of the subjects "pulled" their judgments toward either the median or the lateral-vertical planes, and estimates were almost always elevated. Individual differences were pronounced for the distance judgments; 15% to 46% of stimuli were heard inside the head, with the shortest estimates near the median plane. The results suggest that most listeners can obtain useful azimuth information from speech stimuli filtered by nonindividualized HRTFs. Measurements of localization error and reversal rates are comparable with a previous study that used broadband noise stimuli.

  4. Sound scattering from rough bubbly ocean surface based on modified sea surface acoustic simulator and consideration of various incident angles and sub-surface bubbles' radii

    NASA Astrophysics Data System (ADS)

    Bolghasi, Alireza; Ghadimi, Parviz; Chekab, Mohammad A. Feizi

    2016-09-01

    The aim of the present study is to improve the capabilities and precision of a recently introduced Sea Surface Acoustic Simulator (SSAS) developed based on optimization of the Helmholtz-Kirchhoff-Fresnel (HKF) method. The improved acoustic simulator, hereby known as the Modified SSAS (MSSAS), is capable of determining sound scattering from the sea surface and includes an extended Hall-Novarini model and optimized HKF method. The extended Hall-Novarini model is used for considering the effects of sub-surface bubbles over a wider range of radii of sub-surface bubbles compared to the previous SSAS version. Furthermore, MSSAS has the capability of making a three-dimensional simulation of scattered sound from the rough bubbly sea surface with less error than that of the Critical Sea Tests (CST) experiments. Also, it presents scattered pressure levels from the rough bubbly sea surface based on various incident angles of sound. Wind speed, frequency, incident angle, and pressure level of the sound source are considered as input data, and scattered pressure levels and scattering coefficients are provided. Finally, different parametric studies were conducted on wind speeds, frequencies, and incident angles to indicate that MSSAS is quite capable of simulating sound scattering from the rough bubbly sea surface, according to the scattering mechanisms determined by Ogden and Erskine. Therefore, it is concluded that MSSAS is valid for both scattering mechanisms and the transition region between them that are defined by Ogden and Erskine.

  5. Acoustic Differences between Humorous and Sincere Communicative Intentions

    ERIC Educational Resources Information Center

    Hoicka, Elena; Gattis, Merideth

    2012-01-01

    Previous studies indicate that the acoustic features of speech discriminate between positive and negative communicative intentions, such as approval and prohibition. Two studies investigated whether acoustic features of speech can discriminate between two positive communicative intentions: humour and sweet-sincerity, where sweet-sincerity involved…

  6. Improving on hidden Markov models: An articulatorily constrained, maximum likelihood approach to speech recognition and speech coding

    SciTech Connect

    Hogden, J.

    1996-11-05

    The goal of the proposed research is to test a statistical model of speech recognition that incorporates the knowledge that speech is produced by relatively slow motions of the tongue, lips, and other speech articulators. This model is called Maximum Likelihood Continuity Mapping (Malcom). Many speech researchers believe that by using constraints imposed by articulator motions, we can improve or replace the current hidden Markov model based speech recognition algorithms. Unfortunately, previous efforts to incorporate information about articulation into speech recognition algorithms have suffered because (1) slight inaccuracies in our knowledge or the formulation of our knowledge about articulation may decrease recognition performance, (2) small changes in the assumptions underlying models of speech production can lead to large changes in the speech derived from the models, and (3) collecting measurements of human articulator positions in sufficient quantity for training a speech recognition algorithm is still impractical. The most interesting (and in fact, unique) quality of Malcom is that, even though Malcom makes use of a mapping between acoustics and articulation, Malcom can be trained to recognize speech using only acoustic data. By learning the mapping between acoustics and articulation using only acoustic data, Malcom avoids the difficulties involved in collecting articulator position measurements and does not require an articulatory synthesizer model to estimate the mapping between vocal tract shapes and speech acoustics. Preliminary experiments that demonstrate that Malcom can learn the mapping between acoustics and articulation are discussed. Potential applications of Malcom aside from speech recognition are also discussed. Finally, specific deliverables resulting from the proposed research are described.

  7. Clinical and acoustical variability in hypokinetic dysarthria

    SciTech Connect

    Metter, E.J.; Hanson, W.R.

    1986-10-01

    Ten male patients with parkinsonism secondary to Parkinson's disease or progressive supranuclear palsy had clinical neurological, speech, and acoustical speech evaluations. In addition, seven of the patients were evaluated by x-ray computed tomography (CT) and (F-18)-fluorodeoxyglucose (FDG) positron emission tomography (PET). Extensive variability of speech features, both clinical and acoustical, were found and seemed to be independent of the severity of any parkinsonian sign, CT, or FDG PET. In addition, little relationship existed between the variability across each measured speech feature. What appeared to be important for the appearance of abnormal acoustic measures was the degree of overall severity of the dysarthria. These observations suggest that a better understanding of hypokinetic dysarthria may result from more extensive examination of the variability between patients. Emphasizing a specific feature such as rapid speaking rate in characterizing hypokinetic dysarthria focuses on a single and inconstant finding in a complex speech pattern.

  8. Auditory-perceptual learning improves speech motor adaptation in children.

    PubMed

    Shiller, Douglas M; Rochon, Marie-Lyne

    2014-08-01

    Auditory feedback plays an important role in children's speech development by providing the child with information about speech outcomes that is used to learn and fine-tune speech motor plans. The use of auditory feedback in speech motor learning has been extensively studied in adults by examining oral motor responses to manipulations of auditory feedback during speech production. Children are also capable of adapting speech motor patterns to perceived changes in auditory feedback; however, it is not known whether their capacity for motor learning is limited by immature auditory-perceptual abilities. Here, the link between speech perceptual ability and the capacity for motor learning was explored in two groups of 5- to 7-year-old children who underwent a period of auditory perceptual training followed by tests of speech motor adaptation to altered auditory feedback. One group received perceptual training on a speech acoustic property relevant to the motor task while a control group received perceptual training on an irrelevant speech contrast. Learned perceptual improvements led to an enhancement in speech motor adaptation (proportional to the perceptual change) only for the experimental group. The results indicate that children's ability to perceive relevant speech acoustic properties has a direct influence on their capacity for sensory-based speech motor adaptation.

  9. Automatic Speech Recognition

    NASA Astrophysics Data System (ADS)

    Potamianos, Gerasimos; Lamel, Lori; Wölfel, Matthias; Huang, Jing; Marcheret, Etienne; Barras, Claude; Zhu, Xuan; McDonough, John; Hernando, Javier; Macho, Dusan; Nadeu, Climent

    Automatic speech recognition (ASR) is a critical component for CHIL services. For example, it provides the input to higher-level technologies, such as summarization and question answering, as discussed in Chapter 8. In the spirit of ubiquitous computing, the goal of ASR in CHIL is to achieve a high performance using far-field sensors (networks of microphone arrays and distributed far-field microphones). However, close-talking microphones are also of interest, as they are used to benchmark ASR system development by providing a best-case acoustic channel scenario to compare against.

  10. A frequency-selective feedback model of auditory efferent suppression and its implications for the recognition of speech in noise.

    PubMed

    Clark, Nicholas R; Brown, Guy J; Jürgens, Tim; Meddis, Ray

    2012-09-01

    The potential contribution of the peripheral auditory efferent system to our understanding of speech in a background of competing noise was studied using a computer model of the auditory periphery and assessed using an automatic speech recognition system. A previous study had shown that a fixed efferent attenuation applied to all channels of a multi-channel model could improve the recognition of connected digit triplets in noise [G. J. Brown, R. T. Ferry, and R. Meddis, J. Acoust. Soc. Am. 127, 943-954 (2010)]. In the current study an anatomically justified feedback loop was used to automatically regulate separate attenuation values for each auditory channel. This arrangement resulted in a further enhancement of speech recognition over fixed-attenuation conditions. Comparisons between multi-talker babble and pink noise interference conditions suggest that the benefit originates from the model's ability to modify the amount of suppression in each channel separately according to the spectral shape of the interfering sounds.

  11. Speech Aids

    NASA Technical Reports Server (NTRS)

    1987-01-01

    Designed to assist deaf and hearing impaired-persons in achieving better speech, Resnick Worldwide Inc.'s device provides a visual means of cuing the deaf as a speech-improvement measure. This is done by electronically processing the subjects' sounds and comparing them with optimum values which are displayed for comparison.

  12. Speech Communication.

    ERIC Educational Resources Information Center

    Brooks, William D.

    Presented in this book is a view of speech communication which enables an individual to become fully aware of his or her role as both initiator and recipient of messages. Communication is treated broadly with emphasis on the understanding and skills relating to various types of speech communication across the broad spectrum of human communication.…

  13. Symbolic Speech

    ERIC Educational Resources Information Center

    Podgor, Ellen S.

    1976-01-01

    The concept of symbolic speech emanates from the 1967 case of United States v. O'Brien. These discussions of flag desecration, grooming and dress codes, nude entertainment, buttons and badges, and musical expression show that the courts place symbolic speech in different strata from verbal communication. (LBH)

  14. Speech coding

    SciTech Connect

    Ravishankar, C., Hughes Network Systems, Germantown, MD

    1998-05-08

    Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the

  15. History of chronic stress modifies acute stress-evoked fear memory and acoustic startle in male rats.

    PubMed

    Schmeltzer, Sarah N; Vollmer, Lauren L; Rush, Jennifer E; Weinert, Mychal; Dolgas, Charles M; Sah, Renu

    2015-01-01

    Chronicity of trauma exposure plays an important role in the pathophysiology of posttraumatic stress disorder (PTSD). Thus, exposure to multiple traumas on a chronic scale leads to worse outcomes than acute events. The rationale for the current study was to investigate the effects of a single adverse event versus the same event on a background of chronic stress. We hypothesized that a history of chronic stress would lead to worse behavioral outcomes than a single event alone. Male rats (n = 14/group) were exposed to either a single traumatic event in the form of electric foot shocks (acute shock, AS), or to footshocks on a background of chronic stress (chronic variable stress-shock, CVS-S). PTSD-relevant behaviors (fear memory and acoustic startle responses) were measured following 7 d recovery. In line with our hypothesis, CVS-S elicited significant increases in fear acquisition and conditioning versus the AS group. Unexpectedly, CVS-S elicited reduced startle reactivity to an acoustic stimulus in comparison with the AS group. Significant increase in FosB/ΔFosB-like immunostaining was observed in the dentate gyrus, basolateral amygdala and medial prefrontal cortex of CVS-S rats. Assessments of neuropeptide Y (NPY), a stress-regulatory transmitter associated with chronic PTSD, revealed selective reduction in the hippocampus of CVS-S rats. Collectively, our data show that cumulative stress potentiates delayed fear memory and impacts defensive responding. Altered neuronal activation in forebrain limbic regions and reduced NPY may contribute to these phenomena. Our preclinical studies support clinical findings reporting worse PTSD outcomes stemming from cumulative traumatization in contrast to acute trauma.

  16. Improving the speech intelligibility in classrooms

    NASA Astrophysics Data System (ADS)

    Lam, Choi Ling Coriolanus

    One of the major acoustical concerns in classrooms is the establishment of effective verbal communication between teachers and students. Non-optimal acoustical conditions, resulting in reduced verbal communication, can cause two main problems. First, they can lead to reduce learning efficiency. Second, they can also cause fatigue, stress, vocal strain and health problems, such as headaches and sore throats, among teachers who are forced to compensate for poor acoustical conditions by raising their voices. Besides, inadequate acoustical conditions can induce the usage of public address system. Improper usage of such amplifiers or loudspeakers can lead to impairment of students' hearing systems. The social costs of poor classroom acoustics will be large to impair the learning of children. This invisible problem has far reaching implications for learning, but is easily solved. Many researches have been carried out that they have accurately and concisely summarized the research findings on classrooms acoustics. Though, there is still a number of challenging questions remaining unanswered. Most objective indices for speech intelligibility are essentially based on studies of western languages. Even several studies of tonal languages as Mandarin have been conducted, there is much less on Cantonese. In this research, measurements have been done in unoccupied rooms to investigate the acoustical parameters and characteristics of the classrooms. The speech intelligibility tests, which based on English, Mandarin and Cantonese, and the survey were carried out on students aged from 5 years old to 22 years old. It aims to investigate the differences in intelligibility between English, Mandarin and Cantonese of the classrooms in Hong Kong. The significance on speech transmission index (STI) related to Phonetically Balanced (PB) word scores will further be developed. Together with developed empirical relationship between the speech intelligibility in classrooms with the variations

  17. Phrase-level speech simulation with an airway modulation model of speech production

    PubMed Central

    Story, Brad H.

    2012-01-01

    Artificial talkers and speech synthesis systems have long been used as a means of understanding both speech production and speech perception. The development of an airway modulation model is described that simulates the time-varying changes of the glottis and vocal tract, as well as acoustic wave propagation, during speech production. The result is a type of artificial talker that can be used to study various aspects of how sound is generated by humans and how that sound is perceived by a listener. The primary components of the model are introduced and simulation of words and phrases are demonstrated. PMID:23503742

  18. Contemporary Issues in Phoneme Production by Hearing-Impaired Persons: Physiological and Acoustic Aspects.

    ERIC Educational Resources Information Center

    McGarr, Nancy S.; Whitehead, Robert

    1992-01-01

    This paper on physiologic correlates of speech production in children and youth with hearing impairments focuses specifically on the production of phonemes and includes data on respiration for speech production, phonation, speech aerodynamics, articulation, and acoustic analyses of speech by hearing-impaired persons. (Author/DB)

  19. Some articulatory details of emotional speech

    NASA Astrophysics Data System (ADS)

    Lee, Sungbok; Yildirim, Serdar; Bulut, Murtaza; Kazemzadeh, Abe; Narayanan, Shrikanth

    2005-09-01

    Differences in speech articulation among four emotion types, neutral, anger, sadness, and happiness are investigated by analyzing tongue tip, jaw, and lip movement data collected from one male and one female speaker of American English. The data were collected using an electromagnetic articulography (EMA) system while subjects produce simulated emotional speech. Pitch, root-mean-square (rms) energy and the first three formants were estimated for vowel segments. For both speakers, angry speech exhibited the largest rms energy and largest articulatory activity in terms of displacement range and movement speed. Happy speech is characterized by largest pitch variability. It has higher rms energy than neutral speech but articulatory activity is rather comparable to, or less than, neutral speech. That is, happy speech is more prominent in voicing activity than in articulation. Sad speech exhibits longest sentence duration and lower rms energy. However, its articulatory activity is no less than neutral speech. Interestingly, for the male speaker, articulation for vowels in sad speech is consistently more peripheral (i.e., more forwarded displacements) when compared to other emotions. However, this does not hold for female subject. These and other results will be discussed in detail with associated acoustics and perceived emotional qualities. [Work supported by NIH.

  20. Talking while chewing: speaker response to natural perturbation of speech.

    PubMed

    Mayer, Connor; Gick, Bryan

    2012-01-01

    This study looks at how the conflicting goals of chewing and speech production are reconciled by examining the acoustic and articulatory output of talking while chewing. We consider chewing to be a type of perturbation with regard to speech production, but with some important differences. Ultrasound and acoustic measurements were made while participants chewed gum and produced various utterances containing the sounds /s/, /ʃ/, and /r/. Results show a great deal of individual variation in articulation and acoustics between speakers, but consistent productions and maintenance of relative acoustic distances within speakers. Although chewing interfered with speech production, and this interference manifested itself in a variety of ways across speakers, the objectives of speech production were indirectly achieved within the constraints and variability introduced by individual chewing strategies.

  1. Psychophysics of Complex Auditory and Speech Stimuli.

    DTIC Science & Technology

    1996-10-01

    it more distinctive (i.e. in a different instrument timbre than the other musical voices) and less distinctive (i.e. presenting the musical pieces in...complex acoustic signals, including speech and music . Traditional, solid psycho- ,physical procedures were employed to systematically investigate...result in the perception of classes of complex auditory i stimuli, including speech and music . In health, industry, and human factors, the M.. SUBJECT

  2. Levels of Processing of Speech and Non-Speech

    DTIC Science & Technology

    1991-05-10

    Timbre : A better musical analogv to speech? Presented to the Acoustical Society of America. Anaheim. A. Samuel. (Fall 1987) Central and peripheal...Thle studies of listener based factors include studies of perceptual. restoration of deleted sounds (phonemes or musical notes), and studies of the... music . The attentional investi- ctnsdemons;trate, rjAher fine-tuned ittentional control under high-predictability condi- Lios. ic~ifcart oogrssh&A; been

  3. Acoustic Event Detection and Classification

    NASA Astrophysics Data System (ADS)

    Temko, Andrey; Nadeu, Climent; Macho, Dušan; Malkin, Robert; Zieger, Christian; Omologo, Maurizio

    The human activity that takes place in meeting rooms or classrooms is reflected in a rich variety of acoustic events (AE), produced either by the human body or by objects handled by humans, so the determination of both the identity of sounds and their position in time may help to detect and describe that human activity. Indeed, speech is usually the most informative sound, but other kinds of AEs may also carry useful information, for example, clapping or laughing inside a speech, a strong yawn in the middle of a lecture, a chair moving or a door slam when the meeting has just started. Additionally, detection and classification of sounds other than speech may be useful to enhance the robustness of speech technologies like automatic speech recognition.

  4. Speech discrimination after early exposure to pulsed-noise or speech.

    PubMed

    Ranasinghe, Kamalini G; Carraway, Ryan S; Borland, Michael S; Moreno, Nicole A; Hanacik, Elizabeth A; Miller, Robert S; Kilgard, Michael P

    2012-07-01

    Early experience of structured inputs and complex sound features generate lasting changes in tonotopy and receptive field properties of primary auditory cortex (A1). In this study we tested whether these changes are severe enough to alter neural representations and behavioral discrimination of speech. We exposed two groups of rat pups during the critical period of auditory development to pulsed-noise or speech. Both groups of rats were trained to discriminate speech sounds when they were young adults, and anesthetized neural responses were recorded from A1. The representation of speech in A1 and behavioral discrimination of speech remained robust to altered spectral and temporal characteristics of A1 neurons after pulsed-noise exposure. Exposure to passive speech during early development provided no added advantage in speech sound processing. Speech training increased A1 neuronal firing rate for speech stimuli in naïve rats, but did not increase responses in rats that experienced early exposure to pulsed-noise or speech. Our results suggest that speech sound processing is resistant to changes in simple neural response properties caused by manipulating early acoustic environment.

  5. The Human Neural Alpha Response to Speech is a Proxy of Attentional Control.

    PubMed

    Wöstmann, Malte; Lim, Sung-Joo; Obleser, Jonas

    2017-03-18

    Human alpha (~10 Hz) oscillatory power is a prominent neural marker of cognitive effort. When listeners attempt to process and retain acoustically degraded speech, alpha power enhances. It is unclear whether these alpha modulations reflect the degree of acoustic degradation per se or the degradation-driven demand to a listener's attentional control. Using an irrelevant-speech paradigm and measuring the electroencephalogram (EEG), the current experiment demonstrates that the neural alpha response to speech is a surprisingly clear proxy of top-down control, entirely driven by the listening goals of attending versus ignoring degraded speech. While (n = 23) listeners retained the serial order of 9 to-be-recalled digits, one to-be-ignored sentence was presented. Distractibility of the to-be-ignored sentence parametrically varied in acoustic detail (noise-vocoding), with more acoustic detail of distracting speech increasingly disrupting listeners' serial memory recall. Where previous studies had observed decreases in parietal and auditory alpha power with more acoustic detail (of target speech), alpha power here showed the opposite pattern and increased with more acoustic detail in the speech distractor. In sum, the neural alpha response reflects almost exclusively a listener's goal, which is decisive for whether more acoustic detail facilitates comprehension (of attended speech) or enhances distraction (of ignored speech).

  6. Acoustic Emphasis in Four Year Olds

    ERIC Educational Resources Information Center

    Wonnacott, Elizabeth; Watson, Duane G.

    2008-01-01

    Acoustic emphasis may convey a range of subtle discourse distinctions, yet little is known about how this complex ability develops in children. This paper presents a first investigation of the factors which influence the production of acoustic prominence in young children's spontaneous speech. In a production experiment, SVO sentences were…

  7. Child directed speech, speech in noise and hyperarticulated speech in the Pacific Northwest

    NASA Astrophysics Data System (ADS)

    Wright, Richard; Carmichael, Lesley; Beckford Wassink, Alicia; Galvin, Lisa

    2004-05-01

    Three types of exaggerated speech are thought to be systematic responses to accommodate the needs of the listener: child-directed speech (CDS), hyperspeech, and the Lombard response. CDS (e.g., Kuhl et al., 1997) occurs in interactions with young children and infants. Hyperspeech (Johnson et al., 1993) is a modification in response to listeners difficulties in recovering the intended message. The Lombard response (e.g., Lane et al., 1970) is a compensation for increased noise in the signal. While all three result from adaptations to accommodate the needs of the listener, and therefore should share some features, the triggering conditions are quite different, and therefore should exhibit differences in their phonetic outcomes. While CDS has been the subject of a variety of acoustic studies, it has never been studied in the broader context of the other ``exaggerated'' speech styles. A large crosslinguistic study was undertaken that compares speech produced under four conditions: spontaneous conversations, CDS aimed at 6-9-month-old infants, hyperarticulated speech, and speech in noise. This talk will present some findings for North American English as spoken in the Pacific Northwest. The measures include f0, vowel duration, F1 and F2 at vowel midpoint, and intensity.

  8. Articulatory Mediation of Speech Perception: A Causal Analysis of Multi-Modal Imaging Data

    ERIC Educational Resources Information Center

    Gow, David W., Jr.; Segawa, Jennifer A.

    2009-01-01

    The inherent confound between the organization of articulation and the acoustic-phonetic structure of the speech signal makes it exceptionally difficult to evaluate the competing claims of motor and acoustic-phonetic accounts of how listeners recognize coarticulated speech. Here we use Granger causation analysis of high spatiotemporal resolution…

  9. Modeling words with subword units in an articulatorily constrained speech recognition algorithm

    SciTech Connect

    Hogden, J.

    1997-11-20

    The goal of speech recognition is to find the most probable word given the acoustic evidence, i.e. a string of VQ codes or acoustic features. Speech recognition algorithms typically take advantage of the fact that the probability of a word, given a sequence of VQ codes, can be calculated.

  10. Compensation for Coarticulation: Disentangling Auditory and Gestural Theories of Perception of Coarticulatory Effects in Speech

    ERIC Educational Resources Information Center

    Viswanathan, Navin; Magnuson, James S.; Fowler, Carol A.

    2010-01-01

    According to one approach to speech perception, listeners perceive speech by applying general pattern matching mechanisms to the acoustic signal (e.g., Diehl, Lotto, & Holt, 2004). An alternative is that listeners perceive the phonetic gestures that structured the acoustic signal (e.g., Fowler, 1986). The two accounts have offered different…

  11. Speech Problems

    MedlinePlus

    ... thinking, but it becomes disorganized as they're speaking. So, someone who clutters may speak in bursts ... refuse to wait patiently for them to finish speaking. If you have a speech problem, it's fine ...

  12. Forensic acoustics: An overview of the process

    NASA Astrophysics Data System (ADS)

    Weissenburger, J. T.

    2003-10-01

    There is a potential role for the acoustical expert in litigation. The technical issues may involve aeroacoustics, underwater acoustics, physical effects of sound, environmental acoustics, noise, architectural acoustics, physiological acoustics, speech and hearing, music, psychoacoustics and/or bioacoustics. This brief paper offers an overview of the process of being an expert, the qualifications to be an expert and what is expected of an expert. The six general phases of an expert's involvement-retention, investigation, discovery, deposition, preparation, trial-are addressed. Some antidotal experiences are presented.

  13. A Model for Speech Processing in Second Language Listening Activities

    ERIC Educational Resources Information Center

    Zoghbor, Wafa Shahada

    2016-01-01

    Teachers' understanding of the process of speech perception could inform practice in listening classrooms. Catford (1950) developed a model for speech perception taking into account the influence of the acoustic features of the linguistic forms used by the speaker, whereby the listener "identifies" and "interprets" these…

  14. Integrated Spacesuit Audio System Enhances Speech Quality and Reduces Noise

    NASA Technical Reports Server (NTRS)

    Huang, Yiteng Arden; Chen, Jingdong; Chen, Shaoyan Sharyl

    2009-01-01

    A new approach has been proposed for increasing astronaut comfort and speech capture. Currently, the special design of a spacesuit forms an extreme acoustic environment making it difficult to capture clear speech without compromising comfort. The proposed Integrated Spacesuit Audio (ISA) system is to incorporate the microphones into the helmet and use software to extract voice signals from background noise.

  15. Does Signal Degradation Affect Top-Down Processing of Speech?

    PubMed

    Wagner, Anita; Pals, Carina; de Blecourt, Charlotte M; Sarampalis, Anastasios; Başkent, Deniz

    2016-01-01

    Speech perception is formed based on both the acoustic signal and listeners' knowledge of the world and semantic context. Access to semantic information can facilitate interpretation of degraded speech, such as speech in background noise or the speech signal transmitted via cochlear implants (CIs). This paper focuses on the latter, and investigates the time course of understanding words, and how sentential context reduces listeners' dependency on the acoustic signal for natural and degraded speech via an acoustic CI simulation.In an eye-tracking experiment we combined recordings of listeners' gaze fixations with pupillometry, to capture effects of semantic information on both the time course and effort of speech processing. Normal-hearing listeners were presented with sentences with or without a semantically constraining verb (e.g., crawl) preceding the target (baby), and their ocular responses were recorded to four pictures, including the target, a phonological (bay) competitor and a semantic (worm) and an unrelated distractor.The results show that in natural speech, listeners' gazes reflect their uptake of acoustic information, and integration of preceding semantic context. Degradation of the signal leads to a later disambiguation of phonologically similar words, and to a delay in integration of semantic information. Complementary to this, the pupil dilation data show that early semantic integration reduces the effort in disambiguating phonologically similar words. Processing degraded speech comes with increased effort due to the impoverished nature of the signal. Delayed integration of semantic information further constrains listeners' ability to compensate for inaudible signals.

  16. Infant-Directed Speech Is Modulated by Infant Feedback

    ERIC Educational Resources Information Center

    Smith, Nicholas A.; Trainor, Laurel J.

    2008-01-01

    When mothers engage in infant-directed (ID) speech, their voices change in a number of characteristic ways, including adopting a higher overall pitch. Studies have examined these acoustical cues and have tested infants' preferences for ID speech. However, little is known about how these cues change with maternal sensitivity to infant feedback in…

  17. Cross-Channel Amplitude Sweeps Are Crucial to Speech Intelligibility

    ERIC Educational Resources Information Center

    Prendergast, Garreth; Green, Gary G. R.

    2012-01-01

    Classical views of speech perception argue that the static and dynamic characteristics of spectral energy peaks (formants) are the acoustic features that underpin phoneme recognition. Here we use representations where the amplitude modulations of sub-band filtered speech are described, precisely, in terms of co-sinusoidal pulses. These pulses are…

  18. The Effects of Macroglossia on Speech: A Case Study

    ERIC Educational Resources Information Center

    Mekonnen, Abebayehu Messele

    2012-01-01

    This article presents a case study of speech production in a 14-year-old Amharic-speaking boy. The boy had developed secondary macroglossia, related to a disturbance of growth hormones, following a history of normal speech development. Perceptual analysis combined with acoustic analysis and static palatography is used to investigate the specific…

  19. Near-Term Fetuses Process Temporal Features of Speech

    ERIC Educational Resources Information Center

    Granier-Deferre, Carolyn; Ribeiro, Aurelie; Jacquet, Anne-Yvonne; Bassereau, Sophie

    2011-01-01

    The perception of speech and music requires processing of variations in spectra and amplitude over different time intervals. Near-term fetuses can discriminate acoustic features, such as frequencies and spectra, but whether they can process complex auditory streams, such as speech sequences and more specifically their temporal variations, fast or…

  20. Formant trajectory characteristics in speakers with dysarthria and homogeneous speech intelligibility scores: Further data

    NASA Astrophysics Data System (ADS)

    Kim, Yunjung; Weismer, Gary; Kent, Ray D.

    2005-09-01

    In previous work [J. Acoust. Soc. Am. 117, 2605 (2005)], we reported on formant trajectory characteristics of a relatively large number of speakers with dysarthria and near-normal speech intelligibility. The purpose of that analysis was to begin a documentation of the variability, within relatively homogeneous speech-severity groups, of acoustic measures commonly used to predict across-speaker variation in speech intelligibility. In that study we found that even with near-normal speech intelligibility (90%-100%), many speakers had reduced formant slopes for some words and distributional characteristics of acoustic measures that were different than values obtained from normal speakers. In the current report we extend those findings to a group of speakers with dysarthria with somewhat poorer speech intelligibility than the original group. Results are discussed in terms of the utility of certain acoustic measures as indices of speech intelligibility, and as explanatory data for theories of dysarthria. [Work supported by NIH Award R01 DC00319.

  1. Strategies for distant speech recognitionin reverberant environments

    NASA Astrophysics Data System (ADS)

    Delcroix, Marc; Yoshioka, Takuya; Ogawa, Atsunori; Kubo, Yotaro; Fujimoto, Masakiyo; Ito, Nobutaka; Kinoshita, Keisuke; Espi, Miquel; Araki, Shoko; Hori, Takaaki; Nakatani, Tomohiro

    2015-12-01

    Reverberation and noise are known to severely affect the automatic speech recognition (ASR) performance of speech recorded by distant microphones. Therefore, we must deal with reverberation if we are to realize high-performance hands-free speech recognition. In this paper, we review a recognition system that we developed at our laboratory to deal with reverberant speech. The system consists of a speech enhancement (SE) front-end that employs long-term linear prediction-based dereverberation followed by noise reduction. We combine our SE front-end with an ASR back-end that uses neural networks for acoustic and language modeling. The proposed system achieved top scores on the ASR task of the REVERB challenge. This paper describes the different technologies used in our system and presents detailed experimental results that justify our implementation choices and may provide hints for designing distant ASR systems.

  2. [Improving the speech with a prosthetic construction].

    PubMed

    Stalpers, M J; Engelen, M; van der Stappen, J A A M; Weijs, W L J; Takes, R P; van Heumen, C C M

    2016-03-01

    A 12-year-old boy had problems with his speech due to a defect in the soft palate. This defect was caused by the surgical removal of a synovial sarcoma. Testing with a nasometer revealed hypernasality above normal values. Given the size and severity of the defect in the soft palate, the possibility of improving the speech with speech therapy was limited. At a centre for special dentistry an attempt was made with a prosthetic construction to improve the performance of the palate and, in that way, the speech. This construction consisted of a denture with an obturator attached to it. With it, an effective closure of the palate could be achieved. New measurements with acoustic nasometry showed scores within the normal values. The nasality in the speech largely disappeared. The obturator is an effective and relatively easy solution for palatal insufficiency resulting from surgical resection. Intrusive reconstructive surgery can be avoided in this way.

  3. Temporal modulations in speech and music.

    PubMed

    Ding, Nai; Patel, Aniruddh D; Chen, Lin; Butler, Henry; Luo, Cheng; Poeppel, David

    2017-02-14

    Speech and music have structured rhythms. Here we discuss a major acoustic correlate of spoken and musical rhythms, the slow (0.25-32Hz) temporal modulations in sound intensity and compare the modulation properties of speech and music. We analyze these modulations using over 25h of speech and over 39h of recordings of Western music. We show that the speech modulation spectrum is highly consistent across 9 languages (including languages with typologically different rhythmic characteristics). A different, but similarly consistent modulation spectrum is observed for music, including classical music played by single instruments of different types, symphonic, jazz, and rock. The temporal modulations of speech and music show broad but well-separated peaks around 5 and 2Hz, respectively. These acoustically dominant time scales may be intrinsic features of speech and music, a possibility which should be investigated using more culturally diverse samples in each domain. Distinct modulation timescales for speech and music could facilitate their perceptual analysis and its neural processing.

  4. Children's Perception of Speech Produced in a Two-Talker Background

    ERIC Educational Resources Information Center

    Baker, Mallory; Buss, Emily; Jacks, Adam; Taylor, Crystal; Leibold, Lori J.

    2014-01-01

    Purpose: This study evaluated the degree to which children benefit from the acoustic modifications made by talkers when they produce speech in noise. Method: A repeated measures design compared the speech perception performance of children (5-11 years) and adults in a 2-talker masker. Target speech was produced in a 2-talker background or in…

  5. Audiovisual Speech Perception in Children with Developmental Language Disorder in Degraded Listening Conditions

    ERIC Educational Resources Information Center

    Meronen, Auli; Tiippana, Kaisa; Westerholm, Jari; Ahonen, Timo

    2013-01-01

    Purpose: The effect of the signal-to-noise ratio (SNR) on the perception of audiovisual speech in children with and without developmental language disorder (DLD) was investigated by varying the noise level and the sound intensity of acoustic speech. The main hypotheses were that the McGurk effect (in which incongruent visual speech alters the…

  6. Some Issues in Infant Speech Perception: Do the Means Justify the Ends?

    ERIC Educational Resources Information Center

    Weitzman, Raymond S.

    2007-01-01

    A major focus of research on language acquisition in infancy involves experimental studies of the infant's ability to discriminate various kinds of speech or speech-like stimuli. This research has demonstrated that infants are sensitive to many fine-grained differences in the acoustic properties of speech utterance. Furthermore, these empirical…

  7. The Hypothesis of Apraxia of Speech in Children with Autism Spectrum Disorder

    ERIC Educational Resources Information Center

    Shriberg, Lawrence D.; Paul, Rhea; Black, Lois M.; van Santen, Jan P.

    2011-01-01

    In a sample of 46 children aged 4-7 years with Autism Spectrum Disorder (ASD) and intelligible speech, there was no statistical support for the hypothesis of concomitant Childhood Apraxia of Speech (CAS). Perceptual and acoustic measures of participants' speech, prosody, and voice were compared with data from 40 typically-developing children, 13…

  8. Spacecraft Internal Acoustic Environment Modeling

    NASA Technical Reports Server (NTRS)

    Chu, SShao-sheng R.; Allen, Christopher S.

    2009-01-01

    Acoustic modeling can be used to identify key noise sources, determine/analyze sub-allocated requirements, keep track of the accumulation of minor noise sources, and to predict vehicle noise levels at various stages in vehicle development, first with estimates of noise sources, later with experimental data. In FY09, the physical mockup developed in FY08, with interior geometric shape similar to Orion CM (Crew Module) IML (Interior Mode Line), was used to validate SEA (Statistical Energy Analysis) acoustic model development with realistic ventilation fan sources. The sound power levels of these sources were unknown a priori, as opposed to previous studies that RSS (Reference Sound Source) with known sound power level was used. The modeling results were evaluated based on comparisons to measurements of sound pressure levels over a wide frequency range, including the frequency range where SEA gives good results. Sound intensity measurement was performed over a rectangular-shaped grid system enclosing the ventilation fan source. Sound intensities were measured at the top, front, back, right, and left surfaces of the and system. Sound intensity at the bottom surface was not measured, but sound blocking material was placed tinder the bottom surface to reflect most of the incident sound energy back to the remaining measured surfaces. Integrating measured sound intensities over measured surfaces renders estimated sound power of the source. The reverberation time T6o of the mockup interior had been modified to match reverberation levels of ISS US Lab interior for speech frequency bands, i.e., 0.5k, 1k, 2k, 4 kHz, by attaching appropriately sized Thinsulate sound absorption material to the interior wall of the mockup. Sound absorption of Thinsulate was modeled in three methods: Sabine equation with measured mockup interior reverberation time T60, layup model based on past impedance tube testing, and layup model plus air absorption correction. The evaluation/validation was

  9. Dust ion acoustic travelling waves in the framework of a modified Kadomtsev-Petviashvili equation in a magnetized dusty plasma with superthermal electrons

    NASA Astrophysics Data System (ADS)

    Saha, Asit; Chatterjee, Prasanta

    2014-02-01

    For the critical values of the parameters q and V, the work (Samanta et al. in Phys. Plasma 20:022111, 2013b) is unable to describe the nonlinear wave features in magnetized dusty plasma with superthermal electrons. To describe the nonlinear wave features for critical values of the parameters q and V, we extend the work (Samanta et al. in Phys. Plasma 20:022111, 2013b). To extend the work, we derive the modified Kadomtsev-Petviashvili (MKP) equation for dust ion acoustic waves in a magnetized dusty plasma with q-nonextensive velocity distributed electrons by considering higher order coefficients of ɛ. By applying the bifurcation theory of planar dynamical systems to this MKP equation, the existence of solitary wave solutions of both types rarefactive and compressive, periodic travelling wave solutions and kink and anti-kink wave solutions is proved. Three exact solutions of these above waves are determined. The present study could be helpful for understanding the nonlinear travelling waves propagating in mercury, solar wind, Saturn and in magnetosphere of the Earth.

  10. Respirator Speech Intelligibility Testing with an Experienced Speaker

    DTIC Science & Technology

    2015-05-01

    RESPIRATOR SPEECH INTELLIGIBILITY TESTING WITH AN EXPERIENCED...2. REPORT TYPE Final 3. DATES COVERED (From - To) Oct 2008 - Jun 2009 4. TITLE AND SUBTITLE Respirator Speech Intelligibility Testing with an...14. ABSTRACT The Modified Rhyme Test (MRT) is used by the National Institute for Occupational Safety and Health (NIOSH) to assess speech

  11. Lip Movement Exaggerations during Infant-Directed Speech

    ERIC Educational Resources Information Center

    Green, Jordan R.; Nip, Ignatius S. B.; Wilson, Erin M.; Mefferd, Antje S.; Yunusova, Yana

    2010-01-01

    Purpose: Although a growing body of literature has identified the positive effects of visual speech on speech and language learning, oral movements of infant-directed speech (IDS) have rarely been studied. This investigation used 3-dimensional motion capture technology to describe how mothers modify their lip movements when talking to their…

  12. Start/End Delays of Voiced and Unvoiced Speech Signals

    SciTech Connect

    Herrnstein, A

    1999-09-24

    Recent experiments using low power EM-radar like sensors (e.g, GEMs) have demonstrated a new method for measuring vocal fold activity and the onset times of voiced speech, as vocal fold contact begins to take place. Similarly the end time of a voiced speech segment can be measured. Secondly it appears that in most normal uses of American English speech, unvoiced-speech segments directly precede or directly follow voiced-speech segments. For many applications, it is useful to know typical duration times of these unvoiced speech segments. A corpus, assembled earlier of spoken ''Timit'' words, phrases, and sentences and recorded using simultaneously measured acoustic and EM-sensor glottal signals, from 16 male speakers, was used for this study. By inspecting the onset (or end) of unvoiced speech, using the acoustic signal, and the onset (or end) of voiced speech using the EM sensor signal, the average duration times for unvoiced segments preceding onset of vocalization were found to be 300ms, and for following segments, 500ms. An unvoiced speech period is then defined in time, first by using the onset of the EM-sensed glottal signal, as the onset-time marker for the voiced speech segment and end marker for the unvoiced segment. Then, by subtracting 300ms from the onset time mark of voicing, the unvoiced speech segment start time is found. Similarly, the times for a following unvoiced speech segment can be found. While data of this nature have proven to be useful for work in our laboratory, a great deal of additional work remains to validate such data for use with general populations of users. These procedures have been useful for applying optimal processing algorithms over time segments of unvoiced, voiced, and non-speech acoustic signals. For example, these data appear to be of use in speaker validation, in vocoding, and in denoising algorithms.

  13. Free Speech Yearbook: 1972.

    ERIC Educational Resources Information Center

    Tedford, Thomas L., Ed.

    This book is a collection of essays on free speech issues and attitudes, compiled by the Commission on Freedom of Speech of the Speech Communication Association. Four articles focus on freedom of speech in classroom situations as follows: a philosophic view of teaching free speech, effects of a course on free speech on student attitudes,…

  14. Modelling speech intelligibility in adverse conditions.

    PubMed

    Jørgensen, Søren; Dau, Torsten

    2013-01-01

    Jørgensen and Dau (J Acoust Soc Am 130:1475-1487, 2011) proposed the speech-based envelope power spectrum model (sEPSM) in an attempt to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII) in conditions with nonlinearly processed speech. Instead of considering the reduction of the temporal modulation energy as the intelligibility metric, as assumed in the STI, the sEPSM applies the signal-to-noise ratio in the envelope domain (SNRenv). This metric was shown to be the key for predicting the intelligibility of reverberant speech as well as noisy speech processed by spectral subtraction. The key role of the SNRenv metric is further supported here by the ability of a short-term version of the sEPSM to predict speech masking release for different speech materials and modulated interferers. However, the sEPSM cannot account for speech subjected to phase jitter, a condition in which the spectral structure of the intelligibility of speech signal is strongly affected, while the broadband temporal envelope is kept largely intact. In contrast, the effects of this distortion can be predicted -successfully by the spectro-temporal modulation index (STMI) (Elhilali et al., Speech Commun 41:331-348, 2003), which assumes an explicit analysis of the spectral "ripple" structure of the speech signal. However, since the STMI applies the same decision metric as the STI, it fails to account for spectral subtraction. The results from this study suggest that the SNRenv might reflect a powerful decision metric, while some explicit across-frequency analysis seems crucial in some conditions. How such across-frequency analysis is "realized" in the auditory system remains unresolved.

  15. Learning Vowel Categories from Maternal Speech in Gurindji Kriol

    ERIC Educational Resources Information Center

    Jones, Caroline; Meakins, Felicity; Muawiyath, Shujau

    2012-01-01

    Distributional learning is a proposal for how infants might learn early speech sound categories from acoustic input before they know many words. When categories in the input differ greatly in relative frequency and overlap in acoustic space, research in bilingual development suggests that this affects the course of development. In the present…

  16. Intensity Accents in French 2 Year Olds' Speech.

    ERIC Educational Resources Information Center

    Allen, George D.

    The acoustic features and functions of accentuation in French are discussed, and features of accentuation in the speech of French 2-year-olds are explored. The four major acoustic features used to signal accentual distinctions are fundamental frequency of voicing, duration of segments and syllables, intensity of segments and syllables, and…

  17. Speech Research

    NASA Astrophysics Data System (ADS)

    Several articles addressing topics in speech research are presented. The topics include: exploring the functional significance of physiological tremor: A biospectroscopic approach; differences between experienced and inexperienced listeners to deaf speech; a language-oriented view of reading and its disabilities; Phonetic factors in letter detection; categorical perception; Short-term recall by deaf signers of American sign language; a common basis for auditory sensory storage in perception and immediate memory; phonological awareness and verbal short-term memory; initiation versus execution time during manual and oral counting by stutterers; trading relations in the perception of speech by five-year-old children; the role of the strap muscles in pitch lowering; phonetic validation of distinctive features; consonants and syllable boundaires; and vowel information in postvocalic frictions.

  18. Speech analyzer

    NASA Technical Reports Server (NTRS)

    Lokerson, D. C. (Inventor)

    1977-01-01

    A speech signal is analyzed by applying the signal to formant filters which derive first, second and third signals respectively representing the frequency of the speech waveform in the first, second and third formants. A first pulse train having approximately a pulse rate representing the average frequency of the first formant is derived; second and third pulse trains having pulse rates respectively representing zero crossings of the second and third formants are derived. The first formant pulse train is derived by establishing N signal level bands, where N is an integer at least equal to two. Adjacent ones of the signal bands have common boundaries, each of which is a predetermined percentage of the peak level of a complete cycle of the speech waveform.

  19. Talker versus dialect effects on speech intelligibility: a symmetrical study

    PubMed Central

    McCloy, Daniel R.; Wright, Richard A.; Souza, Pamela E.

    2014-01-01

    This study investigates the relative effects of talker-specific variation and dialect-based variation on speech intelligibility. Listeners from two dialects of American English performed speech-in-noise tasks with sentences spoken by talkers of each dialect. An initial statistical model showed no significant effects for either talker or listener dialect group, and no interaction. However, a mixed-effects regression model including several acoustic measures of the talker’s speech revealed a subtle effect of talker dialect once the various acoustic dimensions were accounted for. Results are discussed in relation to other recent studies of cross-dialect intelligibility. PMID:26529902

  20. Communication in a noisy environment: Perception of one's own voice and speech enhancement

    NASA Astrophysics Data System (ADS)

    Le Cocq, Cecile

    Workers in noisy industrial environments are often confronted to communication problems. Lost of workers complain about not being able to communicate easily with their coworkers when they wear hearing protectors. In consequence, they tend to remove their protectors, which expose them to the risk of hearing loss. In fact this communication problem is a double one: first the hearing protectors modify one's own voice perception; second they interfere with understanding speech from others. This double problem is examined in this thesis. When wearing hearing protectors, the modification of one's own voice perception is partly due to the occlusion effect which is produced when an earplug is inserted in the car canal. This occlusion effect has two main consequences: first the physiological noises in low frequencies are better perceived, second the perception of one's own voice is modified. In order to have a better understanding of this phenomenon, the literature results are analyzed systematically, and a new method to quantify the occlusion effect is developed. Instead of stimulating the skull with a bone vibrator or asking the subject to speak as is usually done in the literature, it has been decided to excite the buccal cavity with an acoustic wave. The experiment has been designed in such a way that the acoustic wave which excites the buccal cavity does not excite the external car or the rest of the body directly. The measurement of the hearing threshold in open and occluded car has been used to quantify the subjective occlusion effect for an acoustic wave in the buccal cavity. These experimental results as well as those reported in the literature have lead to a better understanding of the occlusion effect and an evaluation of the role of each internal path from the acoustic source to the internal car. The speech intelligibility from others is altered by both the high sound levels of noisy industrial environments and the speech signal attenuation due to hearing

  1. Speech processing based on short-time Fourier analysis

    SciTech Connect

    Portnoff, M.R.

    1981-06-02

    Short-time Fourier analysis (STFA) is a mathematical technique that represents nonstationary signals, such as speech, music, and seismic signals in terms of time-varying spectra. This representation provides a formalism for such intuitive notions as time-varying frequency components and pitch contours. Consequently, STFA is useful for speech analysis and speech processing. This paper shows that STFA provides a convenient technique for estimating and modifying certain perceptual parameters of speech. As an example of an application of STFA of speech, the problem of time-compression or expansion of speech, while preserving pitch and time-varying frequency content is presented.

  2. Enhancement of Electrolaryngeal Speech by Adaptive Filtering.

    ERIC Educational Resources Information Center

    Espy-Wilson, Carol Y.; Chari, Venkatesh R.; MacAuslan, Joel M.; Huang, Caroline B.; Walsh, Michael J.

    1998-01-01

    A study tested the quality and intelligibility, as judged by several listeners, of four users' electrolaryngeal speech, with and without filtering to compensate for perceptually objectionable acoustic characteristics. Results indicated that an adaptive filtering technique produced a noticeable improvement in the quality of the Transcutaneous…

  3. GALLAUDET'S NEW HEARING AND SPEECH CENTER.

    ERIC Educational Resources Information Center

    FRISINA, D. ROBERT

    THIS REPROT DESCRIBES THE DESIGN OF A NEW SPEECH AND HEARING CENTER AND ITS INTEGRATION INTO THE OVERALL ARCHITECTURAL SCHEME OF THE CAMPUS. THE CIRCULAR SHAPE WAS SELECTED TO COMPLEMENT THE SURROUNDING STRUCTURES AND COMPENSATE FOR DIFFERENCES IN SITE, WHILE PROVIDING THE ACOUSTICAL ADVANTAGES OF NON-PARALLEL WALLS, AND FACILITATING TRAFFIC FLOW.…

  4. Effects of Syntactic Expectations on Speech Segmentation

    ERIC Educational Resources Information Center

    Mattys, Sven L.; Melhorn, James F.; White, Laurence

    2007-01-01

    Although the effect of acoustic cues on speech segmentation has been extensively investigated, the role of higher order information (e.g., syntax) has received less attention. Here, the authors examined whether syntactic expectations based on subject-verb agreement have an effect on segmentation and whether they do so despite conflicting acoustic…

  5. Effects of Cognitive Load on Speech Recognition

    ERIC Educational Resources Information Center

    Mattys, Sven L.; Wiget, Lukas

    2011-01-01

    The effect of cognitive load (CL) on speech recognition has received little attention despite the prevalence of CL in everyday life, e.g., dual-tasking. To assess the effect of CL on the interaction between lexically-mediated and acoustically-mediated processes, we measured the magnitude of the "Ganong effect" (i.e., lexical bias on phoneme…

  6. Differentiating speech and nonspeech sounds via amplitude envelope cues

    NASA Astrophysics Data System (ADS)

    Lehnhoff, Robert J.; Strange, Winifred; Long, Glenis

    2001-05-01

    Recent evidence from neuroscience and behavioral speech science suggests that the temporal modulation pattern of the speech signal plays a distinctive role in speech perception. As a first step in exploring the nature of the perceptually relevant information in the temporal pattern of speech, this experiment examined whether speech versus nonspeech environmental sounds could be differentiated on the basis of their amplitude envelopes. Conversational speech was recorded from native speakers of six different languages (French, German, Hebrew, Hindi, Japanese, and Russian) along with samples of their English. Nonspeech sounds included animal vocalizations, water sounds, and other environmental sounds (e.g., thunder). The stimulus set included 30 2-s speech segments and 30 2-s nonspeech events. Frequency information was removed from all stimuli using a technique described by Dorman et al. [J. Acoust. Soc. Am. 102 (1997)]. Nine normal-hearing adult listeners participated in the experiment. Subjects decided whether each sound was (originally) speech or nonspeech and rated their confidence (7-point Likert scale). Overall, subjects differentiated speech from nonspeech very accurately (84% correct). Only 12 stimuli were not correctly categorized at greater than chance levels. Acoustical analysis is underway to determine what parameters of the amplitude envelope differentiate speech from nonspeech sounds.

  7. Differentiating speech and nonspeech sounds via amplitude envelope cues

    NASA Astrophysics Data System (ADS)

    Lehnhoff, Robert J.; Strange, Winifred; Long, Glenis

    2004-05-01

    Recent evidence from neuroscience and behavioral speech science suggests that the temporal modulation pattern of the speech signal plays a distinctive role in speech perception. As a first step in exploring the nature of the perceptually relevant information in the temporal pattern of speech, this experiment examined whether speech versus nonspeech environmental sounds could be differentiated on the basis of their amplitude envelopes. Conversational speech was recorded from native speakers of six different languages (French, German, Hebrew, Hindi, Japanese, and Russian) along with samples of their English. Nonspeech sounds included animal vocalizations, water sounds, and other environmental sounds (e.g., thunder). The stimulus set included 30 2-s speech segments and 30 2-s nonspeech events. Frequency information was removed from all stimuli using a technique described by Dorman et al. [J. Acoust. Soc. Am. 102 (1997)]. Nine normal-hearing adult listeners participated in the experiment. Subjects decided whether each sound was (originally) speech or nonspeech and rated their confidence (7-point Likert scale). Overall, subjects differentiated speech from nonspeech very accurately (84% correct). Only 12 stimuli were not correctly categorized at greater than chance levels. Acoustical analysis is underway to determine what parameters of the amplitude envelope differentiate speech from nonspeech sounds.

  8. Acoustic Neuroma

    MedlinePlus

    ... search IRSA's site Unique Hits since January 2003 Acoustic Neuroma Click Here for Acoustic Neuroma Practice Guideline ... to microsurgery. One doctor's story of having an acoustic neuroma In August 1991, Dr. Thomas F. Morgan ...

  9. Cross spectral measurement of head related speech transfer functions using speaker's own voice

    NASA Astrophysics Data System (ADS)

    Nukina, Masumi; Kawahara, Hideki

    2002-11-01

    A cross spectrum method is applied to measure sound pressure variations around the head using the speaker's own speech sounds. The variations are represented as transfer functions from the mouth reference point to a set of measuring points. Preliminary tests indicated that there are systematic frequency response variations depending on vowel colors. This vowel color dependency was not replicated in the classical measurement of speech radiation characteristics by J. L. Flanagan. However, taking into account the large (sometimes exceeding 20 dB) amount of variations, it is not likely to be negligible. A set of calibration and normalization procedures were introduced to reduce artifacts due to background noise, room acoustics, zeros in the speech spectra. A series of M-sequence based transfer function measurements were also conducted using a head and torso simulator to evaluate intrinsic errors in the cross spectral measurements. It was found that the standard errors in the cross spectral measurements using recorded speech sounds are around 1 dB. Based on these reference data and confidence interval calculations based on coherence, it is safe to conclude that the vowel color dependency is significantly modifying the transfer functions. [Work supported by JSPS.

  10. Speech vs. singing: infants choose happier sounds

    PubMed Central

    Corbeil, Marieve; Trehub, Sandra E.; Peretz, Isabelle

    2013-01-01

    Infants prefer speech to non-vocal sounds and to non-human vocalizations, and they prefer happy-sounding speech to neutral speech. They also exhibit an interest in singing, but there is little knowledge of their relative interest in speech and singing. The present study explored infants' attention to unfamiliar audio samples of speech and singing. In Experiment 1, infants 4–13 months of age were exposed to happy-sounding infant-directed speech vs. hummed lullabies by the same woman. They listened significantly longer to the speech, which had considerably greater acoustic variability and expressiveness, than to the lullabies. In Experiment 2, infants of comparable age who heard the lyrics of a Turkish children's song spoken vs. sung in a joyful/happy manner did not exhibit differential listening. Infants in Experiment 3 heard the happily sung lyrics of the Turkish children's song vs. a version that was spoken in an adult-directed or affectively neutral manner. They listened significantly longer to the sung version. Overall, happy voice quality rather than vocal mode (speech or singing) was the principal contributor to infant attention, regardless of age. PMID:23805119

  11. Speech neglect: A strange educational blind spot

    NASA Astrophysics Data System (ADS)

    Harris, Katherine Safford

    2005-09-01

    Speaking is universally acknowledged as an important human talent, yet as a topic of educated common knowledge, it is peculiarly neglected. Partly, this is a consequence of the relatively recent growth of research on speech perception, production, and development, but also a function of the way that information is sliced up by undergraduate colleges. Although the basic acoustic mechanism of vowel production was known to Helmholtz, the ability to view speech production as a physiological event is evolving even now with such techniques as fMRI. Intensive research on speech perception emerged only in the early 1930s as Fletcher and the engineers at Bell Telephone Laboratories developed the transmission of speech over telephone lines. The study of speech development was revolutionized by the papers of Eimas and his colleagues on speech perception in infants in the 1970s. Dissemination of knowledge in these fields is the responsibility of no single academic discipline. It forms a center for two departments, Linguistics, and Speech and Hearing, but in the former, there is a heavy emphasis on other aspects of language than speech and, in the latter, a focus on clinical practice. For psychologists, it is a rather minor component of a very diverse assembly of topics. I will focus on these three fields in proposing possible remedies.

  12. Speech Intelligibility

    NASA Astrophysics Data System (ADS)

    Brand, Thomas

    Speech intelligibility (SI) is important for different fields of research, engineering and diagnostics in order to quantify very different phenomena like the quality of recordings, communication and playback devices, the reverberation of auditoria, characteristics of hearing impairment, benefit using hearing aids or combinations of these things.

  13. Keynote Speeches.

    ERIC Educational Resources Information Center

    2000

    This document contains the six of the seven keynote speeches from an international conference on vocational education and training (VET) for lifelong learning in the information era. "IVETA (International Vocational Education and Training Association) 2000 Conference 6-9 August 2000" (K.Y. Yeung) discusses the objectives and activities…

  14. High-Accuracy Large-Vocabulary Speech Recognition Using Mixture Tying and Consistency Modeling

    DTIC Science & Technology

    1994-01-01

    Hidden Markov Models for Speech Recognition," Proc. ICASSP-87. 21. S. J. Young, "The General Use of Tying in Phoneme -Based HMM Speech Recognizers," Proc. ICASSP, pp. 1-569 - 1- 572, March 1992. 318 ...HIGH-ACCURACY LARGE-VOCABULARY SPEECH RECOGNITION USING MIXTURE TYING AND CONSISTENCY MODELING Vassilios Digalakis and Hy Murveit SRI In ternat...T Improved acoustic modeling can significantly decrease the error rate in large-vocabulary speech recognition. Our approach to the problem is

  15. Neural Oscillations Carry Speech Rhythm through to Comprehension

    PubMed Central

    Peelle, Jonathan E.; Davis, Matthew H.

    2012-01-01

    A key feature of speech is the quasi-regular rhythmic information contained in its slow amplitude modulations. In this article we review the information conveyed by speech rhythm, and the role of ongoing brain oscillations in listeners’ processing of this content. Our starting point is the fact that speech is inherently temporal, and that rhythmic information conveyed by the amplitude envelope contains important markers for place and manner of articulation, segmental information, and speech rate. Behavioral studies demonstrate that amplitude envelope information is relied upon by listeners and plays a key role in speech intelligibility. Extending behavioral findings, data from neuroimaging – particularly electroencephalography (EEG) and magnetoencephalography (MEG) – point to phase locking by ongoing cortical oscillations to low-frequency information (~4–8 Hz) in the speech envelope. This phase modulation effectively encodes a prediction of when important events (such as stressed syllables) are likely to occur, and acts to increase sensitivity to these relevant acoustic cues. We suggest a framework through which such neural entrainment to speech rhythm can explain effects of speech rate on word and segment perception (i.e., that the perception of phonemes and words in connected speech is influenced by preceding speech rate). Neuroanatomically, acoustic amplitude modulations are processed largely bilaterally in auditory cortex, with intelligible speech resulting in differential recruitment of left-hemisphere regions. Notable among these is lateral anterior temporal cortex, which we propose functions in a domain-general fashion to support ongoing memory and integration of meaningful input. Together, the reviewed evidence suggests that low-frequency oscillations in the acoustic speech signal form the foundation of a rhythmic hierarchy supporting spoken language, mirrored by phase-locked oscillations in the human brain. PMID:22973251

  16. Intensive Speech and Language Therapy for Older Children with Cerebral Palsy: A Systems Approach

    ERIC Educational Resources Information Center

    Pennington, Lindsay; Miller, Nick; Robson, Sheila; Steen, Nick

    2010-01-01

    Aim: To investigate whether speech therapy using a speech systems approach to controlling breath support, phonation, and speech rate can increase the speech intelligibility of children with dysarthria and cerebral palsy (CP). Method: Sixteen children with dysarthria and CP participated in a modified time series design. Group characteristics were…

  17. Speech perception as an active cognitive process.

    PubMed

    Heald, Shannon L M; Nusbaum, Howard C

    2014-01-01

    One view of speech perception is that acoustic signals are transformed into representations for pattern matching to determine linguistic structure. This process can be taken as a statistical pattern-matching problem, assuming realtively stable linguistic categories are characterized by neural representations related to auditory properties of speech that can be compared to speech input. This kind of pattern matching can be termed a passive process which implies rigidity of processing with few demands on cognitive processing. An alternative view is that speech recognition, even in early stages, is an active process in which speech analysis is attentionally guided. Note that this does not mean consciously guided but that information-contingent changes in early auditory encoding can occur as a function of context and experience. Active processing assumes that attention, plasticity, and listening goals are important in considering how listeners cope with adverse circumstances that impair hearing by masking noise in the environment or hearing loss. Although theories of speech perception have begun to incorporate some active processing, they seldom treat early speech encoding as plastic and attentionally guided. Recent research has suggested that speech perception is the product of both feedforward and feedback interactions between a number of brain regions that include descending projections perhaps as far downstream as the cochlea. It is important to understand how the ambiguity of the speech signal and constraints of context dynamically determine cognitive resources recruited during perception including focused attention, learning, and working memory. Theories of speech perception need to go beyond the current corticocentric approach in order to account for the intrinsic dynamics of the auditory encoding of speech. In doing so, this may provide new insights into ways in which hearing disorders and loss may be treated either through augementation or therapy.

  18. Speech research: Studies on the nature of speech, instrumentation for its investigation, and practical applications

    NASA Astrophysics Data System (ADS)

    Liberman, A. M.

    1982-03-01

    This report is one of a regular series on the status and progress of studies on the nature of speech, instrumentation for its investigation and practical applications. Manuscripts cover the following topics: Speech perception and memory coding in relation to reading ability; The use of orthographic structure by deaf adults: Recognition of finger-spelled letters; Exploring the information support for speech; The stream of speech; Using the acoustic signal to make inferences about place and duration of tongue-palate contact. Patterns of human interlimb coordination emerge from the the properties of nonlinear limit cycle oscillatory processes: Theory and data; Motor control: Which themes do we orchestrate? Exploring the nature of motor control in Down's syndrome; Periodicity and auditory memory: A pilot study; Reading skill and language skill: On the role of sign order and morphological structure in memory for American Sign Language sentences; Perception of nasal consonants with special reference to Catalan; and Speech production Characteristics of the hearing impaired.

  19. The relationship between perceptual disturbances in dysarthric speech and automatic speech recognition performance.

    PubMed

    Tu, Ming; Wisler, Alan; Berisha, Visar; Liss, Julie M

    2016-11-01

    State-of-the-art automatic speech recognition (ASR) engines perform well on healthy speech; however recent studies show that their performance on dysarthric speech is highly variable. This is because of the acoustic variability associated with the different dysarthria subtypes. This paper aims to develop a better understanding of how perceptual disturbances in dysarthric speech relate to ASR performance. Accurate ratings of a representative set of 32 dysarthric speakers along different perceptual dimensions are obtained and the performance of a representative ASR algorithm on the same set of speakers is analyzed. This work explores the relationship between these ratings and ASR performance and reveals that ASR performance can be predicted from perceptual disturbances in dysarthric speech with articulatory precision contributing the most to the prediction followed by prosody.

  20. The intelligibility of tracheoesophageal speech, with an emphasis on the voiced-voiceless distinction.

    PubMed

    Jongmans, P; Hilgers, F J M; Pols, L C W; van As-Brooks, C J

    2006-01-01

    Total laryngectomy has far-reaching effects on vocal tract anatomy and physiology. The preferred method for restoring postlaryngectomy oral communication is prosthetic tracheoesophageal (TE) speech, which like laryngeal speech is pulmonary driven. TE speech quality is better than esophageal or electrolarynx speech quality, but still very deviant from laryngeal speech. For a better understanding of neoglottis physiology and for improving rehabilitation results, study of TE speech intelligibility remains important. Methods used were perceptual evaluation, acoustic analyses, and digital high-speed imaging. First results show large variations between speakers and especially difficulty in producing voiced-voiceless distinction. This paper discusses first results of our experiment.

  1. Melodic contour identification and sentence recognition using sung speech

    PubMed Central

    Crew, Joseph D.; Galvin, John J.; Fu, Qian-Jie

    2015-01-01

    For bimodal cochlear implant users, acoustic and electric hearing has been shown to contribute differently to speech and music perception. However, differences in test paradigms and stimuli in speech and music testing can make it difficult to assess the relative contributions of each device. To address these concerns, the Sung Speech Corpus (SSC) was created. The SSC contains 50 monosyllable words sung over an octave range and can be used to test both speech and music perception using the same stimuli. Here SSC data are presented with normal hearing listeners and any advantage of musicianship is examined. PMID:26428838

  2. Application of a short-time version of the Equalization-Cancellation model to speech intelligibility experiments with speech maskers.

    PubMed

    Wan, Rui; Durlach, Nathaniel I; Colburn, H Steven

    2014-08-01

    A short-time-processing version of the Equalization-Cancellation (EC) model of binaural processing is described and applied to speech intelligibility tasks in the presence of multiple maskers, including multiple speech maskers. This short-time EC model, called the STEC model, extends the model described by Wan et al. [J. Acoust. Soc. Am. 128, 3678-3690 (2010)] to allow the EC model's equalization parameters τ and α to be adjusted as a function of time, resulting in improved masker cancellation when the dominant masker location varies in time. Using the Speech Intelligibility Index, the STEC model is applied to speech intelligibility with maskers that vary in number, type, and spatial arrangements. Most notably, when maskers are located on opposite sides of the target, this STEC model predicts improved thresholds when the maskers are modulated independently with speech-envelope modulators; this includes the most relevant case of independent speech maskers. The STEC model describes the spatial dependence of the speech reception threshold with speech maskers better than the steady-state model. Predictions are also improved for independently speech-modulated noise maskers but are poorer for reversed-speech maskers. In general, short-term processing is useful, but much remains to be done in the complex task of understanding speech in speech maskers.

  3. Combined Electric and Contralateral Acoustic Hearing: Word and Sentence Recognition with Bimodal Hearing

    ERIC Educational Resources Information Center

    Gifford, Rene H.; Dorman, Michael F.; McKarns, Sharon A.; Spahr, Anthony J.

    2007-01-01

    Purpose: The authors assessed whether (a) a full-insertion cochlear implant would provide a higher level of speech understanding than bilateral low-frequency acoustic hearing, (b) contralateral acoustic hearing would add to the speech understanding provided by the implant, and (c) the level of performance achieved with electric stimulation plus…

  4. High-frequency neural activity predicts word parsing in ambiguous speech streams.

    PubMed

    Kösem, Anne; Basirat, Anahita; Azizi, Leila; van Wassenhove, Virginie

    2016-12-01

    During speech listening, the brain parses a continuous acoustic stream of information into computational units (e.g., syllables or words) necessary for speech comprehension. Recent neuroscientific hypotheses have proposed that neural oscillations contribute to speech parsing, but whether they do so on the basis of acoustic cues (bottom-up acoustic parsing) or as a function of available linguistic representations (top-down linguistic parsing) is unknown. In this magnetoencephalography study, we contrasted acoustic and linguistic parsing using bistable speech sequences. While listening to the speech sequences, participants were asked to maintain one of the two possible speech percepts through volitional control. We predicted that the tracking of speech dynamics by neural oscillations would not only follow the acoustic properties but also shift in time according to the participant's conscious speech percept. Our results show that the latency of high-frequency activity (specifically, beta and gamma bands) varied as a function of the perceptual report. In contrast, the phase of low-frequency oscillations was not strongly affected by top-down control. Whereas changes in low-frequency neural oscillations were compatible with the encoding of prelexical segmentation cues, high-frequency activity specifically informed on an individual's conscious speech percept.

  5. The evolution of speech: vision, rhythm, cooperation.

    PubMed

    Ghazanfar, Asif A; Takahashi, Daniel Y

    2014-10-01

    A full account of human speech evolution must consider its multisensory, rhythmic, and cooperative characteristics. Humans, apes, and monkeys recognize the correspondence between vocalizations and their associated facial postures, and gain behavioral benefits from them. Some monkey vocalizations even have a speech-like acoustic rhythmicity but lack the concomitant rhythmic facial motion that speech exhibits. We review data showing that rhythmic facial expressions such as lip-smacking may have been linked to vocal output to produce an ancestral form of rhythmic audiovisual speech. Finally, we argue that human vocal cooperation (turn-taking) may have arisen through a combination of volubility and prosociality, and provide comparative evidence from one species to support this hypothesis.

  6. The evolution of speech: vision, rhythm, cooperation

    PubMed Central

    Ghazanfar, Asif A.; Takahashi, Daniel Y.

    2014-01-01

    A full account of human speech evolution must consider its multisensory, rhythmic, and cooperative characteristics. Humans, apes and monkeys recognize the correspondence between vocalizations and the associated facial postures and gain behavioral benefits from them. Some monkey vocalizations even have a speech-like acoustic rhythmicity, yet they lack the concomitant rhythmic facial motion that speech exhibits. We review data showing that facial expressions like lip-smacking may be an ancestral expression that was later linked to vocal output in order to produce rhythmic audiovisual speech. Finally, we argue that human vocal cooperation (turn-taking) may have arisen through a combination of volubility and prosociality, and provide comparative evidence from one species to support this hypothesis. PMID:25048821

  7. Speech communications in noise

    NASA Technical Reports Server (NTRS)

    1984-01-01

    The physical characteristics of speech, the methods of speech masking measurement, and the effects of noise on speech communication are investigated. Topics include the speech signal and intelligibility, the effects of noise on intelligibility, the articulation index, and various devices for evaluating speech systems.

  8. Speech communications in noise

    NASA Astrophysics Data System (ADS)

    1984-07-01

    The physical characteristics of speech, the methods of speech masking measurement, and the effects of noise on speech communication are investigated. Topics include the speech signal and intelligibility, the effects of noise on intelligibility, the articulation index, and various devices for evaluating speech systems.

  9. The Auditory-Brainstem Response to Continuous, Non-repetitive Speech Is Modulated by the Speech Envelope and Reflects Speech Processing

    PubMed Central

    Reichenbach, Chagit S.; Braiman, Chananel; Schiff, Nicholas D.; Hudspeth, A. J.; Reichenbach, Tobias

    2016-01-01

    The auditory-brainstem response (ABR) to short and simple acoustical signals is an important clinical tool used to diagnose the integrity of the brainstem. The ABR is also employed to investigate the auditory brainstem in a multitude of tasks related to hearing, such as processing speech or selectively focusing on one speaker in a noisy environment. Such research measures the response of the brainstem to short speech signals such as vowels or words. Because the voltage signal of the ABR has a tiny amplitude, several hundred to a thousand repetitions of the acoustic signal are needed to obtain a reliable response. The large number of repetitions poses a challenge to assessing cognitive functions due to neural adaptation. Here we show that continuous, non-repetitive speech, lasting several minutes, may be employed to measure the ABR. Because the speech is not repeated during the experiment, the precise temporal form of the ABR cannot be determined. We show, however, that important structural features of the ABR can nevertheless be inferred. In particular, the brainstem responds at the fundamental frequency of the speech signal, and this response is modulated by the envelope of the voiced parts of speech. We accordingly introduce a novel measure that assesses the ABR as modulated by the speech envelope, at the fundamental frequency of speech and at the characteristic latency of the response. This measure has a high signal-to-noise ratio and can hence be employed effectively to measure the ABR to continuous speech. We use this novel measure to show that the ABR is weaker to intelligible speech than to unintelligible, time-reversed speech. The methods presented here can be employed for further research on speech processing in the auditory brainstem and can lead to the development of future clinical diagnosis of brainstem function. PMID:27303286

  10. Analysis of speech and tongue motion in normal and post-glossectomy speaker using cine MRI

    PubMed Central

    Ha, Jinhee; Sung, Iel-yong; Son, Jang-ho; Stone, Maureen; Ord, Robert; Cho, Yeong-cheol

    2016-01-01

    ABSTRACT Objective Since the tongue is the oral structure responsible for mastication, pronunciation, and swallowing functions, patients who undergo glossectomy can be affected in various aspects of these functions. The vowel /i/ uses the tongue shape, whereas /u/ uses tongue and lip shapes. The purpose of this study is to investigate the morphological changes of the tongue and the adaptation of pronunciation using cine MRI for speech of patients who undergo glossectomy. Material and Methods Twenty-three controls (11 males and 12 females) and 13 patients (eight males and five females) volunteered to participate in the experiment. The patients underwent glossectomy surgery for T1 or T2 lateral lingual tumors. The speech tasks “a souk” and “a geese” were spoken by all subjects providing data for the vowels /u/ and /i/. Cine MRI and speech acoustics were recorded and measured to compare the changes in the tongue with vowel acoustics after surgery. 2D measurements were made of the interlip distance, tongue-palate distance, tongue position (anterior-posterior and superior-inferior), tongue height on the left and right sides, and pharynx size. Vowel formants Fl, F2, and F3 were measured. Results The patients had significantly lower F2/Fl ratios (F=5.911, p=0.018), and lower F3/F1 ratios that approached significance. This was seen primarily in the /u/ data. Patients had flatter tongue shapes than controls with a greater effect seen in /u/ than /i/. Conclusion The patients showed complex adaptation motion in order to preserve the acoustic integrity of the vowels, and the tongue modified cavity size relationships to maintain the value of the formant frequencies. PMID:27812617

  11. Speech Research.

    DTIC Science & Technology

    1979-12-31

    Academic Press, 1973. Kimura, D. The neural basis of language qua gesture. In H. Whitaker & H. A. Whitaker (Eds.), Studies in neurolinguistics (Vol. 3...Lubker, J., & Gay, T. Formant frequencies of some fixed- mandible vowels and a model of speech motor programming . Journal of Phonetics, 1979, 7, 147-162...A. Interarticulator programming in stop production. To appear in Journal of Phonetics, in press. Ldfqvist, A., & Yoshioka, H. Laryngeal activity in

  12. Extensions to the Speech Disorders Classification System (SDCS)

    PubMed Central

    Shriberg, Lawrence D.; Fourakis, Marios; Hall, Sheryl D.; Karlsson, Heather B.; Lohmeier, Heather L.; McSweeny, Jane L.; Potter, Nancy L.; Scheer-Cohen, Alison R.; Strand, Edythe A.; Tilkens, Christie M.; Wilson, David L.

    2010-01-01

    This report describes three extensions to a classification system for pediatric speech sound disorders termed the Speech Disorders Classification System (SDCS). Part I describes a classification extension to the SDCS to differentiate motor speech disorders from speech delay and to differentiate among three subtypes of motor speech disorders. Part II describes the Madison Speech Assessment Protocol (MSAP), an approximately two-hour battery of 25 measures that includes 15 speech tests and tasks. Part III describes the Competence, Precision, and Stability Analytics (CPSA) framework, a current set of approximately 90 perceptual- and acoustic-based indices of speech, prosody, and voice used to quantify and classify subtypes of Speech Sound Disorders (SSD). A companion paper, Shriberg, Fourakis, et al. (2010) provides reliability estimates for the perceptual and acoustic data reduction methods used in the SDCS. The agreement estimates in the companion paper support the reliability of SDCS methods and illustrate the complementary roles of perceptual and acoustic methods in diagnostic analyses of SSD of unknown origin. Examples of research using the extensions to the SDCS described in the present report include diagnostic findings for a sample of youth with motor speech disorders associated with galactosemia (Shriberg, Potter, & Strand, 2010) and a test of the hypothesis of apraxia of speech in a group of children with autism spectrum disorders (Shriberg, Paul, Black, & van Santen, 2010). All SDCS methods and reference databases running in the PEPPER (Programs to Examine Phonetic and Phonologic Evaluation Records; [Shriberg, Allen, McSweeny, & Wilson, 2001]) environment will be disseminated without cost when complete. PMID:20831378

  13. A survey of acoustic conditions in semi-open plan classrooms in the United Kingdom.

    PubMed

    Greenland, Emma E; Shield, Bridget M

    2011-09-01

    This paper reports the results of a large scale, detailed acoustic survey of 42 open plan classrooms of varying design in the UK each of which contained between 2 and 14 teaching areas or classbases. The objective survey procedure, which was designed specifically for use in open plan classrooms, is described. The acoustic measurements relating to speech intelligibility within a classbase, including ambient noise level, intrusive noise level, speech to noise ratio, speech transmission index, and reverberation time, are presented. The effects on speech intelligibility of critical physical design variables, such as the number of classbases within an open plan unit and the selection of acoustic finishes for control of reverberation, are examined. This analysis enables limitations of open plan classrooms to be discussed and acoustic design guidelines to be developed to ensure good listening conditions. The types of teaching activity to provide adequate acoustic conditions, plus the speech intelligibility requirements of younger children, are also discussed.

  14. Headphone localization of speech stimuli

    NASA Technical Reports Server (NTRS)

    Begault, Durand R.; Wenzel, Elizabeth M.

    1991-01-01

    Recently, three dimensional acoustic display systems have been developed that synthesize virtual sound sources over headphones based on filtering by Head-Related Transfer Functions (HRTFs), the direction-dependent spectral changes caused primarily by the outer ears. Here, 11 inexperienced subjects judged the apparent spatial location of headphone-presented speech stimuli filtered with non-individualized HRTFs. About half of the subjects 'pulled' their judgements toward either the median or the lateral-vertical planes, and estimates were almost always elevated. Individual differences were pronounced for the distance judgements; 15 to 46 percent of stimuli were heard inside the head with the shortest estimates near the median plane. The results infer that most listeners can obtain useful azimuth information from speech stimuli filtered by nonindividualized RTFs. Measurements of localization error and reversal rates are comparable with a previous study that used broadband noise stimuli.

  15. Speech Intelligibility with Acoustic and Contact Microphones

    DTIC Science & Technology

    2005-04-01

    each category containing 16 word pairs that differ only in the initial consonant. The six consonant categories are voicing, nasality, sustention ...voiced) are paired with their bilabial stop counterparts. meat (nasal) vs. beat (voiced, bilabial stop) Sustention (Sust) No movement compared...and sustention categories. The current results clearly demonstrate that while the throat microphone enhances the signal-to-noise ratio, the

  16. Speech Recognition: Acoustic, Phonetic and Lexical Knowledge

    DTIC Science & Technology

    2013-04-04

    v en 1 < 2 0 Ü e ® • >j> •> . - - ^^^S-J .* »-. -^ J O-’-r^ iOt -’BOlOOM 3g» » i.a jt a ’S S 2 u « g I ä o a S 3 9 « J ^ a...M.O — !M.O-^ )«.0 -...0 » 1 T i r*i—r ■ mm WIM, . 9.01 Fi(jure 3: Histogram . Standard Dev alien RECOGNH ION EXPER’.vIENT Alter

  17. Hearing impaired speech in noisy classrooms

    NASA Astrophysics Data System (ADS)

    Shahin, Kimary; McKellin, William H.; Jamieson, Janet; Hodgson, Murray; Pichora-Fuller, M. Kathleen

    2005-04-01

    Noisy classrooms have been shown to induce among students patterns of interaction similar to those used by hearing impaired people [W. H. McKellin et al., GURT (2003)]. In this research, the speech of children in a noisy classroom setting was investigated to determine if noisy classrooms have an effect on students' speech. Audio recordings were made of the speech of students during group work in their regular classrooms (grades 1-7), and of the speech of the same students in a sound booth. Noise level readings in the classrooms were also recorded. Each student's noisy and quiet environment speech samples were acoustically analyzed for prosodic and segmental properties (f0, pitch range, pitch variation, phoneme duration, vowel formants), and compared. The analysis showed that the students' speech in the noisy classrooms had characteristics of the speech of hearing-impaired persons [e.g., R. O'Halpin, Clin. Ling. and Phon. 15, 529-550 (2001)]. Some educational implications of our findings were identified. [Work supported by the Peter Wall Institute for Advanced Studies, University of British Columbia.

  18. Hidden Markov models in automatic speech recognition

    NASA Astrophysics Data System (ADS)

    Wrzoskowicz, Adam

    1993-11-01

    This article describes a method for constructing an automatic speech recognition system based on hidden Markov models (HMMs). The author discusses the basic concepts of HMM theory and the application of these models to the analysis and recognition of speech signals. The author provides algorithms which make it possible to train the ASR system and recognize signals on the basis of distinct stochastic models of selected speech sound classes. The author describes the specific components of the system and the procedures used to model and recognize speech. The author discusses problems associated with the choice of optimal signal detection and parameterization characteristics and their effect on the performance of the system. The author presents different options for the choice of speech signal segments and their consequences for the ASR process. The author gives special attention to the use of lexical, syntactic, and semantic information for the purpose of improving the quality and efficiency of the system. The author also describes an ASR system developed by the Speech Acoustics Laboratory of the IBPT PAS. The author discusses the results of experiments on the effect of noise on the performance of the ASR system and describes methods of constructing HMM's designed to operate in a noisy environment. The author also describes a language for human-robot communications which was defined as a complex multilevel network from an HMM model of speech sounds geared towards Polish inflections. The author also added mandatory lexical and syntactic rules to the system for its communications vocabulary.

  19. Working memory and intelligibility of hearing-aid processed speech.

    PubMed

    Souza, Pamela E; Arehart, Kathryn H; Shen, Jing; Anderson, Melinda; Kates, James M

    2015-01-01

    Previous work suggested that individuals with low working memory capacity may be at a disadvantage in adverse listening environments, including situations with background noise or substantial modification of the acoustic signal. This study explored the relationship between patient factors (including working memory capacity) and intelligibility and quality of modified speech for older individuals with sensorineural hearing loss. The modification was created using a combination of hearing aid processing [wide-dynamic range compression (WDRC) and frequency compression (FC)] applied to sentences in multitalker babble. The extent of signal modification was quantified via an envelope fidelity index. We also explored the contribution of components of working memory by including measures of processing speed and executive function. We hypothesized that listeners with low working memory capacity would perform more poorly than those with high working memory capacity across all situations, and would also be differentially affected by high amounts of signal modification. Results showed a significant effect of working memory capacity for speech intelligibility, and an interaction between working memory, amount of hearing loss and signal modification. Signal modification was the major predictor of quality ratings. These data add to the literature on hearing-aid processing and working memory by suggesting that the working memory-intelligibility effects may be related to aggregate signal fidelity, rather than to the specific signal manipulation. They also suggest that for individuals with low working memory capacity, sensorineural loss may be most appropriately addressed with WDRC and/or FC parameters that maintain the fidelity of the signal envelope.

  20. Working memory and intelligibility of hearing-aid processed speech

    PubMed Central

    Souza, Pamela E.; Arehart, Kathryn H.; Shen, Jing; Anderson, Melinda; Kates, James M.

    2015-01-01

    Previous work suggested that individuals with low working memory capacity may be at a disadvantage in adverse listening environments, including situations with background noise or substantial modification of the acoustic signal. This study explored the relationship between patient factors (including working memory capacity) and intelligibility and quality of modified speech for older individuals with sensorineural hearing loss. The modification was created using a combination of hearing aid processing [wide-dynamic range compression (WDRC) and frequency compression (FC)] applied to sentences in multitalker babble. The extent of signal modification was quantified via an envelope fidelity index. We also explored the contribution of components of working memory by including measures of processing speed and executive function. We hypothesized that listeners with low working memory capacity would perform more poorly than those with high working memory capacity across all situations, and would also be differentially affected by high amounts of signal modification. Results showed a significant effect of working memory capacity for speech intelligibility, and an interaction between working memory, amount of hearing loss and signal modification. Signal modification was the major predictor of quality ratings. These data add to the literature on hearing-aid processing and working memory by suggesting that the working memory-intelligibility effects may be related to aggregate signal fidelity, rather than to the specific signal manipulation. They also suggest that for individuals with low working memory capacity, sensorineural loss may be most appropriately addressed with WDRC and/or FC parameters that maintain the fidelity of the signal envelope. PMID:25999874

  1. Speech Motor Correlates of Treatment-Related Changes in Stuttering Severity and Speech Naturalness

    ERIC Educational Resources Information Center

    Tasko, Stephen M.; McClean, Michael D.; Runyan, Charles M.

    2007-01-01

    Participants of stuttering treatment programs provide an opportunity to evaluate persons who stutter as they demonstrate varying levels of fluency. Identifying physiologic correlates of altered fluency levels may lead to insights about mechanisms of speech disfluency. This study examined respiratory, orofacial kinematic and acoustic measures in 35…

  2. Acoustic measurements of articulator motions.

    PubMed

    Schroeder, M R; Strube, H W

    1979-01-01

    Methods for estimating articulatory data from acoustic measurements are reviewed. First, relations between the vocal-tract area function and formant or impedance data are pointed out. Then the possibility of determining a (discretized) area function from the speech signal itself is considered. Finally, we look at the estimation of certain articulatory parameters rather than the area function. By using a regression method, such parameters can even be estimated independently of any vocal-tract model. Results for real-speech data are given.

  3. Human speech articulator measurements using low power, 2GHz Homodyne sensors

    SciTech Connect

    Barnes, T; Burnett, G C; Holzrichter, J F

    1999-06-29

    Very low power, short-range microwave ''radar-like'' sensors can measure the motions and vibrations of internal human speech articulators as speech is produced. In these animate (and also in inanimate acoustic systems) microwave sensors can measure vibration information associated with excitation sources and other interfaces. These data, together with the corresponding acoustic data, enable the calculation of system transfer functions. This information appears to be useful for a surprisingly wide range of applications such as speech coding and recognition, speaker or object identification, speech and musical instrument synthesis, noise cancellation, and other applications.

  4. The effects of macroglossia on speech: a case study.

    PubMed

    Mekonnen, Abebayehu Messele

    2012-01-01

    This article presents a case study of speech production in a 14-year-old Amharic-speaking boy. The boy had developed secondary macroglossia, related to a disturbance of growth hormones, following a history of normal speech development. Perceptual analysis combined with acoustic analysis and static palatography is used to investigate the specific articulatory compensations arising from the macroglossia. The subset of sounds chosen for study were the denti-alveolar and alveolar plosives, fricatives, ejectives, nasal, lateral and trill produced in single words, as well as in short phrases. The phonetic analysis revealed both spatial and temporal atypicalities in the realisations of the sounds in question. Speaking rate was slow relative to his peer's speech and attempts to increase speech rate resulted in dysfluent speech. Given the phonological system of Amharic, however, the atypical segmental realisations, while reducing both the intelligibility and acceptability of the participant's speech production, did not result in loss of phonological contrasts.

  5. Acoustic Correlates of Emphatic Stress in Central Catalan

    ERIC Educational Resources Information Center

    Nadeu, Marianna; Hualde, Jose Ignacio

    2012-01-01

    A common feature of public speech in Catalan is the placement of prominence on lexically unstressed syllables ("emphatic stress"). This paper presents an acoustic study of radio speech data. Instances of emphatic stress were perceptually identified. Within-word comparison between vowels with emphatic stress and vowels with primary lexical stress…

  6. On-Line Acoustic and Semantic Interpretation of Talker Information

    ERIC Educational Resources Information Center

    Creel, Sarah C.; Tumlin, Melanie A.

    2011-01-01

    Recent work demonstrates that listeners utilize talker-specific information in the speech signal to inform real-time language processing. However, there are multiple representational levels at which this may take place. Listeners might use acoustic cues in the speech signal to access the talker's identity and information about what they tend to…

  7. Acoustic and Perceptual Characteristics of Vowels Produced during Simultaneous Communication

    ERIC Educational Resources Information Center

    Schiavetti, Nicholas; Metz, Dale Evan; Whitehead, Robert L.; Brown, Shannon; Borges, Janie; Rivera, Sara; Schultz, Christine

    2004-01-01

    This study investigated the acoustical and perceptual characteristics of vowels in speech produced during simultaneous communication (SC). Twelve normal hearing, experienced sign language users were recorded under SC and speech alone (SA) conditions speaking a set of sentences containing monosyllabic words designed for measurement of vowel…

  8. Voice Acoustical Measurement of the Severity of Major Depression

    ERIC Educational Resources Information Center

    Cannizzaro, Michael; Harel, Brian; Reilly, Nicole; Chappell, Phillip; Snyder, Peter J.

    2004-01-01

    A number of empirical studies have documented the relationship between quantifiable and objective acoustical measures of voice and speech, and clinical subjective ratings of severity of Major Depression. To further explore this relationship, speech samples were extracted from videotape recordings of structured interviews made during the…

  9. Cued Speech for Enhancing Speech Perception and First Language Development of Children With Cochlear Implants

    PubMed Central

    Leybaert, Jacqueline; LaSasso, Carol J.

    2010-01-01

    Nearly 300 million people worldwide have moderate to profound hearing loss. Hearing impairment, if not adequately managed, has strong socioeconomic and affective impact on individuals. Cochlear implants have become the most effective vehicle for helping profoundly deaf children and adults to understand spoken language, to be sensitive to environmental sounds, and, to some extent, to listen to music. The auditory information delivered by the cochlear implant remains non-optimal for speech perception because it delivers a spectrally degraded signal and lacks some of the fine temporal acoustic structure. In this article, we discuss research revealing the multimodal nature of speech perception in normally-hearing individuals, with important inter-subject variability in the weighting of auditory or visual information. We also discuss how audio-visual training, via Cued Speech, can improve speech perception in cochlear implantees, particularly in noisy contexts. Cued Speech is a system that makes use of visual information from speechreading combined with hand shapes positioned in different places around the face in order to deliver completely unambiguous information about the syllables and the phonemes of spoken language. We support our view that exposure to Cued Speech before or after the implantation could be important in the aural rehabilitation process of cochlear implantees. We describe five lines of research that are converging to support the view that Cued Speech can enhance speech perception in individuals with cochlear implants. PMID:20724357

  10. Distinct developmental profiles in typical speech acquisition.

    PubMed

    Vick, Jennell C; Campbell, Thomas F; Shriberg, Lawrence D; Green, Jordan R; Abdi, Hervé; Rusiewicz, Heather Leavy; Venkatesh, Lakshmi; Moore, Christopher A

    2012-05-01

    Three- to five-year-old children produce speech that is characterized by a high level of variability within and across individuals. This variability, which is manifest in speech movements, acoustics, and overt behaviors, can be input to subgroup discovery methods to identify cohesive subgroups of speakers or to reveal distinct developmental pathways or profiles. This investigation characterized three distinct groups of typically developing children and provided normative benchmarks for speech development. These speech development profiles, identified among 63 typically developing preschool-aged speakers (ages 36-59 mo), were derived from the children's performance on multiple measures. These profiles were obtained by submitting to a k-means cluster analysis of 72 measures that composed three levels of speech analysis: behavioral (e.g., task accuracy, percentage of consonants correct), acoustic (e.g., syllable duration, syllable stress), and kinematic (e.g., variability of movements of the upper lip, lower lip, and jaw). Two of the discovered group profiles were distinguished by measures of variability but not by phonemic accuracy; the third group of children was characterized by their relatively low phonemic accuracy but not by an increase in measures of variability. Analyses revealed that of the original 72 measures, 8 key measures were sufficient to best distinguish the 3 profile groups.

  11. Automatic Speech Recognition Based on Electromyographic Biosignals

    NASA Astrophysics Data System (ADS)

    Jou, Szu-Chen Stan; Schultz, Tanja

    This paper presents our studies of automatic speech recognition based on electromyographic biosignals captured from the articulatory muscles in the face using surface electrodes. We develop a phone-based speech recognizer and describe how the performance of this recognizer improves by carefully designing and tailoring the extraction of relevant speech feature toward electromyographic signals. Our experimental design includes the collection of audibly spoken speech simultaneously recorded as acoustic data using a close-speaking microphone and as electromyographic signals using electrodes. Our experiments indicate that electromyographic signals precede the acoustic signal by about 0.05-0.06 seconds. Furthermore, we introduce articulatory feature classifiers, which had recently shown to improved classical speech recognition significantly. We describe that the classification accuracy of articulatory features clearly benefits from the tailored feature extraction. Finally, these classifiers are integrated into the overall decoding framework applying a stream architecture. Our final system achieves a word error rate of 29.9% on a 100-word recognition task.

  12. Assessing the acoustical climate of underground stations.

    PubMed

    Nowicka, Elzbieta

    2007-01-01

    Designing a proper acoustical environment--indispensable to speech recognition--in long enclosures is difficult. Although there is some literature on the acoustical conditions in underground stations, there is still little information about methods that make estimation of correct reverberation conditions possible. This paper discusses the assessment of the reverberation conditions of underground stations. A comparison of the measurements of reverberation time in Warsaw's underground stations with calculated data proves there are divergences between measured and calculated early decay time values, especially for long source-receiver distances. Rapid speech transmission index values for measured stations are also presented.

  13. "Perception of the speech code" revisited: Speech is alphabetic after all.

    PubMed

    Fowler, Carol A; Shankweiler, Donald; Studdert-Kennedy, Michael

    2016-03-01

    We revisit an article, "Perception of the Speech Code" (PSC), published in this journal 50 years ago (Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967) and address one of its legacies concerning the status of phonetic segments, which persists in theories of speech today. In the perspective of PSC, segments both exist (in language as known) and do not exist (in articulation or the acoustic speech signal). Findings interpreted as showing that speech is not a sound alphabet, but, rather, phonemes are encoded in the signal, coupled with findings that listeners perceive articulation, led to the motor theory of speech perception, a highly controversial legacy of PSC. However, a second legacy, the paradoxical perspective on segments has been mostly unquestioned. We remove the paradox by offering an alternative supported by converging evidence that segments exist in language both as known and as used. We support the existence of segments in both language knowledge and in production by showing that phonetic segments are articulatory and dynamic and that coarticulation does not eliminate them. We show that segments leave an acoustic signature that listeners can track. This suggests that speech is well-adapted to public communication in facilitating, not creating a barrier to, exchange of language forms.

  14. Speech-to-Speech Relay Service

    MedlinePlus

    ... Speech-to-Speech (STS) is one form of Telecommunications Relay Service (TRS). TRS is a service that ... to STS, go to www. fcc. gov/ guides/ telecommunications- relay- service- trs. Filing a Complaint If you ...

  15. Speech research

    NASA Astrophysics Data System (ADS)

    1992-06-01

    Phonology is traditionally seen as the discipline that concerns itself with the building blocks of linguistic messages. It is the study of the structure of sound inventories of languages and of the participation of sounds in rules or processes. Phonetics, in contrast, concerns speech sounds as produced and perceived. Two extreme positions on the relationship between phonological messages and phonetic realizations are represented in the literature. One holds that the primary home for linguistic symbols, including phonological ones, is the human mind, itself housed in the human brain. The second holds that their primary home is the human vocal tract.

  16. Acoustic confinement in superlattice cavities

    NASA Astrophysics Data System (ADS)

    Garcia-Sanchez, Daniel; Déleglise, Samuel; Thomas, Jean-Louis; Atkinson, Paola; Lagoin, Camille; Perrin, Bernard

    2016-09-01

    The large coupling rate between the acoustic and optical fields confined in GaAs/AlAs superlattice cavities makes them appealing systems for cavity optomechanics. We have developed a mathematical model based on the scattering matrix that allows the acoustic guided modes to be predicted in nano and micropillar superlattice cavities. We demonstrate here that the reflection at the surface boundary considerably modifies the acoustic quality factor and leads to significant confinement at the micropillar center. Our mathematical model also predicts unprecedented acoustic Fano resonances on nanopillars featuring small mode volumes and very high mechanical quality factors, making them attractive systems for optomechanical applications.

  17. Abnormal cortical processing of the syllable rate of speech in poor readers

    PubMed Central

    Abrams, Daniel A.; Nicol, Trent; Zecker, Steven; Kraus, Nina

    2009-01-01

    Children with reading impairments have long been associated with impaired perception for rapidly presented acoustic stimuli and recently have shown deficits for slower features. It is not known whether impairments for low-frequency acoustic features negatively impact processing of speech in reading impaired individuals. Here we provide neurophysiological evidence that poor readers have impaired representation of the speech envelope, the acoustical cue that provides syllable pattern information in speech. We measured cortical-evoked potentials in response to sentence stimuli and found that good readers indicated consistent right-hemisphere dominance in auditory cortex for all measures of speech envelope representation, including the precision, timing and magnitude of cortical responses. Poor readers showed abnormal patterns of cerebral asymmetry for all measures of speech envelope representation. Moreover, cortical measures of speech envelope representation predicted up to 44% of the variability in standardized reading scores and 50% in measures of phonological processing across a wide range of abilities. Findings strongly support a relationship between acoustic-level processing and higher-level language abilities, and are the first to link reading ability with cortical processing of low-frequency acoustic features in the speech signal. Results also support the hypothesis that asymmetric routing between cerebral hemispheres represents an important mechanism for temporal encoding in the human auditory system, and the need for an expansion of the temporal processing hypothesis for reading-disabilities to encompass impairments for a wider range of speech features than previously acknowledged. PMID:19535580

  18. Voice Quality Modelling for Expressive Speech Synthesis

    PubMed Central

    Socoró, Joan Claudi

    2014-01-01

    This paper presents the perceptual experiments that were carried out in order to validate the methodology of transforming expressive speech styles using voice quality (VoQ) parameters modelling, along with the well-known prosody (F0, duration, and energy), from a neutral style into a number of expressive ones. The main goal was to validate the usefulness of VoQ in the enhancement of expressive synthetic speech in terms of speech quality and style identification. A harmonic plus noise model (HNM) was used to modify VoQ and prosodic parameters that were extracted from an expressive speech corpus. Perception test results indicated the improvement of obtained expressive speech styles using VoQ modelling along with prosodic characteristics. PMID:24587738

  19. Vocal tract resonances in speech, singing, and playing musical instruments

    PubMed Central

    Wolfe, Joe; Garnier, Maëva; Smith, John

    2009-01-01

    In both the voice and musical wind instruments, a valve (vocal folds, lips, or reed) lies between an upstream and downstream duct: trachea and vocal tract for the voice; vocal tract and bore for the instrument. Examining the structural similarities and functional differences gives insight into their operation and the duct-valve interactions. In speech and singing, vocal tract resonances usually determine the spectral envelope and usually have a smaller influence on the operating frequency. The resonances are important not only for the phonemic information they produce, but also because of their contribution to voice timbre, loudness, and efficiency. The role of the tract resonances is usually different in brass and some woodwind instruments, where they modify and to some extent compete or collaborate with resonances of the instrument to control the vibration of a reed or the player’s lips, and∕or the spectrum of air flow into the instrument. We give a brief overview of oscillator mechanisms and vocal tract acoustics. We discuss recent and current research on how the acoustical resonances of the vocal tract are involved in singing and the playing of musical wind instruments. Finally, we compare techniques used in determining tract resonances and suggest some future developments. PMID:19649157

  20. Measuring phonetic convergence in speech production

    PubMed Central

    Pardo, Jennifer S.

    2013-01-01

    Phonetic convergence is defined as an increase in the similarity of acoustic-phonetic form between talkers. Previous research has demonstrated phonetic convergence both when a talker listens passively to speech and while talkers engage in social interaction. Much of this research has focused on a diverse array of acoustic-phonetic attributes, with fewer studies incorporating perceptual measures of phonetic convergence. The current paper reviews research on phonetic convergence in both non-interactive and conversational settings, and attempts to consolidate the diverse array of findings by proposing a paradigm that models perceptual and acoustic measures together. By modeling acoustic measures as predictors of perceived phonetic convergence, this paradigm has the potential to reconcile some of the diverse and inconsistent findings currently reported in the literature. PMID:23986738

  1. Tracing the emergence of categorical speech perception in the human auditory system.

    PubMed

    Bidelman, Gavin M; Moreno, Sylvain; Alain, Claude

    2013-10-01

    Speech perception requires the effortless mapping from smooth, seemingly continuous changes in sound features into discrete perceptual units, a conversion exemplified in the phenomenon of categorical perception. Explaining how/when the human brain performs this acoustic-phonetic transformation remains an elusive problem in current models and theories of speech perception. In previous attempts to decipher the neural basis of speech perception, it is often unclear whether the alleged brain correlates reflect an underlying percept or merely changes in neural activity that covary with parameters of the stimulus. Here, we recorded neuroelectric activity generated at both cortical and subcortical levels of the auditory pathway elicited by a speech vowel continuum whose percept varied categorically from /u/ to /a/. This integrative approach allows us to characterize how various auditory structures code, transform, and ultimately render the perception of speech material as well as dissociate brain responses reflecting changes in stimulus acoustics from those that index true internalized percepts. We find that activity from the brainstem mirrors properties of the speech waveform with remarkable fidelity, reflecting progressive changes in speech acoustics but not the discrete phonetic classes reported behaviorally. In comparison, patterns of late cortical evoked activity contain information reflecting distinct perceptual categories and predict the abstract phonetic speech boundaries heard by listeners. Our findings demonstrate a critical transformation in neural speech representations between brainstem and early auditory cortex analogous to an acoustic-phonetic mapping necessary to generate categorical speech percepts. Analytic modeling demonstrates that a simple nonlinearity accounts for the transformation between early (subcortical) brain activity and subsequent cortical/behavioral responses to speech (>150-200 ms) thereby describing a plausible mechanism by which the

  2. Changes in normal speech after fatiguing the tongue.

    PubMed

    Solomon, N P

    2000-12-01

    Detrimental effects of tongue fatigue on speech have been assumed to exist based on neuromotor speech disorders. However, to address whether fatigue is a contributing cause to impaired speech requires an experimental protocol with an uncomplicated population. This study induced tongue fatigue in eight neurologically normal persons and examined changes in speech perceptually and acoustically. The fatigue task consisted of repeated cycles of 6 s of sustained maximum voluntary contraction and 4 s of rest until 50% of maximum strength could not be achieved for three consecutive cycles. Participants then produced speech that was weighted heavily with lingual-palatal consonants. Perceptual analyses of the speech revealed a statistically significant deleterious effect of induced tongue fatigue on speech precision and an incomplete reversal of this effect after a recovery period. Acoustically, the first and third spectral moments (mean and skewness) of the spectral energy for /see text/, /see text/, and /see text/ differed significantly after fatigue but in directions opposite to a priori predictions. Tendencies were found for decreased stop-closure duration and increased voice onset time for /see text/ after fatigue. Supplemental analyses revealed decreased second formant (F2) frequency for /see text/ and /see text/ and flattened F2 transition for the diphthong /see text/ after fatigue. These results indicate disruption of tongue positioning and transitioning for lingual-palatal consonants during speech after prolonged strenuous tongue exercises.

  3. Acoustic Neuroma

    MedlinePlus

    An acoustic neuroma is a benign tumor that develops on the nerve that connects the ear to the brain. ... can press against the brain, becoming life-threatening. Acoustic neuroma can be difficult to diagnose, because the ...

  4. Acheiving speech intelligibility at Paddington Station, London, UK

    NASA Astrophysics Data System (ADS)

    Goddard, Helen M.

    2002-11-01

    Paddington Station in London, UK is a large rail terminus for long distance electric and diesel powered trains. This magnificent train shed has four arched spans and is one of the remaining structural testaments to the architect Brunel. Given the current British and European legislative requirements for intelligible speech in public buildings AMS Acoustics were engaged to design an electroacoustic solution. In this paper we will outline how the significant problems of lively natural acoustics, the high operational noise levels and the strict aesthetic constraints were addressed. The resultant design is radical, using the most recent dsp controlled line array loudspeakers. In the paper we detail the acoustic modeling undertaken to predict both even direct sound pressure level coverage and STI. Further it presents the speech intelligibility measured upon handover of the new system. The design has proved to be successful and given the nature of the space, outstanding speech intelligibility is achieved.

  5. Acoustic Characteristics of Simulated Respiratory-Induced Vocal Tremor

    ERIC Educational Resources Information Center

    Lester, Rosemary A.; Story, Brad H.

    2013-01-01

    Purpose: The purpose of this study was to investigate the relation of respiratory forced oscillation to the acoustic characteristics of vocal tremor. Method: Acoustical analyses were performed to determine the characteristics of the intensity and fundamental frequency (F[subscript 0]) for speech samples obtained by Farinella, Hixon, Hoit, Story,…

  6. Comparison of a short-time speech-based intelligibility metric to the speech transmission index and intelligibility data.

    PubMed

    Payton, Karen L; Shrestha, Mona

    2013-11-01

    Several algorithms have been shown to generate a metric corresponding to the Speech Transmission Index (STI) using speech as a probe stimulus [e.g., Goldsworthy and Greenberg, J. Acoust. Soc. Am. 116, 3679-3689 (2004)]. The time-domain approaches work well on long speech segments and have the added potential to be used for short-time analysis. This study investigates the performance of the Envelope Regression (ER) time-domain STI method as a function of window length, in acoustically degraded environments with multiple talkers and speaking styles. The ER method is compared with a short-time Theoretical STI, derived from octave-band signal-to-noise ratios and reverberation times. For windows as short as 0.3 s, the ER method tracks short-time Theoretical STI changes in stationary speech-shaped noise, fluctuating restaurant babble and stationary noise plus reverberation. The metric is also compared to intelligibility scores on conversational speech and speech articulated clearly but at normal speaking rates (Clear/Norm) in stationary noise. Correlation between the metric and intelligibility scores is high and, consistent with the subject scores, the metrics are higher for Clear/Norm speech than for conversational speech and higher for the first word in a sentence than for the last word.

  7. Comparison of a short-time speech-based intelligibility metric to the speech transmission index and intelligibility dataa

    PubMed Central

    Payton, Karen L.; Shrestha, Mona

    2013-01-01

    Several algorithms have been shown to generate a metric corresponding to the Speech Transmission Index (STI) using speech as a probe stimulus [e.g., Goldsworthy and Greenberg, J. Acoust. Soc. Am. 116, 3679–3689 (2004)]. The time-domain approaches work well on long speech segments and have the added potential to be used for short-time analysis. This study investigates the performance of the Envelope Regression (ER) time-domain STI method as a function of window length, in acoustically degraded environments with multiple talkers and speaking styles. The ER method is compared with a short-time Theoretical STI, derived from octave-band signal-to-noise ratios and reverberation times. For windows as short as 0.3 s, the ER method tracks short-time Theoretical STI changes in stationary speech-shaped noise, fluctuating restaurant babble and stationary noise plus reverberation. The metric is also compared to intelligibility scores on conversational speech and speech articulated clearly but at normal speaking rates (Clear/Norm) in stationary noise. Correlation between the metric and intelligibility scores is high and, consistent with the subject scores, the metrics are higher for Clear/Norm speech than for conversational speech and higher for the first word in a sentence than for the last word. PMID:24180791

  8. Acoustic Seaglider

    DTIC Science & Technology

    2008-03-07

    a national naval responsibility. Acoustic sensors on mobile, autonomous platforms will enable basic research topics on temporal and spatial...problem and acoustic navigation and communications within the context of distributed autonomous persistent undersea surveillance sensor networks...Acoustic sensors on mobile, autonomous platforms will enable basic research topics on temporal and spatial coherence and the description of ambient

  9. Acoustic seal

    NASA Technical Reports Server (NTRS)

    Steinetz, Bruce M. (Inventor)

    2006-01-01

    The invention relates to a sealing device having an acoustic resonator. The acoustic resonator is adapted to create acoustic waveforms to generate a sealing pressure barrier blocking fluid flow from a high pressure area to a lower pressure area. The sealing device permits noncontacting sealing operation. The sealing device may include a resonant-macrosonic-synthesis (RMS) resonator.

  10. Acoustic Seal

    NASA Technical Reports Server (NTRS)

    Steinetz, Bruce M. (Inventor)

    2006-01-01

    The invention relates to a sealing device having an acoustic resonator. The acoustic resonator is adapted to create acoustic waveforms to generate a sealing pressure barrier blocking fluid flow from a high pressure area to a lower pressure area. The sealing device permits noncontacting sealing operation. The sealing device may include a resonant-macrosonic-synthesis (RMS) resonator.

  11. Predicting the intelligibility of vocoded speech

    PubMed Central

    Chen, Fei; Loizou, Philipos C.

    2010-01-01

    Objectives The purpose of this study is to evaluate the performance of a number of speech intelligibility indices in terms of predicting the intelligibility of vocoded speech. Design Noise-corrupted sentences were vocoded in a total of 80 conditions, involving three different SNR levels (-5, 0 and 5 dB) and two types of maskers (steady-state noise and two-talker). Tone-vocoder simulations were used as well as simulations of combined electric-acoustic stimulation (EAS). The vocoded sentences were presented to normal-hearing listeners for identification, and the resulting intelligibility scores were used to assess the correlation of various speech intelligibility measures. These included measures designed to assess speech intelligibility, including the speech-transmission index (STI) and articulation index (AI) based measures, as well as distortions in hearing aids (e.g., coherence-based measures). These measures employed primarily either the temporal-envelope or the spectral-envelope information in the prediction model. The underlying hypothesis in the present study is that measures that assess temporal envelope distortions, such as those based on the speech-transmission index, should correlate highly with the intelligibility of vocoded speech. This is based on the fact that vocoder simulations preserve primarily envelope information, similar to the processing implemented in current cochlear implant speech processors. Similarly, it is hypothesized that measures such as the coherence-based index that assess the distortions present in the spectral envelope could also be used to model the intelligibility of vocoded speech. Results Of all the intelligibility measures considered, the coherence-based and the STI-based measures performed the best. High correlations (r=0.9-0.96) were maintained with the coherence-based measures in all noisy conditions. The highest correlation obtained with the STI-based measure was 0.92, and that was obtained when high modulation rates (100

  12. The Effects of Simulated Stuttering and Prolonged Speech on the Neural Activation Patterns of Stuttering and Nonstuttering Adults

    ERIC Educational Resources Information Center

    De Nil, Luc F.; Beal, Deryk S.; Lafaille, Sophie J.; Kroll, Robert M.; Crawley, Adrian P.; Gracco, Vincent L.

    2008-01-01

    Functional magnetic resonance imaging was used to investigate the neural correlates of passive listening, habitual speech and two modified speech patterns (simulated stuttering and prolonged speech) in stuttering and nonstuttering adults. Within-group comparisons revealed increased right hemisphere biased activation of speech-related regions…

  13. Chinese speech intelligibility and its relationship with the speech transmission index for children in elementary school classrooms.

    PubMed

    Peng, Jianxin; Yan, Nanjie; Wang, Dan

    2015-01-01

    The present study investigated Chinese speech intelligibility in 28 classrooms from nine different elementary schools in Guangzhou, China. The subjective Chinese speech intelligibility in the classrooms was evaluated with children in grades 2, 4, and 6 (7 to 12 years old). Acoustical measurements were also performed in these classrooms. Subjective Chinese speech intelligibility scores and objective speech intelligibility parameters, such as speech transmission index (STI), were obtained at each listening position for all tests. The relationship between subjective Chinese speech intelligibility scores and STI was revealed and analyzed. The effects of age on Chinese speech intelligibility scores were compared. Results indicate high correlations between subjective Chinese speech intelligibility scores and STI for grades 2, 4, and 6 children. Chinese speech intelligibility scores increase with increase of age under the same STI condition. The differences in scores among different age groups decrease as STI increases. To achieve 95% Chinese speech intelligibility scores, the STIs required for grades 2, 4, and 6 children are 0.75, 0.69, and 0.63, respectively.

  14. Analysis of Speech Disorders in Acute Pseudobulbar Palsy: a Longitudinal Study of a Patient with Lingual Paralysis.

    ERIC Educational Resources Information Center

    Leroy-Malherbe, V.; Chevrie-Muller, C.; Rigoard, M. T.; Arabia, C.

    1998-01-01

    This case report describes the case of a 52-year-old man with bilateral central lingual paralysis following a myocardial infarction. Analysis of speech recordings 15 days and 18 months after the attack were acoustically analyzed. The case demonstrates the usefulness of acoustic analysis to detect slight acoustic differences. (DB)

  15. 78 FR 49717 - Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-08-15

    ... COMMISSION 47 CFR Part 64 Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services; Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and Speech Disabilities AGENCY: Federal Communications Commission. ACTION: Proposed rule....

  16. Development of the Cantonese speech intelligibility index.

    PubMed

    Wong, Lena L N; Ho, Amy H S; Chua, Elizabeth W W; Soli, Sigfrid D

    2007-04-01

    A Speech Intelligibility Index (SII) for the sentences in the Cantonese version of the Hearing In Noise Test (CHINT) was derived using conventional procedures described previously in studies such as Studebaker and Sherbecoe [J. Speech Hear. Res. 34, 427-438 (1991)]. Two studies were conducted to determine the signal-to-noise ratios and high- and low-pass filtering conditions that should be used and to measure speech intelligibility in these conditions. Normal hearing subjects listened to the sentences presented in speech-spectrum shaped noise. Compared to other English speech assessment materials such as the English Hearing In Noise Test [Nilsson et al., J. Acoust. Soc. Am. 95, 1085-1099 (1994)], the frequency importance function of the CHINT suggests that low-frequency information is more important for Cantonese speech understanding. The difference in ,frequency importance weight in Chinese, compared to English, was attributed to the redundancy of test material, tonal nature of the Cantonese language, or a combination of these factors.

  17. Speech rate and rhythm in Parkinson's disease.

    PubMed

    Skodda, Sabine; Schlegel, Uwe

    2008-05-15

    Articulatory rate and pause time in a standardized reading task in Parkinson's disease (PD) patients in correlation to disease duration and severity as compared to healthy controls were analyzed. In 121 PD patients and 70 healthy controls, an acoustical analysis was performed on the first and last sentence of a standardized 170-syllabic text, using a commercial audio software. Articulatory rate and speech to pause ratios were calculated by measuring the length of each syllable and each pause both at the end of words and within polysyllabic words. No significant difference in overall articulatory rate was found between PD patients and controls. Both groups showed an accelerated speech rate in the last sentence compared to the first; however, PD patients had a higher speech acceleration than did controls. PD patients exhibited a significantly reduced percental pause duration in relation to total speech time in the first sentence and a reduced percental pause time within polysyllabic words. PD patients made significantly less but longer pauses at the end of words and less pauses within polysyllabic words. UPDRS III showed an inverse relation to number and rate of intraword pauses, and disease duration was negatively correlated with articulatory rate. The characteristics of parkinsonian speech feature was not only a stronger acceleration of articulation rate in the course of speaking but also a significant reduction of the total numbers of pauses, indicating an impaired speech rhythm and timing organization.

  18. Speech therapy with obturator.

    PubMed

    Shyammohan, A; Sreenivasulu, D

    2010-12-01

    Rehabilitation of speech is tantamount to closure of defect in cases with velopharyngeal insufficiency. Often the importance of speech therapy is sidelined during the fabrication of obturators. Usually the speech part is taken up only at a later stage and is relegated entirely to a speech therapist without the active involvement of the prosthodontist. The article suggests a protocol for speech therapy in such cases to be done in unison with a prosthodontist.

  19. Psychoacoustic cues to emotion in speech prosody and music.

    PubMed

    Coutinho, Eduardo; Dibben, Nicola

    2013-01-01

    There is strong evidence of shared acoustic profiles common to the expression of emotions in music and speech, yet relatively limited understanding of the specific psychoacoustic features involved. This study combined a controlled experiment and computational modelling to investigate the perceptual codes associated with the expression of emotion in the acoustic domain. The empirical stage of the study provided continuous human ratings of emotions perceived in excerpts of film music and natural speech samples. The computational stage created a computer model that retrieves the relevant information from the acoustic stimuli and makes predictions about the emotional expressiveness of speech and music close to the responses of human subjects. We show that a significant part of the listeners' second-by-second reported emotions to music and speech prosody can be predicted from a set of seven psychoacoustic features: loudness, tempo/speech rate, melody/prosody contour, spectral centroid, spectral flux, sharpness, and roughness. The implications of these results are discussed in the context of cross-modal similarities in the communication of emotion in the acoustic domain.

  20. Effects of syntactic expectations on speech segmentation.

    PubMed

    Mattys, Sven L; Melhorn, James F; White, Laurence

    2007-08-01

    Although the effect of acoustic cues on speech segmentation has been extensively investigated, the role of higher order information (e.g., syntax) has received less attention. Here, the authors examined whether syntactic expectations based on subject-verb agreement have an effect on segmentation and whether they do so despite conflicting acoustic cues. Although participants detected target words faster in phrases containing adequate acoustic cues ("spins" in take spins and "pins" in takes pins), this acoustic effect was suppressed when the phrases were appended to a plural context (those women take spins/*takes pins [with the asterisk indicating a syntactically unacceptable parse]). The syntactically congruent target ("spins") was detected faster regardless of the acoustics. However, a singular context (that woman *take spins/takes pins) had no effect on segmentation, and the results resembled those of the neutral phrases. Subsequent experiments showed that the discrepancy was due to the relative time course of syntactic expectations and acoustics cues. Taken together, the data suggest that syntactic knowledge can facilitate segmentation but that its effect is substantially attenuated if conflicting acoustic cues are encountered before full realization of the syntactic constraint.

  1. Millimeter Wave Radar for detecting the speech signal applications

    NASA Astrophysics Data System (ADS)

    Li, Zong-Wen

    1996-12-01

    MilliMeter Wave (MMW) Doppler Radar with grating structures for the applications of detecting speech signals has been discovered in our laboratory. The operating principle of detection the acoustic wave signals based on the Wave Propagation Theory and Wave Equations of The ElectroMagnetic Wave (EMW) and Acoustic Wave (AW) propagating, scattering, reflecting and interacting has been investigated. The experimental and observation results have been provided to verify that MMW CW 40GHz dielectric integrated radar can detect and identify out exactly the existential speech signals in free space from a person speaking. The received sound signal have been reproduced by the DSP and the reproducer.

  2. Visual-tactile integration in speech perception: Evidence for modality neutral speech primitives.

    PubMed

    Bicevskis, Katie; Derrick, Donald; Gick, Bryan

    2016-11-01

    Audio-visual [McGurk and MacDonald (1976). Nature 264, 746-748] and audio-tactile [Gick and Derrick (2009). Nature 462(7272), 502-504] speech stimuli enhance speech perception over audio stimuli alone. In addition, multimodal speech stimuli form an asymmetric window of integration that is consistent with the relative speeds of the various signals [Munhall, Gribble, Sacco, and Ward (1996). Percept. Psychophys. 58(3), 351-362; Gick, Ikegami, and Derrick (2010). J. Acoust. Soc. Am. 128(5), EL342-EL346]. In this experiment, participants were presented video of faces producing /pa/ and /ba/ syllables, both alone and with air puffs occurring synchronously and at different timings up to 300 ms before and after the stop release. Perceivers were asked to identify the syllable they perceived, and were more likely to respond that they perceived /pa/ when air puffs were present, with asymmetrical preference for puffs following the video signal-consistent with the relative speeds of visual and air puff signals. The results demonstrate that visual-tactile integration of speech perception occurs much as it does with audio-visual and audio-tactile stimuli. This finding contributes to the understanding of multimodal speech perception, lending support to the idea that speech is not perceived as an audio signal that is supplemented by information from other modes, but rather that primitives of speech perception are, in principle, modality neutral.

  3. Low Bandwidth Vocoding using EM Sensor and Acoustic Signal Processing

    SciTech Connect

    Ng, L C; Holzrichter, J F; Larson, P E

    2001-10-25

    Low-power EM radar-like sensors have made it possible to measure properties of the human speech production system in real-time, without acoustic interference [1]. By combining these data with the corresponding acoustic signal, we've demonstrated an almost 10-fold bandwidth reduction in speech compression, compared to a standard 2.4 kbps LPC10 protocol used in the STU-III (Secure Terminal Unit, third generation) telephone. This paper describes a potential EM sensor/acoustic based vocoder implementation.

  4. Speaker verification using combined acoustic and EM sensor signal processing

    SciTech Connect

    Ng, L C; Gable, T J; Holzrichter, J F

    2000-11-10

    Low Power EM radar-like sensors have made it possible to measure properties of the human speech production system in real-time, without acoustic interference. This greatly enhances the quality and quantity of information for many speech related applications. See Holzrichter, Burnett, Ng, and Lea, J. Acoustic. SOC. Am . 103 ( 1) 622 (1998). By combining the Glottal-EM-Sensor (GEMS) with the Acoustic-signals, we've demonstrated an almost 10 fold reduction in error rates from a speaker verification system experiment under a moderate noisy environment (-10dB).

  5. Integration of letters and speech sounds in the human brain.

    PubMed

    van Atteveldt, Nienke; Formisano, Elia; Goebel, Rainer; Blomert, Leo

    2004-07-22

    Most people acquire literacy skills with remarkable ease, even though the human brain is not evolutionarily adapted to this relatively new cultural phenomenon. Associations between letters and speech sounds form the basis of reading in alphabetic scripts. We investigated the functional neuroanatomy of the integration of letters and speech sounds using functional magnetic resonance imaging (fMRI). Letters and speech sounds were presented unimodally and bimodally in congruent or incongruent combinations. Analysis of single-subject data and group data aligned on the basis of individual cortical anatomy revealed that letters and speech sounds are integrated in heteromodal superior temporal cortex. Interestingly, responses to speech sounds in a modality-specific region of the early auditory cortex were modified by simultaneously presented letters. These results suggest that efficient processing of culturally defined associations between letters and speech sounds relies on neural mechanisms similar to those naturally evolved for integrating audiovisual speech.

  6. A neural network based speech recognition system

    NASA Astrophysics Data System (ADS)

    Carroll, Edward J.; Coleman, Norman P., Jr.; Reddy, G. N.

    1990-02-01

    An overview is presented of the development of a neural network based speech recognition system. The two primary tasks involved were the development of a time invariant speech encoder and a pattern recognizer or detector. The speech encoder uses amplitude normalization and a Fast Fourier Transform to eliminate amplitude and frequency shifts of acoustic clues. The detector consists of a back-propagation network which accepts data from the encoder and identifies individual words. This use of neural networks offers two advantages over conventional algorithmic detectors: the detection time is no more than a few network time constants, and its recognition speed is independent of the number of the words in the vocabulary. The completed system has functioned as expected with high tolerance to input variation and with error rates comparable to a commercial system when used in a noisy environment.

  7. Speech and non‐speech processing in children with phonological disorders: an electrophysiological study

    PubMed Central

    Gonçalves, Isabela Crivellaro; Wertzner, Haydée Fiszbein; Samelli, Alessandra Giannella; Matas, Carla Gentile

    2011-01-01

    OBJECTIVE: To determine whether neurophysiological auditory brainstem responses to clicks and repeated speech stimuli differ between typically developing children and children with phonological disorders. INTRODUCTION: Phonological disorders are language impairments resulting from inadequate use of adult phonological language rules and are among the most common speech and language disorders in children (prevalence: 8 ‐ 9%). Our hypothesis is that children with phonological disorders have basic differences in the way that their brains encode acoustic signals at brainstem level when compared to normal counterparts. METHODS: We recorded click and speech evoked auditory brainstem responses in 18 typically developing children (control group) and in 18 children who were clinically diagnosed with phonological disorders (research group). The age range of the children was from 7‐11 years. RESULTS: The research group exhibited significantly longer latency responses to click stimuli (waves I, III and V) and speech stimuli (waves V and A) when compared to the control group. DISCUSSION: These results suggest that the abnormal encoding of speech sounds may be a biological marker of phonological disorders. However, these results cannot define the biological origins of phonological problems. We also observed that speech‐evoked auditory brainstem responses had a higher specificity/sensitivity for identifying phonological disorders than click‐evoked auditory brainstem responses. CONCLUSIONS: Early stages of the auditory pathway processing of an acoustic stimulus are not similar in typically developing children and those with phonological disorders. These findings suggest that there are brainstem auditory pathway abnormalities in children with phonological disorders. PMID:21484049

  8. Classroom Acoustics: The Problem, Impact, and Solution.

    ERIC Educational Resources Information Center

    Berg, Frederick S.; And Others

    1996-01-01

    This article describes aspects of classroom acoustics that interfere with the ability of listeners to understand speech. It considers impacts on students and teachers and offers four possible solutions: noise control, signal control without amplification, individual amplification systems, and sound field amplification systems. (Author/DB)

  9. Experiments on the Acoustics of Whistling.

    ERIC Educational Resources Information Center

    Shadle, Christine H.

    1983-01-01

    The acoustics of speech production allows the prediction of resonances for a given vocal tract configuration. Combining these predictions with aerodynamic theory developed for mechanical whistles makes theories about human whistling more complete. Several experiments involving human whistling are reported which support the theory and indicate new…

  10. Acoustics, Noise, and Buildings. Revised Edition 1969.

    ERIC Educational Resources Information Center

    Parkin, P. H.; Humphreys, H. R.

    The fundamental physical concepts needed in any appreciation of acoustical problems are discussed by a scientist and an architect. The major areas of interest are--(1) the nature of sound, (2) the behavior of sound in rooms, (3) the design of rooms for speech, (4) the design of rooms for music, (5) the design of studios, (6) the design of high…

  11. Behavioral and Electrophysiological Evidence for Early and Automatic Detection of Phonological Equivalence in Variable Speech Inputs

    ERIC Educational Resources Information Center

    Kharlamov, Viktor; Campbell, Kenneth; Kazanina, Nina

    2011-01-01

    Speech sounds are not always perceived in accordance with their acoustic-phonetic content. For example, an early and automatic process of perceptual repair, which ensures conformity of speech inputs to the listener's native language phonology, applies to individual input segments that do not exist in the native inventory or to sound sequences that…

  12. The Effect of Technology and Testing Environment on Speech Perception Using Telehealth with Cochlear Implant Recipients

    ERIC Educational Resources Information Center

    Goehring, Jenny L.; Hughes, Michelle L.; Baudhuin, Jacquelyn L.; Valente, Daniel L.; McCreery, Ryan W.; Diaz, Gina R.; Sanford, Todd; Harpster, Roger

    2012-01-01

    Purpose: In this study, the authors evaluated the effect of remote system and acoustic environment on speech perception via telehealth with cochlear implant recipients. Method: Speech perception was measured in quiet and in noise. Systems evaluated were Polycom visual concert (PVC) and a hybrid presentation system (HPS). Each system was evaluated…

  13. Learning to Match Auditory and Visual Speech Cues: Social Influences on Acquisition of Phonological Categories

    ERIC Educational Resources Information Center

    Altvater-Mackensen, Nicole; Grossmann, Tobias

    2015-01-01

    Infants' language exposure largely involves face-to-face interactions providing acoustic and visual speech cues but also social cues that might foster language learning. Yet, both audiovisual speech information and social information have so far received little attention in research on infants' early language development. Using a preferential…

  14. Phonetic Category Cues in Adult-Directed Speech: Evidence from Three Languages with Distinct Vowel Characteristics

    ERIC Educational Resources Information Center

    Pons, Ferran; Biesanz, Jeremy C.; Kajikawa, Sachiyo; Fais, Laurel; Narayan, Chandan R.; Amano, Shigeaki; Werker, Janet F.

    2012-01-01

    Using an artificial language learning manipulation, Maye, Werker, and Gerken (2002) demonstrated that infants' speech sound categories change as a function of the distributional properties of the input. In a recent study, Werker et al. (2007) showed that Infant-directed Speech (IDS) input contains reliable acoustic cues that support distributional…

  15. Seeing to Hear Better: Evidence for Early Audio-Visual Interactions in Speech Identification

    ERIC Educational Resources Information Center

    Schwartz, Jean-Luc; Berthommier, Frederic; Savariaux, Christophe

    2004-01-01

    Lip reading is the ability to partially understand speech by looking at the speaker's lips. It improves the intelligibility of speech in noise when audio-visual perception is compared with audio-only perception. A recent set of experiments showed that seeing the speaker's lips also enhances "sensitivity" to acoustic information,…

  16. The Effects of Simultaneous Communication on Production and Perception of Speech

    ERIC Educational Resources Information Center

    Schiavetti, Nicholas; Whitehead, Robert L.; Metz, Dale Evan

    2004-01-01

    This article reviews experiments completed over the past decade at the National Technical Institute for the Deaf and the State University of New York at Geneseo concerning speech produced during simultaneous communication (SC) and synthesizes the empirical evidence concerning the acoustical and perceptual characteristics of speech in SC.…

  17. Prosody in Infant-Directed Speech Is Similar across Western and Traditional Cultures

    ERIC Educational Resources Information Center

    Broesch, Tanya L.; Bryant, Gregory A.

    2015-01-01

    When speaking to infants, adults typically alter the acoustic properties of their speech in a variety of ways compared with how they speak to other adults; for example, they use higher pitch, increased pitch range, more pitch variability, and slower speech rate. Research shows that these vocal changes happen similarly across industrialized…

  18. Speech Modification by a Deaf Child through Dynamic Orometric Modeling and Feedback.

    ERIC Educational Resources Information Center

    Fletcher, Samuel G.; Hasegawa, Akira

    1983-01-01

    A three and one-half-year-old profoundly deaf girl, whose physiologic, acoustic, and phonetic data indicated poor speech production, rapidly learned goal articulation gestures (positional and timing features of speech) after visual articulatory modeling and feedbck on tongue position with a microprocessor based instrument and video display.…

  19. The Perception of "Sine-Wave Speech" by Adults with Developmental Dyslexia.

    ERIC Educational Resources Information Center

    Rosner, Burton S.; Talcott, Joel B.; Witton, Caroline; Hogg, James D.; Richardson, Alexandra J.; Hansen, Peter C.; Stein, John F.

    2003-01-01

    "Sine-wave speech" sentences contain only four frequency-modulated sine waves, lacking many acoustic cues present in natural speech. Adults with (n=19) and without (n=14) dyslexia were asked to reproduce orally sine-wave utterances in successive trials. Results suggest comprehension of sine-wave sentences is impaired in some adults with…

  20. Emotional and Physiological Responses of Fluent Listeners while Watching the Speech of Adults Who Stutter

    ERIC Educational Resources Information Center

    Guntupalli, Vijaya K.; Everhart, D. Erik; Kalinowski, Joseph; Nanjundeswaran, Chayadevie; Saltuklaroglu, Tim

    2007-01-01

    Background: People who stutter produce speech that is characterized by intermittent, involuntary part-word repetitions and prolongations. In addition to these signature acoustic manifestations, those who stutter often display repetitive and fixated behaviours outside the speech producing mechanism (e.g. in the head, arm, fingers, nares, etc.).…

  1. The auditory representation of speech sounds in human motor cortex.

    PubMed

    Cheung, Connie; Hamiton, Liberty S; Johnson, Keith; Chang, Edward F

    2016-03-04

    In humans, listening to speech evokes neural responses in the motor cortex. This has been controversially interpreted as evidence that speech sounds are processed as articulatory gestures. However, it is unclear what information is actually encoded by such neural activity. We used high-density direct human cortical recordings while participants spoke and listened to speech sounds. Motor cortex neural patterns during listening were substantially different than during articulation of the same sounds. During listening, we observed neural activity in the superior and inferior regions of ventral motor cortex. During speaking, responses were distributed throughout somatotopic representations of speech articulators in motor cortex. The structure of responses in motor cortex during listening was organized along acoustic features similar to auditory cortex, rather than along articulatory features as during speaking. Motor cortex does not contain articulatory representations of perceived actions in speech, but rather, represents auditory vocal information.

  2. Modelling Errors in Automatic Speech Recognition for Dysarthric Speakers

    NASA Astrophysics Data System (ADS)

    Caballero Morales, Santiago Omar; Cox, Stephen J.

    2009-12-01

    Dysarthria is a motor speech disorder characterized by weakness, paralysis, or poor coordination of the muscles responsible for speech. Although automatic speech recognition (ASR) systems have been developed for disordered speech, factors such as low intelligibility and limited phonemic repertoire decrease speech recognition accuracy, making conventional speaker adaptation algorithms perform poorly on dysarthric speakers. In this work, rather than adapting the acoustic models, we model the errors made by the speaker and attempt to correct them. For this task, two techniques have been developed: (1) a set of "metamodels" that incorporate a model of the speaker's phonetic confusion matrix into the ASR process; (2) a cascade of weighted finite-state transducers at the confusion matrix, word, and language levels. Both techniques attempt to correct the errors made at the phonetic level and make use of a language model to find the best estimate of the correct word sequence. Our experiments show that both techniques outperform standard adaptation techniques.

  3. Real-Time Speech/Music Classification With a Hierarchical Oblique Decision Tree

    DTIC Science & Technology

    2008-04-01

    REAL-TIME SPEECH/ MUSIC CLASSIFICATION WITH A HIERARCHICAL OBLIQUE DECISION TREE Jun Wang, Qiong Wu, Haojiang Deng, Qin Yan Institute of Acoustics...time speech/ music classification with a hierarchical oblique decision tree. A set of discrimination features in frequency domain are selected...handle signals without discrimination and can not work properly in the existence of multimedia signals. This paper proposes a real-time speech/ music

  4. Classroom acoustics: Three pilot studies

    NASA Astrophysics Data System (ADS)

    Smaldino, Joseph J.

    2005-04-01

    This paper summarizes three related pilot projects designed to focus on the possible effects of classroom acoustics on fine auditory discrimination as it relates to language acquisition, especially English as a second language. The first study investigated the influence of improving the signal-to-noise ratio on the differentiation of English phonemes. The results showed better differentiation with better signal-to-noise ratio. The second studied speech perception in noise by young adults for whom English was a second language. The outcome indicated that the second language learners required a better signal-to-noise ratio to perform equally to the native language participants. The last study surveyed the acoustic conditions of preschool and day care classrooms, wherein first and second language learning occurs. The survey suggested an unfavorable acoustic environment for language learning.

  5. Acoustical evaluation of preschool classrooms

    NASA Astrophysics Data System (ADS)

    Yang, Wonyoung; Hodgson, Murray

    2003-10-01

    An investigation was made of the acoustical environments in the Berwick Preschool, Vancouver, in response to complaints by the teachers. Reverberation times (RT), background noise levels (BNL), and in-class sound levels (Leq) were measured for acoustical evaluation in the classrooms. With respect to the measured RT and BNL, none of the classrooms in the preschool were acceptable according to the criteria relevant to this study. A questionnaire was administered to the teachers to assess their subjective responses to the acoustical and nonacoustical environments of the classrooms. Teachers agreed that the nonacoustical environments in the classrooms were fair, but that the acoustical environments had problems. Eight different classroom configurations were simulated to improve the acoustical environments, using the CATT room acoustical simulation program. When the surface absorption was increased, both the RT and speech levels decreased. RASTI was dependent on the volumes of the classrooms when the background noise levels were high; however, it depended on the total absorption of the classrooms when the background noise levels were low. Ceiling heights are critical as well. It is recommended that decreasing the volume of the classrooms is effective. Sound absorptive materials should be added to the walls or ceiling.

  6. Classroom acoustics and intervention strategies to enhance the learning environment

    NASA Astrophysics Data System (ADS)

    Savage, Christal

    The classroom environment can be an acoustically difficult atmosphere for students to learn effectively, sometimes due in part to poor acoustical properties. Noise and reverberation have a substantial influence on room acoustics and subsequently intelligibility of speech. The American Speech-Language-Hearing Association (ASHA, 1995) developed minimal standards for noise and reverberation in a classroom for the purpose of providing an adequate listening environment. A lack of adherence to these standards may have undesirable consequences, which may lead to poor academic performance. The purpose of this capstone project is to develop a protocol to measure the acoustical properties of reverberation time and noise levels in elementary classrooms and present the educators with strategies to improve the learning environment. Noise level and reverberation will be measured and recorded in seven, unoccupied third grade classrooms in Lincoln Parish in North Louisiana. The recordings will occur at six specific distances in the classroom to simulate teacher and student positions. The recordings will be compared to the American Speech-Language-Hearing Association standards for noise and reverberation. If discrepancies are observed, the primary investigator will serve as an auditory consultant for the school and educators to recommend remediation and intervention strategies to improve these acoustical properties. The hypothesis of the study is that the classroom acoustical properties of noise and reverberation will exceed the American Speech-Language-Hearing Association standards; therefore, the auditory consultant will provide strategies to improve those acoustical properties.

  7. Speaker verification system using acoustic data and non-acoustic data

    DOEpatents

    Gable, Todd J.; Ng, Lawrence C.; Holzrichter, John F.; Burnett, Greg C.

    2006-03-21

    A method and system for speech characterization. One embodiment includes a method for speaker verification which includes collecting data from a speaker, wherein the data comprises acoustic data and non-acoustic data. The data is used to generate a template that includes a first set of "template" parameters. The method further includes receiving a real-time identity claim from a claimant, and using acoustic data and non-acoustic data from the identity claim to generate a second set of parameters. The method further includes comparing the first set of parameters to the set of parameters to determine whether the claimant is the speaker. The first set of parameters and the second set of parameters include at least one purely non-acoustic parameter, including a non-acoustic glottal shape parameter derived from averaging multiple glottal cycle waveforms.

  8. Measurement of acoustical characteristics of mosques in Saudi Arabia

    NASA Astrophysics Data System (ADS)

    Abdou, Adel A.

    2003-03-01

    The study of mosque acoustics, with regard to acoustical characteristics, sound quality for speech intelligibility, and other applicable acoustic criteria, has been largely neglected. In this study a background as to why mosques are designed as they are and how mosque design is influenced by worship considerations is given. In the study the acoustical characteristics of typically constructed contemporary mosques in Saudi Arabia have been investigated, employing a well-known impulse response. Extensive field measurements were taken in 21 representative mosques of different sizes and architectural features in order to characterize their acoustical quality and to identify the impact of air conditioning, ceiling fans, and sound reinforcement systems on their acoustics. Objective room-acoustic indicators such as reverberation time (RT) and clarity (C50) were measured. Background noise (BN) was assessed with and without the operation of air conditioning and fans. The speech transmission index (STI) was also evaluated with and without the operation of existing sound reinforcement systems. The existence of acoustical deficiencies was confirmed and quantified. The study, in addition to describing mosque acoustics, compares design goals to results obtained in practice and suggests acoustical target values for mosque design. The results show that acoustical quality in the investigated mosques deviates from optimum conditions when unoccupied, but is much better in the occupied condition.

  9. Measurement of acoustical characteristics of mosques in Saudi Arabia.

    PubMed

    Abdou, Adel A

    2003-03-01

    The study of mosque acoustics, with regard to acoustical characteristics, sound quality for speech intelligibility, and other applicable acoustic criteria, has been largely neglected. In this study a background as to why mosques are designed as they are and how mosque design is influenced by worship considerations is given. In the study the acoustical characteristics of typically constructed contemporary mosques in Saudi Arabia have been investigated, employing a well-known impulse response. Extensive field measurements were taken in 21 representative mosques of different sizes and architectural features in order to characterize their acoustical quality and to identify the impact of air conditioning, ceiling fans, and sound reinforcement systems on their acoustics. Objective room-acoustic indicators such as reverberation time (RT) and clarity (C50) were measured. Background noise (BN) was assessed with and without the operation of air conditioning and fans. The speech transmission index (STI) was also evaluated with and without the operation of existing sound reinforcement systems. The existence of acoustical deficiencies was confirmed and quantified. The study, in addition to describing mosque acoustics, compares design goals to results obtained in practice and suggests acoustical target values for mosque design. The results show that acoustical quality in the investigated mosques deviates from optimum conditions when unoccupied, but is much better in the occupied condition.

  10. Motor representations of articulators contribute to categorical perception of speech sounds.

    PubMed

    Möttönen, Riikka; Watkins, Kate E

    2009-08-05

    Listening to speech modulates activity in human motor cortex. It is unclear, however, whether the motor cortex has an essential role in speech perception. Here, we aimed to determine whether the motor representations of articulators contribute to categorical perception of speech sounds. Categorization of continuously variable acoustic signals into discrete phonemes is a fundamental feature of speech communication. We used repetitive transcranial magnetic stimulation (rTMS) to temporarily disrupt the lip representation in the left primary motor cortex. This disruption impaired categorical perception of artificial acoustic continua ranging between two speech sounds that differed in place of articulation, in that the vocal tract is opened and closed rapidly either with the lips or the tip of the tongue (/ba/-/da/ and /pa/-/ta/). In contrast, it did not impair categorical perception of continua ranging between speech sounds that do not involve the lips in their articulation (/ka/-/ga/ and /da/-/ga/). Furthermore, an rTMS-induced disruption of the hand representation had no effect on categorical perception of either of the tested continua (/ba/-da/ and /ka/-/ga/). These findings indicate that motor circuits controlling production of speech sounds also contribute to their perception. Mapping acoustically highly variable speech sounds onto less variable motor representations may facilitate their phonemic categorization and be important for robust speech perception.

  11. Discrimination of brief speech sounds is impaired in rats with auditory cortex lesions.

    PubMed

    Porter, Benjamin A; Rosenthal, Tara R; Ranasinghe, Kamalini G; Kilgard, Michael P

    2011-05-16

    Auditory cortex (AC) lesions impair complex sound discrimination. However, a recent study demonstrated spared performance on an acoustic startle response test of speech discrimination following AC lesions (Floody et al., 2010). The current study reports the effects of AC lesions on two operant speech discrimination tasks. AC lesions caused a modest and quickly recovered impairment in the ability of rats to discriminate consonant-vowel-consonant speech sounds. This result seems to suggest that AC does not play a role in speech discrimination. However, the speech sounds used in both studies differed in many acoustic dimensions and an adaptive change in discrimination strategy could allow the rats to use an acoustic difference that does not require an intact AC to discriminate. Based on our earlier observation that the first 40 ms of the spatiotemporal activity patterns elicited by speech sounds best correlate with behavioral discriminations of these sounds (Engineer et al., 2008), we predicted that eliminating additional cues by truncating speech sounds to the first 40 ms would render the stimuli indistinguishable to a rat with AC lesions. Although the initial discrimination of truncated sounds took longer to learn, the final performance paralleled rats using full-length consonant-vowel-consonant sounds. After 20 days of testing, half of the rats using speech onsets received bilateral AC lesions. Lesions severely impaired speech onset discrimination for at least one-month post lesion. These results support the hypothesis that auditory cortex is required to accurately discriminate the subtle differences between similar consonant and vowel sounds.

  12. A pilot study about speech changes after partial Tucker's laryngectomy: the reduction of regressive voicing assimilation.

    PubMed

    Galant, C; Lagier, A; Vercasson, C; Santini, L; Dessi, P; Giovanni, A; Fakhry, N

    2015-12-01

    Partial frontolateral laryngectomy (PL) is performed to remove larynx tumor while preserving its main functions. So far, the speech changes induced by difficulties of voicing and the alterations to the vocal tract due to PL have been seldom addressed. The goal of our study was to make an acoustic analysis of regressive voicing assimilation (RVA) among patients after PL and to study the relationship with rates of speech. A retrospective study was conducted from January to April 2013. 11 subjects treated by partial frontolateral laryngectomy, and ten healthy subjects were included. Functional recordings of voice were analyzed and compared. For assimilation sequences we found a significant modification of voicing ratio in healthy subjects (p < 0.05) and PL patient at accelerated speaking rate only (p < 0.05). The vowel duration is significantly modified only for healthy subjects. For all subjects (PL patients and healthy) the duration of C1 consonant was not significantly modified. Our results highlight the presence of RVA in healthy subjects, but also in PL patients in the rapid speaking mode.

  13. A Non-Invasive Imaging Approach to Understanding Speech Changes following Deep Brain Stimulation in Parkinson’s Disease

    PubMed Central

    Narayana, Shalini; Jacks, Adam; Robin, Donald A.; Poizner, Howard; Zhang, Wei; Franklin, Crystal; Liotti, Mario; Vogel, Deanie; Fox, Peter T.

    2009-01-01

    Purpose To explore the use of non-invasive functional imaging and “virtual” lesion techniques to study the neural mechanisms underlying motor speech disorders in Parkinson’s disease. Here, we report the use of Positron Emission Tomography (PET) and transcranial magnetic stimulation (TMS) to explain exacerbated speech impairment following subthalamic nucleus deep brain stimulation (STN-DBS) in a patient with Parkinson’s disease. Method Perceptual and acoustic speech measures as well as cerebral blood flow (CBF) during speech as measured by PET were obtained with STN-DBS on and off. TMS was applied to a region in the speech motor network found to be abnormally active during DBS. Speech disruption by TMS was compared both perceptually and acoustically with that resulting from DBS on. Results Speech production was perceptually inferior and acoustically less contrastive during left STN stimulation compared to no stimulation. Increased neural activity in left dorsal premotor cortex (PMd) was observed during DBS on. “Virtual” lesioning of this region resulted in speech characterized by decreased speech segment duration, increased pause duration, and decreased intelligibility. Conclusions This case report provides evidence that impaired speech production accompanying STN-DBS may be resulting from unintended activation of PMd. Clinical application of functional imaging and TMS may lead to optimizing the delivery of STN-DBS to improve outcomes for speech production as well as general motor abilities. PMID:19029533

  14. Dog-directed speech: why do we use it and do dogs pay attention to it?

    PubMed

    Ben-Aderet, Tobey; Gallego-Abenza, Mario; Reby, David; Mathevon, Nicolas

    2017-01-11

    Pet-directed speech is strikingly similar to infant-directed speech, a peculiar speaking pattern with higher pitch and slower tempo known to engage infants' attention and promote language learning. Here, we report the first investigation of potential factors modulating the use of dog-directed speech, as well as its immediate impact on dogs' behaviour. We recorded adult participants speaking in front of pictures of puppies, adult and old dogs, and analysed the quality of their speech. We then performed playback experiments to assess dogs' reaction to dog-directed speech compared with normal speech. We found that human speakers used dog-directed speech with dogs of all ages and that the acoustic structure of dog-directed speech was mostly independent of dog age, except for sound pitch which was relatively higher when communicating with puppies. Playback demonstrated that, in the absence of other non-auditory cues, puppies were highly reactive to dog-directed speech, and that the pitch was a key factor modulating their behaviour, suggesting that this specific speech register has a functional value in young dogs. Conversely, older dogs did not react differentially to dog-directed speech compared with normal speech. The fact that speakers continue to use dog-directed with older dogs therefore suggests that this speech pattern may mainly be a spontaneous attempt to facilitate interactions with non-verbal listeners.

  15. Information-bearing acoustic change outperforms duration in predicting intelligibility of full-spectrum and noise-vocoded sentences.

    PubMed

    Stilp, Christian E

    2014-03-01

    Recent research has demonstrated a strong relationship between information-bearing acoustic changes in the speech signal and speech intelligibility. The availability of information-bearing acoustic changes reliably predicts intelligibility of full-spectrum [Stilp and Kluender (2010). Proc. Natl. Acad. Sci. U.S.A. 107(27), 12387-12392] and noise-vocoded sentences amid noise interruption [Stilp et al. (2013). J. Acoust. Soc. Am. 133(2), EL136-EL141]. However, other research reports that proportion of signal duration preserved also predicts intelligibility of noise-interrupted speech. These factors have only ever been investigated independently, obscuring whether one better explains speech perception. The present experiments manipulated both factors to answer this question. A broad range of sentence durations (160-480 ms) containing high or low information-bearing acoustic changes were replaced by speech-shaped noise in noise-vocoded (Experiment 1) and full-spectrum sentences (Experiment 2). Sentence intelligibility worsened with increasing noise replacement, but in both experiments, information-bearing acoustic change was a statistically superior predictor of performance. Perception relied more heavily on information-bearing acoustic changes in poorer listening conditions (in spectrally degraded sentences and amid increasing noise replacement). Highly linear relationships between measures of information and performance suggest that exploiting information-bearing acoustic change is a shared principle underlying perception of acoustically rich and degraded speech. Results demonstrate the explanatory power of information-theoretic approaches for speech perception.

  16. Speech and Language Impairments

    MedlinePlus

    ... is…Robbie, Pearl, and Mario. Back to top Definition There are many kinds of speech and language ... education available to school-aged children with disabilities. Definition of “Speech or Language Impairment” under IDEA The ...

  17. A Comparison of the Speech Patterns and Dialect Attitudes of Oklahoma

    ERIC Educational Resources Information Center

    Bakos, Jon

    2013-01-01

    The lexical dialect usage of Oklahoma has been well-studied in the past by the Survey of Oklahoma Dialects, but the acoustic speech production of the state has received little attention. Apart from two people from Tulsa and two people from Oklahoma City that were interviewed for the Atlas of North American English, no other acoustic work has been…

  18. Auditory perception bias in speech imitation.

    PubMed

    Postma-Nilsenová, Marie; Postma, Eric

    2013-01-01

    In an experimental study, we explored the role of auditory perception bias in vocal pitch imitation. Psychoacoustic tasks involving a missing fundamental indicate that some listeners are attuned to the relationship between all the higher harmonics present in the signal, which supports their perception of the fundamental frequency (the primary acoustic correlate of pitch). Other listeners focus on the lowest harmonic constituents of the complex sound signal which may hamper the perception of the fundamental. These two listener types are referred to as fundamental and spectral listeners, respectively. We hypothesized that the individual differences in speakers' capacity to imitate F 0 found in earlier studies, may at least partly be due to the capacity to extract information about F 0 from the speech signal. Participants' auditory perception bias was determined with a standard missing fundamental perceptual test. Subsequently, speech data were collected in a shadowing task with two conditions, one with a full speech signal and one with high-pass filtered speech above 300 Hz. The results showed that perception bias toward fundamental frequency was related to the degree of F 0 imitation. The effect was stronger in the condition with high-pass filtered speech. The experimental outcomes suggest advantages for fundamental listeners in communicative situations where F 0 imitation is used as a behavioral cue. Future research needs to determine to what extent auditory perception bias may be related to other individual properties known to improve imitation, such as phonetic talent.

  19. Recognizing hesitation phenomena in continuous, spontaneous speech

    NASA Astrophysics Data System (ADS)

    Oshaughnessy, Douglas

    Spontaneous speech differs from read speech in speaking rate and hesitation. In natural, spontaneous speech, people often start talking and then think along the way; at times, this causes the speech to have hesitation pauses (both filled and unfilled) and restarts. Results are reported on all types of pauses in a widely-used speech database, for both hesitation pauses and semi-intentional pauses. A distinction is made between grammatical pauses (at major syntactic boundaries) and ungrammatical ones. Different types of unfilled pauses cannot be reliably separated based on silence duration, although grammatical pauses tend to be longer. In the prepausal word before ungrammatical pauses, there were few continuation rises in pitch, whereas 80 percent of the grammatical pauses were accompanied by a prior fundamental frequency rise of 10-40 kHz. Identifying the syntactic function of such hesitation phenomena can improve recognition performance by eliminating from consideration some of the hypotheses proposed by an acoustic recognizer. Results presented allow simple identification of filled pauses (such as uhh, umm) and their syntactic function.

  20. A modeling investigation of vowel-to-vowel movement planning in acoustic and muscle spaces

    NASA Astrophysics Data System (ADS)

    Zandipour, Majid

    The primary objective of this research was to explore the coordinate space in which speech movements are planned. A two dimensional biomechanical model of the vocal tract (tongue, lips, jaw, and pharynx) was constructed based on anatomical and physiological data from a subject. The model transforms neural command signals into the actions of muscles. The tongue was modeled by a 221-node finite element mesh. Each of the eight tongue muscles defined within the mesh was controlled by a virtual muscle model. The other vocal-tract components were modeled as simple 2nd-order systems. The model's geometry was adapted to a speaker, using MRI scans of the speaker's vocal tract. The vocal tract model, combined with an adaptive controller that consisted of a forward model (mapping 12-dimensional motor commands to a 64-dimensional acoustic spectrum) and an inverse model (mapping acoustic trajectories to motor command trajectories), was used to simulate and explore the implications of two planning hypotheses: planning in motor space vs. acoustic space. The acoustic, kinematic, and muscle activation (EMG) patterns of vowel-to-vowel sequences generated by the model were compared to data from the speaker whose acoustic, kinematic and EMG were also recorded. The simulation results showed that: (a) modulations of the motor commands effectively accounted for the effects of speaking rate on EMG, kinematic, and acoustic outputs; (b) the movement and acoustic trajectories were influenced by vocal tract biomechanics; and (c) both planning schemes produced similar articulatory movement, EMG, muscle length, force, and acoustic trajectories, which were also comparable to the subject's data under normal speaking conditions. In addition, the effects of a bite-block on measured EMG, kinematics and formants were simulated by the model. Acoustic planning produced successful simulations but motor planning did not. The simulation results suggest that with somatosensory feedback but no auditory

  1. Speech imagery recalibrates speech-perception boundaries.

    PubMed

    Scott, Mark

    2016-07-01

    The perceptual boundaries between speech sounds are malleable and can shift after repeated exposure to contextual information. This shift is known as recalibration. To date, the known inducers of recalibration are lexical (including phonotactic) information, lip-read information and reading. The experiments reported here are a proof-of-effect demonstration that speech imagery can also induce recalibration.

  2. Neurophysiological evidence that musical training influences the recruitment of right hemispheric homologues for speech perception.

    PubMed

    Jantzen, McNeel G; Howe, Bradley M; Jantzen, Kelly J

    2014-01-01

    Musicians have a more accurate temporal and tonal representation of auditory stimuli than their non-musician counterparts (Musacchia et al., 2007; Parbery-Clark et al., 2009a; Zendel and Alain, 2009; Kraus and Chandrasekaran, 2010). Musicians who are adept at the production and perception of music are also more sensitive to key acoustic features of speech such as voice onset timing and pitch. Together, these data suggest that musical training may enhance the processing of acoustic information for speech sounds. In the current study, we sought to provide neural evidence that musicians process speech and music in a similar way. We hypothesized that for musicians, right hemisphere areas traditionally associated with music are also engaged for the processing of speech sounds. In contrast we predicted that in non-musicians processing of speech sounds would be localized to traditional left hemisphere language areas. Speech stimuli differing in voice onset time was presented using a dichotic listening paradigm. Subjects either indicated aural location for a specified speech sound or identified a specific speech sound from a directed aural location. Musical training effects and organization of acoustic features were reflected by activity in source generators of the P50. This included greater activation of right middle temporal gyrus and superior temporal gyrus in musicians. The findings demonstrate recruitment of right hemisphere in musicians for discriminating speech sounds and a putative broadening of their language network. Musicians appear to have an increased sensitivity to acoustic features and enhanced selective attention to temporal features of speech that is facilitated by musical training and supported, in part, by right hemisphere homologues of established speech processing regions of the brain.

  3. Free Speech Yearbook 1978.

    ERIC Educational Resources Information Center

    Phifer, Gregg, Ed.

    The 17 articles in this collection deal with theoretical and practical freedom of speech issues. The topics include: freedom of speech in Marquette Park, Illinois; Nazis in Skokie, Illinois; freedom of expression in the Confederate States of America; Robert M. LaFollette's arguments for free speech and the rights of Congress; the United States…

  4. Talking Speech Input.

    ERIC Educational Resources Information Center

    Berliss-Vincent, Jane; Whitford, Gigi

    2002-01-01

    This article presents both the factors involved in successful speech input use and the potential barriers that may suggest that other access technologies could be more appropriate for a given individual. Speech input options that are available are reviewed and strategies for optimizing use of speech recognition technology are discussed. (Contains…

  5. Free Speech Yearbook: 1970.

    ERIC Educational Resources Information Center

    Tedford, Thomas L., Ed.

    This book is a collection of syllabi, attitude surveys, and essays relating to free-speech issues, compiled by the Committee on Freedom of Seech of the Speech Communication Association. The collection begins with a rationale for the inclusion of a course on free speech in the college curriculum. Three syllabi with bibliographies present guides for…

  6. Musical Acoustics

    NASA Astrophysics Data System (ADS)

    Gough, Colin

    This chapter provides an introduction to the physical and psycho-acoustic principles underlying the production and perception of the sounds of musical instruments. The first section introduces generic aspects of musical acoustics and the perception of musical sounds, followed by separate sections on string, wind and percussion instruments.

  7. Applications of Hilbert Spectral Analysis for Speech and Sound Signals

    NASA Technical Reports Server (NTRS)

    Huang, Norden E.

    2003-01-01

    A new method for analyzing nonlinear and nonstationary data has been developed, and the natural applications are to speech and sound signals. The key part of the method is the Empirical Mode Decomposition method with which any complicated data set can be decomposed into a finite and often small number of Intrinsic Mode Functions (IMF). An IMF is defined as any function having the same numbers of zero-crossing and extrema, and also having symmetric envelopes defined by the local maxima and minima respectively. The IMF also admits well-behaved Hilbert transform. This decomposition method is adaptive, and, therefore, highly efficient. Since the decomposition is based on the local characteristic time scale of the data, it is applicable to nonlinear and nonstationary processes. With the Hilbert transform, the Intrinsic Mode Functions yield instantaneous frequencies as functions of time, which give sharp identifications of imbedded structures. This method invention can be used to process all acoustic signals. Specifically, it can process the speech signals for Speech synthesis, Speaker identification and verification, Speech recognition, and Sound signal enhancement and filtering. Additionally, as the acoustical signals from machinery are essentially the way the machines are talking to us. Therefore, the acoustical signals, from the machines, either from sound through air or vibration on the machines, can tell us the operating conditions of the machines. Thus, we can use the acoustic signal to diagnosis the problems of machines.

  8. The Role of the Listener's State in Speech Perception

    ERIC Educational Resources Information Center

    Viswanathan, Navin

    2009-01-01

    Accounts of speech perception disagree on whether listeners perceive the acoustic signal (Diehl, Lotto, & Holt, 2004) or the vocal tract gestures that produce the signal (e.g., Fowler, 1986). In this dissertation, I outline a research program using a phenomenon called "perceptual compensation for coarticulation" (Mann, 1980) to examine this…

  9. How Autism Affects Speech Understanding in Multitalker Environments

    DTIC Science & Technology

    2013-10-01

    autism spectrum disorder using the Let’s Face It! skills battery. Autism Research, 1(6), 329-340. ...adults with Autism Spectrum Disorders have particular difficulty recognizing speech in acoustically-hostile environments (e.g., Alcantara et al...other talkers (Barker & Newman, 2004; van de Weijer, 1998). Studies suggest that adults with Autism Spectrum Disorders (ASD) may have

  10. Speech Research Status Report, July-December 1992.

    ERIC Educational Resources Information Center

    Fowler, Carol A., Ed.

    One of a series of semi-annual reports, this publication contains 25 articles which report the status and progress of studies on the nature of speech, instruments for its investigation, and practical applications. Articles are as follows: "Acoustic Shards, Perceptual Glue" (Robert E. Remez and Philip E. Rubin); "F0 Gives Voicing…

  11. Speech prosody in Friedreich's and olivo-ponto cerebellar atrophy

    NASA Astrophysics Data System (ADS)

    Casper, Maureen

    2004-05-01

    A critical issue in the study of speech motor control is the identification of the mechanisms that generate the temporal flow of serially ordered articulatory events. Two staged models of serial ordered events (Lashley, 1951; Lindblom, 1963) claim that time controls events whereas dynamic models predict a relative relation between time and space. Each of these models predicts a different relation between the acoustic measures of formant frequency and segmental duration. The most recent method described herein provides a sensitive index of speech deterioration which is both acoustically robust and phonetically systematic. Both acoustic and magnetic resonance imaging measures were used to describe the speech disturbance in two neurologically distinct groups of cerebellar ataxia: Friedreich's ataxia and olivo-ponto cerebellar ataxia. The speaking task was designed to elicit six different prosodic conditions and four prosodic contrasts. All subjects read the same syllable embedded in a sentence, under six different prosodic conditions. Pair-wise comparisons derived from the six conditions were used to describe (1) final lengthening, (2) phrasal accent, (3) nuclear accent and (4) syllable reduction. An estimate of speech deterioration as determined by individual and normal subects' acoustic values of syllable duration, formant and fundamental frequencies was used in correlation analyses with magnetic resonance imaging ratings.

  12. Attention Orienting Effects of Hesitations in Speech: Evidence from ERPs

    ERIC Educational Resources Information Center

    Collard, Philip; Corley, Martin; MacGregor, Lucy J.; Donaldson, David I.

    2008-01-01

    Filled-pause disfluencies such as "um" and "er" affect listeners' comprehension, possibly mediated by attentional mechanisms (J. E. Fox Tree, 2001). However, there is little direct evidence that hesitations affect attention. The current study used an acoustic manipulation of continuous speech to induce event-related potential components associated…

  13. Critical Issues in Airborne Applications of Speech Recognition.

    DTIC Science & Technology

    1979-01-01

    human’s tongue , lips, and other articulators to get ready for the next vowel or consonant to be spoken, and to gradually move away from the...acoustic tube, so that formants and other interesting features of the speech signal could be more readily and accurately detected). Of particular

  14. Acoustic Characterization of Soil

    DTIC Science & Technology

    2007-11-02

    modified SAR imaging algorithm. Page 26 Final Report In the acoustic subsurface imaging scenario, the "object" to be imaged (i.e., cultural artifacts... subsurface imaging scenario. To combat this potential difficulty we can utilize a new SAR imaging algorithm (Lee et al., 1996) derived from a geophysics...essentially a transmit plane wave. This is a cost-effective means to evaluate the feasibility of subsurface imaging . A more complete (and costly

  15. Construction of Hindi Speech Stimuli for Eliciting Auditory Brainstem Responses.

    PubMed

    Ansari, Mohammad Shamim; Rangasayee, R

    2016-12-01

    Speech-evoked auditory brainstem responses (spABRs) provide considerable information of clinical relevance to describe auditory processing of complex stimuli at the sub cortical level. The substantial research data have suggested faithful representation of temporal and spectral characteristics of speech sounds. However, the spABR are known to be affected by acoustic properties of speech, language experiences and training. Hence, there exists indecisive literature with regards to brainstem speech processing. This warrants establishment of language specific speech stimulus to describe the brainstem processing in specific oral language user. The objective of current study is to develop Hindi speech stimuli for recording auditory brainstem responses. The Hindi stop speech of 40 ms containing five formants was constructed. Brainstem evoked responses to speech sound |da| were gained from 25 normal hearing (NH) adults having mean age of 20.9 years (SD = 2.7) in the age range of 18-25 years and ten subjects (HI) with mild SNHL of mean 21.3 years (SD = 3.2) in the age range of 18-25 years. The statistically significant differences in the mean identification scores of synthesized for speech stimuli |da| and |ga| between NH and HI were obtained. The mean, median, standard deviation, minimum, maximum and 95 % confidence interval for the discrete peaks and V-A complex values of electrophysiological responses to speech stimulus were measured and compared between NH and HI population. This paper delineates a comprehensive methodological approach for development of Hindi speech stimuli and recording of ABR to speech. The acoustic characteristic of stimulus |da| was faithfully represented at brainstem level in normal hearing adults. There was statistically significance difference between NH and HI individuals. This suggests that spABR offers an opportunity to segregate normal speech encoding from abnormal speech processing at sub cortical level, which implies that

  16. Auditory-Motor Expertise Alters “Speech Selectivity” in Professional Musicians and Actors

    PubMed Central

    Lee, Hwee Ling; Nusbaum, Howard; Price, Cathy J.

    2011-01-01

    Several perisylvian brain regions show preferential activation for spoken language above and beyond other complex sounds. These “speech-selective” effects might be driven by regions’ intrinsic biases for processing the acoustical or informational properties of speech. Alternatively, such speech selectivity might emerge through extensive experience in perceiving and producing speech sounds. This functional magnetic resonance imaging (fMRI) study disambiguated such audiomotor expertise from speech selectivity by comparing activation for listening to speech and music in female professional violinists and actors. Audiomotor expertise effects were identified in several right and left superior temporal regions that responded to speech in all participants and music in violinists more than actresses. Regions associated with the acoustic/information content of speech were identified along the entire length of the superior temporal sulci bilaterally where activation was greater for speech than music in all participants. Finally, an effect of performing arts training was identified in bilateral premotor regions commonly activated by finger and mouth movements as well as in right hemisphere “language regions.” These results distinguish the seemingly speech-specific neural responses that can be abolished and even reversed by long-term audiomotor experience. PMID:20829245

  17. Inconsistency of speech in children with childhood apraxia of speech, phonological disorders, and typical speech

    NASA Astrophysics Data System (ADS)

    Iuzzini, Jenya

    There is a lack of agreement on the features used to differentiate Childhood Apraxia of Speech (CAS) from Phonological Disorders (PD). One criterion which has gained consensus is lexical inconsistency of speech (ASHA, 2007); however, no accepted measure of this feature has been defined. Although lexical assessment provides information about consistency of an item across repeated trials, it may not capture the magnitude of inconsistency within an item. In contrast, segmental analysis provides more extensive information about consistency of phoneme usage across multiple contexts and word-positions. The current research compared segmental and lexical inconsistency metrics in preschool-aged children with PD, CAS, and typical development (TD) to determine how inconsistency varies with age in typical and disordered speakers, and whether CAS and PD were differentiated equally well by both assessment levels. Whereas lexical and segmental analyses may be influenced by listener characteristics or speaker intelligibility, the acoustic signal is less vulnerable to these factors. In addition, the acoustic signal may reveal information which is not evident in the perceptual signal. A second focus of the current research was motivated by Blumstein et al.'s (1980) classic study on voice onset time (VOT) in adults with acquired apraxia of speech (AOS) which demonstrated a motor impairment underlying AOS. In the current study, VOT analyses were conducted to determine the relationship between age and group with the voicing distribution for bilabial and alveolar plosives. Findings revealed that 3-year-olds evidenced significantly higher inconsistency than 5-year-olds; segmental inconsistency approached 0% in 5-year-olds with TD, whereas it persisted in children with PD and CAS suggesting that for child in this age-range, inconsistency is a feature of speech disorder rather than typical development (Holm et al., 2007). Likewise, whereas segmental and lexical inconsistency were

  18. Improvement of electrolaryngeal speech quality using a supraglottal voice source with compensation of vocal tract characteristics.

    PubMed

    Wu, Liang; Wan, Congying; Wang, Supin; Wan, Mingxi

    2013-07-01

    Electrolarynx (EL) is a medical speech-recovery device designed for patients who have lost their original voice box due to laryngeal cancer. As a substitute for human larynx, the current commercial EL voice source cannot reconstruct natural EL speech under laryngectomy conditions. To eliminate the abnormal acoustic properties of EL speech, a supraglottal voice source with compensation of vocal tract characteristics was proposed and provided through an experimental EL(SGVS-EL) system. The acoustic analyses of simulated EL speech and reconstructed EL speech produced with different voice sources were performed in the normal subject and laryngectomee. The results indicated that the supraglottal voice source was successful in improving the acoustic properties of EL speech by enhancing low- frequency energy, correcting the shifted formants to normal range, and eliminating the visible spectral zeros. Both normal subject and laryngectomee also produced more natural vowels using SGVS-EL than commercial EL, even if the vocal tract parameter was substituted and the supraglottal voice source was biased to a certain degree. Therefore, supraglottal voice source is a feasible and effective approach to improving the acoustic quality of EL speech.

  19. Seeing to hear better: evidence for early audio-visual interactions in speech identification.

    PubMed

    Schwartz, Jean-Luc; Berthommier, Frédéric; Savariaux, Christophe

    2004-09-01

    Lip reading is the ability to partially understand speech by looking at the speaker's lips. It improves the intelligibility of speech in noise when audio-visual perception is compared with audio-only perception. A recent set of experiments showed that seeing the speaker's lips also enhances sensitivity to acoustic information, decreasing the auditory detection threshold of speech embedded in noise [J. Acoust. Soc. Am. 109 (2001) 2272; J. Acoust. Soc. Am. 108 (2000) 1197]. However, detection is different from comprehension, and it remains to be seen whether improved sensitivity also results in an intelligibility gain in audio-visual speech perception. In this work, we use an original paradigm to show that seeing the speaker's lips enables the listener to hear better and hence to understand better. The audio-visual stimuli used here could not be differentiated by lip reading per se since they contained exactly the same lip gesture matched with different compatible speech sounds. Nevertheless, the noise-masked stimuli were more intelligible in the audio-visual condition than in the audio-only condition due to the contribution of visual information to the extraction of acoustic cues. Replacing the lip gesture by a non-speech visual input with exactly the same time course, providing the same temporal cues for extraction, removed the intelligibility benefit. This early contribution to audio-visual speech identification is discussed in relationships with recent neurophysiological data on audio-visual perception.

  20. [Impact of the Overlap Region Between Acoustic and Electric Stimulation].

    PubMed

    Baumann, Uwe; Mocka, Moritz

    2017-02-08

    Patients with residual hearing in the low frequencies and ski-slope hearing loss with partial deafness at medium and high frequencies receive a cochlear implant treatment with electric-acoustic stimulation (EAS, "hybrid" stimulation). In the border region between electric and acoustic stimulation a superposition of the 2 types of stimulation is expected. The area of overlap is determined by the insertion depth of the stimulating electrode and the lower starting point of signal transmission provided by the CI speech processor. The study examined the influence of the variation of the electric-acoustic overlap area on speech perception in noise, whereby the width of the "transmission gap" between the 2 different stimulus modalities was varied by 2 different methods. The results derived from 9 experienced users of the MED-EL Duet 2 speech processor show that the electric-acoustic overlapping area and with it the crossover frequency between the acoustic part and the CI should be adjusted individually. Overall, speech reception thresholds (SRT) showed a wide variation of results in between subjects. Further studies shall investigate whether generalized procedures about the setting of the overlap between electric and acoustic stimulation are reasonable, whereby an increased number of subjects and a longer period of acclimatization prior to the conduction of hearing tests deemed necessary.