Science.gov

Sample records for acoustically modified speech

  1. Language Comprehension in Language-Learning Impaired Children Improved with Acoustically Modified Speech

    NASA Astrophysics Data System (ADS)

    Tallal, Paula; Miller, Steve L.; Bedi, Gail; Byma, Gary; Wang, Xiaoqin; Nagarajan, Srikantan S.; Schreiner, Christoph; Jenkins, William M.; Merzenich, Michael M.

    1996-01-01

    A speech processing algorithm was developed to create more salient versions of the rapidly changing elements in the acoustic waveform of speech that have been shown to be deficiently processed by language-learning impaired (LLI) children. LLI children received extensive daily training, over a 4-week period, with listening exercises in which all speech was translated into this synthetic form. They also received daily training with computer "games" designed to adaptively drive improvements in temporal processing thresholds. Significant improvements in speech discrimination and language comprehension abilities were demonstrated in two independent groups of LLI children.

  2. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2004-03-23

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  3. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2006-02-14

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  4. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2006-08-08

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  5. System And Method For Characterizing Voiced Excitations Of Speech And Acoustic Signals, Removing Acoustic Noise From Speech, And Synthesizi

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2006-04-25

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  6. Acoustics of Clear Speech: Effect of Instruction

    ERIC Educational Resources Information Center

    Lam, Jennifer; Tjaden, Kris; Wilding, Greg

    2012-01-01

    Purpose: This study investigated how different instructions for eliciting clear speech affected selected acoustic measures of speech. Method: Twelve speakers were audio-recorded reading 18 different sentences from the Assessment of Intelligibility of Dysarthric Speech (Yorkston & Beukelman, 1984). Sentences were produced in habitual, clear,…

  7. Prolonged Speech and Modification of Stuttering: Perceptual, Acoustic, and Electroglottographic Data.

    ERIC Educational Resources Information Center

    Packman, Ann; And Others

    1994-01-01

    This study investigated changes in the speech patterns of young adult male subjects when stuttering was modified by deliberately prolonging speech. Three subjects showed clinically significant stuttering reductions when using prolonged speech to reduce their stuttering. Resulting speech was perceptually stutter free. Acoustic and…

  8. Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

    DOEpatents

    Holzrichter, J.F.; Ng, L.C.

    1998-03-17

    The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching. 35 figs.

  9. Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

    DOEpatents

    Holzrichter, John F.; Ng, Lawrence C.

    1998-01-01

    The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching.

  10. Optimizing acoustical conditions for speech intelligibility in classrooms

    NASA Astrophysics Data System (ADS)

    Yang, Wonyoung

    High speech intelligibility is imperative in classrooms where verbal communication is critical. However, the optimal acoustical conditions to achieve a high degree of speech intelligibility have previously been investigated with inconsistent results, and practical room-acoustical solutions to optimize the acoustical conditions for speech intelligibility have not been developed. This experimental study validated auralization for speech-intelligibility testing, investigated the optimal reverberation for speech intelligibility for both normal and hearing-impaired listeners using more realistic room-acoustical models, and proposed an optimal sound-control design for speech intelligibility based on the findings. The auralization technique was used to perform subjective speech-intelligibility tests. The validation study, comparing auralization results with those of real classroom speech-intelligibility tests, found that if the room to be auralized is not very absorptive or noisy, speech-intelligibility tests using auralization are valid. The speech-intelligibility tests were done in two different auralized sound fields---approximately diffuse and non-diffuse---using the Modified Rhyme Test and both normal and hearing-impaired listeners. A hybrid room-acoustical prediction program was used throughout the work, and it and a 1/8 scale-model classroom were used to evaluate the effects of ceiling barriers and reflectors. For both subject groups, in approximately diffuse sound fields, when the speech source was closer to the listener than the noise source, the optimal reverberation time was zero. When the noise source was closer to the listener than the speech source, the optimal reverberation time was 0.4 s (with another peak at 0.0 s) with relative output power levels of the speech and noise sources SNS = 5 dB, and 0.8 s with SNS = 0 dB. In non-diffuse sound fields, when the noise source was between the speaker and the listener, the optimal reverberation time was 0.6 s with

  11. Acoustic modeling of the speech organ

    NASA Astrophysics Data System (ADS)

    Kacprowski, J.

    The state of research on acoustic modeling of phonational and articulatory speech producing elements is reviewed. Consistent with the physical interpretation of the speech production process, the acoustic theory of speech production is expressed as the product of three factors: laryngeal involvement, sound transmission, and emanations from the mouth and/or nose. Each of these factors is presented in the form of a simplified mathematical description which provides the theoretical basis for the formation of physical models of the appropriate functional members of this complex bicybernetic system. Vocal tract wall impedance, vocal tract synthesizers, laryngeal dysfunction, vowel nasalization, resonance circuits, and sound wave propagation are discussed.

  12. Acoustic characteristics of listener-constrained speech

    NASA Astrophysics Data System (ADS)

    Ashby, Simone; Cummins, Fred

    2003-04-01

    Relatively little is known about the acoustical modifications speakers employ to meet the various constraints-auditory, linguistic and otherwise-of their listeners. Similarly, the manner by which perceived listener constraints interact with speakers' adoption of specialized speech registers is poorly Hypo (H&H) theory offers a framework for examining the relationship between speech production and output-oriented goals for communication, suggesting that under certain circumstances speakers may attempt to minimize phonetic ambiguity by employing a ``hyperarticulated'' speaking style (Lindblom, 1990). It remains unclear, however, what the acoustic correlates of hyperarticulated speech are, and how, if at all, we might expect phonetic properties to change respective to different listener-constrained conditions. This paper is part of a preliminary investigation concerned with comparing the prosodic characteristics of speech produced across a range of listener constraints. Analyses are drawn from a corpus of read hyperarticulated speech data comprising eight adult, female speakers of English. Specialized registers include speech to foreigners, infant-directed speech, speech produced under noisy conditions, and human-machine interaction. The authors gratefully acknowledge financial support of the Irish Higher Education Authority, allocated to Fred Cummins for collaborative work with Media Lab Europe.

  13. Investigation of the optimum acoustical conditions for speech using auralization

    NASA Astrophysics Data System (ADS)

    Yang, Wonyoung; Hodgson, Murray

    2001-05-01

    Speech intelligibility is mainly affected by reverberation and by signal-to-noise level difference, the difference between the speech-signal and background-noise levels at a receiver. An important question for the design of rooms for speech (e.g., classrooms) is, what are the optimal values of these factors? This question has been studied experimentally and theoretically. Experimental studies found zero optimal reverberation time, but theoretical predictions found nonzero reverberation times. These contradictory results are partly caused by the different ways of accounting for background noise. Background noise sources and their locations inside the room are the most detrimental factors in speech intelligibility. However, noise levels also interact with reverberation in rooms. In this project, two major room-acoustical factors for speech intelligibility were controlled using speech and noise sources of known relative output levels located in a virtual room with known reverberation. Speech intelligibility test signals were played in the virtual room and auralized for listeners. The Modified Rhyme Test (MRT) and babble noise were used to measure subjective speech intelligibility quality. Optimal reverberation times, and the optimal values of other speech intelligibility metrics, for normal-hearing people and for hard-of-hearing people, were identified and compared.

  14. Speech Intelligibility Advantages using an Acoustic Beamformer Display

    NASA Technical Reports Server (NTRS)

    Begault, Durand R.; Sunder, Kaushik; Godfroy, Martine; Otto, Peter

    2015-01-01

    A speech intelligibility test conforming to the Modified Rhyme Test of ANSI S3.2 "Method for Measuring the Intelligibility of Speech Over Communication Systems" was conducted using a prototype 12-channel acoustic beamformer system. The target speech material (signal) was identified against speech babble (noise), with calculated signal-noise ratios of 0, 5 and 10 dB. The signal was delivered at a fixed beam orientation of 135 deg (re 90 deg as the frontal direction of the array) and the noise at 135 deg (co-located) and 0 deg (separated). A significant improvement in intelligibility from 57% to 73% was found for spatial separation for the same signal-noise ratio (0 dB). Significant effects for improved intelligibility due to spatial separation were also found for higher signal-noise ratios (5 and 10 dB).

  15. Acoustic Evidence for Phonologically Mismatched Speech Errors

    ERIC Educational Resources Information Center

    Gormley, Andrea

    2015-01-01

    Speech errors are generally said to accommodate to their new phonological context. This accommodation has been validated by several transcription studies. The transcription methodology is not the best choice for detecting errors at this level, however, as this type of error can be difficult to perceive. This paper presents an acoustic analysis of…

  16. Speech recognition: Acoustic, phonetic and lexical

    NASA Astrophysics Data System (ADS)

    Zue, V. W.

    1985-10-01

    Our long-term research goal is the development and implementation of speaker-independent continuous speech recognition systems. It is our conviction that proper utilization of speech-specific knowledge is essential for advanced speech recognition systems. With this in mind, we have continued to make progress on the acquisition of acoustic-phonetic and lexical knowledge. We have completed the development of a continuous digit recognition system. The system was constructed to investigate the utilization of acoustic phonetic knowledge in a speech recognition system. Some of the significant development of this study includes a soft-failure procedure for lexical access, and the discovery of a set of acoustic-phonetic features for verification. We have completed a study of the constraints provided by lexical stress on word recognition. We found that lexical stress information alone can, on the average, reduce the number of word candidates from a large dictionary by more than 80%. In conjunction with this study, we successfully developed a system that automatically determines the stress pattern of a word from the acoustic signal.

  17. Speech recognition: Acoustic, phonetic and lexical knowledge

    NASA Astrophysics Data System (ADS)

    Zue, V. W.

    1985-08-01

    During this reporting period we continued to make progress on the acquisition of acoustic-phonetic and lexical knowledge. We completed development of a continuous digit recognition system. The system was constructed to investigate the use of acoustic-phonetic knowledge in a speech recognition system. The significant achievements of this study include the development of a soft-failure procedure for lexical access and the discovery of a set of acoustic-phonetic features for verification. We completed a study of the constraints that lexical stress imposes on word recognition. We found that lexical stress information alone can, on the average, reduce the number of word candidates from a large dictionary by more than 80 percent. In conjunction with this study, we successfully developed a system that automatically determines the stress pattern of a word from the acoustic signal. We performed an acoustic study on the characteristics of nasal consonants and nasalized vowels. We have also developed recognition algorithms for nasal murmurs and nasalized vowels in continuous speech. We finished the preliminary development of a system that aligns a speech waveform with the corresponding phonetic transcription.

  18. Perceptual centres in speech - an acoustic analysis

    NASA Astrophysics Data System (ADS)

    Scott, Sophie Kerttu

    Perceptual centres, or P-centres, represent the perceptual moments of occurrence of acoustic signals - the 'beat' of a sound. P-centres underlie the perception and production of rhythm in perceptually regular speech sequences. P-centres have been modelled both in speech and non speech (music) domains. The three aims of this thesis were toatest out current P-centre models to determine which best accounted for the experimental data bto identify a candidate parameter to map P-centres onto (a local approach) as opposed to the previous global models which rely upon the whole signal to determine the P-centre the final aim was to develop a model of P-centre location which could be applied to speech and non speech signals. The first aim was investigated by a series of experiments in which a) speech from different speakers was investigated to determine whether different models could account for variation between speakers b) whether rendering the amplitude time plot of a speech signal affects the P-centre of the signal c) whether increasing the amplitude at the offset of a speech signal alters P-centres in the production and perception of speech. The second aim was carried out by a) manipulating the rise time of different speech signals to determine whether the P-centre was affected, and whether the type of speech sound ramped affected the P-centre shift b) manipulating the rise time and decay time of a synthetic vowel to determine whether the onset alteration was had more affect on P-centre than the offset manipulation c) and whether the duration of a vowel affected the P-centre, if other attributes (amplitude, spectral contents) were held constant. The third aim - modelling P-centres - was based on these results. The Frequency dependent Amplitude Increase Model of P-centre location (FAIM) was developed using a modelling protocol, the APU GammaTone Filterbank and the speech from different speakers. The P-centres of the stimuli corpus were highly predicted by attributes of

  19. Acoustic characterization of developmental speech disorders

    NASA Astrophysics Data System (ADS)

    Bunnell, H. Timothy; Polikoff, James; McNicholas, Jane; Walter, Rhonda; Winn, Matthew

    2001-05-01

    A novel approach to classifying children with developmental speech delays (DSD) involving /r/ was developed. The approach first derives an acoustic classification of /r/ tokens based on their forced Viterbi alignment to a five-state hidden Markov model (HMM) of normally articulated /r/. Children with DSD are then classified in terms of the proportion of their /r/ productions that fall into each broad acoustic class. This approach was evaluated using 953 examples of /r/ as produced by 18 DSD children and an approximately equal number of /r/ tokens produced by a much larger number of normally articulating children. The acoustic classification identified three broad categories of /r/ that differed substantially in how they aligned to the normal speech /r/ HMM. Additionally, these categories tended to partition tokens uttered by DSD children from those uttered by normally articulating children. Similarities among the DSD children and average normal child measured in terms of the proportion of their /r/ productions that fell into each of the three broad acoustic categories were used to perform a hierarchical clustering. This clustering revealed groupings of DSD children who tended to approach /r/ production in one of several acoustically distinct manners.

  20. Acoustic analysis of speech under stress.

    PubMed

    Sondhi, Savita; Khan, Munna; Vijay, Ritu; Salhan, Ashok K; Chouhan, Satish

    2015-01-01

    When a person is emotionally charged, stress could be discerned in his voice. This paper presents a simplified and a non-invasive approach to detect psycho-physiological stress by monitoring the acoustic modifications during a stressful conversation. Voice database consists of audio clips from eight different popular FM broadcasts wherein the host of the show vexes the subjects who are otherwise unaware of the charade. The audio clips are obtained from real-life stressful conversations (no simulated emotions). Analysis is done using PRAAT software to evaluate mean fundamental frequency (F0) and formant frequencies (F1, F2, F3, F4) both in neutral and stressed state. Results suggest that F0 increases with stress; however, formant frequency decreases with stress. Comparison of Fourier and chirp spectra of short vowel segment shows that for relaxed speech, the two spectra are similar; however, for stressed speech, they differ in the high frequency range due to increased pitch modulation. PMID:26558301

  1. Acoustic assessment of erygmophonic speech of Moroccan laryngectomized patients

    PubMed Central

    Ouattassi, Naouar; Benmansour, Najib; Ridal, Mohammed; Zaki, Zouheir; Bendahhou, Karima; Nejjari, Chakib; Cherkaoui, Abdeljabbar; El Alami, Mohammed Nouredine El Amine

    2015-01-01

    Introduction Acoustic evaluation of alaryngeal voices is among the most prominent issues in speech analysis field. In fact, many methods have been developed to date to substitute the classic perceptual evaluation. The Aim of this study is to present our experience in erygmophonic speech objective assessment and to discuss the most widely used methods of acoustic speech appraisal. through a prospective case-control study we have measured acoustic parameters of speech quality during one year of erygmophonic rehabilitation therapy of Moroccan laryngectomized patients. Methods We have assessed acoustic parameters of erygmophonic speech samples of eleven laryngectomized patients through the speech rehabilitation therapy. Acoustic parameters were obtained by perturbation analysis method and linear predictive coding algorithms also through the broadband spectrogram. Results Using perturbation analysis methods, we have found erygmophonic voice to be significantly poorer than normal speech and it exhibits higher formant frequency values. However, erygmophonic voice shows also higher and extremely variable Error values that were greater than the acceptable level. And thus, live a doubt on the reliability of those analytic methods results. Conclusion Acoustic parameters for objective evaluation of alaryngeal voices should allow a reliable representation of the perceptual evaluation of the quality of speech. This requirement has not been fulfilled by the common methods used so far. Therefore, acoustical assessment of erygmophonic speech needs more investigations. PMID:26587121

  2. Acoustic Study of Acted Emotions in Speech

    NASA Astrophysics Data System (ADS)

    Wang, Rong

    An extensive set of carefully recorded utterances provided a speech database for investigating acoustic correlates among eight emotional states. Four professional actors and four professional actresses simulated the emotional states of joy, conversation, nervousness, anger, sadness, hate, fear, and depression. The values of 14 acoustic parameters were extracted from analyses of the simulated portrayals. Normalization of the parameters was made to reduce the talker-dependence. Correlates of emotion were investigated by means of principal components analysis. Sadness and depression were found to be "ambiguous" with respect to each other, but "unique" with respect to joy and anger in the principal components space. Joy, conversation, nervousness, anger, hate, and fear did not separate well in the space and so exhibited ambiguity with respect to one another. The different talkers expressed joy, anger, sadness, and depression more consistently than the other four emotions. The analysis results were compared with the results of a subjective study using the same speech database and considerable consistency between the two was found.

  3. Acoustic differences among casual, conversational, and read speech

    NASA Astrophysics Data System (ADS)

    Pinnow, DeAnna

    Speech is a complex behavior that allows speakers to use many variations to satisfy the demands connected with multiple speaking environments. Speech research typically obtains speech samples in a controlled laboratory setting using read material, yet anecdotal observations of such speech, particularly from talkers with a speech and language impairment, have identified a "performance" effect in the produced speech which masks the characteristics of impaired speech outside of the lab (Goberman, Recker, & Parveen, 2010). The aim of the current study was to investigate acoustic differences among laboratory read, laboratory conversational, and casual speech through well-defined speech tasks in the laboratory and in talkers' natural environments. Eleven healthy research participants performed lab recording tasks (19 read sentences and a dialogue about their life) and collected natural-environment recordings of themselves over 3-day periods using portable recorders. Segments were analyzed for articulatory, voice, and prosodic acoustic characteristics using computer software and hand counting. The current study results indicate that lab-read speech was significantly different from casual speech: greater articulation range, improved voice quality measures, lower speech rate, and lower mean pitch. One implication of the results is that different laboratory techniques may be beneficial in obtaining speech samples that are more like casual speech, thus making it easier to correctly analyze abnormal speech characteristics with fewer errors.

  4. Acoustics in Halls for Speech and Music

    NASA Astrophysics Data System (ADS)

    Gade, Anders C.

    This chapter deals specifically with concepts, tools, and architectural variables of importance when designing auditoria for speech and music. The focus will be on cultivating the useful components of the sound in the room rather than on avoiding noise from outside or from installations, which is dealt with in Chap. 11. The chapter starts by presenting the subjective aspects of the room acoustic experience according to consensus at the time of writing. Then follows a description of their objective counterparts, the objective room acoustic parameters, among which the classical reverberation time measure is only one of many, but still of fundamental value. After explanations on how these parameters can be measured and predicted during the design phase, the remainder of the chapter deals with how the acoustic properties can be controlled by the architectural design of auditoria. This is done by presenting the influence of individual design elements as well as brief descriptions of halls designed for specific purposes, such as drama, opera, and symphonic concerts. Finally, some important aspects of loudspeaker installations in auditoria are briefly touched upon.

  5. Age-Related Changes in Acoustic Characteristics of Adult Speech

    ERIC Educational Resources Information Center

    Torre, Peter, III; Barlow, Jessica A.

    2009-01-01

    This paper addresses effects of age and sex on certain acoustic properties of speech, given conflicting findings on such effects reported in prior research. The speech of 27 younger adults (15 women, 12 men; mean age 25.5 years) and 59 older adults (32 women, 27 men; mean age 75.2 years) was evaluated for identification of differences for sex and…

  6. Acoustic Markers of Prosodic Boundaries in Spanish Spontaneous Alaryngeal Speech

    ERIC Educational Resources Information Center

    Cuenca, M. H.; Barrio, M. M.

    2010-01-01

    Prosodic information aids segmentation of the continuous speech signal and thereby facilitates auditory speech processing. Durational and pitch variations are prosodic cues especially necessary to convey prosodic boundaries, but alaryngeal speakers have inconsistent control over acoustic parameters such as F0 and duration, being as a result noisy…

  7. Study of acoustic correlates associate with emotional speech

    NASA Astrophysics Data System (ADS)

    Yildirim, Serdar; Lee, Sungbok; Lee, Chul Min; Bulut, Murtaza; Busso, Carlos; Kazemzadeh, Ebrahim; Narayanan, Shrikanth

    2004-10-01

    This study investigates the acoustic characteristics of four different emotions expressed in speech. The aim is to obtain detailed acoustic knowledge on how a speech signal is modulated by changes from neutral to a certain emotional state. Such knowledge is necessary for automatic emotion recognition and classification and emotional speech synthesis. Speech data obtained from two semi-professional actresses are analyzed and compared. Each subject produces 211 sentences with four different emotions; neutral, sad, angry, happy. We analyze changes in temporal and acoustic parameters such as magnitude and variability of segmental duration, fundamental frequency and the first three formant frequencies as a function of emotion. Acoustic differences among the emotions are also explored with mutual information computation, multidimensional scaling and acoustic likelihood comparison with normal speech. Results indicate that speech associated with anger and happiness is characterized by longer duration, shorter interword silence, higher pitch and rms energy with wider ranges. Sadness is distinguished from other emotions by lower rms energy and longer interword silence. Interestingly, the difference in formant pattern between [happiness/anger] and [neutral/sadness] are better reflected in back vowels such as /a/(/father/) than in front vowels. Detailed results on intra- and interspeaker variability will be reported.

  8. Acoustic assessment of speech privacy curtains in two nursing units.

    PubMed

    Pope, Diana S; Miller-Klein, Erik T

    2016-01-01

    Hospitals have complex soundscapes that create challenges to patient care. Extraneous noise and high reverberation rates impair speech intelligibility, which leads to raised voices. In an unintended spiral, the increasing noise may result in diminished speech privacy, as people speak loudly to be heard over the din. The products available to improve hospital soundscapes include construction materials that absorb sound (acoustic ceiling tiles, carpet, wall insulation) and reduce reverberation rates. Enhanced privacy curtains are now available and offer potential for a relatively simple way to improve speech privacy and speech intelligibility by absorbing sound at the hospital patient's bedside. Acoustic assessments were performed over 2 days on two nursing units with a similar design in the same hospital. One unit was built with the 1970s' standard hospital construction and the other was newly refurbished (2013) with sound-absorbing features. In addition, we determined the effect of an enhanced privacy curtain versus standard privacy curtains using acoustic measures of speech privacy and speech intelligibility indexes. Privacy curtains provided auditory protection for the patients. In general, that protection was increased by the use of enhanced privacy curtains. On an average, the enhanced curtain improved sound absorption from 20% to 30%; however, there was considerable variability, depending on the configuration of the rooms tested. Enhanced privacy curtains provide measureable improvement to the acoustics of patient rooms but cannot overcome larger acoustic design issues. To shorten reverberation time, additional absorption, and compact and more fragmented nursing unit floor plate shapes should be considered. PMID:26780959

  9. Acoustic assessment of speech privacy curtains in two nursing units

    PubMed Central

    Pope, Diana S.; Miller-Klein, Erik T.

    2016-01-01

    Hospitals have complex soundscapes that create challenges to patient care. Extraneous noise and high reverberation rates impair speech intelligibility, which leads to raised voices. In an unintended spiral, the increasing noise may result in diminished speech privacy, as people speak loudly to be heard over the din. The products available to improve hospital soundscapes include construction materials that absorb sound (acoustic ceiling tiles, carpet, wall insulation) and reduce reverberation rates. Enhanced privacy curtains are now available and offer potential for a relatively simple way to improve speech privacy and speech intelligibility by absorbing sound at the hospital patient's bedside. Acoustic assessments were performed over 2 days on two nursing units with a similar design in the same hospital. One unit was built with the 1970s’ standard hospital construction and the other was newly refurbished (2013) with sound-absorbing features. In addition, we determined the effect of an enhanced privacy curtain versus standard privacy curtains using acoustic measures of speech privacy and speech intelligibility indexes. Privacy curtains provided auditory protection for the patients. In general, that protection was increased by the use of enhanced privacy curtains. On an average, the enhanced curtain improved sound absorption from 20% to 30%; however, there was considerable variability, depending on the configuration of the rooms tested. Enhanced privacy curtains provide measureable improvement to the acoustics of patient rooms but cannot overcome larger acoustic design issues. To shorten reverberation time, additional absorption, and compact and more fragmented nursing unit floor plate shapes should be considered. PMID:26780959

  10. Speech recognition: Acoustic phonetic and lexical knowledge representation

    NASA Astrophysics Data System (ADS)

    Zue, V. W.

    1983-02-01

    The purpose of this program is to develop a speech data base facility under which the acoustic characteristics of speech sounds in various contexts can be studied conveniently; investigate the phonological properties of a large lexicon of, say 10,000 words, and determine to what extent the phontactic constraints can be utilized in speech recognition; study the acoustic cues that are used to mark work boundaries; develop a test bed in the form of a large-vocabulary, IWR system to study the interactions of acoustic, phonetic and lexical knowledge; and develop a limited continuous speech recognition system with the goal of recognizing any English word from its spelling in order to assess the interactions of higher-level knowledge sources.

  11. Speech recognition: Acoustic phonetic and lexical knowledge representation

    NASA Astrophysics Data System (ADS)

    Zue, V. W.

    1984-02-01

    The purpose of this program is to develop a speech data base facility under which the acoustic characteristics of speech sounds in various contexts can be studied conveniently; investigate the phonological properties of a large lexicon of, say 10,000 words and determine to what extent the phonotactic constraints can be utilized in speech recognition; study the acoustic cues that are used to mark work boundaries; develop a test bed in the form of a large-vocabulary, IWR system to study the interactions of acoustic, phonetic and lexical knowledge; and develop a limited continuous speech recognition system with the goal of recognizing any English word from its spelling in order to assess the interactions of higher-level knowledge sources.

  12. Preserved Acoustic Hearing in Cochlear Implantation Improves Speech Perception

    PubMed Central

    Sheffield, Sterling W.; Jahn, Kelly; Gifford, René H.

    2015-01-01

    Background With improved surgical techniques and electrode design, an increasing number of cochlear implant (CI) recipients have preserved acoustic hearing in the implanted ear, thereby resulting in bilateral acoustic hearing. There are currently no guidelines, however, for clinicians with respect to audio-metric criteria and the recommendation of amplification in the implanted ear. The acoustic bandwidth necessary to obtain speech perception benefit from acoustic hearing in the implanted ear is unknown. Additionally, it is important to determine if, and in which listening environments, acoustic hearing in both ears provides more benefit than hearing in just one ear, even with limited residual hearing. Purpose The purposes of this study were to (1) determine whether acoustic hearing in an ear with a CI provides as much speech perception benefit as an equivalent bandwidth of acoustic hearing in the non-implanted ear, and (2) determine whether acoustic hearing in both ears provides more benefit than hearing in just one ear. Research Design A repeated-measures, within-participant design was used to compare performance across listening conditions. Study Sample Seven adults with CIs and bilateral residual acoustic hearing (hearing preservation) were recruited for the study. Data Collection and Analysis Consonant-nucleus-consonant word recognition was tested in four conditions: CI alone, CI + acoustic hearing in the nonimplanted ear, CI + acoustic hearing in the implanted ear, and CI + bilateral acoustic hearing. A series of low-pass filters were used to examine the effects of acoustic bandwidth through an insert earphone with amplification. Benefit was defined as the difference among conditions. The benefit of bilateral acoustic hearing was tested in both diffuse and single-source background noise. Results were analyzed using repeated-measures analysis of variance. Results Similar benefit was obtained for equivalent acoustic frequency bandwidth in either ear. Acoustic

  13. Evaluating a topographical mapping from speech acoustics to tongue positions

    SciTech Connect

    Hogden, J.; Heard, M.

    1995-05-01

    The {ital continuity} {ital mapping} algorithm---a procedure for learning to recover the relative positions of the articulators from speech signals---is evaluated using human speech data. The advantage of continuity mapping is that it is an unsupervised algorithm; that is, it can potentially be trained to make a mapping from speech acoustics to speech articulation without articulator measurements. The procedure starts by vector quantizing short windows of a speech signal so that each window is represented (encoded) by a single number. Next, multidimensional scaling is used to map quantization codes that were temporally close in the encoded speech to nearby points in a {ital continuity} {ital map}. Since speech sounds produced sufficiently close together in time must have been produced by similar articulator configurations, and speech sounds produced close together in time are close to each other in the continuity map, sounds produced by similar articulator positions should be mapped to similar positions in the continuity map. The data set used for evaluating the continuity mapping algorithm is comprised of simultaneously collected articulator and acoustic measurements made using an electromagnetic midsagittal articulometer on a human subject. Comparisons between measured articulator positions and those recovered using continuity mapping will be presented.

  14. Acoustic Analysis of Speech of Cochlear Implantees and Its Implications

    PubMed Central

    Patadia, Rajesh; Govale, Prajakta; Rangasayee, R.; Kirtane, Milind

    2012-01-01

    Objectives Cochlear implantees have improved speech production skills compared with those using hearing aids, as reflected in their acoustic measures. When compared to normal hearing controls, implanted children had fronted vowel space and their /s/ and /∫/ noise frequencies overlapped. Acoustic analysis of speech provides an objective index of perceived differences in speech production which can be precursory in planning therapy. The objective of this study was to compare acoustic characteristics of speech in cochlear implantees with those of normal hearing age matched peers to understand implications. Methods Group 1 consisted of 15 children with prelingual bilateral severe-profound hearing loss (age, 5-11 years; implanted between 4-10 years). Prior to an implant behind the ear, hearing aids were used; prior & post implantation subjects received at least 1 year of aural intervention. Group 2 consisted of 15 normal hearing age matched peers. Sustained productions of vowels and words with selected consonants were recorded. Using Praat software for acoustic analysis, digitized speech tokens were measured for F1, F2, and F3 of vowels; centre frequency (Hz) and energy concentration (dB) in burst; voice onset time (VOT in ms) for stops; centre frequency (Hz) of noise in /s/; rise time (ms) for affricates. A t-test was used to find significant differences between groups. Results Significant differences were found in VOT for /b/, F1 and F2 of /e/, and F3 of /u/. No significant differences were found for centre frequency of burst, energy concentration for stops, centre frequency of noise in /s/, or rise time for affricates. These findings suggest that auditory feedback provided by cochlear implants enable subjects to monitor production of speech sounds. Conclusion Acoustic analysis of speech is an essential method for discerning characteristics which have or have not been improved by cochlear implantation and thus for planning intervention. PMID:22701768

  15. Methods and apparatus for non-acoustic speech characterization and recognition

    DOEpatents

    Holzrichter, John F.

    1999-01-01

    By simultaneously recording EM wave reflections and acoustic speech information, the positions and velocities of the speech organs as speech is articulated can be defined for each acoustic speech unit. Well defined time frames and feature vectors describing the speech, to the degree required, can be formed. Such feature vectors can uniquely characterize the speech unit being articulated each time frame. The onset of speech, rejection of external noise, vocalized pitch periods, articulator conditions, accurate timing, the identification of the speaker, acoustic speech unit recognition, and organ mechanical parameters can be determined.

  16. Methods and apparatus for non-acoustic speech characterization and recognition

    SciTech Connect

    Holzrichter, J.F.

    1999-12-21

    By simultaneously recording EM wave reflections and acoustic speech information, the positions and velocities of the speech organs as speech is articulated can be defined for each acoustic speech unit. Well defined time frames and feature vectors describing the speech, to the degree required, can be formed. Such feature vectors can uniquely characterize the speech unit being articulated each time frame. The onset of speech, rejection of external noise, vocalized pitch periods, articulator conditions, accurate timing, the identification of the speaker, acoustic speech unit recognition, and organ mechanical parameters can be determined.

  17. Speech recognition: Acoustic-phonetic knowledge acquisition and representation

    NASA Astrophysics Data System (ADS)

    Zue, Victor W.

    1987-09-01

    A long-term research goal is the development and implementation of speaker-independent continuous speech recognition systems. It is believed that the proper utilization of speech-specific knowledge is essential for such advanced systems. Research is thus directed toward the acquisition of acoustic-phonetic and lexical knowledge, and the application of this knowledge to speech recognition algorithms. Investigation into the contextual variations of speech sounds has continued, emphasizing the role of the syllable in these variations. Analysis revealed that the acoustic realization of a stop depends greatly on its position within a syllable. In order to represent and utilize this information in speech recognition, a hierarchical syllable description has been adopted that enables us to specify the constraints in terms of an immediate constituent grammar. We will continue to quantify the effect of context on the acoustic realization of phonemes using larger constituent units such as syllables. In addition, a grammar will be developed to describe the relationship between phonemes and acoustic segments, and a parser that will make use of this grammar for phonetic recognition and lexical access.

  18. Acoustic Speech Analysis Of Wayang Golek Puppeteer

    NASA Astrophysics Data System (ADS)

    Hakim, Faisal Abdul; Mandasari, Miranti Indar; Sarwono, Joko

    2010-12-01

    Active disguising speech is one problem to be taken into account in forensic speaker verification or identification processes. The verification processes are usually carried out by comparison between unknown samples and known samples. Active disguising can be occurred on both samples. To simulate the condition of speech disguising, voices of Wayang Golek Puppeteer were used. It is assumed that wayang golek puppeteer is a master of disguise. He can manipulate his voice into many different types of character's voices. This paper discusses the speech characteristics of 2 puppeteers. Comparison was made between the voices of puppeteer's habitual voice with his manipulated voice.

  19. An Acoustic Measure for Word Prominence in Spontaneous Speech

    PubMed Central

    Wang, Dagen; Narayanan, Shrikanth

    2010-01-01

    An algorithm for automatic speech prominence detection is reported in this paper. We describe a comparative analysis on various acoustic features for word prominence detection and report results using a spoken dialog corpus with manually assigned prominence labels. The focus is on features such as spectral intensity and speech rate that are directly extracted from speech based on a correlation-based approach without requiring explicit linguistic or phonetic knowledge. Additionally, various pitch-based measures are studied with respect to their discriminating ability for prominence detection. A parametric scheme for modeling pitch plateau is proposed and this feature alone is found to outperform the traditional local pitch statistics. Two sets of experiments are used to explore the usefulness of the acoustic score generated using these features. The first set focuses on a more traditional way of word prominence detection based on a manually-tagged corpus. A 76.8% classification accuracy was achieved on a corpus of role-playing spoken dialogs. Due to difficulties in manually tagging speech prominence into discrete levels (categories), the second set of experiments focuses on evaluating the score indirectly. Specifically, through experiments on the Switchboard corpus, it is shown that the proposed acoustic score can discriminate between content word and function words in a statistically significant way. The relation between speech prominence and content/function words is also explored. Since prominent words tend to be predominantly content words, and since content words can be automatically marked from text-derived part of speech (POS) information, it is shown that the proposed acoustic score can be indirectly cross-validated through POS information. PMID:20454538

  20. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2002-01-01

    Low power EM waves are used to detect motions of vocal tract tissues of the human speech system before, during, and after voiced speech. A voiced excitation function is derived. The excitation function provides speech production information to enhance speech characterization and to enable noise removal from human speech.

  1. Unvoiced Speech Recognition Using Tissue-Conductive Acoustic Sensor

    NASA Astrophysics Data System (ADS)

    Heracleous, Panikos; Kaino, Tomomi; Saruwatari, Hiroshi; Shikano, Kiyohiro

    2006-12-01

    We present the use of stethoscope and silicon NAM (nonaudible murmur) microphones in automatic speech recognition. NAM microphones are special acoustic sensors, which are attached behind the talker's ear and can capture not only normal (audible) speech, but also very quietly uttered speech (nonaudible murmur). As a result, NAM microphones can be applied in automatic speech recognition systems when privacy is desired in human-machine communication. Moreover, NAM microphones show robustness against noise and they might be used in special systems (speech recognition, speech transform, etc.) for sound-impaired people. Using adaptation techniques and a small amount of training data, we achieved for a 20 k dictation task a[InlineEquation not available: see fulltext.] word accuracy for nonaudible murmur recognition in a clean environment. In this paper, we also investigate nonaudible murmur recognition in noisy environments and the effect of the Lombard reflex on nonaudible murmur recognition. We also propose three methods to integrate audible speech and nonaudible murmur recognition using a stethoscope NAM microphone with very promising results.

  2. Speech recognition: Acoustic-phonetic knowledge acquisition and representation

    NASA Astrophysics Data System (ADS)

    Zue, Victor W.

    1988-09-01

    The long-term research goal is to develop and implement speaker-independent continuous speech recognition systems. It is believed that the proper utilization of speech-specific knowledge is essential for such advanced systems. This research is thus directed toward the acquisition, quantification, and representation, of acoustic-phonetic and lexical knowledge, and the application of this knowledge to speech recognition algorithms. In addition, we are exploring new speech recognition alternatives based on artificial intelligence and connectionist techniques. We developed a statistical model for predicting the acoustic realization of stop consonants in various positions in the syllable template. A unification-based grammatical formalism was developed for incorporating this model into the lexical access algorithm. We provided an information-theoretic justification for the hierarchical structure of the syllable template. We analyzed segmented duration for vowels and fricatives in continuous speech. Based on contextual information, we developed durational models for vowels and fricatives that account for over 70 percent of the variance, using data from multiple, unknown speakers. We rigorously evaluated the ability of human spectrogram readers to identify stop consonants spoken by many talkers and in a variety of phonetic contexts. Incorporating the declarative knowledge used by the readers, we developed a knowledge-based system for stop identification. We achieved comparable system performance to that to the readers.

  3. Acoustic Characteristics of Ataxic Speech in Japanese Patients with Spinocerebellar Degeneration (SCD)

    ERIC Educational Resources Information Center

    Ikui, Yukiko; Tsukuda, Mamoru; Kuroiwa, Yoshiyuki; Koyano, Shigeru; Hirose, Hajime; Taguchi, Takahide

    2012-01-01

    Background: In English- and German-speaking countries, ataxic speech is often described as showing scanning based on acoustic impressions. Although the term "scanning" is generally considered to represent abnormal speech features including prosodic excess or insufficiency, any precise acoustic analysis of ataxic speech has not been performed in…

  4. Adaptation to Room Acoustics Using the Modified Rhyme Test

    PubMed Central

    Brandewie, Eugene; Zahorik, Pavel

    2012-01-01

    The negative effect of reverberant sound energy on speech intelligibility is well documented. Recently, however, prior exposure to room acoustics has been shown to increase intelligibility for a number of listeners in simulated room environments. This room adaptation effect, a possible extension of dynamic echo suppression, has been shown to be specific to reverberant rooms and requires binaural input. Because this effect has been demonstrated only using the Coordinated Response Measure (CRM) corpus it is important to determine whether the increase in intelligibility scores reported previously was due to the specific nature of the CRM task. Here we demonstrate a comparable room-acoustic effect using the Modified Rhyme Test (MRT) corpus in multiple room environments. The results are consistent with the idea that the room adaptation effect may be a natural phenomenon of listening in reverberant environments. PMID:23437415

  5. Predicting the intelligibility of deaf children's speech from acoustic measures

    NASA Astrophysics Data System (ADS)

    Uchanski, Rosalie M.; Geers, Ann E.; Brenner, Christine M.; Tobey, Emily A.

    2001-05-01

    A weighted combination of speech-acoustic measures may provide an objective assessment of speech intelligibility in deaf children that could be used to evaluate the benefits of sensory aids and rehabilitation programs. This investigation compared the accuracy of two different approaches, multiple linear regression and a simple neural net. These two methods were applied to identical sets of acoustic measures, including both segmental (e.g., voice-onset times of plosives, spectral moments of fricatives, second formant frequencies of vowels) and suprasegmental measures (e.g., sentence duration, number and frequency of intersentence pauses). These independent variables were obtained from digitized recordings of deaf children's imitations of 11 simple sentences. The dependent measure was the percentage of spoken words from the 36 McGarr Sentences understood by groups of naive listeners. The two predictive methods were trained on speech measures obtained from 123 out of 164 8- and 9-year-old deaf children who used cochlear implants. Then, predictions were obtained using speech measures from the remaining 41 children. Preliminary results indicate that multiple linear regression is a better predictor of intelligibility than the neural net, accounting for 79% as opposed to 65% of the variance in the data. [Work supported by NIH.

  6. Tongue-Palate Contact Pressure, Oral Air Pressure, and Acoustics of Clear Speech

    ERIC Educational Resources Information Center

    Searl, Jeff; Evitts, Paul M.

    2013-01-01

    Purpose: The authors compared articulatory contact pressure (ACP), oral air pressure (Po), and speech acoustics for conversational versus clear speech. They also assessed the relationship of these measures to listener perception. Method: Twelve adults with normal speech produced monosyllables in a phrase using conversational and clear speech.…

  7. Adding articulatory features to acoustic features for automatic speech recognition

    SciTech Connect

    Zlokarnik, I.

    1995-05-01

    A hidden-Markov-model (HMM) based speech recognition system was evaluated that makes use of simultaneously recorded acoustic and articulatory data. The articulatory measurements were gathered by means of electromagnetic articulography and describe the movement of small coils fixed to the speakers` tongue and jaw during the production of German V{sub 1}CV{sub 2} sequences [P. Hoole and S. Gfoerer, J. Acoust. Soc. Am. Suppl. 1 {bold 87}, S123 (1990)]. Using the coordinates of the coil positions as an articulatory representation, acoustic and articulatory features were combined to make up an acoustic--articulatory feature vector. The discriminant power of this combined representation was evaluated for two subjects on a speaker-dependent isolated word recognition task. When the articulatory measurements were used both for training and testing the HMMs, the articulatory representation was capable of reducing the error rate of comparable acoustic-based HMMs by a relative percentage of more than 60%. In a separate experiment, the articulatory movements during the testing phase were estimated using a multilayer perceptron that performed an acoustic-to-articulatory mapping. Under these more realistic conditions, when articulatory measurements are only available during the training, the error rate could be reduced by a relative percentage of 18% to 25%.

  8. Acoustic and Perceptual Consequences of Clear and Loud Speech

    PubMed Central

    Tjaden, Kris; Richards, Emily; Kuo, Christina; Wilding, Greg; Sussman, Joan

    2014-01-01

    Objective Several issues concerning F2 slope in dysarthria were addressed by obtaining speech acoustic measures and judgments of intelligibility for sentences produced in Habitual, Clear and Loud conditions by speakers with Parkinson's disease (PD) and healthy controls. Patients and Methods Acoustic measures of average and maximum F2 slope for diphthongs, duration and intensity were obtained. Listeners judged intelligibility using a visual analog scale. Differences in measures among groups and conditions as well as relationships among measures were examined. Results Average and maximum F2 slope metrics were strongly correlated, but only average F2 slope consistently differed among groups and conditions, with shallower slopes for the PD group and steeper slopes for Clear speech versus Habitual and Loud. Clear and Loud speech were also characterized by lengthened durations, increased intensity and improved intelligibility versus Habitual. F2 slope and intensity were unrelated, and F2 slope was a significant predictor of intelligibility. Conclusion Average diphthong F2 slope was more sensitive than maximum F2 slope to articulatory mechanism involvement in mild dysarthria in PD. F2 slope holds promise as an objective measure of treatment-related changes in the articulatory mechanism for therapeutic techniques that focus on articulation. PMID:24504015

  9. Learning Speech Variability in Discriminative Acoustic Model Adaptation

    NASA Astrophysics Data System (ADS)

    Sato, Shoei; Oku, Takahiro; Homma, Shinichi; Kobayashi, Akio; Imai, Toru

    We present a new discriminative method of acoustic model adaptation that deals with a task-dependent speech variability. We have focused on differences of expressions or speaking styles between tasks and set the objective of this method as improving the recognition accuracy of indistinctly pronounced phrases dependent on a speaking style.The adaptation appends subword models for frequently observable variants of subwords in the task. To find the task-dependent variants, low-confidence words are statistically selected from words with higher frequency in the task's adaptation data by using their word lattices. HMM parameters of subword models dependent on the words are discriminatively trained by using linear transforms with a minimum phoneme error (MPE) criterion. For the MPE training, subword accuracy discriminating between the variants and the originals is also investigated. In speech recognition experiments, the proposed adaptation with the subword variants reduced the word error rate by 12.0% relative in a Japanese conversational broadcast task.

  10. Denoising of human speech using combined acoustic and em sensor signal processing

    SciTech Connect

    Ng, L C; Burnett, G C; Holzrichter, J F; Gable, T J

    1999-11-29

    Low Power EM radar-like sensors have made it possible to measure properties of the human speech production system in real-time, without acoustic interference. This greatly enhances the quality and quantify of information for many speech related applications. See Holzrichter, Burnett, Ng, and Lea, J. Acoustic. Soc. Am. 103 (1) 622 (1998). By using combined Glottal-EM- Sensor- and Acoustic-signals, segments of voiced, unvoiced, and no-speech can be reliably defined. Real-time Denoising filters can be constructed to remove noise from the user's corresponding speech signal.

  11. Effect of acoustic fine structure cues on the recognition of auditory-only and audiovisual speech.

    PubMed

    Meister, Hartmut; Fuersen, Katrin; Schreitmueller, Stefan; Walger, Martin

    2016-06-01

    This study addressed the hypothesis that an improvement in speech recognition due to combined envelope and fine structure cues is greater in the audiovisual than the auditory modality. Normal hearing listeners were presented with envelope vocoded speech in combination with low-pass filtered speech. The benefit of adding acoustic low-frequency fine structure to acoustic envelope cues was significantly greater for audiovisual than for auditory-only speech. It is suggested that this is due to complementary information of the different acoustic and visual cues. The results have potential implications for the assessment of bimodal cochlear implant fittings or electroacoustic stimulation. PMID:27369134

  12. Acoustic Predictors of Intelligibility for Segmentally Interrupted Speech: Temporal Envelope, Voicing, and Duration

    ERIC Educational Resources Information Center

    Fogerty, Daniel

    2013-01-01

    Purpose: Temporal interruption limits the perception of speech to isolated temporal glimpses. An analysis was conducted to determine the acoustic parameter that best predicts speech recognition from temporal fragments that preserve different types of speech information--namely, consonants and vowels. Method: Young listeners with normal hearing…

  13. Perceptual and Acoustic Reliability Estimates for the Speech Disorders Classification System (SDCS)

    ERIC Educational Resources Information Center

    Shriberg, Lawrence D.; Fourakis, Marios; Hall, Sheryl D.; Karlsson, Heather B.; Lohmeier, Heather L.; McSweeny, Jane L.; Potter, Nancy L.; Scheer-Cohen, Alison R.; Strand, Edythe A.; Tilkens, Christie M.; Wilson, David L.

    2010-01-01

    A companion paper describes three extensions to a classification system for paediatric speech sound disorders termed the Speech Disorders Classification System (SDCS). The SDCS uses perceptual and acoustic data reduction methods to obtain information on a speaker's speech, prosody, and voice. The present paper provides reliability estimates for…

  14. Emotional speech acoustic model for Malay: iterative versus isolated unit training.

    PubMed

    Mustafa, Mumtaz Begum; Ainon, Raja Noor

    2013-10-01

    The ability of speech synthesis system to synthesize emotional speech enhances the user's experience when using this kind of system and its related applications. However, the development of an emotional speech synthesis system is a daunting task in view of the complexity of human emotional speech. The more recent state-of-the-art speech synthesis systems, such as the one based on hidden Markov models, can synthesize emotional speech with acceptable naturalness with the use of a good emotional speech acoustic model. However, building an emotional speech acoustic model requires adequate resources including segment-phonetic labels of emotional speech, which is a problem for many under-resourced languages, including Malay. This research shows how it is possible to build an emotional speech acoustic model for Malay with minimal resources. To achieve this objective, two forms of initialization methods were considered: iterative training using the deterministic annealing expectation maximization algorithm and the isolated unit training. The seed model for the automatic segmentation is a neutral speech acoustic model, which was transformed to target emotion using two transformation techniques: model adaptation and context-dependent boundary refinement. Two forms of evaluation have been performed: an objective evaluation measuring the prosody error and a listening evaluation to measure the naturalness of the synthesized emotional speech. PMID:24116440

  15. Acoustic Analysis of the Voiced-Voiceless Distinction in Dutch Tracheoesophageal Speech

    ERIC Educational Resources Information Center

    Jongmans, Petra; Wempe, Ton G.; van Tinteren, Harm; Hilgers, Frans J. M.; Pols, Louis C. W.; van As-Brooks, Corina J.

    2010-01-01

    Purpose: Confusions between voiced and voiceless plosives and voiced and voiceless fricatives are common in Dutch tracheoesophageal (TE) speech. This study investigates (a) which acoustic measures are found to convey a correct voicing contrast in TE speech and (b) whether different measures are found in TE speech than in normal laryngeal (NL)…

  16. Speech Compensation for Time-Scale-Modified Auditory Feedback

    ERIC Educational Resources Information Center

    Ogane, Rintaro; Honda, Masaaki

    2014-01-01

    Purpose: The purpose of this study was to examine speech compensation in response to time-scale-modified auditory feedback during the transition of the semivowel for a target utterance of /ija/. Method: Each utterance session consisted of 10 control trials in the normal feedback condition followed by 20 perturbed trials in the modified auditory…

  17. Moving to the Speed of Sound: Context Modulation of the Effect of Acoustic Properties of Speech

    ERIC Educational Resources Information Center

    Shintel, Hadas; Nusbaum, Howard C.

    2008-01-01

    Suprasegmental acoustic patterns in speech can convey meaningful information and affect listeners' interpretation in various ways, including through systematic analog mapping of message-relevant information onto prosody. We examined whether the effect of analog acoustic variation is governed by the acoustic properties themselves. For example, fast…

  18. Fluid-acoustic interactions and their impact on pathological voiced speech

    NASA Astrophysics Data System (ADS)

    Erath, Byron D.; Zanartu, Matias; Peterson, Sean D.; Plesniak, Michael W.

    2011-11-01

    Voiced speech is produced by vibration of the vocal fold structures. Vocal fold dynamics arise from aerodynamic pressure loadings, tissue properties, and acoustic modulation of the driving pressures. Recent speech science advancements have produced a physiologically-realistic fluid flow solver (BLEAP) capable of prescribing asymmetric intraglottal flow attachment that can be easily assimilated into reduced order models of speech. The BLEAP flow solver is extended to incorporate acoustic loading and sound propagation in the vocal tract by implementing a wave reflection analog approach for sound propagation based on the governing BLEAP equations. This enhanced physiological description of the physics of voiced speech is implemented into a two-mass model of speech. The impact of fluid-acoustic interactions on vocal fold dynamics is elucidated for both normal and pathological speech through linear and nonlinear analysis techniques. Supported by NSF Grant CBET-1036280.

  19. Acoustical Characteristics of Mastication Sounds: Application of Speech Analysis Techniques

    NASA Astrophysics Data System (ADS)

    Brochetti, Denise

    Food scientists have used acoustical methods to study characteristics of mastication sounds in relation to food texture. However, a model for analysis of the sounds has not been identified, and reliability of the methods has not been reported. Therefore, speech analysis techniques were applied to mastication sounds, and variation in measures of the sounds was examined. To meet these objectives, two experiments were conducted. In the first experiment, a digital sound spectrograph generated waveforms and wideband spectrograms of sounds by 3 adult subjects (1 male, 2 females) for initial chews of food samples differing in hardness and fracturability. Acoustical characteristics were described and compared. For all sounds, formants appeared in the spectrograms, and energy occurred across a 0 to 8000-Hz range of frequencies. Bursts characterized waveforms for peanut, almond, raw carrot, ginger snap, and hard candy. Duration and amplitude of the sounds varied with the subjects. In the second experiment, the spectrograph was used to measure the duration, amplitude, and formants of sounds for the initial 2 chews of cylindrical food samples (raw carrot, teething toast) differing in diameter (1.27, 1.90, 2.54 cm). Six adult subjects (3 males, 3 females) having normal occlusions and temporomandibular joints chewed the samples between the molar teeth and with the mouth open. Ten repetitions per subject were examined for each food sample. Analysis of estimates of variation indicated an inconsistent intrasubject variation in the acoustical measures. Food type and sample diameter also affected the estimates, indicating the variable nature of mastication. Generally, intrasubject variation was greater than intersubject variation. Analysis of ranks of the data indicated that the effect of sample diameter on the acoustical measures was inconsistent and depended on the subject and type of food. If inferences are to be made concerning food texture from acoustical measures of mastication

  20. How stable are acoustic metrics of contrastive speech rhythm?

    PubMed

    Wiget, Lukas; White, Laurence; Schuppler, Barbara; Grenon, Izabelle; Rauch, Olesya; Mattys, Sven L

    2010-03-01

    Acoustic metrics of contrastive speech rhythm, based on vocalic and intervocalic interval durations, are intended to capture stable typological differences between languages. They should consequently be robust to variation between speakers, sentence materials, and measurers. This paper assesses the impact of these sources of variation on the metrics %V (proportion of utterance comprised of vocalic intervals), VarcoV (rate-normalized standard deviation of vocalic interval duration), and nPVI-V (a measure of the durational variability between successive pairs of vocalic intervals). Five measurers analyzed the same corpus of speech: five sentences read by six speakers of Standard Southern British English. Differences between sentences were responsible for the greatest variation in rhythm scores. Inter-speaker differences were also a source of significant variability. However, there was relatively little variation due to segmentation differences between measurers following an agreed protocol. An automated phone alignment process was also used: Rhythm scores thus derived showed good agreement with the human measurers. A number of recommendations for researchers wishing to exploit contrastive rhythm metrics are offered in conclusion. PMID:20329856

  1. DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1

    NASA Astrophysics Data System (ADS)

    Garofolo, J. S.; Lamel, L. F.; Fisher, W. M.; Fiscus, J. G.; Pallett, D. S.

    1993-02-01

    The Texas Instruments/Massachusetts Institute of Technology (TIMIT) corpus of read speech has been designed to provide speech data for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of automatic speech recognition systems. TIMIT contains speech from 630 speakers representing 8 major dialect divisions of American English, each speaking 10 phonetically-rich sentences. The TIMIT corpus includes time-aligned orthographic, phonetic, and word transcriptions, as well as speech waveform data for each spoken sentence. The release of TIMIT contains several improvements over the Prototype CD-ROM released in December, 1988: (1) full 630-speaker corpus, (2) checked and corrected transcriptions, (3) word-alignment transcriptions, (4) NIST SPHERE-headered waveform files and header manipulation software, (5) phonemic dictionary, (6) new test and training subsets balanced for dialectal and phonetic coverage, and (7) more extensive documentation.

  2. Acoustic properties of naturally produced clear speech at normal speaking rates

    NASA Astrophysics Data System (ADS)

    Krause, Jean C.; Braida, Louis D.

    2004-01-01

    Sentences spoken ``clearly'' are significantly more intelligible than those spoken ``conversationally'' for hearing-impaired listeners in a variety of backgrounds [Picheny et al., J. Speech Hear. Res. 28, 96-103 (1985); Uchanski et al., ibid. 39, 494-509 (1996); Payton et al., J. Acoust. Soc. Am. 95, 1581-1592 (1994)]. While producing clear speech, however, talkers often reduce their speaking rate significantly [Picheny et al., J. Speech Hear. Res. 29, 434-446 (1986); Uchanski et al., ibid. 39, 494-509 (1996)]. Yet speaking slowly is not solely responsible for the intelligibility benefit of clear speech (over conversational speech), since a recent study [Krause and Braida, J. Acoust. Soc. Am. 112, 2165-2172 (2002)] showed that talkers can produce clear speech at normal rates with training. This finding suggests that clear speech has inherent acoustic properties, independent of rate, that contribute to improved intelligibility. Identifying these acoustic properties could lead to improved signal processing schemes for hearing aids. To gain insight into these acoustical properties, conversational and clear speech produced at normal speaking rates were analyzed at three levels of detail (global, phonological, and phonetic). Although results suggest that talkers may have employed different strategies to achieve clear speech at normal rates, two global-level properties were identified that appear likely to be linked to the improvements in intelligibility provided by clear/normal speech: increased energy in the 1000-3000-Hz range of long-term spectra and increased modulation depth of low frequency modulations of the intensity envelope. Other phonological and phonetic differences associated with clear/normal speech include changes in (1) frequency of stop burst releases, (2) VOT of word-initial voiceless stop consonants, and (3) short-term vowel spectra.

  3. Formant Centralization Ratio: A Proposal for a New Acoustic Measure of Dysarthric Speech

    ERIC Educational Resources Information Center

    Sapir, Shimon; Ramig, Lorraine O.; Spielman, Jennifer L.; Fox, Cynthia

    2010-01-01

    Purpose: The vowel space area (VSA) has been used as an acoustic metric of dysarthric speech, but with varying degrees of success. In this study, the authors aimed to test an alternative metric to the VSA--the "formant centralization ratio" (FCR), which is hypothesized to more effectively differentiate dysarthric from healthy speech and register…

  4. Speech intelligibility and speech quality of modified loudspeaker announcements examined in a simulated aircraft cabin.

    PubMed

    Pennig, Sibylle; Quehl, Julia; Wittkowski, Martin

    2014-01-01

    Acoustic modifications of loudspeaker announcements were investigated in a simulated aircraft cabin to improve passengers' speech intelligibility and quality of communication in this specific setting. Four experiments with 278 participants in total were conducted in an acoustic laboratory using a standardised speech test and subjective rating scales. In experiments 1 and 2 the sound pressure level (SPL) of the announcements was varied (ranging from 70 to 85 dB(A)). Experiments 3 and 4 focused on frequency modification (octave bands) of the announcements. All studies used a background noise with the same SPL (74 dB(A)), but recorded at different seat positions in the aircraft cabin (front, rear). The results quantify speech intelligibility improvements with increasing signal-to-noise ratio and amplification of particular octave bands, especially the 2 kHz and the 4 kHz band. Thus, loudspeaker power in an aircraft cabin can be reduced by using appropriate filter settings in the loudspeaker system. PMID:25183056

  5. The Use of Artificial Neural Networks to Estimate Speech Intelligibility from Acoustic Variables: A Preliminary Analysis.

    ERIC Educational Resources Information Center

    Metz, Dale Evan; And Others

    1992-01-01

    A preliminary scheme for estimating the speech intelligibility of hearing-impaired speakers from acoustic parameters, using a computerized artificial neural network to process mathematically the acoustic input variables, is outlined. Tests with 60 hearing-impaired speakers found the scheme to be highly accurate in identifying speakers separated by…

  6. A maximum likelihood approach to estimating articulator positions from speech acoustics

    SciTech Connect

    Hogden, J.

    1996-09-23

    This proposal presents an algorithm called maximum likelihood continuity mapping (MALCOM) which recovers the positions of the tongue, jaw, lips, and other speech articulators from measurements of the sound-pressure waveform of speech. MALCOM differs from other techniques for recovering articulator positions from speech in three critical respects: it does not require training on measured or modeled articulator positions, it does not rely on any particular model of sound propagation through the vocal tract, and it recovers a mapping from acoustics to articulator positions that is linearly, not topographically, related to the actual mapping from acoustics to articulation. The approach categorizes short-time windows of speech into a finite number of sound types, and assumes the probability of using any articulator position to produce a given sound type can be described by a parameterized probability density function. MALCOM then uses maximum likelihood estimation techniques to: (1) find the most likely smooth articulator path given a speech sample and a set of distribution functions (one distribution function for each sound type), and (2) change the parameters of the distribution functions to better account for the data. Using this technique improves the accuracy of articulator position estimates compared to continuity mapping -- the only other technique that learns the relationship between acoustics and articulation solely from acoustics. The technique has potential application to computer speech recognition, speech synthesis and coding, teaching the hearing impaired to speak, improving foreign language instruction, and teaching dyslexics to read. 34 refs., 7 figs.

  7. Researches of the Electrotechnical Laboratory. No. 955: Speech recognition by description of acoustic characteristic variations

    NASA Astrophysics Data System (ADS)

    Hayamizu, Satoru

    1993-09-01

    A new speech recognition technique is proposed. This technique systematically describes acoustic characteristic variations using a large scale speech database, thereby, obtaining high recognition accuracy. Rules are extracted to represent knowledge concerning acoustic characteristic variations by observing the actual speech database. A general framework based on maps of the sets of variation factors to the acoustic feature spaces is proposed. A single recognition model is not used for each element of descriptive units regardless of the states of the variation factors. Large-scaled and systematic different recognition models are used for different states. A technique to structurize the representation of acoustic characteristic variations by clustering recognition models depending on variation factors is proposed. To investigate acoustic characteristic variations for phonetic contexts efficiently, word sets for reading texts of speech database are selected so that the maximum number of three phoneme sequences are covered in small number of words as possible. A selection algorithm, in which the first criterion is to maximize the number of different three phoneme sequences in the word set and the second criterion is to maximize the entropy of the three phonemes, is proposed. Read speed data of the word sets are collected and labelled as acoustic-phonetic segments. Experiments of speaker-independent word recognition using this speech database were conducted to show the description effectiveness of the acoustic characteristic variations using networks of acoustic-phonetic segments. The experiment shows the recognition errors are reduced. Basic framework for estimating the acoustic characteristics in unknown phonetic contexts using decision trees is proposed.

  8. A magnetic resonance imaging study on the articulatory and acoustic speech parameters of Malay vowels

    PubMed Central

    2014-01-01

    The phonetic properties of six Malay vowels are investigated using magnetic resonance imaging (MRI) to visualize the vocal tract in order to obtain dynamic articulatory parameters during speech production. To resolve image blurring due to the tongue movement during the scanning process, a method based on active contour extraction is used to track tongue contours. The proposed method efficiently tracks tongue contours despite the partial blurring of MRI images. Consequently, the articulatory parameters that are effectively measured as tongue movement is observed, and the specific shape of the tongue and its position for all six uttered Malay vowels are determined. Speech rehabilitation procedure demands some kind of visual perceivable prototype of speech articulation. To investigate the validity of the measured articulatory parameters based on acoustic theory of speech production, an acoustic analysis based on the uttered vowels by subjects has been performed. As the acoustic speech and articulatory parameters of uttered speech were examined, a correlation between formant frequencies and articulatory parameters was observed. The experiments reported a positive correlation between the constriction location of the tongue body and the first formant frequency, as well as a negative correlation between the constriction location of the tongue tip and the second formant frequency. The results demonstrate that the proposed method is an effective tool for the dynamic study of speech production. PMID:25060583

  9. The acoustics for speech of eight auditoriums in the city of Sao Paulo

    NASA Astrophysics Data System (ADS)

    Bistafa, Sylvio R.

    2002-11-01

    Eight auditoriums with a proscenium type of stage, which usually operate as dramatic theaters in the city of Sao Paulo, were acoustically surveyed in terms of their adequacy to unassisted speech. Reverberation times, early decay times, and speech levels were measured in different positions, together with objective measures of speech intelligibility. The measurements revealed reverberation time values rather uniform throughout the rooms, whereas significant variations were found in the values of the other acoustical measures with position. The early decay time was found to be better correlated with the objective measures of speech intelligibility than the reverberation time. The results from the objective measurements of speech intelligibility revealed that the speech transmission index STI, and its simplified version RaSTI, are strongly correlated with the early-to-late sound ratio C50 (1 kHz). However, it was found that the criterion value of acceptability of the latter is more easily met than the former. The results from these measurements enable to understand how the characteristics of the architectural design determine the acoustical quality for speech. Measurements of ST1-Gade were made as an attempt to validate it as an objective measure of ''support'' for the actor. The preliminary diagnosing results with ray tracing simulations will also be presented.

  10. Speech and melody recognition in binaurally combined acoustic and electric hearing

    NASA Astrophysics Data System (ADS)

    Kong, Ying-Yee; Stickney, Ginger S.; Zeng, Fan-Gang

    2005-03-01

    Speech recognition in noise and music perception is especially challenging for current cochlear implant users. The present study utilizes the residual acoustic hearing in the nonimplanted ear in five cochlear implant users to elucidate the role of temporal fine structure at low frequencies in auditory perception and to test the hypothesis that combined acoustic and electric hearing produces better performance than either mode alone. The first experiment measured speech recognition in the presence of competing noise. It was found that, although the residual low-frequency (<1000 Hz) acoustic hearing produced essentially no recognition for speech recognition in noise, it significantly enhanced performance when combined with the electric hearing. The second experiment measured melody recognition in the same group of subjects and found that, contrary to the speech recognition result, the low-frequency acoustic hearing produced significantly better performance than the electric hearing. It is hypothesized that listeners with combined acoustic and electric hearing might use the correlation between the salient pitch in low-frequency acoustic hearing and the weak pitch in the envelope to enhance segregation between signal and noise. The present study suggests the importance and urgency of accurately encoding the fine-structure cue in cochlear implants. .

  11. Mathematical model of acoustic speech production with mobile walls of the vocal tract

    NASA Astrophysics Data System (ADS)

    Lyubimov, N. A.; Zakharov, E. V.

    2016-03-01

    A mathematical speech production model is considered that describes acoustic oscillation propagation in a vocal tract with mobile walls. The wave field function satisfies the Helmholtz equation with boundary conditions of the third kind (impedance type). The impedance mode corresponds to a threeparameter pendulum oscillation model. The experimental research demonstrates the nonlinear character of how the mobility of the vocal tract walls influence the spectral envelope of a speech signal.

  12. Precategorical Acoustic Storage and the Perception of Speech

    ERIC Educational Resources Information Center

    Frankish, Clive

    2008-01-01

    Theoretical accounts of both speech perception and of short term memory must consider the extent to which perceptual representations of speech sounds might survive in relatively unprocessed form. This paper describes a novel version of the serial recall task that can be used to explore this area of shared interest. In immediate recall of digit…

  13. Vowel Acoustics in Adults with Apraxia of Speech

    ERIC Educational Resources Information Center

    Jacks, Adam; Mathes, Katey A.; Marquardt, Thomas P.

    2010-01-01

    Purpose: To investigate the hypothesis that vowel production is more variable in adults with acquired apraxia of speech (AOS) relative to healthy individuals with unimpaired speech. Vowel formant frequency measures were selected as the specific target of focus. Method: Seven adults with AOS and aphasia produced 15 repetitions of 6 American English…

  14. Improving Robustness of Deep Neural Network Acoustic Models via Speech Separation and Joint Adaptive Training

    PubMed Central

    Narayanan, Arun; Wang, DeLiang

    2015-01-01

    Although deep neural network (DNN) acoustic models are known to be inherently noise robust, especially with matched training and testing data, the use of speech separation as a frontend and for deriving alternative feature representations has been shown to improve performance in challenging environments. We first present a supervised speech separation system that significantly improves automatic speech recognition (ASR) performance in realistic noise conditions. The system performs separation via ratio time-frequency masking; the ideal ratio mask (IRM) is estimated using DNNs. We then propose a framework that unifies separation and acoustic modeling via joint adaptive training. Since the modules for acoustic modeling and speech separation are implemented using DNNs, unification is done by introducing additional hidden layers with fixed weights and appropriate network architecture. On the CHiME-2 medium-large vocabulary ASR task, and with log mel spectral features as input to the acoustic model, an independently trained ratio masking frontend improves word error rates by 10.9% (relative) compared to the noisy baseline. In comparison, the jointly trained system improves performance by 14.4%. We also experiment with alternative feature representations to augment the standard log mel features, like the noise and speech estimates obtained from the separation module, and the standard feature set used for IRM estimation. Our best system obtains a word error rate of 15.4% (absolute), an improvement of 4.6 percentage points over the next best result on this corpus. PMID:26973851

  15. Acoustic properties of vowels in clear and conversational speech by female non-native English speakers

    NASA Astrophysics Data System (ADS)

    Li, Chi-Nin; So, Connie K.

    2005-04-01

    Studies have shown that talkers can improve the intelligibility of their speech when instructed to speak as if talking to a hearing-impaired person. The improvement of speech intelligibility is associated with specific acoustic-phonetic changes: increases in vowel duration and fundamental frequency (F0), a wider pitch range, and a shift in formant frequencies for F1 and F2. Most previous studies of clear speech production have been conducted with native speakers; research with second language speakers is much less common. The present study examined the acoustic properties of non-native English vowels produced in a clear speaking style. Five female Cantonese speakers and a comparison group of English speakers were recorded producing four vowels (/i u ae a/) in /bVt/ context in conversational and clear speech. Vowel durations, F0, pitch range, and the first two formants for each of the four vowels were measured. Analyses revealed that for both groups of speakers, vowel durations, F0, pitch range, and F1 spoken clearly were greater than those produced conversationally. However, F2 was higher in conversational speech than in clear speech. The findings suggest that female non-native English speakers exhibit acoustic-phonetic patterns similar to those of native speakers when asked to produce English vowels clearly.

  16. Speech privacy and annoyance considerations in the acoustic environment of passenger cars of high-speed trains.

    PubMed

    Jeon, Jin Yong; Hong, Joo Young; Jang, Hyung Suk; Kim, Jae Hyeon

    2015-12-01

    It is necessary to consider not only annoyance of interior noises but also speech privacy to achieve acoustic comfort in a passenger car of a high-speed train because speech from other passengers can be annoying. This study aimed to explore an optimal acoustic environment to satisfy speech privacy and reduce annoyance in a passenger car. Two experiments were conducted using speech sources and compartment noise of a high speed train with varying speech-to-noise ratios (SNRA) and background noise levels (BNL). Speech intelligibility was tested in experiment I, and in experiment II, perceived speech privacy, annoyance, and acoustic comfort of combined sounds with speech and background noise were assessed. The results show that speech privacy and annoyance were significantly influenced by the SNRA. In particular, the acoustic comfort was evaluated as acceptable when the SNRA was less than -6 dB for both speech privacy and noise annoyance. In addition, annoyance increased significantly as the BNL exceeded 63 dBA, whereas the effect of the background-noise level on the speech privacy was not significant. These findings suggest that an optimal level of interior noise in a passenger car might exist between 59 and 63 dBA, taking normal speech levels into account. PMID:26723351

  17. Perceiving speech in context: Compensation for contextual variability during acoustic cue encoding and categorization

    NASA Astrophysics Data System (ADS)

    Toscano, Joseph Christopher

    Several fundamental questions about speech perception concern how listeners understand spoken language despite considerable variability in speech sounds across different contexts (the problem of lack of invariance in speech). This contextual variability is caused by several factors, including differences between individual talkers' voices, variation in speaking rate, and effects of coarticulatory context. A number of models have been proposed to describe how the speech system handles differences across contexts. Critically, these models make different predictions about (1) whether contextual variability is handled at the level of acoustic cue encoding or categorization, (2) whether it is driven by feedback from category-level processes or interactions between cues, and (3) whether listeners discard fine-grained acoustic information to compensate for contextual variability. Separating the effects of cue- and category-level processing has been difficult because behavioral measures tap processes that occur well after initial cue encoding and are influenced by task demands and linguistic information. Recently, we have used the event-related brain potential (ERP) technique to examine cue encoding and online categorization. Specifically, we have looked at differences in the auditory N1 as a measure of acoustic cue encoding and the P3 as a measure of categorization. This allows us to examine multiple levels of processing during speech perception and can provide a useful tool for studying effects of contextual variability. Here, I apply this approach to determine the point in processing at which context has an effect on speech perception and to examine whether acoustic cues are encoded continuously. Several types of contextual variability (talker gender, speaking rate, and coarticulation), as well as several acoustic cues (voice onset time, formant frequencies, and bandwidths), are examined in a series of experiments. The results suggest that (1) at early stages of speech

  18. Aging Affects Neural Synchronization to Speech-Related Acoustic Modulations

    PubMed Central

    Goossens, Tine; Vercammen, Charlotte; Wouters, Jan; van Wieringen, Astrid

    2016-01-01

    As people age, speech perception problems become highly prevalent, especially in noisy situations. In addition to peripheral hearing and cognition, temporal processing plays a key role in speech perception. Temporal processing of speech features is mediated by synchronized activity of neural oscillations in the central auditory system. Previous studies indicate that both the degree and hemispheric lateralization of synchronized neural activity relate to speech perception performance. Based on these results, we hypothesize that impaired speech perception in older persons may, in part, originate from deviances in neural synchronization. In this study, auditory steady-state responses that reflect synchronized activity of theta, beta, low and high gamma oscillations (i.e., 4, 20, 40, and 80 Hz ASSR, respectively) were recorded in young, middle-aged, and older persons. As all participants had normal audiometric thresholds and were screened for (mild) cognitive impairment, differences in synchronized neural activity across the three age groups were likely to be attributed to age. Our data yield novel findings regarding theta and high gamma oscillations in the aging auditory system. At an older age, synchronized activity of theta oscillations is increased, whereas high gamma synchronization is decreased. In contrast to young persons who exhibit a right hemispheric dominance for processing of high gamma range modulations, older adults show a symmetrical processing pattern. These age-related changes in neural synchronization may very well underlie the speech perception problems in aging persons. PMID:27378906

  19. Speech in ALS: Longitudinal Changes in Lips and Jaw Movements and Vowel Acoustics

    PubMed Central

    Yunusova, Yana; Green, Jordan R.; Lindstrom, Mary J.; Pattee, Gary L.; Zinman, Lorne

    2015-01-01

    Purpose The goal of this exploratory study was to investigate longitudinally the changes in facial kinematics, vowel formant frequencies, and speech intelligibility in individuals diagnosed with bulbar amyotrophic lateral sclerosis (ALS). This study was motivated by the need to understand articulatory and acoustic changes with disease progression and their subsequent effect on deterioration of speech in ALS. Method Lip and jaw movements and vowel acoustics were obtained for four individuals with bulbar ALS during four consecutive recording sessions with an average interval of three months between recordings. Participants read target words embedded into sentences at a comfortable speaking rate. Maximum vertical and horizontal mouth opening and maximum jaw displacements were obtained during corner vowels. First and second formant frequencies were measured for each vowel. Speech intelligibility and speaking rate score were obtained for each session as well. Results Transient, non-vowel-specific changes in kinematics of the jaw and lips were observed. Kinematic changes often preceded changes in vowel acoustics and speech intelligibility. Conclusions Nonlinear changes in speech kinematics should be considered in evaluation of the disease effects on jaw and lip musculature. Kinematic measures might be most suitable for early detection of changes associated with bulbar ALS.

  20. Acoustic and Articulatory Features of Diphthong Production: A Speech Clarity Study

    ERIC Educational Resources Information Center

    Tasko, Stephen M.; Greilick, Kristin

    2010-01-01

    Purpose: The purpose of this study was to evaluate how speaking clearly influences selected acoustic and orofacial kinematic measures associated with diphthong production. Method: Forty-nine speakers, drawn from the University of Wisconsin X-Ray Microbeam Speech Production Database (J. R. Westbury, 1994), served as participants. Samples of clear…

  1. Effects of contextual cues on speech recognition in simulated electric-acoustic stimulation.

    PubMed

    Kong, Ying-Yee; Donaldson, Gail; Somarowthu, Ala

    2015-05-01

    Low-frequency acoustic cues have shown to improve speech perception in cochlear-implant listeners. However, the mechanisms underlying this benefit are still not well understood. This study investigated the extent to which low-frequency cues can facilitate listeners' use of linguistic knowledge in simulated electric-acoustic stimulation (EAS). Experiment 1 examined differences in the magnitude of EAS benefit at the phoneme, word, and sentence levels. Speech materials were processed via noise-channel vocoding and lowpass (LP) filtering. The amount of spectral degradation in the vocoder speech was varied by applying different numbers of vocoder channels. Normal-hearing listeners were tested on vocoder-alone, LP-alone, and vocoder + LP conditions. Experiment 2 further examined factors that underlie the context effect on EAS benefit at the sentence level by limiting the low-frequency cues to temporal envelope and periodicity (AM + FM). Results showed that EAS benefit was greater for higher-context than for lower-context speech materials even when the LP ear received only low-frequency AM + FM cues. Possible explanations for the greater EAS benefit observed with higher-context materials may lie in the interplay between perceptual and expectation-driven processes for EAS speech recognition, and/or the band-importance functions for different types of speech materials. PMID:25994712

  2. Contributions of electric and acoustic hearing to bimodal speech and music perception.

    PubMed

    Crew, Joseph D; Galvin, John J; Landsberger, David M; Fu, Qian-Jie

    2015-01-01

    Cochlear implant (CI) users have difficulty understanding speech in noisy listening conditions and perceiving music. Aided residual acoustic hearing in the contralateral ear can mitigate these limitations. The present study examined contributions of electric and acoustic hearing to speech understanding in noise and melodic pitch perception. Data was collected with the CI only, the hearing aid (HA) only, and both devices together (CI+HA). Speech reception thresholds (SRTs) were adaptively measured for simple sentences in speech babble. Melodic contour identification (MCI) was measured with and without a masker instrument; the fundamental frequency of the masker was varied to be overlapping or non-overlapping with the target contour. Results showed that the CI contributes primarily to bimodal speech perception and that the HA contributes primarily to bimodal melodic pitch perception. In general, CI+HA performance was slightly improved relative to the better ear alone (CI-only) for SRTs but not for MCI, with some subjects experiencing a decrease in bimodal MCI performance relative to the better ear alone (HA-only). Individual performance was highly variable, and the contribution of either device to bimodal perception was both subject- and task-dependent. The results suggest that individualized mapping of CIs and HAs may further improve bimodal speech and music perception. PMID:25790349

  3. Contributions of Electric and Acoustic Hearing to Bimodal Speech and Music Perception

    PubMed Central

    Crew, Joseph D.; Galvin III, John J.; Landsberger, David M.; Fu, Qian-Jie

    2015-01-01

    Cochlear implant (CI) users have difficulty understanding speech in noisy listening conditions and perceiving music. Aided residual acoustic hearing in the contralateral ear can mitigate these limitations. The present study examined contributions of electric and acoustic hearing to speech understanding in noise and melodic pitch perception. Data was collected with the CI only, the hearing aid (HA) only, and both devices together (CI+HA). Speech reception thresholds (SRTs) were adaptively measured for simple sentences in speech babble. Melodic contour identification (MCI) was measured with and without a masker instrument; the fundamental frequency of the masker was varied to be overlapping or non-overlapping with the target contour. Results showed that the CI contributes primarily to bimodal speech perception and that the HA contributes primarily to bimodal melodic pitch perception. In general, CI+HA performance was slightly improved relative to the better ear alone (CI-only) for SRTs but not for MCI, with some subjects experiencing a decrease in bimodal MCI performance relative to the better ear alone (HA-only). Individual performance was highly variable, and the contribution of either device to bimodal perception was both subject- and task-dependent. The results suggest that individualized mapping of CIs and HAs may further improve bimodal speech and music perception. PMID:25790349

  4. Effects of contextual cues on speech recognition in simulated electric-acoustic stimulation

    PubMed Central

    Kong, Ying-Yee; Donaldson, Gail; Somarowthu, Ala

    2015-01-01

    Low-frequency acoustic cues have shown to improve speech perception in cochlear-implant listeners. However, the mechanisms underlying this benefit are still not well understood. This study investigated the extent to which low-frequency cues can facilitate listeners' use of linguistic knowledge in simulated electric-acoustic stimulation (EAS). Experiment 1 examined differences in the magnitude of EAS benefit at the phoneme, word, and sentence levels. Speech materials were processed via noise-channel vocoding and lowpass (LP) filtering. The amount of spectral degradation in the vocoder speech was varied by applying different numbers of vocoder channels. Normal-hearing listeners were tested on vocoder-alone, LP-alone, and vocoder + LP conditions. Experiment 2 further examined factors that underlie the context effect on EAS benefit at the sentence level by limiting the low-frequency cues to temporal envelope and periodicity (AM + FM). Results showed that EAS benefit was greater for higher-context than for lower-context speech materials even when the LP ear received only low-frequency AM + FM cues. Possible explanations for the greater EAS benefit observed with higher-context materials may lie in the interplay between perceptual and expectation-driven processes for EAS speech recognition, and/or the band-importance functions for different types of speech materials. PMID:25994712

  5. Primary Progressive Apraxia of Speech: Clinical Features and Acoustic and Neurologic Correlates

    PubMed Central

    Strand, Edythe A.; Clark, Heather; Machulda, Mary; Whitwell, Jennifer L.; Josephs, Keith A.

    2015-01-01

    Purpose This study summarizes 2 illustrative cases of a neurodegenerative speech disorder, primary progressive apraxia of speech (AOS), as a vehicle for providing an overview of the disorder and an approach to describing and quantifying its perceptual features and some of its temporal acoustic attributes. Method Two individuals with primary progressive AOS underwent speech-language and neurologic evaluations on 2 occasions, ranging from 2.0 to 7.5 years postonset. Performance on several tests, tasks, and rating scales, as well as several acoustic measures, were compared over time within and between cases. Acoustic measures were compared with performance of control speakers. Results Both patients initially presented with AOS as the only or predominant sign of disease and without aphasia or dysarthria. The presenting features and temporal progression were captured in an AOS Rating Scale, an Articulation Error Score, and temporal acoustic measures of utterance duration, syllable rates per second, rates of speechlike alternating motion and sequential motion, and a pairwise variability index measure. Conclusions AOS can be the predominant manifestation of neurodegenerative disease. Clinical ratings of its attributes and acoustic measures of some of its temporal characteristics can support its diagnosis and help quantify its salient characteristics and progression over time. PMID:25654422

  6. Classification of stop place in consonant-vowel contexts using feature extrapolation of acoustic-phonetic features in telephone speech.

    PubMed

    Lee, Jung-Won; Choi, Jeung-Yoon; Kang, Hong-Goo

    2012-02-01

    Knowledge-based speech recognition systems extract acoustic cues from the signal to identify speech characteristics. For channel-deteriorated telephone speech, acoustic cues, especially those for stop consonant place, are expected to be degraded or absent. To investigate the use of knowledge-based methods in degraded environments, feature extrapolation of acoustic-phonetic features based on Gaussian mixture models is examined. This process is applied to a stop place detection module that uses burst release and vowel onset cues for consonant-vowel tokens of English. Results show that classification performance is enhanced in telephone channel-degraded speech, with extrapolated acoustic-phonetic features reaching or exceeding performance using estimated Mel-frequency cepstral coefficients (MFCCs). Results also show acoustic-phonetic features may be combined with MFCCs for best performance, suggesting these features provide information complementary to MFCCs. PMID:22352523

  7. Intelligibility of Modified Speech for Young Listeners with Normal and Impaired Hearing.

    ERIC Educational Resources Information Center

    Uchanski, Rosalie M.; Geers, Ann E.; Protopapas, Athanassios

    2002-01-01

    A study examined whether the benefits of modified speech could be extended to provide intelligibility improvements for eight children (ages 8-14) with severe-to-profound hearing impairments who wear sensory aids and five controls. All varieties of modified speech (envelope-amplified, slowed, and both) yielded either equivalent or poorer…

  8. [Influence of human personal features on acoustic correlates of speech emotional intonation characteristics].

    PubMed

    Dmitrieva, E S; Gel'man, V Ia; Zaĭtseva, K A; Orlov, A M

    2009-01-01

    Comparative study of acoustic correlates of emotional intonation was conducted on two types of speech material: sensible speech utterances and short meaningless words. The corpus of speech signals of different emotional intonations (happy, angry, frightened, sad and neutral) was created using the actor's method of simulation of emotions. Native Russian 20-70-year-old speakers (both professional actors and non-actors) participated in the study. In the corpus, the following characteristics were analyzed: mean values and standard deviations of the power, fundamental frequency, frequencies of the first and second formants, and utterance duration. Comparison of each emotional intonation with "neutral" utterances showed the greatest deviations of the fundamental frequency and frequencies of the first formant. The direction of these deviations was independent of the semantic content of speech utterance and its duration, age, gender, and being actor or non-actor, though the personal features of the speakers affected the absolute values of these frequencies. PMID:19947529

  9. Vowel Acoustics in Dysarthria: Speech Disorder Diagnosis and Classification

    ERIC Educational Resources Information Center

    Lansford, Kaitlin L.; Liss, Julie M.

    2014-01-01

    Purpose: The purpose of this study was to determine the extent to which vowel metrics are capable of distinguishing healthy from dysarthric speech and among different forms of dysarthria. Method: A variety of vowel metrics were derived from spectral and temporal measurements of vowel tokens embedded in phrases produced by 45 speakers with…

  10. Estimation of glottal source features from the spectral envelope of the acoustic speech signal

    NASA Astrophysics Data System (ADS)

    Torres, Juan Felix

    Speech communication encompasses diverse types of information, including phonetics, affective state, voice quality, and speaker identity. From a speech production standpoint, the acoustic speech signal can be mainly divided into glottal source and vocal tract components, which play distinct roles in rendering the various types of information it contains. Most deployed speech analysis systems, however, do not explicitly represent these two components as distinct entities, as their joint estimation from the acoustic speech signal becomes an ill-defined blind deconvolution problem. Nevertheless, because of the desire to understand glottal behavior and how it relates to perceived voice quality, there has been continued interest in explicitly estimating the glottal component of the speech signal. To this end, several inverse filtering (IF) algorithms have been proposed, but they are unreliable in practice because of the blind formulation of the separation problem. In an effort to develop a method that can bypass the challenging IF process, this thesis proposes a new glottal source information extraction method that relies on supervised machine learning to transform smoothed spectral representations of speech, which are already used in some of the most widely deployed and successful speech analysis applications, into a set of glottal source features. A transformation method based on Gaussian mixture regression (GMR) is presented and compared to current IF methods in terms of feature similarity, reliability, and speaker discrimination capability on a large speech corpus, and potential representations of the spectral envelope of speech are investigated for their ability represent glottal source variation in a predictable manner. The proposed system was found to produce glottal source features that reasonably matched their IF counterparts in many cases, while being less susceptible to spurious errors. The development of the proposed method entailed a study into the aspects

  11. Fundamental frequency is critical to speech perception in noise in combined acoustic and electric hearinga

    PubMed Central

    Carroll, Jeff; Tiaden, Stephanie; Zeng, Fan-Gang

    2011-01-01

    Cochlear implant (CI) users have been shown to benefit from residual low-frequency hearing, specifically in pitch related tasks. It remains unclear whether this benefit is dependent on fundamental frequency (F0) or other acoustic cues. Three experiments were conducted to determine the role of F0, as well as its frequency modulated (FM) and amplitude modulated (AM) components, in speech recognition with a competing voice. In simulated CI listeners, the signal-to-noise ratio was varied to estimate the 50% correct response. Simulation results showed that the F0 cue contributes to a significant proportion of the benefit seen with combined acoustic and electric hearing, and additionally that this benefit is due to the FM rather than the AM component. In actual CI users, sentence recognition scores were collected with either the full F0 cue containing both the FM and AM components or the 500-Hz low-pass speech cue containing the F0 and additional harmonics. The F0 cue provided a benefit similar to the low-pass cue for speech in noise, but not in quiet. Poorer CI users benefited more from the F0 cue than better users. These findings suggest that F0 is critical to improving speech perception in noise in combined acoustic and electric hearing. PMID:21973360

  12. A Chimpanzee Recognizes Synthetic Speech With Significantly Reduced Acoustic Cues to Phonetic Content

    PubMed Central

    Heimbauer, Lisa A.; Beran, Michael J.; Owren, Michael J.

    2011-01-01

    Summary A long-standing debate concerns whether humans are specialized for speech perception [1–7], which some researchers argue is demonstrated by the ability to understand synthetic speech with significantly reduced acoustic cues to phonetic content [2–4,7]. We tested a chimpanzee (Pan troglodytes) that recognizes 128 spoken words [8,9], asking whether she could understand such speech. Three experiments presented 48 individual words, with the animal selecting a corresponding visuo-graphic symbol from among four alternatives. Experiment 1 tested spectrally reduced, noise-vocoded (NV) synthesis, originally developed to simulate input received by human cochlear-implant users [10]. Experiment 2 tested “impossibly unspeechlike” [3] sine-wave (SW) synthesis, which reduces speech to just three moving tones [11]. Although receiving only intermittent and non-contingent reward, the chimpanzee performed well above chance level, including when hearing synthetic versions for the first time. Recognition of SW words was least accurate, but improved in Experiment 3 when natural words in the same session were rewarded. The chimpanzee was more accurate with NV than SW versions, as were 32 human participants hearing these items. The chimpanzee's ability to spontaneously recognize acoustically reduced synthetic words suggests that experience rather than specialization is critical for speech-perception capabilities that some have suggested are uniquely human [12–14]. PMID:21723125

  13. Speech Recognition and Acoustic Features in Combined Electric and Acoustic Stimulation

    ERIC Educational Resources Information Center

    Yoon, Yang-soo; Li, Yongxin; Fu, Qian-Jie

    2012-01-01

    Purpose: In this study, the authors aimed to identify speech information processed by a hearing aid (HA) that is additive to information processed by a cochlear implant (CI) as a function of signal-to-noise ratio (SNR). Method: Speech recognition was measured with CI alone, HA alone, and CI + HA. Ten participants were separated into 2 groups; good…

  14. From prosodic structure to acoustic saliency: A fMRI investigation of speech rate, clarity, and emphasis

    NASA Astrophysics Data System (ADS)

    Golfinopoulos, Elisa

    Acoustic variability in fluent speech can arise at many stages in speech production planning and execution. For example, at the phonological encoding stage, the grouping of phonemes into syllables determines which segments are coarticulated and, by consequence, segment-level acoustic variation. Likewise phonetic encoding, which determines the spatiotemporal extent of articulatory gestures, will affect the acoustic detail of segments. Functional magnetic resonance imaging (fMRI) was used to measure brain activity of fluent adult speakers in four speaking conditions: fast, normal, clear, and emphatic (or stressed) speech. These speech manner changes typically result in acoustic variations that do not change the lexical or semantic identity of productions but do affect the acoustic saliency of phonemes, syllables and/or words. Acoustic responses recorded inside the scanner were assessed quantitatively using eight acoustic measures and sentence duration was used as a covariate of non-interest in the neuroimaging analysis. Compared to normal speech, emphatic speech was characterized acoustically by a greater difference between stressed and unstressed vowels in intensity, duration, and fundamental frequency, and neurally by increased activity in right middle premotor cortex and supplementary motor area, and bilateral primary sensorimotor cortex. These findings are consistent with right-lateralized motor planning of prosodic variation in emphatic speech. Clear speech involved an increase in average vowel and sentence durations and average vowel spacing, along with increased activity in left middle premotor cortex and bilateral primary sensorimotor cortex. These findings are consistent with an increased reliance on feedforward control, resulting in hyper-articulation, under clear as compared to normal speech. Fast speech was characterized acoustically by reduced sentence duration and average vowel spacing, and neurally by increased activity in left anterior frontal

  15. Acoustic-Phonetic Differences between Infant- and Adult-Directed Speech: The Role of Stress and Utterance Position

    ERIC Educational Resources Information Center

    Wang, Yuanyuan; Seidl, Amanda; Cristia, Alejandrina

    2015-01-01

    Previous studies have shown that infant-directed speech (IDS) differs from adult-directed speech (ADS) on a variety of dimensions. The aim of the current study was to investigate whether acoustic differences between IDS and ADS in English are modulated by prosodic structure. We compared vowels across the two registers (IDS, ADS) in both stressed…

  16. Speech after Radial Forearm Free Flap Reconstruction of the Tongue: A Longitudinal Acoustic Study of Vowel and Diphthong Sounds

    ERIC Educational Resources Information Center

    Laaksonen, Juha-Pertti; Rieger, Jana; Happonen, Risto-Pekka; Harris, Jeffrey; Seikaly, Hadi

    2010-01-01

    The purpose of this study was to use acoustic analyses to describe speech outcomes over the course of 1 year after radial forearm free flap (RFFF) reconstruction of the tongue. Eighteen Canadian English-speaking females and males with reconstruction for oral cancer had speech samples recorded (pre-operative, and 1 month, 6 months, and 1 year…

  17. A modified diffusion equation for room-acoustic predication.

    PubMed

    Jing, Yun; Xiang, Ning

    2007-06-01

    This letter presents a modified diffusion model using an Eyring absorption coefficient to predict the reverberation time and sound pressure distributions in enclosures. While the original diffusion model [Ollendorff, Acustica 21, 236-245 (1969); J. Picaut et al., Acustica 83, 614-621 (1997); Valeau et al., J. Acoust. Soc. Am. 119, 1504-1513 (2006)] usually has good performance for low absorption, the modified diffusion model yields more satisfactory results for both low and high absorption. Comparisons among the modified model, the original model, a geometrical-acoustics model, and several well-established theories in terms of reverberation times and sound pressure level distributions, indicate significantly improved prediction accuracy by the modification. PMID:17552680

  18. An eighth-scale speech source for subjective assessments in acoustic models

    NASA Astrophysics Data System (ADS)

    Orlowski, R. J.

    1981-08-01

    The design of a source is described which is suitable for making speech recordings in eighth-scale acoustic models of auditoria. An attempt was made to match the directionality of the source with the directionality of the human voice using data reported in the literature. A narrow aperture was required for the design which was provided by mounting an inverted conical horn over the diaphragm of a high frequency loudspeaker. Resonance problems were encountered with the use of a horn and a description is given of the electronic techniques adopted to minimize the effect of these resonances. Subjective and objective assessments on the completed speech source have proved satisfactory. It has been used in a modelling exercise concerned with the acoustic design of a theatre with a thrust-type stage.

  19. Are you a good mimic? Neuro-acoustic signatures for speech imitation ability

    PubMed Central

    Reiterer, Susanne M.; Hu, Xiaochen; Sumathi, T. A.; Singh, Nandini C.

    2013-01-01

    We investigated individual differences in speech imitation ability in late bilinguals using a neuro-acoustic approach. One hundred and thirty-eight German-English bilinguals matched on various behavioral measures were tested for “speech imitation ability” in a foreign language, Hindi, and categorized into “high” and “low ability” groups. Brain activations and speech recordings were obtained from 26 participants from the two extreme groups as they performed a functional neuroimaging experiment which required them to “imitate” sentences in three conditions: (A) German, (B) English, and (C) German with fake English accent. We used recently developed novel acoustic analysis, namely the “articulation space” as a metric to compare speech imitation abilities of the two groups. Across all three conditions, direct comparisons between the two groups, revealed brain activations (FWE corrected, p < 0.05) that were more widespread with significantly higher peak activity in the left supramarginal gyrus and postcentral areas for the low ability group. The high ability group, on the other hand showed significantly larger articulation space in all three conditions. In addition, articulation space also correlated positively with imitation ability (Pearson's r = 0.7, p < 0.01). Our results suggest that an expanded articulation space for high ability individuals allows access to a larger repertoire of sounds, thereby providing skilled imitators greater flexibility in pronunciation and language learning. PMID:24155739

  20. Acoustic-Emergent Phonology in the Amplitude Envelope of Child-Directed Speech

    PubMed Central

    Leong, Victoria; Goswami, Usha

    2015-01-01

    When acquiring language, young children may use acoustic spectro-temporal patterns in speech to derive phonological units in spoken language (e.g., prosodic stress patterns, syllables, phonemes). Children appear to learn acoustic-phonological mappings rapidly, without direct instruction, yet the underlying developmental mechanisms remain unclear. Across different languages, a relationship between amplitude envelope sensitivity and phonological development has been found, suggesting that children may make use of amplitude modulation (AM) patterns within the envelope to develop a phonological system. Here we present the Spectral Amplitude Modulation Phase Hierarchy (S-AMPH) model, a set of algorithms for deriving the dominant AM patterns in child-directed speech (CDS). Using Principal Components Analysis, we show that rhythmic CDS contains an AM hierarchy comprising 3 core modulation timescales. These timescales correspond to key phonological units: prosodic stress (Stress AM, ~2 Hz), syllables (Syllable AM, ~5 Hz) and onset-rime units (Phoneme AM, ~20 Hz). We argue that these AM patterns could in principle be used by naïve listeners to compute acoustic-phonological mappings without lexical knowledge. We then demonstrate that the modulation statistics within this AM hierarchy indeed parse the speech signal into a primitive hierarchically-organised phonological system comprising stress feet (proto-words), syllables and onset-rime units. We apply the S-AMPH model to two other CDS corpora, one spontaneous and one deliberately-timed. The model accurately identified 72–82% (freely-read CDS) and 90–98% (rhythmically-regular CDS) stress patterns, syllables and onset-rime units. This in-principle demonstration that primitive phonology can be extracted from speech AMs is termed Acoustic-Emergent Phonology (AEP) theory. AEP theory provides a set of methods for examining how early phonological development is shaped by the temporal modulation structure of speech across

  1. Suppressed alpha oscillations predict intelligibility of speech and its acoustic details.

    PubMed

    Obleser, Jonas; Weisz, Nathan

    2012-11-01

    Modulations of human alpha oscillations (8-13 Hz) accompany many cognitive processes, but their functional role in auditory perception has proven elusive: Do oscillatory dynamics of alpha reflect acoustic details of the speech signal and are they indicative of comprehension success? Acoustically presented words were degraded in acoustic envelope and spectrum in an orthogonal design, and electroencephalogram responses in the frequency domain were analyzed in 24 participants, who rated word comprehensibility after each trial. First, the alpha power suppression during and after a degraded word depended monotonically on spectral and, to a lesser extent, envelope detail. The magnitude of this alpha suppression exhibited an additional and independent influence on later comprehension ratings. Second, source localization of alpha suppression yielded superior parietal, prefrontal, as well as anterior temporal brain areas. Third, multivariate classification of the time-frequency pattern across participants showed that patterns of late posterior alpha power allowed best for above-chance classification of word intelligibility. Results suggest that both magnitude and topography of late alpha suppression in response to single words can indicate a listener's sensitivity to acoustic features and the ability to comprehend speech under adverse listening conditions. PMID:22100354

  2. Suppressed Alpha Oscillations Predict Intelligibility of Speech and its Acoustic Details

    PubMed Central

    Weisz, Nathan

    2012-01-01

    Modulations of human alpha oscillations (8–13 Hz) accompany many cognitive processes, but their functional role in auditory perception has proven elusive: Do oscillatory dynamics of alpha reflect acoustic details of the speech signal and are they indicative of comprehension success? Acoustically presented words were degraded in acoustic envelope and spectrum in an orthogonal design, and electroencephalogram responses in the frequency domain were analyzed in 24 participants, who rated word comprehensibility after each trial. First, the alpha power suppression during and after a degraded word depended monotonically on spectral and, to a lesser extent, envelope detail. The magnitude of this alpha suppression exhibited an additional and independent influence on later comprehension ratings. Second, source localization of alpha suppression yielded superior parietal, prefrontal, as well as anterior temporal brain areas. Third, multivariate classification of the time–frequency pattern across participants showed that patterns of late posterior alpha power allowed best for above-chance classification of word intelligibility. Results suggest that both magnitude and topography of late alpha suppression in response to single words can indicate a listener's sensitivity to acoustic features and the ability to comprehend speech under adverse listening conditions. PMID:22100354

  3. Segment-based acoustic models for continuous speech recognition

    NASA Astrophysics Data System (ADS)

    Ostendorf, Mari; Rohlicek, J. R.

    1993-07-01

    This research aims to develop new and more accurate stochastic models for speaker-independent continuous speech recognition, by extending previous work in segment-based modeling and by introducing a new hierarchical approach to representing intra-utterance statistical dependencies. These techniques, which are more costly than traditional approaches because of the large search space associated with higher order models, are made feasible through rescoring a set of HMM-generated N-best sentence hypotheses. We expect these different modeling techniques to result in improved recognition performance over that achieved by current systems, which handle only frame-based observations and assume that these observations are independent given an underlying state sequence. In the fourth quarter of the project, we have completed the following: (1) ported our recognition system to the Wall Street Journal task, a standard task in the ARPA community; (2) developed an initial dependency-tree model of intra-utterance observation correlation; and (3) implemented baseline language model estimation software. Our initial results on the Wall Street Journal task are quite good and represent significantly improved performance over most HMM systems reporting on the Nov. 1992 5k vocabulary test set.

  4. On the Acoustics of Emotion in Audio: What Speech, Music, and Sound have in Common

    PubMed Central

    Weninger, Felix; Eyben, Florian; Schuller, Björn W.; Mortillaro, Marcello; Scherer, Klaus R.

    2013-01-01

    Without doubt, there is emotional information in almost any kind of sound received by humans every day: be it the affective state of a person transmitted by means of speech; the emotion intended by a composer while writing a musical piece, or conveyed by a musician while performing it; or the affective state connected to an acoustic event occurring in the environment, in the soundtrack of a movie, or in a radio play. In the field of affective computing, there is currently some loosely connected research concerning either of these phenomena, but a holistic computational model of affect in sound is still lacking. In turn, for tomorrow’s pervasive technical systems, including affective companions and robots, it is expected to be highly beneficial to understand the affective dimensions of “the sound that something makes,” in order to evaluate the system’s auditory environment and its own audio output. This article aims at a first step toward a holistic computational model: starting from standard acoustic feature extraction schemes in the domains of speech, music, and sound analysis, we interpret the worth of individual features across these three domains, considering four audio databases with observer annotations in the arousal and valence dimensions. In the results, we find that by selection of appropriate descriptors, cross-domain arousal, and valence regression is feasible achieving significant correlations with the observer annotations of up to 0.78 for arousal (training on sound and testing on enacted speech) and 0.60 for valence (training on enacted speech and testing on music). The high degree of cross-domain consistency in encoding the two main dimensions of affect may be attributable to the co-evolution of speech and music from multimodal affect bursts, including the integration of nature sounds for expressive effects. PMID:23750144

  5. Acoustic landmarks drive delta-theta oscillations to enable speech comprehension by facilitating perceptual parsing

    PubMed Central

    Doelling, Keith; Arnal, Luc; Ghitza, Oded; Poeppel, David

    2013-01-01

    A growing body of research suggests that intrinsic neuronal slow (< 10 Hz) oscillations in auditory cortex appear to track incoming speech and other spectro-temporally complex auditory signals. Within this framework, several recent studies have identified critical-band temporal envelopes as the specific acoustic feature being reflected by the phase of these oscillations. However, how this alignment between speech acoustics and neural oscillations might underpin intelligibility is unclear. Here we test the hypothesis that the ‘sharpness’ of temporal fluctuations in the critical band envelope acts as a temporal cue to speech syllabic rate, driving delta-theta rhythms to track the stimulus and facilitate intelligibility. We interpret our findings as evidence that sharp events in the stimulus cause cortical rhythms to re-align and parse the stimulus into syllable-sized chunks for further decoding. Using magnetoencephalographic recordings, we show that by removing temporal fluctuations that occur at the syllabic rate, envelope-tracking activity is reduced. By artificially reinstating these temporal fluctuations, envelope-tracking activity is regained. These changes in tracking correlate with intelligibility of the stimulus. Together, the results suggest that the sharpness of fluctuations in the stimulus, as reflected in the cochlear output, drive oscillatory activity to track and entrain to the stimulus, at its syllabic rate. This process likely facilitates parsing of the stimulus into meaningful chunks appropriate for subsequent decoding, enhancing perception and intelligibility. PMID:23791839

  6. Compensating Acoustic Mismatch Using Class-Based Histogram Equalization for Robust Speech Recognition

    NASA Astrophysics Data System (ADS)

    Suh, Youngjoo; Kim, Sungtak; Kim, Hoirin

    2007-12-01

    A new class-based histogram equalization method is proposed for robust speech recognition. The proposed method aims at not only compensating for an acoustic mismatch between training and test environments but also reducing the two fundamental limitations of the conventional histogram equalization method, the discrepancy between the phonetic distributions of training and test speech data, and the nonmonotonic transformation caused by the acoustic mismatch. The algorithm employs multiple class-specific reference and test cumulative distribution functions, classifies noisy test features into their corresponding classes, and equalizes the features by using their corresponding class reference and test distributions. The minimum mean-square error log-spectral amplitude (MMSE-LSA)-based speech enhancement is added just prior to the baseline feature extraction to reduce the corruption by additive noise. The experiments on the Aurora2 database proved the effectiveness of the proposed method by reducing relative errors by[InlineEquation not available: see fulltext.] over the mel-cepstral-based features and by[InlineEquation not available: see fulltext.] over the conventional histogram equalization method, respectively.

  7. Logopenic and nonfluent variants of primary progressive aphasia are differentiated by acoustic measures of speech production.

    PubMed

    Ballard, Kirrie J; Savage, Sharon; Leyton, Cristian E; Vogel, Adam P; Hornberger, Michael; Hodges, John R

    2014-01-01

    Differentiation of logopenic (lvPPA) and nonfluent/agrammatic (nfvPPA) variants of Primary Progressive Aphasia is important yet remains challenging since it hinges on expert based evaluation of speech and language production. In this study acoustic measures of speech in conjunction with voxel-based morphometry were used to determine the success of the measures as an adjunct to diagnosis and to explore the neural basis of apraxia of speech in nfvPPA. Forty-one patients (21 lvPPA, 20 nfvPPA) were recruited from a consecutive sample with suspected frontotemporal dementia. Patients were diagnosed using the current gold-standard of expert perceptual judgment, based on presence/absence of particular speech features during speaking tasks. Seventeen healthy age-matched adults served as controls. MRI scans were available for 11 control and 37 PPA cases; 23 of the PPA cases underwent amyloid ligand PET imaging. Measures, corresponding to perceptual features of apraxia of speech, were periods of silence during reading and relative vowel duration and intensity in polysyllable word repetition. Discriminant function analyses revealed that a measure of relative vowel duration differentiated nfvPPA cases from both control and lvPPA cases (r(2) = 0.47) with 88% agreement with expert judgment of presence of apraxia of speech in nfvPPA cases. VBM analysis showed that relative vowel duration covaried with grey matter intensity in areas critical for speech motor planning and programming: precentral gyrus, supplementary motor area and inferior frontal gyrus bilaterally, only affected in the nfvPPA group. This bilateral involvement of frontal speech networks in nfvPPA potentially affects access to compensatory mechanisms involving right hemisphere homologues. Measures of silences during reading also discriminated the PPA and control groups, but did not increase predictive accuracy. Findings suggest that a measure of relative vowel duration from of a polysyllable word repetition task

  8. Logopenic and Nonfluent Variants of Primary Progressive Aphasia Are Differentiated by Acoustic Measures of Speech Production

    PubMed Central

    Ballard, Kirrie J.; Savage, Sharon; Leyton, Cristian E.; Vogel, Adam P.; Hornberger, Michael; Hodges, John R.

    2014-01-01

    Differentiation of logopenic (lvPPA) and nonfluent/agrammatic (nfvPPA) variants of Primary Progressive Aphasia is important yet remains challenging since it hinges on expert based evaluation of speech and language production. In this study acoustic measures of speech in conjunction with voxel-based morphometry were used to determine the success of the measures as an adjunct to diagnosis and to explore the neural basis of apraxia of speech in nfvPPA. Forty-one patients (21 lvPPA, 20 nfvPPA) were recruited from a consecutive sample with suspected frontotemporal dementia. Patients were diagnosed using the current gold-standard of expert perceptual judgment, based on presence/absence of particular speech features during speaking tasks. Seventeen healthy age-matched adults served as controls. MRI scans were available for 11 control and 37 PPA cases; 23 of the PPA cases underwent amyloid ligand PET imaging. Measures, corresponding to perceptual features of apraxia of speech, were periods of silence during reading and relative vowel duration and intensity in polysyllable word repetition. Discriminant function analyses revealed that a measure of relative vowel duration differentiated nfvPPA cases from both control and lvPPA cases (r2 = 0.47) with 88% agreement with expert judgment of presence of apraxia of speech in nfvPPA cases. VBM analysis showed that relative vowel duration covaried with grey matter intensity in areas critical for speech motor planning and programming: precentral gyrus, supplementary motor area and inferior frontal gyrus bilaterally, only affected in the nfvPPA group. This bilateral involvement of frontal speech networks in nfvPPA potentially affects access to compensatory mechanisms involving right hemisphere homologues. Measures of silences during reading also discriminated the PPA and control groups, but did not increase predictive accuracy. Findings suggest that a measure of relative vowel duration from of a polysyllable word repetition task

  9. A Modified Algorithm For Scanning Tomographic Acoustic Microscopy

    NASA Astrophysics Data System (ADS)

    Meyyappan, A.; Wade, G.

    1988-07-01

    Acoustic microscopy is an invaluable tool in non-destructive evaluation because of its ability to provide high-resolution images of microscopic structure in small objects. When such a microscope operates in the transmission mode, the micrograph produced is simply a shadowgraph of all the struc-tures encountered by the acoustic wave passing through the object. Because of diffraction and over-lapping, the resultant images are difficult to comprehend, especially in the case of objects of sub-stantial thickness with complex structures. To over-come these problems, we have developed a scanning tomographic acoustic microscope (STAM) which is capable of producing unambiguous high-resolution tomograms. We have described in previously-published work how a scanning laser acoustic micro-scope can be employed to realize STAM. We use an algorithm based on "back-and-forth propagation" to reconstruct tomograms of the various layers to be imaged. When these layers are physically close to one another, we see ambiguities in the reconstructions. In this paper we describe a modified algorithm which removes these ambiguities. With the new algorithm, we can resolve layers that are only two wavelengths apart.

  10. Effective Acoustic Modeling for Pronunciation Quality Scoring of Strongly Accented Mandarin Speech

    NASA Astrophysics Data System (ADS)

    Ge, Fengpei; Liu, Changliang; Shao, Jian; Pan, Fuping; Dong, Bin; Yan, Yonghong

    In this paper we present our investigation into improving the performance of our computer-assisted language learning (CALL) system through exploiting the acoustic model and features within the speech recognition framework. First, to alleviate channel distortion, speaker-dependent cepstrum mean normalization (CMN) is adopted and the average correlation coefficient (average CC) between machine and expert scores is improved from 78.00% to 84.14%. Second, heteroscedastic linear discriminant analysis (HLDA) is adopted to enhance the discriminability of the acoustic model, which successfully increases the average CC from 84.14% to 84.62%. Additionally, HLDA causes the scoring accuracy to be more stable at various pronunciation proficiency levels, and thus leads to an increase in the speaker correct-rank rate from 85.59% to 90.99%. Finally, we use maximum a posteriori (MAP) estimation to tune the acoustic model to fit strongly accented test speech. As a result, the average CC is improved from 84.62% to 86.57%. These three novel techniques improve the accuracy of evaluating pronunciation quality.

  11. Effect of several acoustic cues on perceiving Mandarin retroflex affricates and fricatives in continuous speech.

    PubMed

    Zhu, Jian; Chen, Yaping

    2016-07-01

    Relatively little attention has been paid to the perception of the three-way contrast between unaspirated affricates, aspirated affricates and fricatives in Mandarin Chinese. This study reports two experiments that explore the acoustic cues relevant to the contrast between the Mandarin retroflex series /tʂ/, /tʂ(h)/ and /ʂ/ in continuous speech. Twenty participants performed two three-alternative forced-choice tasks, in which acoustic cues including closure, frication duration (FD), aspiration, and vocalic contexts (VCs) were systematically manipulated and presented in a carrier phrase. A subsequent classification tree analysis shows that FD distinguishes /tʂ/ from /tʂ(h)/ and /ʂ/, and that closure cues the affricate manner. Interactions between VC and individual cues are also found. The FD threshold for separating /ʂ/ and /tʂ/ is susceptible to the influence of the following vocalic segments, shifting to lower values if frication is followed by the low vowel /a/. On the other hand, while aspiration cues /tʂ(h)/ before /a/ and //, this acoustic cue is obscured by gesture continuation when /tʂ(h)/ precedes its homorganic approximant /ɻ/ in natural speech, which might cause potential confusion between /tʂ(h)/ and /ʂ/. PMID:27475170

  12. The role of metrical information in apraxia of speech. Perceptual and acoustic analyses of word stress.

    PubMed

    Aichert, Ingrid; Späth, Mona; Ziegler, Wolfram

    2016-02-01

    Several factors are known to influence speech accuracy in patients with apraxia of speech (AOS), e.g., syllable structure or word length. However, the impact of word stress has largely been neglected so far. More generally, the role of prosodic information at the phonetic encoding stage of speech production often remains unconsidered in models of speech production. This study aimed to investigate the influence of word stress on error production in AOS. Two-syllabic words with stress on the first (trochees) vs. the second syllable (iambs) were compared in 14 patients with AOS, three of them exhibiting pure AOS, and in a control group of six normal speakers. The patients produced significantly more errors on iambic than on trochaic words. A most prominent metrical effect was obtained for segmental errors. Acoustic analyses of word durations revealed a disproportionate advantage of the trochaic meter in the patients relative to the healthy controls. The results indicate that German apraxic speakers are sensitive to metrical information. It is assumed that metrical patterns function as prosodic frames for articulation planning, and that the regular metrical pattern in German, the trochaic form, has a facilitating effect on word production in patients with AOS. PMID:26792367

  13. Emphasis of short-duration acoustic speech cues for cochlear implant users.

    PubMed

    Vandali, A E

    2001-05-01

    A new speech-coding strategy for cochlear implant users, called the transient emphasis spectral maxima (TESM), was developed to aid perception of short-duration transient cues in speech. Speech-perception scores using the TESM strategy were compared to scores using the spectral maxima sound processor (SMSP) strategy in a group of eight adult users of the Nucleus 22 cochlear implant system. Significant improvements in mean speech-perception scores for the group were obtained on CNC open-set monosyllabic word tests in quiet (SMSP: 53.6% TESM: 61.3%, p<0.001), and on MUSL open-set sentence tests in multitalker noise (SMSP: 64.9% TESM: 70.6%, p<0.001). Significant increases were also shown for consonant scores in the word test (SMSP: 75.1% TESM: 80.6%, p<0.001) and for vowel scores in the word test (SMSP: 83.1% TESM: 85.7%, p<0.05). Analysis of consonant perception results from the CNC word tests showed that perception of nasal, stop, and fricative consonant discrimination was most improved. Information transmission analysis indicated that place of articulation was most improved, although improvements were also evident for manner of articulation. The increases in discrimination were shown to be related to improved coding of short-duration acoustic cues, particularly those of low intensity. PMID:11386557

  14. A Bayesian view on acoustic model-based techniques for robust speech recognition

    NASA Astrophysics Data System (ADS)

    Maas, Roland; Huemmer, Christian; Sehr, Armin; Kellermann, Walter

    2015-12-01

    This article provides a unifying Bayesian view on various approaches for acoustic model adaptation, missing feature, and uncertainty decoding that are well-known in the literature of robust automatic speech recognition. The representatives of these classes can often be deduced from a Bayesian network that extends the conventional hidden Markov models used in speech recognition. These extensions, in turn, can in many cases be motivated from an underlying observation model that relates clean and distorted feature vectors. By identifying and converting the observation models into a Bayesian network representation, we formulate the corresponding compensation rules. We thus summarize the various approaches as approximations or modifications of the same Bayesian decoding rule leading to a unified view on known derivations as well as to new formulations for certain approaches.

  15. The effects of selective attention and speech acoustics on neural speech-tracking in a multi-talker scene

    PubMed Central

    Rimmele, Johanna M.; Golumbic, Elana Zion; Schröger, Erich; Poeppel, David

    2015-01-01

    Attending to one speaker in multi-speaker situations is challenging. One neural mechanism proposed to underlie the ability to attend to a particular speaker is phase-locking of low-frequency activity in auditory cortex to speech’s temporal envelope (“speech-tracking”), which is more precise for attended speech. However, it is not known what brings about this attentional effect, and specifically if it reflects enhanced processing of the fine structure of attended speech. To investigate this question we compared attentional effects on speech-tracking of natural vs. vocoded speech which preserves the temporal envelope but removes the fine-structure of speech. Pairs of natural and vocoded speech stimuli were presented concurrently and participants attended to one stimulus and performed a detection task while ignoring the other stimulus. We recorded magnetoencephalography (MEG) and compared attentional effects on the speech-tracking response in auditory cortex. Speech-tracking of natural, but not vocoded, speech was enhanced by attention, whereas neural tracking of ignored speech was similar for natural and vocoded speech. These findings suggest that the more precise speech tracking of attended natural speech is related to processing its fine structure, possibly reflecting the application of higher-order linguistic processes. In contrast, when speech is unattended its fine structure is not processed to the same degree and thus elicits less precise speech tracking more similar to vocoded speech. PMID:25650107

  16. Acoustic Feature Optimization Based on F-Ratio for Robust Speech Recognition

    NASA Astrophysics Data System (ADS)

    Sun, Yanqing; Zhou, Yu; Zhao, Qingwei; Yan, Yonghong

    This paper focuses on the problem of performance degradation in mismatched speech recognition. The F-Ratio analysis method is utilized to analyze the significance of different frequency bands for speech unit classification, and we find that frequencies around 1kHz and 3kHz, which are the upper bounds of the first and the second formants for most of the vowels, should be emphasized in comparison to the Mel-frequency cepstral coefficients (MFCC). The analysis result is further observed to be stable in several typical mismatched situations. Similar to the Mel-Frequency scale, another frequency scale called the F-Ratio-scale is thus proposed to optimize the filter bank design for the MFCC features, and make each subband contains equal significance for speech unit classification. Under comparable conditions, with the modified features we get a relative 43.20% decrease compared with the MFCC in sentence error rate for the emotion affected speech recognition, 35.54%, 23.03% for the noisy speech recognition at 15dB and 0dB SNR (signal to noise ratio) respectively, and 64.50% for the three years' 863 test data. The application of the F-Ratio analysis on the clean training set of the Aurora2 database demonstrates its robustness over languages, texts and sampling rates.

  17. Advantages from bilateral hearing in speech perception in noise with simulated cochlear implants and residual acoustic hearing.

    PubMed

    Schoof, Tim; Green, Tim; Faulkner, Andrew; Rosen, Stuart

    2013-02-01

    Acoustic simulations were used to study the contributions of spatial hearing that may arise from combining a cochlear implant with either a second implant or contralateral residual low-frequency acoustic hearing. Speech reception thresholds (SRTs) were measured in twenty-talker babble. Spatial separation of speech and noise was simulated using a spherical head model. While low-frequency acoustic information contralateral to the implant simulation produced substantially better SRTs there was no effect of spatial cues on SRT, even when interaural differences were artificially enhanced. Simulated bilateral implants showed a significant head shadow effect, but no binaural unmasking based on interaural time differences, and weak, inconsistent overall spatial release from masking. There was also a small but significant non-spatial summation effect. It appears that typical cochlear implant speech processing strategies may substantially reduce the utility of spatial cues, even in the absence of degraded neural processing arising from auditory deprivation. PMID:23363118

  18. Depression Diagnoses and Fundamental Frequency-Based Acoustic Cues in Maternal Infant-Directed Speech

    ERIC Educational Resources Information Center

    Porritt, Laura L.; Zinser, Michael C.; Bachorowski, Jo-Anne; Kaplan, Peter S.

    2014-01-01

    F[subscript 0]-based acoustic measures were extracted from a brief, sentence-final target word spoken during structured play interactions between mothers and their 3- to 14-month-old infants and were analyzed based on demographic variables and DSM-IV Axis-I clinical diagnoses and their common modifiers. F[subscript 0] range (?F[subscript 0]) was…

  19. Improved perception of speech in noise and Mandarin tones with acoustic simulations of harmonic coding for cochlear implants.

    PubMed

    Li, Xing; Nie, Kaibao; Imennov, Nikita S; Won, Jong Ho; Drennan, Ward R; Rubinstein, Jay T; Atlas, Les E

    2012-11-01

    Harmonic and temporal fine structure (TFS) information are important cues for speech perception in noise and music perception. However, due to the inherently coarse spectral and temporal resolution in electric hearing, the question of how to deliver harmonic and TFS information to cochlear implant (CI) users remains unresolved. A harmonic-single-sideband-encoder [(HSSE); Nie et al. (2008). Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing; Lie et al., (2010). Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing] strategy has been proposed that explicitly tracks the harmonics in speech and transforms them into modulators conveying both amplitude modulation and fundamental frequency information. For unvoiced speech, HSSE transforms the TFS into a slowly varying yet still noise-like signal. To investigate its potential, four- and eight-channel vocoder simulations of HSSE and the continuous-interleaved-sampling (CIS) strategy were implemented, respectively. Using these vocoders, five normal-hearing subjects' speech recognition performance was evaluated under different masking conditions; another five normal-hearing subjects' Mandarin tone identification performance was also evaluated. Additionally, the neural discharge patterns evoked by HSSE- and CIS-encoded Mandarin tone stimuli were simulated using an auditory nerve model. All subjects scored significantly higher with HSSE than with CIS vocoders. The modeling analysis demonstrated that HSSE can convey temporal pitch cues better than CIS. Overall, the results suggest that HSSE is a promising strategy to enhance speech perception with CIs. PMID:23145619

  20. Analysis of Acoustic Features in Speakers with Cognitive Disorders and Speech Impairments

    NASA Astrophysics Data System (ADS)

    Saz, Oscar; Simón, Javier; Rodríguez, W. Ricardo; Lleida, Eduardo; Vaquero, Carlos

    2009-12-01

    This work presents the results in the analysis of the acoustic features (formants and the three suprasegmental features: tone, intensity and duration) of the vowel production in a group of 14 young speakers suffering different kinds of speech impairments due to physical and cognitive disorders. A corpus with unimpaired children's speech is used to determine the reference values for these features in speakers without any kind of speech impairment within the same domain of the impaired speakers; this is 57 isolated words. The signal processing to extract the formant and pitch values is based on a Linear Prediction Coefficients (LPCs) analysis of the segments considered as vowels in a Hidden Markov Model (HMM) based Viterbi forced alignment. Intensity and duration are also based in the outcome of the automated segmentation. As main conclusion of the work, it is shown that intelligibility of the vowel production is lowered in impaired speakers even when the vowel is perceived as correct by human labelers. The decrease in intelligibility is due to a 30% of increase in confusability in the formants map, a reduction of 50% in the discriminative power in energy between stressed and unstressed vowels and to a 50% increase of the standard deviation in the length of the vowels. On the other hand, impaired speakers keep good control of tone in the production of stressed and unstressed vowels.

  1. A Frame-Based Context-Dependent Acoustic Modeling for Speech Recognition

    NASA Astrophysics Data System (ADS)

    Terashima, Ryuta; Zen, Heiga; Nankaku, Yoshihiko; Tokuda, Keiichi

    We propose a novel acoustic model for speech recognition, named FCD (Frame-based Context Dependent) model. It can obtain a probability distribution by using a top-down clustering technique to simultaneously consider the local frame position in phoneme, phoneme duration, and phoneme context. The model topology is derived from connecting left-to-right HMM models without self-loop transition for each phoneme duration. Because the FCD model can change the probability distribution into a sequence corresponding with one phoneme duration, it can has the ability to generate a smooth trajectory of speech feature vector. We also performed an experiment to evaluate the performance of speech recognition for the model. In the experiment, 132 questions for frame position, 66 questions for phoneme duration and 134 questions for phoneme context were used to train the sub-phoneme FCD model. In order to compare the performance, left-to-right HMM and two types of HSMM models with almost same number of states were also trained. As a result, 18% of relative improvement of tri-phone accuracy was achieved by the FCD model.

  2. Speech intelligibility in rooms: Effect of prior listening exposure interacts with room acoustics.

    PubMed

    Zahorik, Pavel; Brandewie, Eugene J

    2016-07-01

    There is now converging evidence that a brief period of prior listening exposure to a reverberant room can influence speech understanding in that environment. Although the effect appears to depend critically on the amplitude modulation characteristic of the speech signal reaching the ear, the extent to which the effect may be influenced by room acoustics has not been thoroughly evaluated. This study seeks to fill this gap in knowledge by testing the effect of prior listening exposure or listening context on speech understanding in five different simulated sound fields, ranging from anechoic space to a room with broadband reverberation time (T60) of approximately 3 s. Although substantial individual variability in the effect was observed and quantified, the context effect was, on average, strongly room dependent. At threshold, the effect was minimal in anechoic space, increased to a maximum of 3 dB on average in moderate reverberation (T60 = 1 s), and returned to minimal levels again in high reverberation. This interaction suggests that the functional effects of prior listening exposure may be limited to sound fields with moderate reverberation (0.4 ≤ T60 ≤ 1 s). PMID:27475133

  3. Discrimination of Speech Stimuli Based on Neuronal Response Phase Patterns Depends on Acoustics But Not Comprehension

    PubMed Central

    Poeppel, David

    2010-01-01

    Speech stimuli give rise to neural activity in the listener that can be observed as waveforms using magnetoencephalography. Although waveforms vary greatly from trial to trial due to activity unrelated to the stimulus, it has been demonstrated that spoken sentences can be discriminated based on theta-band (3–7 Hz) phase patterns in single-trial response waveforms. Furthermore, manipulations of the speech signal envelope and fine structure that reduced intelligibility were found to produce correlated reductions in discrimination performance, suggesting a relationship between theta-band phase patterns and speech comprehension. This study investigates the nature of this relationship, hypothesizing that theta-band phase patterns primarily reflect cortical processing of low-frequency (<40 Hz) modulations present in the acoustic signal and required for intelligibility, rather than processing exclusively related to comprehension (e.g., lexical, syntactic, semantic). Using stimuli that are quite similar to normal spoken sentences in terms of low-frequency modulation characteristics but are unintelligible (i.e., their time-inverted counterparts), we find that discrimination performance based on theta-band phase patterns is equal for both types of stimuli. Consistent with earlier findings, we also observe that whereas theta-band phase patterns differ across stimuli, power patterns do not. We use a simulation model of the single-trial response to spoken sentence stimuli to demonstrate that phase-locked responses to low-frequency modulations of the acoustic signal can account not only for the phase but also for the power results. The simulation offers insight into the interpretation of the empirical results with respect to phase-resetting and power-enhancement models of the evoked response. PMID:20484530

  4. An acoustical assessment of pitch-matching accuracy in relation to speech frequency, speech frequency range, age and gender in preschool children

    NASA Astrophysics Data System (ADS)

    Trollinger, Valerie L.

    This study investigated the relationship between acoustical measurement of singing accuracy in relationship to speech fundamental frequency, speech fundamental frequency range, age and gender in preschool-aged children. Seventy subjects from Southeastern Pennsylvania; the San Francisco Bay Area, California; and Terre Haute, Indiana, participated in the study. Speech frequency was measured by having the subjects participate in spontaneous and guided speech activities with the researcher, with 18 diverse samples extracted from each subject's recording for acoustical analysis for fundamental frequency in Hz with the CSpeech computer program. The fundamental frequencies were averaged together to derive a mean speech frequency score for each subject. Speech range was calculated by subtracting the lowest fundamental frequency produced from the highest fundamental frequency produced, resulting in a speech range measured in increments of Hz. Singing accuracy was measured by having the subjects each echo-sing six randomized patterns using the pitches Middle C, D, E, F♯, G and A (440), using the solfege syllables of Do and Re, which were recorded by a 5-year-old female model. For each subject, 18 samples of singing were recorded. All samples were analyzed by the CSpeech for fundamental frequency. For each subject, deviation scores in Hz were derived by calculating the difference between what the model sang in Hz and what the subject sang in response in Hz. Individual scores for each child consisted of an overall mean total deviation frequency, mean frequency deviations for each pattern, and mean frequency deviation for each pitch. Pearson correlations, MANOVA and ANOVA analyses, Multiple Regressions and Discriminant Analysis revealed the following findings: (1) moderate but significant (p < .001) relationships emerged between mean speech frequency and the ability to sing the pitches E, F♯, G and A in the study; (2) mean speech frequency also emerged as the strongest

  5. Acoustic diagnosis of pulmonary hypertension: automated speech- recognition-inspired classification algorithm outperforms physicians.

    PubMed

    Kaddoura, Tarek; Vadlamudi, Karunakar; Kumar, Shine; Bobhate, Prashant; Guo, Long; Jain, Shreepal; Elgendi, Mohamed; Coe, James Y; Kim, Daniel; Taylor, Dylan; Tymchak, Wayne; Schuurmans, Dale; Zemp, Roger J; Adatia, Ian

    2016-01-01

    We hypothesized that an automated speech- recognition-inspired classification algorithm could differentiate between the heart sounds in subjects with and without pulmonary hypertension (PH) and outperform physicians. Heart sounds, electrocardiograms, and mean pulmonary artery pressures (mPAp) were recorded simultaneously. Heart sound recordings were digitized to train and test speech-recognition-inspired classification algorithms. We used mel-frequency cepstral coefficients to extract features from the heart sounds. Gaussian-mixture models classified the features as PH (mPAp ≥ 25 mmHg) or normal (mPAp < 25 mmHg). Physicians blinded to patient data listened to the same heart sound recordings and attempted a diagnosis. We studied 164 subjects: 86 with mPAp ≥ 25 mmHg (mPAp 41 ± 12 mmHg) and 78 with mPAp < 25 mmHg (mPAp 17 ± 5 mmHg) (p  < 0.005). The correct diagnostic rate of the automated speech-recognition-inspired algorithm was 74% compared to 56% by physicians (p = 0.005). The false positive rate for the algorithm was 34% versus 50% (p = 0.04) for clinicians. The false negative rate for the algorithm was 23% and 68% (p = 0.0002) for physicians. We developed an automated speech-recognition-inspired classification algorithm for the acoustic diagnosis of PH that outperforms physicians that could be used to screen for PH and encourage earlier specialist referral. PMID:27609672

  6. Acoustic diagnosis of pulmonary hypertension: automated speech- recognition-inspired classification algorithm outperforms physicians

    PubMed Central

    Kaddoura, Tarek; Vadlamudi, Karunakar; Kumar, Shine; Bobhate, Prashant; Guo, Long; Jain, Shreepal; Elgendi, Mohamed; Coe, James Y; Kim, Daniel; Taylor, Dylan; Tymchak, Wayne; Schuurmans, Dale; Zemp, Roger J.; Adatia, Ian

    2016-01-01

    We hypothesized that an automated speech- recognition-inspired classification algorithm could differentiate between the heart sounds in subjects with and without pulmonary hypertension (PH) and outperform physicians. Heart sounds, electrocardiograms, and mean pulmonary artery pressures (mPAp) were recorded simultaneously. Heart sound recordings were digitized to train and test speech-recognition-inspired classification algorithms. We used mel-frequency cepstral coefficients to extract features from the heart sounds. Gaussian-mixture models classified the features as PH (mPAp ≥ 25 mmHg) or normal (mPAp < 25 mmHg). Physicians blinded to patient data listened to the same heart sound recordings and attempted a diagnosis. We studied 164 subjects: 86 with mPAp ≥ 25 mmHg (mPAp 41 ± 12 mmHg) and 78 with mPAp < 25 mmHg (mPAp 17 ± 5 mmHg) (p  < 0.005). The correct diagnostic rate of the automated speech-recognition-inspired algorithm was 74% compared to 56% by physicians (p = 0.005). The false positive rate for the algorithm was 34% versus 50% (p = 0.04) for clinicians. The false negative rate for the algorithm was 23% and 68% (p = 0.0002) for physicians. We developed an automated speech-recognition-inspired classification algorithm for the acoustic diagnosis of PH that outperforms physicians that could be used to screen for PH and encourage earlier specialist referral. PMID:27609672

  7. Pointing and naming are not redundant: Children use gesture to modify nouns before they modify nouns in speech

    PubMed Central

    Cartmill, Erica A.; Hunsicker, Dea; Goldin-Meadow, Susan

    2014-01-01

    Nouns form the first building blocks of children’s language, but are not consistently modified by other words until around 2 ½ years of age. Before then, children often combine their nouns with gestures that indicate the object labeled by the noun, for example, pointing at a bottle while saying “bottle.” These gestures are typically assumed to be redundant with speech. Here we present data challenging this assumption, suggesting that these early pointing gestures serve a determiner-like function (i.e., point at bottle + “bottle” = that bottle). Using longitudinal data from 18 children (8 girls), we analyzed all utterances containing nouns and focused on (1) utterances containing an unmodified noun combined with a pointing gesture, and (2) utterances containing a noun modified by a determiner. We found that the age at which children first produced point+noun combinations predicted the onset age for determiner+noun combinations. Moreover, point+noun combinations decreased following the onset of determiner+noun constructions. Importantly, combinations of pointing gestures with other types of speech (e.g., point at bottle + “gimme” = gimme that) did not relate to the onset or offset of determiner+noun constructions. Point+noun combinations thus appear to selectively predict the development of a new construction in speech. When children point to an object and simultaneously label it, they are beginning to develop their understanding of nouns as a modifiable unit of speech. PMID:24588517

  8. Discrimination and Comprehension of Synthetic Speech by Students with Visual Impairments: The Case of Similar Acoustic Patterns

    ERIC Educational Resources Information Center

    Papadopoulos, Konstantinos; Argyropoulos, Vassilios S.; Kouroupetroglou, Georgios

    2008-01-01

    This study examined the perceptions held by sighted students and students with visual impairments of the intelligibility and comprehensibility of similar acoustic patterns produced by synthetic speech. It determined the types of errors the students made and compared the performance of the two groups on auditory discrimination and comprehension.

  9. Transient Auditory Storage of Acoustic Details Is Associated with Release of Speech from Informational Masking in Reverberant Conditions

    ERIC Educational Resources Information Center

    Huang, Ying; Huang, Qiang; Chen, Xun; Wu, Xihong; Li, Liang

    2009-01-01

    Perceptual integration of the sound directly emanating from the source with reflections needs both temporal storage and correlation computation of acoustic details. We examined whether the temporal storage is frequency dependent and associated with speech unmasking. In Experiment 1, a break in correlation (BIC) between interaurally correlated…

  10. Comments on "Effects of Noise on Speech Production: Acoustic and Perceptual Analyses" [J. Acoust. Soc. Am. 84, 917-928 (1988)].

    PubMed

    Fitch, H

    1989-11-01

    The effect of background noise on speech production is an important issue, both from the practical standpoint of developing speech recognition algorithms and from the theoretical standpoint of understanding how speech is tuned to the environment in which it is spoken. Summers et al. [J. Acoust. Soc. Am. 84, 917-928 (1988]) address this issue by experimentally manipulating the level of noise delivered through headphones to two talkers and making several kinds of acoustic measurements on the resulting speech. They indicate that they have replicated effects on amplitude, duration, and pitch and have found effects on spectral tilt and first-formant frequency (F1). The authors regard these acoustic changes as effects in themselves rather than as consequences of a change in vocal effort, and thus treat equally the change in spectral tilt and the change in F1. In fact, the change in spectral tilt is a well-documented and understood consequence of the change in the glottal waveform, which is known to occur with increased effort. The situation with F1 is less clear and is made difficult by measurement problems. The bias in linear predictive coding (LPC) techniques related to two of the other changes-fundamental frequency and spectral tilt-is discussed. PMID:2808931

  11. Comparing the effects of reverberation and of noise on speech recognition in simulated electric-acoustic listening

    PubMed Central

    Helms Tillery, Kate; Brown, Christopher A.; Bacon, Sid P.

    2012-01-01

    Cochlear implant users report difficulty understanding speech in both noisy and reverberant environments. Electric-acoustic stimulation (EAS) is known to improve speech intelligibility in noise. However, little is known about the potential benefits of EAS in reverberation, or about how such benefits relate to those observed in noise. The present study used EAS simulations to examine these questions. Sentences were convolved with impulse responses from a model of a room whose estimated reverberation times were varied from 0 to 1 sec. These reverberated stimuli were then vocoded to simulate electric stimulation, or presented as a combination of vocoder plus low-pass filtered speech to simulate EAS. Monaural sentence recognition scores were measured in two conditions: reverberated speech and speech in a reverberated noise. The long-term spectrum and amplitude modulations of the noise were equated to the reverberant energy, allowing a comparison of the effects of the interferer (speech vs noise). Results indicate that, at least in simulation, (1) EAS provides significant benefit in reverberation; (2) the benefits of EAS in reverberation may be underestimated by those in a comparable noise; and (3) the EAS benefit in reverberation likely arises from partially preserved cues in this background accessible via the low-frequency acoustic component. PMID:22280603

  12. Continuous perception and graded categorization: Electrophysiological evidence for a linear relationship between the acoustic signal and perceptual encoding of speech

    PubMed Central

    Toscano, Joseph C.; McMurray, Bob; Dennhardt, Joel; Luck, Steven. J.

    2012-01-01

    Speech sounds are highly variable, yet listeners readily extract information from them and transform continuous acoustic signals into meaningful categories during language comprehension. A central question is whether perceptual encoding captures continuous acoustic detail in a one-to-one fashion or whether it is affected by categories. We addressed this in an event-related potential (ERP) experiment in which listeners categorized spoken words that varied along a continuous acoustic dimension (voice onset time; VOT) in an auditory oddball task. We found that VOT effects were present through a late stage of perceptual processing (N1 component, ca. 100 ms poststimulus) and were independent of categories. In addition, effects of within-category differences in VOT were present at a post-perceptual categorization stage (P3 component, ca. 450 ms poststimulus). Thus, at perceptual levels, acoustic information is encoded continuously, independent of phonological information. Further, at phonological levels, fine-grained acoustic differences are preserved along with category information. PMID:20935168

  13. Assessing the Treatment Effects in Apraxia of Speech: Introduction and Evaluation of the Modified Diadochokinesis Test

    ERIC Educational Resources Information Center

    Hurkmans, Joost; Jonkers, Roel; Boonstra, Anne M.; Stewart, Roy E.; Reinders-Messelink, Heleen A.

    2012-01-01

    Background: The number of reliable and valid instruments to measure the effects of therapy in apraxia of speech (AoS) is limited. Aims: To evaluate the newly developed Modified Diadochokinesis Test (MDT), which is a task to assess the effects of rate and rhythm therapies for AoS in a multiple baseline across behaviours design. Methods: The…

  14. Recognition of Emotions in Mexican Spanish Speech: An Approach Based on Acoustic Modelling of Emotion-Specific Vowels

    PubMed Central

    Caballero-Morales, Santiago-Omar

    2013-01-01

    An approach for the recognition of emotions in speech is presented. The target language is Mexican Spanish, and for this purpose a speech database was created. The approach consists in the phoneme acoustic modelling of emotion-specific vowels. For this, a standard phoneme-based Automatic Speech Recognition (ASR) system was built with Hidden Markov Models (HMMs), where different phoneme HMMs were built for the consonants and emotion-specific vowels associated with four emotional states (anger, happiness, neutral, sadness). Then, estimation of the emotional state from a spoken sentence is performed by counting the number of emotion-specific vowels found in the ASR's output for the sentence. With this approach, accuracy of 87–100% was achieved for the recognition of emotional state of Mexican Spanish speech. PMID:23935410

  15. The influence of phonemic awareness development on acoustic cue weighting strategies in children's speech perception.

    PubMed

    Mayo, Catherine; Scobbie, James M; Hewlett, Nigel; Waters, Daphne

    2003-10-01

    In speech perception, children give particular patterns of weight to different acoustic cues (their cue weighting). These patterns appear to change with increased linguistic experience. Previous speech perception research has found a positive correlation between more analytical cue weighting strategies and the ability to consciously think about and manipulate segment-sized units (phonemic awareness). That research did not, however, aim to address whether the relation is in any way causal or, if so, then in which direction possible causality might move. Causality in this relation could move in 1 of 2 ways: Either phonemic awareness development could impact on cue weighting strategies or changes in cue weighting could allow for the later development of phonemic awareness. The aim of this study was to follow the development of these 2 processes longitudinally to determine which of the above 2 possibilities was more likely. Five-year-old children were tested 3 times in 7 months on their cue weighting strategies for a /so/-/[symbol in text]o/ contrast, in which the 2 cues manipulated were the frequency of fricative spectrum and the frequency of vowel-onset formant transitions. The children were also tested at the same time on their phoneme segmentation and phoneme blending skills. Results showed that phonemic awareness skills tended to improve before cue weighting changed and that early phonemic awareness ability predicted later cue weighting strategies. These results suggest that the development of metaphonemic awareness may play some role in changes in cue weighting. PMID:14575351

  16. Acoustic markers of prominence influence infants' and adults' segmentation of speech sequences.

    PubMed

    Bion, Ricardo A H; Benavides-Varela, Silvia; Nespor, Marina

    2011-03-01

    Two experiments investigated the way acoustic markers of prominence influence the grouping of speech sequences by adults and 7-month-old infants. In the first experiment, adults were familiarized with and asked to memorize sequences of adjacent syllables that alternated in either pitch or duration. During the test phase, participants heard pairs of syllables with constant pitch and duration and were asked whether the syllables had appeared adjacently during familiarization. Adults were better at remembering pairs of syllables that during familiarization had short syllables preceding long syllables, or high-pitched syllables preceding low-pitched syllables. In the second experiment, infants were familiarized and tested with similar stimuli as in the first experiment, and their preference for pairs of syllables was accessed using the head-turn preference paradigm.When familiarized with syllables alternating in pitch, infants showed a preference to listen to pairs of syllables that had high pitch in the first syllable. However, no preference was found when the familiarization stream alternated in duration. It is proposed that these perceptual biases help infants and adults find linguistic units in the continuous speech stream.While the bias for grouping based on pitch appears early in development, biases for durational grouping might rely on more extensive linguistic experience. PMID:21524015

  17. Statistical evidence that musical universals derive from the acoustic characteristics of human speech

    NASA Astrophysics Data System (ADS)

    Schwartz, David; Howe, Catharine; Purves, Dale

    2003-04-01

    Listeners of all ages and societies produce a similar consonance ordering of chromatic scale tone combinations. Despite intense interest in this perceptual phenomenon over several millennia, it has no generally accepted explanation in physical, psychological, or physiological terms. Here we show that the musical universal of consonance ordering can be understood in terms of the statistical relationship between a pattern of sound pressure at the ear and the possible generative sources of the acoustic energy pattern. Since human speech is the principal naturally occurring source of tone-evoking (i.e., periodic) sound energy for human listeners, we obtained normalized spectra from more than 100000 recorded speech segments. The probability distribution of amplitude/frequency combinations derived from these spectra predicts both the fundamental frequency ratios that define the chromatic scale intervals and the consonance ordering of chromatic scale tone combinations. We suggest that these observations reveal the statistical character of the perceptual process by which the auditory system guides biologically successful behavior in response to inherently ambiguous sound stimuli.

  18. Elicitation of the Acoustic Change Complex to Long-Duration Speech Stimuli in Four-Month-Old Infants.

    PubMed

    Chen, Ke Heng; Small, Susan A

    2015-01-01

    The acoustic change complex (ACC) is an auditory-evoked potential elicited to changes within an ongoing stimulus that indicates discrimination at the level of the auditory cortex. Only a few studies to date have attempted to record ACCs in young infants. The purpose of the present study was to investigate the elicitation of ACCs to long-duration speech stimuli in English-learning 4-month-old infants. ACCs were elicited to consonant contrasts made up of two concatenated speech tokens. The stimuli included native dental-dental /dada/ and dental-labial /daba/ contrasts and a nonnative Hindi dental-retroflex /daDa/ contrast. Each consonant-vowel speech token was 410 ms in duration. Slow cortical responses were recorded to the onset of the stimulus and to the acoustic change from /da/ to either /ba/ or /Da/ within the stimulus with significantly prolonged latencies compared with adults. ACCs were reliably elicited for all stimulus conditions with more robust morphology compared with our previous findings using stimuli that were shorter in duration. The P1 amplitudes elicited to the acoustic change in /daba/ and /daDa/ were significantly larger compared to /dada/ supporting that the brain discriminated between the speech tokens. These findings provide further evidence for the use of ACCs as an index of discrimination ability. PMID:26798343

  19. Elicitation of the Acoustic Change Complex to Long-Duration Speech Stimuli in Four-Month-Old Infants

    PubMed Central

    Chen, Ke Heng; Small, Susan A.

    2015-01-01

    The acoustic change complex (ACC) is an auditory-evoked potential elicited to changes within an ongoing stimulus that indicates discrimination at the level of the auditory cortex. Only a few studies to date have attempted to record ACCs in young infants. The purpose of the present study was to investigate the elicitation of ACCs to long-duration speech stimuli in English-learning 4-month-old infants. ACCs were elicited to consonant contrasts made up of two concatenated speech tokens. The stimuli included native dental-dental /dada/ and dental-labial /daba/ contrasts and a nonnative Hindi dental-retroflex /daDa/ contrast. Each consonant-vowel speech token was 410 ms in duration. Slow cortical responses were recorded to the onset of the stimulus and to the acoustic change from /da/ to either /ba/ or /Da/ within the stimulus with significantly prolonged latencies compared with adults. ACCs were reliably elicited for all stimulus conditions with more robust morphology compared with our previous findings using stimuli that were shorter in duration. The P1 amplitudes elicited to the acoustic change in /daba/ and /daDa/ were significantly larger compared to /dada/ supporting that the brain discriminated between the speech tokens. These findings provide further evidence for the use of ACCs as an index of discrimination ability. PMID:26798343

  20. The effect of different open plan and enclosed classroom acoustic conditions on speech perception in Kindergarten children.

    PubMed

    Mealings, Kiri T; Demuth, Katherine; Buchholz, Jörg M; Dillon, Harvey

    2015-10-01

    Open plan classrooms, where several classes are in the same room, have recently re-emerged in Australian primary schools. This paper explores how the acoustics of four Kindergarten classrooms [an enclosed classroom (25 children), double classroom (44 children), fully open plan triple classroom (91 children), and a semi-open plan K-6 "21st century learning space" (205 children)] affect speech perception. Twenty-two to 23 5-6-year-old children in each classroom participated in an online four-picture choice speech perception test while adjacent classes engaged in quiet versus noisy activities. The noise levels recorded during the test were higher the larger the classroom, except in the noisy condition for the K-6 classroom, possibly due to acoustic treatments. Linear mixed effects models revealed children's performance accuracy and speed decreased as noise level increased. Additionally, children's speech perception abilities decreased the further away they were seated from the loudspeaker in noise levels above 50 dBA. These results suggest that fully open plan classrooms are not appropriate learning environments for critical listening activities with young children due to their high intrusive noise levels which negatively affect speech perception. If open plan classrooms are desired, they need to be acoustically designed to be appropriate for critical listening activities. PMID:26520328

  1. Can acoustic vowel space predict the habitual speech rate of the speaker?

    PubMed

    Tsao, Y-C; Iqbal, K

    2005-01-01

    This study aims to find whether the acoustic vowel space reflect the habitual speaking rate of the speaker. The vowel space is defined as the area of the quadrilateral formed by the four corner vowels (i.e.,/i/,/æ/,/u/,/α) in the F1F2- 2 plane. The study compares the acoustic vowel space in the speech of habitually slow and fast talkers and further analyzes them by gender. In addition to the measurement of vowel duration and midpoint frequencies of F1 and F2, the F1/F2 vowel space areas were measured and compared across speakers. The results indicate substantial overlap in vowel space area functions between slow and fast talkers, though the slow speakers were found to have larger vowel spaces. Furthermore, large variability in vowel space area functions was noted among interspeakers in each group. Both F1 and F2 formant frequencies were found to be gender sensitive in consistence with the existing data. No predictive relation between vowel duration and formant frequencies was observed among speakers. PMID:17282413

  2. Speaker height estimation from speech: Fusing spectral regression and statistical acoustic models.

    PubMed

    Hansen, John H L; Williams, Keri; Bořil, Hynek

    2015-08-01

    Estimating speaker height can assist in voice forensic analysis and provide additional side knowledge to benefit automatic speaker identification or acoustic model selection for automatic speech recognition. In this study, a statistical approach to height estimation that incorporates acoustic models within a non-uniform height bin width Gaussian mixture model structure as well as a formant analysis approach that employs linear regression on selected phones are presented. The accuracy and trade-offs of these systems are explored by examining the consistency of the results, location, and causes of error as well a combined fusion of the two systems using data from the TIMIT corpus. Open set testing is also presented using the Multi-session Audio Research Project corpus and publicly available YouTube audio to examine the effect of channel mismatch between training and testing data and provide a realistic open domain testing scenario. The proposed algorithms achieve a highly competitive performance to previously published literature. Although the different data partitioning in the literature and this study may prevent performance comparisons in absolute terms, the mean average error of 4.89 cm for males and 4.55 cm for females provided by the proposed algorithm on TIMIT utterances containing selected phones suggest a considerable estimation error decrease compared to past efforts. PMID:26328721

  3. Effects of acoustic variability in the perceptual learning of non-native-accented speech sounds.

    PubMed

    Wade, Travis; Jongman, Allard; Sereno, Joan

    2007-01-01

    This study addressed whether acoustic variability and category overlap in non-native speech contribute to difficulty in its recognition, and more generally whether the benefits of exposure to acoustic variability during categorization training are stable across differences in category confusability. Three experiments considered a set of Spanish-accented English productions. The set was seen to pose learning and recognition difficulty (experiment 1) and was more variable and confusable than a parallel set of native productions (experiment 2). A training study (experiment 3) probed the relative contributions of category central tendency and variability to difficulty in vowel identification using derived inventories in which these dimensions were manipulated based on the results of experiments 1 and 2. Training and test difficulty related straightforwardly to category confusability but not to location in the vowel space. Benefits of high-variability exposure also varied across vowel categories, and seemed to be diminished for highly confusable vowels. Overall, variability was implicated in perception and learning difficulty in ways that warrant further investigation. PMID:17914280

  4. Design of acoustic beam aperture modifier using gradient-index phononic crystals

    PubMed Central

    Lin, Sz-Chin Steven; Tittmann, Bernhard R.; Huang, Tony Jun

    2012-01-01

    This article reports the design concept of a novel acoustic beam aperture modifier using butt-jointed gradient-index phononic crystals (GRIN PCs) consisting of steel cylinders embedded in a homogeneous epoxy background. By gradually tuning the period of a GRIN PC, the propagating direction of acoustic waves can be continuously bent to follow a sinusoidal trajectory in the structure. The aperture of an acoustic beam can therefore be shrunk or expanded through change of the gradient refractive index profiles of the butt-jointed GRIN PCs. Our computational results elucidate the effectiveness of the proposed acoustic beam aperture modifier. Such an acoustic device can be fabricated through a simple process and will be valuable in applications, such as biomedical imaging and surgery, nondestructive evaluation, communication, and acoustic absorbers. PMID:22807585

  5. Acoustic Source Characteristics, Across-Formant Integration, and Speech Intelligibility Under Competitive Conditions

    PubMed Central

    2015-01-01

    An important aspect of speech perception is the ability to group or select formants using cues in the acoustic source characteristics—for example, fundamental frequency (F0) differences between formants promote their segregation. This study explored the role of more radical differences in source characteristics. Three-formant (F1+F2+F3) synthetic speech analogues were derived from natural sentences. In Experiment 1, F1+F3 were generated by passing a harmonic glottal source (F0 = 140 Hz) through second-order resonators (H1+H3); in Experiment 2, F1+F3 were tonal (sine-wave) analogues (T1+T3). F2 could take either form (H2 or T2). In some conditions, the target formants were presented alone, either monaurally or dichotically (left ear = F1+F3; right ear = F2). In others, they were accompanied by a competitor for F2 (F1+F2C+F3; F2), which listeners must reject to optimize recognition. Competitors (H2C or T2C) were created using the time-reversed frequency and amplitude contours of F2. Dichotic presentation of F2 and F2C ensured that the impact of the competitor arose primarily through informational masking. In the absence of F2C, the effect of a source mismatch between F1+F3 and F2 was relatively modest. When F2C was present, intelligibility was lowest when F2 was tonal and F2C was harmonic, irrespective of which type matched F1+F3. This finding suggests that source type and context, rather than similarity, govern the phonetic contribution of a formant. It is proposed that wideband harmonic analogues are more effective informational maskers than narrowband tonal analogues, and so become dominant in across-frequency integration of phonetic information when placed in competition. PMID:25751040

  6. Acoustic source characteristics, across-formant integration, and speech intelligibility under competitive conditions.

    PubMed

    Roberts, Brian; Summers, Robert J; Bailey, Peter J

    2015-06-01

    An important aspect of speech perception is the ability to group or select formants using cues in the acoustic source characteristics--for example, fundamental frequency (F0) differences between formants promote their segregation. This study explored the role of more radical differences in source characteristics. Three-formant (F1+F2+F3) synthetic speech analogues were derived from natural sentences. In Experiment 1, F1+F3 were generated by passing a harmonic glottal source (F0 = 140 Hz) through second-order resonators (H1+H3); in Experiment 2, F1+F3 were tonal (sine-wave) analogues (T1+T3). F2 could take either form (H2 or T2). In some conditions, the target formants were presented alone, either monaurally or dichotically (left ear = F1+F3; right ear = F2). In others, they were accompanied by a competitor for F2 (F1+F2C+F3; F2), which listeners must reject to optimize recognition. Competitors (H2C or T2C) were created using the time-reversed frequency and amplitude contours of F2. Dichotic presentation of F2 and F2C ensured that the impact of the competitor arose primarily through informational masking. In the absence of F2C, the effect of a source mismatch between F1+F3 and F2 was relatively modest. When F2C was present, intelligibility was lowest when F2 was tonal and F2C was harmonic, irrespective of which type matched F1+F3. This finding suggests that source type and context, rather than similarity, govern the phonetic contribution of a formant. It is proposed that wideband harmonic analogues are more effective informational maskers than narrowband tonal analogues, and so become dominant in across-frequency integration of phonetic information when placed in competition. PMID:25751040

  7. Sine-wave and noise-vocoded sine-wave speech in a tone language: Acoustic details matter.

    PubMed

    Rosen, Stuart; Hui, Sze Ngar Catherine

    2015-12-01

    Sine-wave speech (SWS) is a highly simplified version of speech consisting only of frequency- and amplitude-modulated sinusoids representing the formants. That listeners can successfully understand SWS has led to claims that speech perception must be based on abstract properties of the stimuli far removed from their specific acoustic form. Here it is shown, in bilingual Cantonese/English listeners, that performance with Cantonese SWS is improved by noise vocoding, with no effect on English SWS utterances. This manipulation preserves the abstract informational structure in the signals but changes its surface form. The differential effects of noise vocoding likely arise from the fact that Cantonese is a tonal language and hence more reliant on fundamental frequency (F0) contours for its intelligibility. SWS does not preserve tonal information from the original speech but does have false tonal information signalled by the lowest frequency sinusoid. Noise vocoding SWS appears to minimise the tonal percept, which thus interferes less in the perception of Cantonese. It has no effect in English, which is minimally reliant on F0 variations for intelligibility. Therefore it is not only the informational structure of a sound that is important but also how its acoustic detail interacts with the phonological structure of a given language. PMID:26723325

  8. Effects of a music therapy voice protocol on speech intelligibility, vocal acoustic measures, and mood of individuals with Parkinson's disease.

    PubMed

    Haneishi, E

    2001-01-01

    This study examined the effects of a Music Therapy Voice Protocol (MTVP) on speech intelligibility, vocal intensity, maximum vocal range, maximum duration of sustained vowel phonation, vocal fundamental frequency, vocal fundamental frequency variability, and mood of individuals with Parkinson's disease. Four female patients, who demonstrated voice and speech problems, served as their own controls and participated in baseline assessment (study pretest), a series of MTVP sessions involving vocal and singing exercises, and final evaluation (study posttest). In study pre and posttests, data for speech intelligibility and all acoustic variables were collected. Statistically significant increases were found in speech intelligibility, as rated by caregivers, and in vocal intensity from study pretest to posttest as the results of paired samples t-tests. In addition, before and after each MTVP session (session pre and posttests), self-rated mood scores and selected acoustic variables were collected. No significant differences were found in any of the variables from the session pretests to posttests, across the entire treatment period, or their interactions as the results of two-way ANOVAs with repeated measures. Although not significant, the mean of mood scores in session posttests (M = 8.69) was higher than that in session pretests (M = 7.93). PMID:11796078

  9. Modified ion-acoustic solitary waves in plasmas with field-aligned shear flows

    SciTech Connect

    Saleem, H.; Haque, Q.

    2015-08-15

    The nonlinear dynamics of ion-acoustic waves is investigated in a plasma having field-aligned shear flow. A Korteweg-deVries-type nonlinear equation for a modified ion-acoustic wave is obtained which admits a single pulse soliton solution. The theoretical result has been applied to solar wind plasma at 1 AU for illustration.

  10. Effects of speech style, room acoustics, and vocal fatigue on vocal effort.

    PubMed

    Bottalico, Pasquale; Graetzer, Simone; Hunter, Eric J

    2016-05-01

    Vocal effort is a physiological measure that accounts for changes in voice production as vocal loading increases. It has been quantified in terms of sound pressure level (SPL). This study investigates how vocal effort is affected by speaking style, room acoustics, and short-term vocal fatigue. Twenty subjects were recorded while reading a text at normal and loud volumes in anechoic, semi-reverberant, and reverberant rooms in the presence of classroom babble noise. The acoustics in each environment were modified by creating a strong first reflection in the talker position. After each task, the subjects answered questions addressing their perception of the vocal effort, comfort, control, and clarity of their own voice. Variation in SPL for each subject was measured per task. It was found that SPL and self-reported effort increased in the loud style and decreased when the reflective panels were present and when reverberation time increased. Self-reported comfort and control decreased in the loud style, while self-reported clarity increased when panels were present. The lowest magnitude of vocal fatigue was experienced in the semi-reverberant room. The results indicate that early reflections may be used to reduce vocal effort without modifying reverberation time. PMID:27250179

  11. Acoustic changes in the production of lexical stress during Lombard speech.

    PubMed

    Arciuli, Joanne; Simpson, Briony S; Vogel, Adam P; Ballard, Kirrie J

    2014-06-01

    The Lombard effect describes the phenomenon of individuals increasing their vocal intensity when speaking in the presence of background noise. Here, we conducted an investigation of the production of lexical stress during Lombard speech. Participants (N = 27) produced the same sentences in three conditions: one quiet condition and two noise conditions at 70 dB (white noise; multi-talker babble). Manual acoustic analyses (syllable duration, vowel intensity, and vowel fundamental frequency) were completed for repeated productions of two trisyllabic words with opposing patterns of lexical stress (weak-strong; strong-weak) in each of the three conditions. In total, 324 productions were analysed (12 utterances per participant). Results revealed that, rather than increasing vocal intensity equally across syllables, participants alter the degree of stress contrastivity when speaking in noise. This was especially evident in the production of strong-weak lexical stress where there was an increase in contrastivity across syllables in terms of intensity and fundamental frequency. This preliminary study paves the way for further research that is needed to establish these findings using a larger set of multisyllabic stimuli. PMID:25102603

  12. Language-Specific Developmental Differences in Speech Production: A Cross-Language Acoustic Study

    ERIC Educational Resources Information Center

    Li, Fangfang

    2012-01-01

    Speech productions of 40 English- and 40 Japanese-speaking children (aged 2-5) were examined and compared with the speech produced by 20 adult speakers (10 speakers per language). Participants were recorded while repeating words that began with "s" and "sh" sounds. Clear language-specific patterns in adults' speech were found, with English…

  13. Modifying the acoustic impedance of polyurea-based composites

    NASA Astrophysics Data System (ADS)

    Nantasetphong, Wiroj; Amirkhizi, Alireza V.; Jia, Zhanzhan; Nemat-Nasser, Sia

    2013-04-01

    Acoustic impedance is a material property that depends on mass density and acoustic wave speed. An impedance mismatch between two media leads to the partial reflection of an acoustic wave sent from one medium to another. Active sonar is one example of a useful application of this phenomenon, where reflected and scattered acoustic waves enable the detection of objects. If the impedance of an object is matched to that of the surrounding medium, however, the object may be hidden from observation (at least directly) by sonar. In this study, polyurea composites are developed to facilitate such impedance matching. Polyurea is used due to its excellent blast-mitigating properties, easy casting, corrosion protection, abrasion resistance, and various uses in current military technology. Since pure polyurea has impedance higher than that of water (the current medium of interest), low mass density phenolic microballoon particles are added to create composite materials with reduced effective impedances. The volume fraction of particles is varied to study the effect of filler quantity on the acoustic impedance of the resulting composite. The composites are experimentally characterized via ultrasonic measurements. Computational models based on the method of dilute-randomly-distributed inclusions are developed and compared with the experimental results. These experiments and models will facilitate the design of new elastomeric composites with desirable acoustic impedances.

  14. Systematic Studies of Modified Vocalization: The Effect of Speech Rate on Speech Production Measures during Metronome-Paced Speech in Persons Who Stutter

    ERIC Educational Resources Information Center

    Davidow, Jason H.

    2014-01-01

    Background: Metronome-paced speech results in the elimination, or substantial reduction, of stuttering moments. The cause of fluency during this fluency-inducing condition is unknown. Several investigations have reported changes in speech pattern characteristics from a control condition to a metronome-paced speech condition, but failure to control…

  15. Acoustic and perceptual correlates of faster-than-habitual speech produced by speakers with Parkinson's disease and Multiple Sclerosis

    PubMed Central

    Kuo, Christina; Tjaden, Kris; Sussman, Joan E.

    2014-01-01

    Acoustic-perceptual characteristics of a faster-than-habitual rate (Fast condition) were examined for speakers with Parkinson's disease (PD) and Multiple Sclerosis (MS). Judgments of intelligibility for sentences produced at a habitual rate (Habitual condition) and at a faster-than-habitual rate (Fast condition) by 46 speakers with PD or MS as well as a group of 32 healthy speakers revealed that the Fast condition was, on average, associated with decreased intelligibility. However, some speakers' intelligibility did not decline. To further understand the acoustic characteristics of varied intelligibility in the Fast condition for speakers with dysarthria, a subgroup of speakers with PD or MS whose intelligibility did not decline in the Fast condition (No Decline group, n = 8) and a subgroup of speakers with significantly declined intelligibility (Decline group, n = 8) were compared. Acoustic measures of global speech timing, suprasegmental characteristics, and utterance-level segmental characteristics for vocalics were examined for the two subgroups. Results suggest acoustic contributions to intelligibility under rate modulation are complex. Potential clinical relevance and implications for the acoustic bases of intelligibility are discussed. PMID:25287378

  16. Acoustic and perceptual evaluation of category goodness of /t/ and /k/ in typical and misarticulated children's speech.

    PubMed

    Strömbergsson, Sofia; Salvi, Giampiero; House, David

    2015-06-01

    This investigation explores perceptual and acoustic characteristics of children's successful and unsuccessful productions of /t/ and /k/, with a specific aim of exploring perceptual sensitivity to phonetic detail, and the extent to which this sensitivity is reflected in the acoustic domain. Recordings were collected from 4- to 8-year-old children with a speech sound disorder (SSD) who misarticulated one of the target plosives, and compared to productions recorded from peers with typical speech development (TD). Perceptual responses were registered with regards to a visual-analog scale, ranging from "clear [t]" to "clear [k]." Statistical models of prototypical productions were built, based on spectral moments and discrete cosine transform features, and used in the scoring of SSD productions. In the perceptual evaluation, "clear substitutions" were rated as less prototypical than correct productions. Moreover, target-appropriate productions of /t/ and /k/ produced by children with SSD were rated as less prototypical than those produced by TD peers. The acoustical modeling could to a large extent discriminate between the gross categories /t/ and /k/, and scored the SSD utterances on a continuous scale that was largely consistent with the category of production. However, none of the methods exhibited the same sensitivity to phonetic detail as the human listeners. PMID:26093431

  17. Disordered speech disrupts conversational entrainment: a study of acoustic-prosodic entrainment and communicative success in populations with communication challenges.

    PubMed

    Borrie, Stephanie A; Lubold, Nichola; Pon-Barry, Heather

    2015-01-01

    Conversational entrainment, a pervasive communication phenomenon in which dialogue partners adapt their behaviors to align more closely with one another, is considered essential for successful spoken interaction. While well-established in other disciplines, this phenomenon has received limited attention in the field of speech pathology and the study of communication breakdowns in clinical populations. The current study examined acoustic-prosodic entrainment, as well as a measure of communicative success, in three distinctly different dialogue groups: (i) healthy native vs. healthy native speakers (Control), (ii) healthy native vs. foreign-accented speakers (Accented), and (iii) healthy native vs. dysarthric speakers (Disordered). Dialogue group comparisons revealed significant differences in how the groups entrain on particular acoustic-prosodic features, including pitch, intensity, and jitter. Most notably, the Disordered dialogues were characterized by significantly less acoustic-prosodic entrainment than the Control dialogues. Further, a positive relationship between entrainment indices and communicative success was identified. These results suggest that the study of conversational entrainment in speech pathology will have essential implications for both scientific theory and clinical application in this domain. PMID:26321996

  18. Semantic and acoustic analysis of speech by functional networks with distinct time scales.

    PubMed

    Deng, Siyi; Srinivasan, Ramesh

    2010-07-30

    Speech perception requires the successful interpretation of both phonetic and syllabic information in the auditory signal. It has been suggested by Poeppel (2003) that phonetic processing requires an optimal time scale of 25 ms while the time scale of syllabic processing is much slower (150-250 ms). To better understand the operation of brain networks at these characteristic time scales during speech perception, we studied the spatial and dynamic properties of EEG responses to five different stimuli: (1) amplitude modulated (AM) speech, (2) AM speech with added broadband noise, (3) AM reversed speech, (4) AM broadband noise, and (5) AM pure tone. Amplitude modulation at gamma band frequencies (40 Hz) elicited steady-state auditory evoked responses (SSAERs) bilaterally over primary auditory cortices. Reduced SSAERs were observed over the left auditory cortex only for stimuli containing speech. In addition, we found over the left hemisphere, anterior to primary auditory cortex, a network whose instantaneous frequencies in the theta to alpha band (4-16 Hz) are correlated with the amplitude envelope of the speech signal. This correlation was not observed for reversed speech. The presence of speech in the sound input activates a 4-16 Hz envelope tracking network and suppresses the 40-Hz gamma band network which generates the steady-state responses over the left auditory cortex. We believe these findings to be consistent with the idea that processing of the speech signals involves preferentially processing at syllabic time scales rather than phonetic time scales. PMID:20580635

  19. Semantic and acoustic analysis of speech by functional networks with distinct time scales

    PubMed Central

    Deng, Siyi; Srinivasan, Ramesh

    2014-01-01

    Speech perception requires the successful interpretation of both phonetic and syllabic information in the auditory signal. It has been suggested by Poeppel (2003) that phonetic processing requires an optimal time scale of 25 ms while the time scale of syllabic processing is much slower (150–250ms). To better understand the operation of brain networks at these characteristic time scales during speech perception, we studied the spatial and dynamic properties of EEG responses to five different stimuli: (1) amplitude modulated (AM) speech, (2) AM speech with added broadband noise, (3) AM reversed speech, (4) AM broadband noise, and (5) AM pure tone. Amplitude modulation at gamma band frequencies (40 Hz) elicited steady-state auditory evoked responses (SSAERs) bilaterally over primary auditory cortices. Reduced SSAERs were observed over the left auditory cortex only for stimuli containing speech. In addition, we found over the left hemisphere, anterior to primary auditory cortex, a network whose instantaneous frequencies in the theta to alpha band (4–16 Hz) are correlated with the amplitude envelope of the speech signal. This correlation was not observed for reversed speech. The presence of speech in the sound input activates a 4–16 Hz envelope tracking network and suppresses the 40-Hz gamma band network which generates the steady-state responses over the left auditory cortex. We believe these findings to be consistent with the idea that processing of the speech signals involves preferentially processing at syllabic time scales rather than phonetic time scales. PMID:20580635

  20. Speech-clarity judgments of hearing-aid-processed speech in noise: differing polar patterns and acoustic environments.

    PubMed

    Amlani, Amyn M; Rakerd, Brad; Punch, Jerry L

    2006-06-01

    This investigation assessed the extent to which listeners' preferences for hearing aid microphone polar patterns vary across listening environments, and whether normal-hearing and inexperienced and experienced hearing-impaired listeners differ in such preferences. Paired-comparison judgments of speech clarity (i.e. subjective speech intelligibility) were made monaurally for recordings of speech in noise processed by a commercially available hearing aid programmed with an omnidirectional and two directional polar patterns (cardioid and hypercardioid). Testing environments included a sound-treated room, a living room, and a classroom. Polar-pattern preferences were highly reliable and agreed closely across all three groups of listeners. All groups preferred listening in the sound-treated room over listening in the living room, and preferred listening in the living room over listening in the classroom. Each group preferred the directional patterns to the omnidirectional pattern in all room conditions. We observed no differences in preference judgments between the two directional patterns or between hearing-impaired listeners' extent of amplification experience. Overall, findings indicate that listeners perceived qualitative benefits from microphones having directional polar patterns. PMID:16777778

  1. A study of acoustic-to-articulatory inversion of speech by analysis-by-synthesis using chain matrices and the Maeda articulatory model

    PubMed Central

    Panchapagesan, Sankaran; Alwan, Abeer

    2011-01-01

    In this paper, a quantitative study of acoustic-to-articulatory inversion for vowel speech sounds by analysis-by-synthesis using the Maeda articulatory model is performed. For chain matrix calculation of vocal tract (VT) acoustics, the chain matrix derivatives with respect to area function are calculated and used in a quasi-Newton method for optimizing articulatory trajectories. The cost function includes a distance measure between natural and synthesized first three formants, and parameter regularization and continuity terms. Calibration of the Maeda model to two speakers, one male and one female, from the University of Wisconsin x-ray microbeam (XRMB) database, using a cost function, is discussed. Model adaptation includes scaling the overall VT and the pharyngeal region and modifying the outer VT outline using measured palate and pharyngeal traces. The inversion optimization is initialized by a fast search of an articulatory codebook, which was pruned using XRMB data to improve inversion results. Good agreement between estimated midsagittal VT outlines and measured XRMB tongue pellet positions was achieved for several vowels and diphthongs for the male speaker, with average pellet-VT outline distances around 0.15 cm, smooth articulatory trajectories, and less than 1% average error in the first three formants. PMID:21476670

  2. Experimental Verification of the Shear-Modified Ion-Acoustic Instability

    NASA Astrophysics Data System (ADS)

    Teodorescu, C.; Reynolds, E. W.; Koepke, M. E.

    2002-05-01

    The predicted shear-induced shift of the wave phase velocity, the essence of the shear-modified ion-acoustic (SMIA) instability mechanism that reduces ion Landau damping for otherwise damped ion-acoustic waves [V. Gavrishchaka et al., 80, 728 (1998)], is verified with direct measurements in a strongly magnetized laboratory plasma. The SMIA growth rate is shown to increase with increasing shear, as predicted. SMIA wave propagation is shown to be possible at both small and large angles to the magnetic field, consistent with space observations of ion-acoustic-like waves.

  3. An acoustic feature-based similarity scoring system for speech rehabilitation assistance.

    PubMed

    Syauqy, Dahnial; Wu, Chao-Min; Setyawati, Onny

    2016-08-01

    The purpose of this study is to develop a tool to assist speech therapy and rehabilitation, which focused on automatic scoring based on the comparison of the patient's speech with another normal speech on several aspects including pitch, vowel, voiced-unvoiced segments, strident fricative and sound intensity. The pitch estimation employed the use of cepstrum-based algorithm for its robustness; the vowel classification used multilayer perceptron (MLP) to classify vowel from pitch and formants; and the strident fricative detection was based on the major peak spectral intensity, location and the pitch existence in the segment. In order to evaluate the performance of the system, this study analyzed eight patient's speech recordings (four males, four females; 4-58-years-old), which had been recorded in previous study in cooperation with Taipei Veterans General Hospital and Taoyuan General Hospital. The experiment result on pitch algorithm showed that the cepstrum method had 5.3% of gross pitch error from a total of 2086 frames. On the vowel classification algorithm, MLP method provided 93% accuracy (men), 87% (women) and 84% (children). In total, the overall results showed that 156 tool's grading results (81%) were consistent compared to 192 audio and visual observations done by four experienced respondents. Implication for Rehabilitation Difficulties in communication may limit the ability of a person to transfer and exchange information. The fact that speech is one of the primary means of communication has encouraged the needs of speech diagnosis and rehabilitation. The advances of technology in computer-assisted speech therapy (CAST) improve the quality, time efficiency of the diagnosis and treatment of the disorders. The present study attempted to develop tool to assist speech therapy and rehabilitation, which provided simple interface to let the assessment be done even by the patient himself without the need of particular knowledge of speech processing while at the

  4. Acoustic Analysis of Clear Versus Conversational Speech in Individuals with Parkinson Disease

    ERIC Educational Resources Information Center

    Goberman, A.M.; Elmer, L.W.

    2005-01-01

    A number of studies have been devoted to the examination of clear versus conversational speech in non-impaired speakers. The purpose of these previous studies has been primarily to help increase speech intelligibility for the benefit of hearing-impaired listeners. The goal of the present study was to examine differences between conversational and…

  5. A Computational Model of Word Segmentation from Continuous Speech Using Transitional Probabilities of Atomic Acoustic Events

    ERIC Educational Resources Information Center

    Rasanen, Okko

    2011-01-01

    Word segmentation from continuous speech is a difficult task that is faced by human infants when they start to learn their native language. Several studies indicate that infants might use several different cues to solve this problem, including intonation, linguistic stress, and transitional probabilities between subsequent speech sounds. In this…

  6. Acoustic Analysis of the Speech of Children with Cochlear Implants: A Longitudinal Study

    ERIC Educational Resources Information Center

    Liker, Marko; Mildner, Vesna; Sindija, Branka

    2007-01-01

    The aim of the study was to analyse the speech of the children with cochlear implants, and compare it with the speech of hearing controls. We focused on three categories of Croatian sounds: vowels (F1 and F2 frequencies), fricatives (noise frequencies of /s/ and /[esh]/ ), and affricates (total duration and the pattern of stop-fricative components…

  7. Designing acoustics for linguistically diverse classrooms: Effects of background noise, reverberation and talker foreign accent on speech comprehension by native and non-native English-speaking listeners

    NASA Astrophysics Data System (ADS)

    Peng, Zhao Ellen

    The current classroom acoustics standard (ANSI S12.60-2010) recommends core learning spaces not to exceed background noise level (BNL) of 35 dBA and reverberation time (RT) of 0.6 second, based on speech intelligibility performance mainly by the native English-speaking population. Existing literature has not correlated these recommended values well with student learning outcomes. With a growing population of non-native English speakers in American classrooms, the special needs for perceiving degraded speech among non-native listeners, either due to realistic room acoustics or talker foreign accent, have not been addressed in the current standard. This research seeks to investigate the effects of BNL and RT on the comprehension of English speech from native English and native Mandarin Chinese talkers as perceived by native and non-native English listeners, and to provide acoustic design guidelines to supplement the existing standard. This dissertation presents two studies on the effects of RT and BNL on more realistic classroom learning experiences. How do native and non-native English-speaking listeners perform on speech comprehension tasks under adverse acoustic conditions, if the English speech is produced by talkers of native English (Study 1) versus native Mandarin Chinese (Study 2)? Speech comprehension materials were played back in a listening chamber to individual listeners: native and non-native English-speaking in Study 1; native English, native Mandarin Chinese, and other non-native English-speaking in Study 2. Each listener was screened for baseline English proficiency level, and completed dual tasks simultaneously involving speech comprehension and adaptive dot-tracing under 15 acoustic conditions, comprised of three BNL conditions (RC-30, 40, and 50) and five RT scenarios (0.4 to 1.2 seconds). The results show that BNL and RT negatively affect both objective performance and subjective perception of speech comprehension, more severely for non

  8. Disordered speech disrupts conversational entrainment: a study of acoustic-prosodic entrainment and communicative success in populations with communication challenges

    PubMed Central

    Borrie, Stephanie A.; Lubold, Nichola; Pon-Barry, Heather

    2015-01-01

    Conversational entrainment, a pervasive communication phenomenon in which dialogue partners adapt their behaviors to align more closely with one another, is considered essential for successful spoken interaction. While well-established in other disciplines, this phenomenon has received limited attention in the field of speech pathology and the study of communication breakdowns in clinical populations. The current study examined acoustic-prosodic entrainment, as well as a measure of communicative success, in three distinctly different dialogue groups: (i) healthy native vs. healthy native speakers (Control), (ii) healthy native vs. foreign-accented speakers (Accented), and (iii) healthy native vs. dysarthric speakers (Disordered). Dialogue group comparisons revealed significant differences in how the groups entrain on particular acoustic–prosodic features, including pitch, intensity, and jitter. Most notably, the Disordered dialogues were characterized by significantly less acoustic-prosodic entrainment than the Control dialogues. Further, a positive relationship between entrainment indices and communicative success was identified. These results suggest that the study of conversational entrainment in speech pathology will have essential implications for both scientific theory and clinical application in this domain. PMID:26321996

  9. Effect of Digital Frequency Compression (DFC) on Speech Recognition in Candidates for Combined Electric and Acoustic Stimulation (EAS)

    PubMed Central

    Gifford, René H.; Dorman, Michael F.; Spahr, Anthony J.; McKarns, Sharon A.

    2008-01-01

    Purpose To compare the effects of conventional amplification (CA) and digital frequency compression (DFC) amplification on the speech recognition abilities of candidates for a partial-insertion cochlear implant, that is, candidates for combined electric and acoustic stimulation (EAS). Method The participants were 6 patients whose audiometric thresholds at 500 Hz and below were ≤60 dB HL and whose thresholds at 2000 Hz and above were ≥80 dB HL. Six tests of speech understanding were administered with CA and DFC. The Abbreviated Profile of Hearing Aid Benefit (APHAB) was also administered following use of CA and DFC. Results Group mean scores were not statistically different in the CA and DFC conditions. However, 2 patients received substantial benefit in DFC conditions. APHAB scores suggested increased ease of communication, but also increased aversive sound quality. Conclusion Results suggest that a relatively small proportion of individuals who meet EAS candidacy will receive substantial benefit from a DFC hearing aid and that a larger proportion will receive at least a small benefit when speech is presented against a background of noise. This benefit, however, comes at a cost—aversive sound quality. PMID:17905905

  10. Echo-acoustic flow dynamically modifies the cortical map of target range in bats.

    PubMed

    Bartenstein, Sophia K; Gerstenberg, Nadine; Vanderelst, Dieter; Peremans, Herbert; Firzlaff, Uwe

    2014-01-01

    Echolocating bats use the delay between their sonar emissions and the reflected echoes to measure target range, a crucial parameter for avoiding collisions or capturing prey. In many bat species, target range is represented as an orderly organized map of echo delay in the auditory cortex. Here we show that the map of target range in bats is dynamically modified by the continuously changing flow of acoustic information perceived during flight ('echo-acoustic flow'). Combining dynamic acoustic stimulation in virtual space with extracellular recordings, we found that neurons in the auditory cortex of the bat Phyllostomus discolor encode echo-acoustic flow information on the geometric relation between targets and the bat's flight trajectory, rather than echo delay per se. Specifically, the cortical representation of close-range targets is enlarged when the lateral passing distance of the target decreases. This flow-dependent enlargement of target representation may trigger adaptive behaviours such as vocal control or flight manoeuvres. PMID:25131175

  11. Echo-acoustic flow dynamically modifies the cortical map of target range in bats

    NASA Astrophysics Data System (ADS)

    Bartenstein, Sophia K.; Gerstenberg, Nadine; Vanderelst, Dieter; Peremans, Herbert; Firzlaff, Uwe

    2014-08-01

    Echolocating bats use the delay between their sonar emissions and the reflected echoes to measure target range, a crucial parameter for avoiding collisions or capturing prey. In many bat species, target range is represented as an orderly organized map of echo delay in the auditory cortex. Here we show that the map of target range in bats is dynamically modified by the continuously changing flow of acoustic information perceived during flight (‘echo-acoustic flow’). Combining dynamic acoustic stimulation in virtual space with extracellular recordings, we found that neurons in the auditory cortex of the bat Phyllostomus discolor encode echo-acoustic flow information on the geometric relation between targets and the bat’s flight trajectory, rather than echo delay per se. Specifically, the cortical representation of close-range targets is enlarged when the lateral passing distance of the target decreases. This flow-dependent enlargement of target representation may trigger adaptive behaviours such as vocal control or flight manoeuvres.

  12. Evidence for thermal anisotropy effects on shear modified ion acoustic instabilities

    NASA Astrophysics Data System (ADS)

    Scime, E. E.; Keesee, A. M.; Spangler, R. S.; Koepke, M. E.; Teodorescu, C.; Reynolds, E. W.

    2002-10-01

    Inclusion of thermal anisotropy effects is shown to be required to describe recently reported experimental measurements as shear-modified, ion acoustic instabilities. For the reported experimental conditions, isotropic theory yields no instability growth that depends on the magnitude of the shear in the parallel flow.

  13. Enhanced Modified Bark Spectral Distortion (EMBSD): An objective speech quality measure based on audible distortion and cognition model

    NASA Astrophysics Data System (ADS)

    Yang, Wonho

    The Speech Processing Lab at Temple University developed an objective speech quality measure called the Modified Bark Spectral Distortion (MBSD). The MBSD uses auditory perception models derived from psychoacoustic studies. The MBSD measure extends the Bark Spectral Distortion (BSD) method by incorporating noise making threshold to differentiate audible/inaudible distortions. The performance of the MBSD was comparable to that of the ITU-T Recommendation P.861 for various coding distortions. Based on the experiments with Time Division Multiple Access (TDMA) data that contains distortions encountered in real network applications, modifications have been made to the MBSD algorithm. These are: use of the first 15 loudness components, normalization of loudness vectors, deletion of the spreading function in the noise masking threshold calculation, and use of a new cognition model based on postmasking effects. The Enhanced MBSD (EMBSD) shows significant improvement over the MBSD for TDMA data. Also, the performance of the EMBSD is better than that of the ITU-T Recommendation P.861 and Measuring Normalizing Blocks (MNB) measures for TDMA data. The performance of the EMBSD was compared to various other objective speech quality measures with the speech data including a wide range of distortion conditions. The EMBSD showed clear improvement over the MBSD and had the correlation coefficient of 0.89 for the conditions of MNRUs, codecs, tandem cases, bit errors, and frame erasures. Mean Opinion Score (MOS) has been used to evaluate objective speech quality measures. Recognizing the procedural difference between the MOS test and current objective speech quality measures, it is proposed that current objective speech quality measures should be evaluated with Degradation Mean Opinion Score (DMOS). The Pearson product-moment correlation coefficient has been the main performance parameter for evaluation of objective speech quality measures. The Standard Error of the Estimates (SEE

  14. Acoustic Variations in Adductor Spasmodic Dysphonia as a Function of Speech Task.

    ERIC Educational Resources Information Center

    Sapienza, Christine M.; Walton, Suzanne; Murry, Thomas

    1999-01-01

    Acoustic phonatory events were identified in 14 women diagnosed with adductor spasmodic dysphonia (ADSD), a focal laryngeal dystonia that disturbs phonatory function, and compared with those of 14 age-matched women with no vocal dysfunction. Findings indicated ADSD subjects produced more aberrant acoustic events than controls during tasks of…

  15. A human vocal utterance corpus for perceptual and acoustic analysis of speech, singing, and intermediate vocalizations

    NASA Astrophysics Data System (ADS)

    Gerhard, David

    2002-11-01

    In this paper we present the collection and annotation process of a corpus of human utterance vocalizations used for speech and song research. The corpus was collected to fill a void in current research tools, since no corpus currently exists which is useful for the classification of intermediate utterances between speech and monophonic singing. Much work has been done in the domain of speech versus music discrimination, and several corpora exist which can be used for this research. A specific example is the work done by Eric Scheirer and Malcom Slaney [IEEE ICASSP, 1997, pp. 1331-1334]. The collection of the corpus is described including questionnaire design and intended and actual response characteristics, as well as the collection and annotation of pre-existing samples. The annotation of the corpus consisted of a survey tool for a subset of the corpus samples, including ratings of the clips based on a speech-song continuum, and questions on the perceptual qualities of speech and song, both generally and corresponding to particular clips in the corpus.

  16. Revisiting Neil Armstrongs Moon-Landing Quote: Implications for Speech Perception, Function Word Reduction, and Acoustic Ambiguity.

    PubMed

    Baese-Berk, Melissa M; Dilley, Laura C; Schmidt, Stephanie; Morrill, Tuuli H; Pitt, Mark A

    2016-01-01

    Neil Armstrong insisted that his quote upon landing on the moon was misheard, and that he had said one small step for a man, instead of one small step for man. What he said is unclear in part because function words like a can be reduced and spectrally indistinguishable from the preceding context. Therefore, their presence can be ambiguous, and they may disappear perceptually depending on the rate of surrounding speech. Two experiments are presented examining production and perception of reduced tokens of for and for a in spontaneous speech. Experiment 1 investigates the distributions of several acoustic features of for and for a. The results suggest that the distributions of for and for a overlap substantially, both in terms of temporal and spectral characteristics. Experiment 2 examines perception of these same tokens when the context speaking rate differs. The perceptibility of the function word a varies as a function of this context speaking rate. These results demonstrate that substantial ambiguity exists in the original quote from Armstrong, and that this ambiguity may be understood through context speaking rate. PMID:27603209

  17. Comparison of Acoustic and Kinematic Approaches to Measuring Utterance-Level Speech Variability

    ERIC Educational Resources Information Center

    Howell, Peter; Anderson, Andrew J.; Bartrip, Jon; Bailey, Eleanor

    2009-01-01

    Purpose: The spatiotemporal index (STI) is one measure of variability. As currently implemented, kinematic data are used, requiring equipment that cannot be used with some patient groups or in scanners. An experiment is reported that addressed whether STI can be extended to an audio measure of sound pressure of the speech envelope over time that…

  18. The Exploitation of Subphonemic Acoustic Detail in L2 Speech Segmentation

    ERIC Educational Resources Information Center

    Shoemaker, Ellenor

    2014-01-01

    The current study addresses an aspect of second language (L2) phonological acquisition that has received little attention to date--namely, the acquisition of allophonic variation as a word boundary cue. The role of subphonemic variation in the segmentation of speech by native speakers has been indisputably demonstrated; however, the acquisition of…

  19. Impact of Aberrant Acoustic Properties on the Perception of Sound Quality in Electrolarynx Speech

    ERIC Educational Resources Information Center

    Meltzner, Geoffrey S.; Hillman, Robert E.

    2005-01-01

    A large percentage of patients who have undergone laryngectomy to treat advanced laryngeal cancer rely on an electrolarynx (EL) to communicate verbally. Although serviceable, EL speech is plagued by shortcomings in both sound quality and intelligibility. This study sought to better quantify the relative contributions of previously identified…

  20. Study of Harmonics-to-Noise Ratio and Critical-Band Energy Spectrum of Speech as Acoustic Indicators of Laryngeal and Voice Pathology

    NASA Astrophysics Data System (ADS)

    Shama, Kumara; krishna, Anantha; Cholayya, Niranjan U.

    2006-12-01

    Acoustic analysis of speech signals is a noninvasive technique that has been proved to be an effective tool for the objective support of vocal and voice disease screening. In the present study acoustic analysis of sustained vowels is considered. A simple[InlineEquation not available: see fulltext.]-means nearest neighbor classifier is designed to test the efficacy of a harmonics-to-noise ratio (HNR) measure and the critical-band energy spectrum of the voiced speech signal as tools for the detection of laryngeal pathologies. It groups the given voice signal sample into pathologic and normal. The voiced speech signal is decomposed into harmonic and noise components using an iterative signal extrapolation algorithm. The HNRs at four different frequency bands are estimated and used as features. Voiced speech is also filtered with 21 critical-bandpass filters that mimic the human auditory neurons. Normalized energies of these filter outputs are used as another set of features. The results obtained have shown that the HNR and the critical-band energy spectrum can be used to correlate laryngeal pathology and voice alteration, using previously classified voice samples. This method could be an additional acoustic indicator that supplements the clinical diagnostic features for voice evaluation.

  1. Acoustic Analyses of Speech Sounds and Rhythms in Japanese- and English-Learning Infants

    PubMed Central

    Yamashita, Yuko; Nakajima, Yoshitaka; Ueda, Kazuo; Shimada, Yohko; Hirsh, David; Seno, Takeharu; Smith, Benjamin Alexander

    2013-01-01

    The purpose of this study was to explore developmental changes, in terms of spectral fluctuations and temporal periodicity with Japanese- and English-learning infants. Three age groups (15, 20, and 24 months) were selected, because infants diversify phonetic inventories with age. Natural speech of the infants was recorded. We utilized a critical-band-filter bank, which simulated the frequency resolution in adults’ auditory periphery. First, the correlations between the power fluctuations of the critical-band outputs represented by factor analysis were observed in order to see how the critical bands should be connected to each other, if a listener is to differentiate sounds in infants’ speech. In the following analysis, we analyzed the temporal fluctuations of factor scores by calculating autocorrelations. The present analysis identified three factors as had been observed in adult speech at 24 months of age in both linguistic environments. These three factors were shifted to a higher frequency range corresponding to the smaller vocal tract size of the infants. The results suggest that the vocal tract structures of the infants had developed to become adult-like configuration by 24 months of age in both language environments. The amount of utterances with periodic nature of shorter time increased with age in both environments. This trend was clearer in the Japanese environment. PMID:23450824

  2. MOOD STATE PREDICTION FROM SPEECH OF VARYING ACOUSTIC QUALITY FOR INDIVIDUALS WITH BIPOLAR DISORDER

    PubMed Central

    Gideon, John; Provost, Emily Mower; McInnis, Melvin

    2016-01-01

    Speech contains patterns that can be altered by the mood of an individual. There is an increasing focus on automated and distributed methods to collect and monitor speech from large groups of patients suffering from mental health disorders. However, as the scope of these collections increases, the variability in the data also increases. This variability is due in part to the range in the quality of the devices, which in turn affects the quality of the recorded data, negatively impacting the accuracy of automatic assessment. It is necessary to mitigate variability effects in order to expand the impact of these technologies. This paper explores speech collected from phone recordings for analysis of mood in individuals with bipolar disorder. Two different phones with varying amounts of clipping, loudness, and noise are employed. We describe methodologies for use during preprocessing, feature extraction, and data modeling to correct these differences and make the devices more comparable. The results demonstrate that these pipeline modifications result in statistically significantly higher performance, which highlights the potential of distributed mental health systems. PMID:27570493

  3. Classifying acoustic signals into phoneme categories: average and dyslexic readers make use of complex dynamical patterns and multifractal scaling properties of the speech signal

    PubMed Central

    2015-01-01

    Several competing aetiologies of developmental dyslexia suggest that the problems with acquiring literacy skills are causally entailed by low-level auditory and/or speech perception processes. The purpose of this study is to evaluate the diverging claims about the specific deficient peceptual processes under conditions of strong inference. Theoretically relevant acoustic features were extracted from a set of artificial speech stimuli that lie on a /bAk/-/dAk/ continuum. The features were tested on their ability to enable a simple classifier (Quadratic Discriminant Analysis) to reproduce the observed classification performance of average and dyslexic readers in a speech perception experiment. The ‘classical’ features examined were based on component process accounts of developmental dyslexia such as the supposed deficit in Envelope Rise Time detection and the deficit in the detection of rapid changes in the distribution of energy in the frequency spectrum (formant transitions). Studies examining these temporal processing deficit hypotheses do not employ measures that quantify the temporal dynamics of stimuli. It is shown that measures based on quantification of the dynamics of complex, interaction-dominant systems (Recurrence Quantification Analysis and the multifractal spectrum) enable QDA to classify the stimuli almost identically as observed in dyslexic and average reading participants. It seems unlikely that participants used any of the features that are traditionally associated with accounts of (impaired) speech perception. The nature of the variables quantifying the temporal dynamics of the speech stimuli imply that the classification of speech stimuli cannot be regarded as a linear aggregate of component processes that each parse the acoustic signal independent of one another, as is assumed by the ‘classical’ aetiologies of developmental dyslexia. It is suggested that the results imply that the differences in speech perception performance between

  4. Classifying acoustic signals into phoneme categories: average and dyslexic readers make use of complex dynamical patterns and multifractal scaling properties of the speech signal.

    PubMed

    Hasselman, Fred

    2015-01-01

    Several competing aetiologies of developmental dyslexia suggest that the problems with acquiring literacy skills are causally entailed by low-level auditory and/or speech perception processes. The purpose of this study is to evaluate the diverging claims about the specific deficient peceptual processes under conditions of strong inference. Theoretically relevant acoustic features were extracted from a set of artificial speech stimuli that lie on a /bAk/-/dAk/ continuum. The features were tested on their ability to enable a simple classifier (Quadratic Discriminant Analysis) to reproduce the observed classification performance of average and dyslexic readers in a speech perception experiment. The 'classical' features examined were based on component process accounts of developmental dyslexia such as the supposed deficit in Envelope Rise Time detection and the deficit in the detection of rapid changes in the distribution of energy in the frequency spectrum (formant transitions). Studies examining these temporal processing deficit hypotheses do not employ measures that quantify the temporal dynamics of stimuli. It is shown that measures based on quantification of the dynamics of complex, interaction-dominant systems (Recurrence Quantification Analysis and the multifractal spectrum) enable QDA to classify the stimuli almost identically as observed in dyslexic and average reading participants. It seems unlikely that participants used any of the features that are traditionally associated with accounts of (impaired) speech perception. The nature of the variables quantifying the temporal dynamics of the speech stimuli imply that the classification of speech stimuli cannot be regarded as a linear aggregate of component processes that each parse the acoustic signal independent of one another, as is assumed by the 'classical' aetiologies of developmental dyslexia. It is suggested that the results imply that the differences in speech perception performance between average and

  5. Changes in Speech Production in a Child with a Cochlear Implant: Acoustic and Kinematic Evidence.

    ERIC Educational Resources Information Center

    Goffman, Lisa; Ertmer, David J.; Erdle, Christa

    2002-01-01

    A method is presented for examining change in motor patterns used to produce linguistic contrasts. In this case study, the method is applied to a child who experienced hearing loss at age 3 and received a multi-channel cochlear implant at 7. Post-implant, acoustic durations showed a maturational change. (Contains references.) (Author/CR)

  6. Intelligibility of Telephone Speech for the Hearing Impaired When Various Microphones Are Used for Acoustic Coupling.

    ERIC Educational Resources Information Center

    Janota, Claus P.; Janota, Jeanette Olach

    1991-01-01

    Various candidate microphones were evaluated for acoustic coupling of hearing aids to a telephone receiver. Results from testing by 9 hearing-impaired adults found comparable listening performance with a pressure gradient microphone at a 10 decibel higher level of interfering noise than with a normal pressure-sensitive microphone. (Author/PB)

  7. Exploration of Acoustic Features for Automatic Vowel Discrimination in Spontaneous Speech

    ERIC Educational Resources Information Center

    Tyson, Na'im R.

    2012-01-01

    In an attempt to understand what acoustic/auditory feature sets motivated transcribers towards certain labeling decisions, I built machine learning models that were capable of discriminating between canonical and non-canonical vowels excised from the Buckeye Corpus. Specifically, I wanted to model when the dictionary form and the transcribed-form…

  8. Comparison of acoustic performance of five muffler configurations on a small helicopter. [acoustic properties of modified helicopter exhaust system

    NASA Technical Reports Server (NTRS)

    Pegg, R. J.; Hilton, D. A.

    1974-01-01

    A field noise measurement program has been conducted on a standard Bell 47 series helicopter and on one that had been modified with specially designed, airframe-mounted mufflers to reduce the engine exhaust noise. The purpose of the study was to evaluate the acoustic performance of five experimental exhaust muffler configurations for a helicopter reciprocating engine in an operational environment. All muffler configurations produced beneficial engine exhaust noise reductions but some configurations were markedly better than others. Flyover noise results indicated that maximum overall noise reductions of approximately 8 db were obtained with the various mufflers. The rotor noise was judged to be the dominant noise component for the muffler-equipped helicopters whereas the engine noise was the dominant component for the basic configuration.

  9. Measurement of Trained Speech Patterns in Stuttering: Interjudge and Intrajudge Agreement of Experts by Means of Modified Time-Interval Analysis

    ERIC Educational Resources Information Center

    Alpermann, Anke; Huber, Walter; Natke, Ulrich; Willmes, Klaus

    2010-01-01

    Improved fluency after stuttering therapy is usually measured by the percentage of stuttered syllables. However, outcome studies rarely evaluate the use of trained speech patterns that speakers use to manage stuttering. This study investigated whether the modified time interval analysis can distinguish between trained speech patterns, fluent…

  10. Filled Pause Refinement Based on the Pronunciation Probability for Lecture Speech

    PubMed Central

    Long, Yan-Hua; Ye, Hong

    2015-01-01

    Nowadays, although automatic speech recognition has become quite proficient in recognizing or transcribing well-prepared fluent speech, the transcription of speech that contains many disfluencies remains problematic, such as spontaneous conversational and lecture speech. Filled pauses (FPs) are the most frequently occurring disfluencies in this type of speech. Most recent studies have shown that FPs are widely believed to increase the error rates for state-of-the-art speech transcription, primarily because most FPs are not well annotated or provided in training data transcriptions and because of the similarities in acoustic characteristics between FPs and some common non-content words. To enhance the speech transcription system, we propose a new automatic refinement approach to detect FPs in British English lecture speech transcription. This approach combines the pronunciation probabilities for each word in the dictionary and acoustic language model scores for FP refinement through a modified speech recognition forced-alignment framework. We evaluate the proposed approach on the Reith Lectures speech transcription task, in which only imperfect training transcriptions are available. Successful results are achieved for both the development and evaluation datasets. Acoustic models trained on different styles of speech genres have been investigated with respect to FP refinement. To further validate the effectiveness of the proposed approach, speech transcription performance has also been examined using systems built on training data transcriptions with and without FP refinement. PMID:25860959

  11. [A modified speech enhancement algorithm for electronic cochlear implant and its digital signal processing realization].

    PubMed

    Wang, Yulin; Tian, Xuelong

    2014-08-01

    In order to improve the speech quality and auditory perceptiveness of electronic cochlear implant under strong noise background, a speech enhancement system used for electronic cochlear implant front-end was constructed. Taking digital signal processing (DSP) as the core, the system combines its multi-channel buffered serial port (McBSP) data transmission channel with extended audio interface chip TLV320AIC10, so speech signal acquisition and output with high speed are realized. Meanwhile, due to the traditional speech enhancement method which has the problems as bad adaptability, slow convergence speed and big steady-state error, versiera function and de-correlation principle were used to improve the existing adaptive filtering algorithm, which effectively enhanced the quality of voice communications. Test results verified the stability of the system and the de-noising performance of the algorithm, and it also proved that they could provide clearer speech signals for the deaf or tinnitus patients. PMID:25464779

  12. [A modified speech enhancement algorithm for electronic cochlear implant and its digital signal processing realization].

    PubMed

    Wang, Yulin; Tian, Xuelong

    2014-08-01

    In order to improve the speech quality and auditory perceptiveness of electronic cochlear implant under strong noise background, a speech enhancement system used for electronic cochlear implant front-end was constructed. Taking digital signal processing (DSP) as the core, the system combines its multi-channel buffered serial port (McBSP) data transmission channel with extended audio interface chip TLV320AIC10, so speech signal acquisition and output with high speed are realized. Meanwhile, due to the traditional speech enhancement method which has the problems as bad adaptability, slow convergence speed and big steady-state error, versiera function and de-correlation principle were used to improve the existing adaptive filtering algorithm, which effectively enhanced the quality of voice communications. Test results verified the stability of the system and the de-noising performance of the algorithm, and it also proved that they could provide clearer speech signals for the deaf or tinnitus patients. PMID:25508410

  13. Protein-modified shear mode film bulk acoustic resonator for bio-sensing applications

    NASA Astrophysics Data System (ADS)

    Wang, Jingjing; Liu, Weihui; Xu, Yan; Chen, Da; Li, Dehua; Zhang, Luyin

    2014-09-01

    In this paper, we present a shear mode film bulk acoustic biosensor based on micro-electromechanical technology. The film bulk acoustic biosensor is a diaphragmatic structure consisting of a lateral field excited ZnO piezoelectric film piezoelectric stack built on an Si3N4 membrane. The device works at near 1.6 GHz with Q factors of 579 in water and 428 in glycerol. A frequency shift of 5.4 MHz and a small decline in the amplitude are found for the measurements in glycerol compared with those in water because of the viscous damping derived from the adjacent glycerol. For bio-sensing demonstration, the resonator was modified with biotin molecule to detect protein-ligand interactions in real-time and in situ. The resonant frequency of the biotin-modified device drops rapidly and gradually reaches equilibrium when exposed to the streptavidin solution due to the biotin-streptavidin interaction. The proposed film bulk acoustic biosensor shows promising applications for disease diagnostics, prognosis, and drug discovery.

  14. Predicting Speech Intelligibility with a Multiple Speech Subsystems Approach in Children with Cerebral Palsy

    ERIC Educational Resources Information Center

    Lee, Jimin; Hustad, Katherine C.; Weismer, Gary

    2014-01-01

    Purpose: Speech acoustic characteristics of children with cerebral palsy (CP) were examined with a multiple speech subsystems approach; speech intelligibility was evaluated using a prediction model in which acoustic measures were selected to represent three speech subsystems. Method: Nine acoustic variables reflecting different subsystems, and…

  15. A Robust Approach For Acoustic Noise Suppression In Speech Using ANFIS

    NASA Astrophysics Data System (ADS)

    Martinek, Radek; Kelnar, Michal; Vanus, Jan; Bilik, Petr; Zidek, Jan

    2015-11-01

    The authors of this article deals with the implementation of a combination of techniques of the fuzzy system and artificial intelligence in the application area of non-linear noise and interference suppression. This structure used is called an Adaptive Neuro Fuzzy Inference System (ANFIS). This system finds practical use mainly in audio telephone (mobile) communication in a noisy environment (transport, production halls, sports matches, etc). Experimental methods based on the two-input adaptive noise cancellation concept was clearly outlined. Within the experiments carried out, the authors created, based on the ANFIS structure, a comprehensive system for adaptive suppression of unwanted background interference that occurs in audio communication and degrades the audio signal. The system designed has been tested on real voice signals. This article presents the investigation and comparison amongst three distinct approaches to noise cancellation in speech; they are LMS (least mean squares) and RLS (recursive least squares) adaptive filtering and ANFIS. A careful review of literatures indicated the importance of non-linear adaptive algorithms over linear ones in noise cancellation. It was concluded that the ANFIS approach had the overall best performance as it efficiently cancelled noise even in highly noise-degraded speech. Results were drawn from the successful experimentation, subjective-based tests were used to analyse their comparative performance while objective tests were used to validate them. Implementation of algorithms was experimentally carried out in Matlab to justify the claims and determine their relative performances.

  16. Streptavidin Modified ZnO Film Bulk Acoustic Resonator for Detection of Tumor Marker Mucin 1.

    PubMed

    Zheng, Dan; Guo, Peng; Xiong, Juan; Wang, Shengfu

    2016-12-01

    A ZnO-based film bulk acoustic resonator has been fabricated using a magnetron sputtering technology, which was employed as a biosensor for detection of mucin 1. The resonant frequency of the thin-film bulk acoustic resonator was located near at 1503.3 MHz. The average electromechanical coupling factor [Formula: see text] and quality factor Q were 2.39 % and 224, respectively. Using the specific binding system of avidin-biotin, the streptavidin was self-assembled on the top gold electrode as the sensitive layer to indirectly test the MUC1 molecules. The resonant frequency of the biosensor decreases in response to the mass loading in range of 20-500 nM. The sensor modified with the streptavidin exhibits a high sensitivity of 4642.6 Hz/nM and a good selectivity. PMID:27624339

  17. Using ion flows parallel and perpendicular to gravity to modify dust acoustic waves

    NASA Astrophysics Data System (ADS)

    Thomas, E.; Fisher, R.

    2008-11-01

    Recent studies of dust acoustic waves have shown that the dust kinetic temperature can play an important role in determining the resulting dispersion relation [M. Rosenberg, et al., Phys. Plasmas, 15, 073701 (2008)]. In these studies, it is believed that ion flows play a dominant role in determining both the kinetic temperature of the charged microparticles as well as providing the source of energy for triggering the waves. In this presentation, results will be presented on the effects of ion flow on spatial structure and velocity distribution of dust acoustic waves. Here, the waves will be formed in dusty plasmas consisting of 3 ± 1 micron diameter silica microspheres. Two separate electrodes will be used to modify the ion flow in the plasma -- one parallel to the direction of gravity and one perpendicular to the direction of gravity. Particle image velocimetry (PIV) techniques will be used to observe the particles and to measure their velocity distributions.

  18. Intelligibility Assessment of Ideal Binary-Masked Noisy Speech with Acceptance of Room Acoustic

    NASA Astrophysics Data System (ADS)

    Vladimír, Sedlak; Daniela, Durackova; Roman, Zalusky; Tomas, Kovacik

    2015-01-01

    In this paper the intelligibility of ideal binary-masked noisy signal is evaluated for different signal to noise ratio (SNR), mask error, masker types, distance between source and receiver, reverberation time and local criteria for forming the binary mask. The ideal binary mask is computed from time-frequency decompositions of target and masker signals by thresholding the local SNR within time-frequency units. The intelligibility of separated signal is measured using different objective measures computed in frequency and perceptual domain. The present study replicates and extends the findings which were already presented but mainly shows impact of room acoustic on the intelligibility performance of IBM technique.

  19. Acoustic evidence for the development of gestural coordination in the speech of 2-year-olds: a longitudinal study.

    PubMed

    Goodell, E W; Studdert-Kennedy, M

    1993-08-01

    Studies of child phonology have often assumed that young children first master a repertoire of phonemes and then build their lexicon by forming combinations of these abstract, contrastive units. However, evidence from children's systematic errors suggests that children first build a repertoire of words as integral sequences of gestures and then gradually differentiate these sequences into their gestural and segmental components. Recently, experimental support for this position has been found in the acoustic records of the speech of 3-, 5-, and 7-year-old children, suggesting that even in older children some phonemes have not yet fully segregated as units of gestural organization and control. The present longitudinal study extends this work to younger children (22- and 32-month-olds). Results demonstrate clear differences in the duration and coordination of gestures between children and adults, and a clear shift toward the patterns of adult speakers during roughly the third year of life. Details of the child-adult differences and developmental changes vary from one aspect of an utterance to another. PMID:8377484

  20. Subjective evaluation of speech and noise in learning environments in the realm of classroom acoustics: Results from laboratory and field experiments

    NASA Astrophysics Data System (ADS)

    Meis, Markus; Nocke, Christian; Hofmann, Simone; Becker, Bernhard

    2005-04-01

    The impact of different acoustical conditions in learning environments on noise annoyance and the evaluation of speech quality were tested in a series of three experiments. In Experiment 1 (n=79) the auralization of seven classrooms with reverberation times from 0.55 to 3.21 s [average between 250 Hz to 2 kHz] served to develop a Semantic Differential, evaluating a simulated teacher's voice. Four factors were found: acoustical comfort, roughness, sharpness, and loudness. In Experiment 2, the effects of two classroom renovations were examined from a holistic perspective. The rooms were treated acoustically with acoustic ceilings (RT=0.5 s [250 Hz-2 kHz]) and muffling floor materials as well as non-acoustically with a new lighting system and color design. The results indicate that pupils (n=61) in renovated classrooms judged the simulated voice more positively, were less annoyed from the noise in classrooms, and were more motivated to participate in the lessons. In Experiment 3 the sound environments from six different lecture rooms (RT=0.8 to 1.39 s [250 Hz-2 kHz]) in two Universities of Oldenburg were evaluated by 321 students during the lectures. Evidence found supports the assumption that acoustical comfort in rooms is dependent on frequency for rooms with higher reverberation times.

  1. Evaluating the role of spectral and envelope characteristics in the intelligibility advantage of clear speech

    PubMed Central

    Krause, Jean C.; Braida, Louis D.

    2009-01-01

    In adverse listening conditions, talkers can increase their intelligibility by speaking clearly [Picheny, M.A., et al. (1985). J. Speech Hear. Res. 28, 96–103; Payton, K. L., et al. (1994). J. Acoust. Soc. Am. 95, 1581–1592]. This modified speaking style, known as clear speech, is typically spoken more slowly than conversational speech [Picheny, M. A., et al. (1986). J. Speech Hear. Res. 29, 434–446; Uchanski, R. M., et al. (1996). J. Speech Hear. Res. 39, 494–509]. However, talkers can produce clear speech at normal rates (clear∕normal speech) with training [Krause, J. C., and Braida, L. D. (2002). J. Acoust. Soc. Am. 112, 2165–2172] suggesting that clear speech has some inherent acoustic properties, independent of rate, that contribute to its improved intelligibility. Identifying these acoustic properties could lead to improved signal processing schemes for hearing aids. Two global-level properties of clear∕normal speech that appear likely to be associated with improved intelligibility are increased energy in the 1000–3000-Hz range of long-term spectra and increased modulation depth of low-frequency modulations of the intensity envelope [Krause, J. C., and Braida, L. D. (2004). J. Acoust. Soc. Am. 115, 362–378]. In an attempt to isolate the contributions of these two properties to intelligibility, signal processing transformations were developed to manipulate each of these aspects of conversational speech independently. Results of intelligibility testing with hearing-impaired listeners and normal-hearing listeners in noise suggest that (1) increasing energy between 1000 and 3000 Hz does not fully account for the intelligibility benefit of clear∕normal speech, and (2) simple filtering of the intensity envelope is generally detrimental to intelligibility. While other manipulations of the intensity envelope are required to determine conclusively the role of this factor in intelligibility, it is also likely that additional properties important for

  2. Speech input and output

    NASA Astrophysics Data System (ADS)

    Class, F.; Mangold, H.; Stall, D.; Zelinski, R.

    1981-12-01

    Possibilities for acoustical dialogs with electronic data processing equipment were investigated. Speech recognition is posed as recognizing word groups. An economical, multistage classifier for word string segmentation is presented and its reliability in dealing with continuous speech (problems of temporal normalization and context) is discussed. Speech synthesis is considered in terms of German linguistics and phonetics. Preprocessing algorithms for total synthesis of written texts were developed. A macrolanguage, MUSTER, is used to implement this processing in an acoustic data information system (ADES).

  3. Modified particle filtering algorithm for single acoustic vector sensor DOA tracking.

    PubMed

    Li, Xinbo; Sun, Haixin; Jiang, Liangxu; Shi, Yaowu; Wu, Yue

    2015-01-01

    The conventional direction of arrival (DOA) estimation algorithm with static sources assumption usually estimates the source angles of two adjacent moments independently and the correlation of the moments is not considered. In this article, we focus on the DOA estimation of moving sources and a modified particle filtering (MPF) algorithm is proposed with state space model of single acoustic vector sensor. Although the particle filtering (PF) algorithm has been introduced for acoustic vector sensor applications, it is not suitable for the case that one dimension angle of source is estimated with large deviation, the two dimension angles (pitch angle and azimuth angle) cannot be simultaneously employed to update the state through resampling processing of PF algorithm. To solve the problems mentioned above, the MPF algorithm is proposed in which the state estimation of previous moment is introduced to the particle sampling of present moment to improve the importance function. Moreover, the independent relationship of pitch angle and azimuth angle is considered and the two dimension angles are sampled and evaluated, respectively. Then, the MUSIC spectrum function is used as the "likehood" function of the MPF algorithm, and the modified PF-MUSIC (MPF-MUSIC) algorithm is proposed to improve the root mean square error (RMSE) and the probability of convergence. The theoretical analysis and the simulation results validate the effectiveness and feasibility of the two proposed algorithms. PMID:26501280

  4. Modified Particle Filtering Algorithm for Single Acoustic Vector Sensor DOA Tracking

    PubMed Central

    Li, Xinbo; Sun, Haixin; Jiang, Liangxu; Shi, Yaowu; Wu, Yue

    2015-01-01

    The conventional direction of arrival (DOA) estimation algorithm with static sources assumption usually estimates the source angles of two adjacent moments independently and the correlation of the moments is not considered. In this article, we focus on the DOA estimation of moving sources and a modified particle filtering (MPF) algorithm is proposed with state space model of single acoustic vector sensor. Although the particle filtering (PF) algorithm has been introduced for acoustic vector sensor applications, it is not suitable for the case that one dimension angle of source is estimated with large deviation, the two dimension angles (pitch angle and azimuth angle) cannot be simultaneously employed to update the state through resampling processing of PF algorithm. To solve the problems mentioned above, the MPF algorithm is proposed in which the state estimation of previous moment is introduced to the particle sampling of present moment to improve the importance function. Moreover, the independent relationship of pitch angle and azimuth angle is considered and the two dimension angles are sampled and evaluated, respectively. Then, the MUSIC spectrum function is used as the “likehood” function of the MPF algorithm, and the modified PF-MUSIC (MPF-MUSIC) algorithm is proposed to improve the root mean square error (RMSE) and the probability of convergence. The theoretical analysis and the simulation results validate the effectiveness and feasibility of the two proposed algorithms. PMID:26501280

  5. Neurophysiological influence of musical training on speech perception.

    PubMed

    Shahin, Antoine J

    2011-01-01

    Does musical training affect our perception of speech? For example, does learning to play a musical instrument modify the neural circuitry for auditory processing in a way that improves one's ability to perceive speech more clearly in noisy environments? If so, can speech perception in individuals with hearing loss (HL), who struggle in noisy situations, benefit from musical training? While music and speech exhibit some specialization in neural processing, there is evidence suggesting that skills acquired through musical training for specific acoustical processes may transfer to, and thereby improve, speech perception. The neurophysiological mechanisms underlying the influence of musical training on speech processing and the extent of this influence remains a rich area to be explored. A prerequisite for such transfer is the facilitation of greater neurophysiological overlap between speech and music processing following musical training. This review first establishes a neurophysiological link between musical training and speech perception, and subsequently provides further hypotheses on the neurophysiological implications of musical training on speech perception in adverse acoustical environments and in individuals with HL. PMID:21716639

  6. Time-Frequency Feature Representation Using Multi-Resolution Texture Analysis and Acoustic Activity Detector for Real-Life Speech Emotion Recognition

    PubMed Central

    Wang, Kun-Ching

    2015-01-01

    The classification of emotional speech is mostly considered in speech-related research on human-computer interaction (HCI). In this paper, the purpose is to present a novel feature extraction based on multi-resolutions texture image information (MRTII). The MRTII feature set is derived from multi-resolution texture analysis for characterization and classification of different emotions in a speech signal. The motivation is that we have to consider emotions have different intensity values in different frequency bands. In terms of human visual perceptual, the texture property on multi-resolution of emotional speech spectrogram should be a good feature set for emotion classification in speech. Furthermore, the multi-resolution analysis on texture can give a clearer discrimination between each emotion than uniform-resolution analysis on texture. In order to provide high accuracy of emotional discrimination especially in real-life, an acoustic activity detection (AAD) algorithm must be applied into the MRTII-based feature extraction. Considering the presence of many blended emotions in real life, in this paper make use of two corpora of naturally-occurring dialogs recorded in real-life call centers. Compared with the traditional Mel-scale Frequency Cepstral Coefficients (MFCC) and the state-of-the-art features, the MRTII features also can improve the correct classification rates of proposed systems among different language databases. Experimental results show that the proposed MRTII-based feature information inspired by human visual perception of the spectrogram image can provide significant classification for real-life emotional recognition in speech. PMID:25594590

  7. Systematic Studies of Modified Vocalization: Effects of Speech Rate and Instatement Style during Metronome Stimulation

    ERIC Educational Resources Information Center

    Davidow, Jason H.; Bothe, Anne K.; Richardson, Jessica D.; Andreatta, Richard D.

    2010-01-01

    Purpose: This study introduces a series of systematic investigations intended to clarify the parameters of the fluency-inducing conditions (FICs) in stuttering. Method: Participants included 11 adults, aged 20-63 years, with typical speech-production skills. A repeated measures design was used to examine the relationships between several speech…

  8. Nonlinear propagation of small-amplitude modified electron acoustic solitary waves and double layer in semirelativistic plasmas

    SciTech Connect

    Sah, O.P.; Goswami, K.S. )

    1994-10-01

    Considering an unmagnetized plasma consisting of relativistic drifting electrons and nondrifting thermal ions and by using reductive perturbation method, a usual Korteweg--de Vries (KdV) equation and a generalized form of KdV equation are derived. It is found that while the former governs the dynamics of a small-amplitude rarefactive modified electron acoustic (MEA) soliton, the latter governs the dynamics of a weak compressive modified electron acoustic double layer. The influences of relativistic effect on the propagation of such a soliton and double layer are examined. The relevance of this investigation to space plasma is pointed out.

  9. Acoustics

    NASA Astrophysics Data System (ADS)

    The acoustics research activities of the DLR fluid-mechanics department (Forschungsbereich Stroemungsmechanik) during 1988 are surveyed and illustrated with extensive diagrams, drawings, graphs, and photographs. Particular attention is given to studies of helicopter rotor noise (high-speed impulsive noise, blade/vortex interaction noise, and main/tail-rotor interaction noise), propeller noise (temperature, angle-of-attack, and nonuniform-flow effects), noise certification, and industrial acoustics (road-vehicle flow noise and airport noise-control installations).

  10. Early recognition of speech

    PubMed Central

    Remez, Robert E; Thomas, Emily F

    2013-01-01

    Classic research on the perception of speech sought to identify minimal acoustic correlates of each consonant and vowel. In explaining perception, this view designated momentary components of an acoustic spectrum as cues to the recognition of elementary phonemes. This conceptualization of speech perception is untenable given the findings of phonetic sensitivity to modulation independent of the acoustic and auditory form of the carrier. The empirical key is provided by studies of the perceptual organization of speech, a low-level integrative function that finds and follows the sensory effects of speech amid concurrent events. These projects have shown that the perceptual organization of speech is keyed to modulation; fast; unlearned; nonsymbolic; indifferent to short-term auditory properties; and organization requires attention. The ineluctably multisensory nature of speech perception also imposes conditions that distinguish language among cognitive systems. WIREs Cogn Sci 2013, 4:213–223. doi: 10.1002/wcs.1213 PMID:23926454

  11. Formant-Frequency Variation and Informational Masking of Speech by Extraneous Formants: Evidence Against Dynamic and Speech-Specific Acoustical Constraints

    PubMed Central

    2014-01-01

    How speech is separated perceptually from other speech remains poorly understood. Recent research indicates that the ability of an extraneous formant to impair intelligibility depends on the variation of its frequency contour. This study explored the effects of manipulating the depth and pattern of that variation. Three formants (F1+F2+F3) constituting synthetic analogues of natural sentences were distributed across the 2 ears, together with a competitor for F2 (F2C) that listeners must reject to optimize recognition (left = F1+F2C; right = F2+F3). The frequency contours of F1 − F3 were each scaled to 50% of their natural depth, with little effect on intelligibility. Competitors were created either by inverting the frequency contour of F2 about its geometric mean (a plausibly speech-like pattern) or using a regular and arbitrary frequency contour (triangle wave, not plausibly speech-like) matched to the average rate and depth of variation for the inverted F2C. Adding a competitor typically reduced intelligibility; this reduction depended on the depth of F2C variation, being greatest for 100%-depth, intermediate for 50%-depth, and least for 0%-depth (constant) F2Cs. This suggests that competitor impact depends on overall depth of frequency variation, not depth relative to that for the target formants. The absence of tuning (i.e., no minimum in intelligibility for the 50% case) suggests that the ability to reject an extraneous formant does not depend on similarity in the depth of formant-frequency variation. Furthermore, triangle-wave competitors were as effective as their more speech-like counterparts, suggesting that the selection of formants from the ensemble also does not depend on speech-specific constraints. PMID:24842068

  12. Formant-frequency variation and informational masking of speech by extraneous formants: evidence against dynamic and speech-specific acoustical constraints.

    PubMed

    Roberts, Brian; Summers, Robert J; Bailey, Peter J

    2014-08-01

    How speech is separated perceptually from other speech remains poorly understood. Recent research indicates that the ability of an extraneous formant to impair intelligibility depends on the variation of its frequency contour. This study explored the effects of manipulating the depth and pattern of that variation. Three formants (F1+F2+F3) constituting synthetic analogues of natural sentences were distributed across the 2 ears, together with a competitor for F2 (F2C) that listeners must reject to optimize recognition (left = F1+F2C; right = F2+F3). The frequency contours of F1 - F3 were each scaled to 50% of their natural depth, with little effect on intelligibility. Competitors were created either by inverting the frequency contour of F2 about its geometric mean (a plausibly speech-like pattern) or using a regular and arbitrary frequency contour (triangle wave, not plausibly speech-like) matched to the average rate and depth of variation for the inverted F2C. Adding a competitor typically reduced intelligibility; this reduction depended on the depth of F2C variation, being greatest for 100%-depth, intermediate for 50%-depth, and least for 0%-depth (constant) F2Cs. This suggests that competitor impact depends on overall depth of frequency variation, not depth relative to that for the target formants. The absence of tuning (i.e., no minimum in intelligibility for the 50% case) suggests that the ability to reject an extraneous formant does not depend on similarity in the depth of formant-frequency variation. Furthermore, triangle-wave competitors were as effective as their more speech-like counterparts, suggesting that the selection of formants from the ensemble also does not depend on speech-specific constraints. PMID:24842068

  13. Experimental verification of the shear-modified ion-acoustic instability

    NASA Astrophysics Data System (ADS)

    Reynolds, E. W.; Teodorescu, C.; Koepke, M. E.

    2002-11-01

    The shear-modified ion-acoustic instability has been experimentally verified in double-ended Q-machine barium plasma containing shear in the magnetic-field-aligned (parallel) ion drift. The ion distribution function f(X,Vz) was measured directly and non-perturbatively with laser induced fluorescence. Measurements of the wave frequency (in the lab frame) and the wave-vector components show that, in the presence of shear, the wave phase velocity (in the ion frame) is greater than the ion-acoustic speed and out of the strong ion landau-damping regime. Measurements of the parallel electron drift yield values lower than the excitation threshold predicted by homogeneous theory but large enough for inverse electron landau damping to provide the free energy for the wave. We emphasize the ramifications on the mode properties of positive and negative values of shear. A quantitative comparison between experimental results and theoretical predictions is presented. Work supported by NASA and NSF. Useful discussions with V. Gavrishchaka and E. Scime are acknowledged.

  14. Speech masking and cancelling and voice obscuration

    DOEpatents

    Holzrichter, John F.

    2013-09-10

    A non-acoustic sensor is used to measure a user's speech and then broadcasts an obscuring acoustic signal diminishing the user's vocal acoustic output intensity and/or distorting the voice sounds making them unintelligible to persons nearby. The non-acoustic sensor is positioned proximate or contacting a user's neck or head skin tissue for sensing speech production information.

  15. Multilevel Analysis in Analyzing Speech Data

    ERIC Educational Resources Information Center

    Guddattu, Vasudeva; Krishna, Y.

    2011-01-01

    The speech produced by human vocal tract is a complex acoustic signal, with diverse applications in phonetics, speech synthesis, automatic speech recognition, speaker identification, communication aids, speech pathology, speech perception, machine translation, hearing research, rehabilitation and assessment of communication disorders and many…

  16. Systematic Studies of Modified Vocalization: Speech Production Changes During a Variation of Metronomic Speech in Persons Who Do and Do Not Stutter

    PubMed Central

    Davidow, Jason H.; Bothe, Anne K.; Ye, Jun

    2011-01-01

    The most common way to induce fluency using rhythm requires persons who stutter to speak one syllable or one word to each beat of a metronome, but stuttering can also be eliminated when the stimulus is of a particular duration (e.g., 1 s). The present study examined stuttering frequency, speech production changes, and speech naturalness during rhythmic speech that alternated 1 s of reading with 1 s of silence. A repeated-measures design was used to compare data obtained during a control reading condition and during rhythmic reading in 10 persons who stutter (PWS) and 10 normally fluent controls. Ratings for speech naturalness were also gathered from naïve listeners. Results showed that mean vowel duration increased significantly, and the percentage of short phonated intervals decreased significantly, for both groups from the control to the experimental condition. Mean phonated interval length increased significantly for the fluent controls. Mean speech naturalness ratings during the experimental condition were approximately 7 on a 1–9 scale (1 = highly natural; 9 = highly unnatural), and these ratings were significantly correlated with vowel duration and phonated intervals for PWS. The findings indicate that PWS may be altering vocal fold vibration duration to obtain fluency during this rhythmic speech style, and that vocal fold vibration duration may have an impact on speech naturalness during rhythmic speech. Future investigations should examine speech production changes and speech naturalness during variations of this rhythmic condition. Educational Objectives The reader will be able to: (1) describe changes (from a control reading condition) in speech production variables when alternating between 1 s of reading and 1 s of silence, (2) describe which rhythmic conditions have been found to sound and feel the most natural, (3) describe methodological issues for studies about alterations in speech production variables during fluency-inducing conditions, and (4

  17. Design Foundations for Content-Rich Acoustic Interfaces: Investigating Audemes as Referential Non-Speech Audio Cues

    ERIC Educational Resources Information Center

    Ferati, Mexhid Adem

    2012-01-01

    To access interactive systems, blind and visually impaired users can leverage their auditory senses by using non-speech sounds. The current structure of non-speech sounds, however, is geared toward conveying user interface operations (e.g., opening a file) rather than large theme-based information (e.g., a history passage) and, thus, is ill-suited…

  18. Acoustics

    NASA Technical Reports Server (NTRS)

    Goodman, Jerry R.; Grosveld, Ferdinand

    2007-01-01

    The acoustics environment in space operations is important to maintain at manageable levels so that the crewperson can remain safe, functional, effective, and reasonably comfortable. High acoustic levels can produce temporary or permanent hearing loss, or cause other physiological symptoms such as auditory pain, headaches, discomfort, strain in the vocal cords, or fatigue. Noise is defined as undesirable sound. Excessive noise may result in psychological effects such as irritability, inability to concentrate, decrease in productivity, annoyance, errors in judgment, and distraction. A noisy environment can also result in the inability to sleep, or sleep well. Elevated noise levels can affect the ability to communicate, understand what is being said, hear what is going on in the environment, degrade crew performance and operations, and create habitability concerns. Superfluous noise emissions can also create the inability to hear alarms or other important auditory cues such as an equipment malfunctioning. Recent space flight experience, evaluations of the requirements in crew habitable areas, and lessons learned (Goodman 2003; Allen and Goodman 2003; Pilkinton 2003; Grosveld et al. 2003) show the importance of maintaining an acceptable acoustics environment. This is best accomplished by having a high-quality set of limits/requirements early in the program, the "designing in" of acoustics in the development of hardware and systems, and by monitoring, testing and verifying the levels to ensure that they are acceptable.

  19. Pointing and Naming Are Not Redundant: Children Use Gesture to Modify Nouns before They Modify Nouns in Speech

    ERIC Educational Resources Information Center

    Cartmill, Erica A.; Hunsicker, Dea; Goldin-Meadow, Susan

    2014-01-01

    Nouns form the first building blocks of children's language but are not consistently modified by other words until around 2.5 years of age. Before then, children often combine their nouns with gestures that indicate the object labeled by the noun, for example, pointing at a bottle while saying "bottle." These gestures are typically…

  20. Effect of Acoustic Spectrographic Instruction on Production of English /i/ and /I/ by Spanish Pre-Service English Teachers

    ERIC Educational Resources Information Center

    Quintana-Lara, Marcela

    2014-01-01

    This study investigates the effects of Acoustic Spectrographic Instruction on the production of the English phonological contrast /i/ and / I /. Acoustic Spectrographic Instruction is based on the assumption that physical representations of speech sounds and spectrography allow learners to objectively see and modify those non-accurate features in…

  1. Correlation of subjective and objective measures of speech intelligibility

    NASA Astrophysics Data System (ADS)

    Bowden, Erica E.; Wang, Lily M.; Palahanska, Milena S.

    2003-10-01

    Currently there are a number of objective evaluation methods used to quantify the speech intelligibility in a built environment, including the Speech Transmission Index (STI), Rapid Speech Transmission Index (RASTI), Articulation Index (AI), and the Percentage Articulation Loss of Consonants (%ALcons). Many of these have been used for years; however, questions remain about their accuracy in predicting the acoustics of a space. Current widely used software programs can quickly evaluate STI, RASTI, and %ALcons from a measured impulse response. This project compares subjective human performance on modified rhyme and phonetically balanced word tests with objective results calculated from impulse response measurements in four different spaces. The results of these tests aid in understanding performance of various methods of speech intelligibility evaluation. [Work supported by the Univ. of Nebraska Center for Building Integration.] For Speech Communication Best Student Paper Award.

  2. A Study of the Relative Effectiveness of Verbal and Visual Augmentation of Rate-Modified Speech in the Presentation of Technical Material.

    ERIC Educational Resources Information Center

    Olson, Janet S.

    The relative effectiveness of verbal and visual augmentation of rate-modified speech in the presentation of technical material was investigated. Subjects were 40 graduate students who used instructional materials consisting of normal and compressed audiotape versions of the Dwyer heart script, printed copies of the script, and black and white…

  3. Time-expanded speech and speech recognition in older adults.

    PubMed

    Vaughan, Nancy E; Furukawa, Izumi; Balasingam, Nirmala; Mortz, Margaret; Fausti, Stephen A

    2002-01-01

    Speech understanding deficits are common in older adults. In addition to hearing sensitivity, changes in certain cognitive functions may affect speech recognition. One such change that may impact the ability to follow a rapidly changing speech signal is processing speed. When speakers slow the rate of their speech naturally in order to speak clearly, speech recognition is improved. The acoustic characteristics of naturally slowed speech are of interest in developing time-expansion algorithms to improve speech recognition for older listeners. In this study, we tested younger normally hearing, older normally hearing, and older hearing-impaired listeners on time-expanded speech using increased duration and increased intensity of unvoiced consonants. Although all groups performed best on unprocessed speech, performance with processed speech was better with the consonant gain feature without time expansion in the noise condition and better at the slowest time-expanded rate in the quiet condition. The effects of signal processing on speech recognition are discussed. PMID:17642020

  4. Experimental investigation of the effects of the acoustical conditions in a simulated classroom on speech recognition and learning in children a

    PubMed Central

    Valente, Daniel L.; Plevinsky, Hallie M.; Franco, John M.; Heinrichs-Graham, Elizabeth C.; Lewis, Dawna E.

    2012-01-01

    The potential effects of acoustical environment on speech understanding are especially important as children enter school where students’ ability to hear and understand complex verbal information is critical to learning. However, this ability is compromised because of widely varied and unfavorable classroom acoustics. The extent to which unfavorable classroom acoustics affect children’s performance on longer learning tasks is largely unknown as most research has focused on testing children using words, syllables, or sentences as stimuli. In the current study, a simulated classroom environment was used to measure comprehension performance of two classroom learning activities: a discussion and lecture. Comprehension performance was measured for groups of elementary-aged students in one of four environments with varied reverberation times and background noise levels. The reverberation time was either 0.6 or 1.5 s, and the signal-to-noise level was either +10 or +7 dB. Performance is compared to adult subjects as well as to sentence-recognition in the same condition. Significant differences were seen in comprehension scores as a function of age and condition; both increasing background noise and reverberation degraded performance in comprehension tasks compared to minimal differences in measures of sentence-recognition. PMID:22280587

  5. Differential Diagnosis of Severe Speech Disorders Using Speech Gestures

    ERIC Educational Resources Information Center

    Bahr, Ruth Huntley

    2005-01-01

    The differentiation of childhood apraxia of speech from severe phonological disorder is a common clinical problem. This article reports on an attempt to describe speech errors in children with childhood apraxia of speech on the basis of gesture use and acoustic analyses of articulatory gestures. The focus was on the movement of articulators and…

  6. Modified impulse method for the measurement of the frequency response of acoustic filters to weakly nonlinear transient excitations

    PubMed

    Payri; Desantes; Broatch

    2000-02-01

    In this paper, a modified impulse method is proposed which allows the determination of the influence of the excitation characteristics on acoustic filter performance. Issues related to nonlinear propagation, namely wave steepening and wave interactions, have been addressed in an approximate way, validated against one-dimensional unsteady nonlinear flow calculations. The results obtained for expansion chambers and extended duct resonators indicate that the amplitude threshold for the onset of nonlinear phenomena is related to the geometry considered. PMID:10687682

  7. Speech research: A report on the status and progress of studies on the nature of speech, instrumentation for its investigation, and practical applications

    NASA Astrophysics Data System (ADS)

    Liberman, A. M.

    1980-06-01

    This report (1 April - 30 June) is one of a regular series on the status and progress of studies on the nature of speech, instrumentation for its investigation, and practical applications. Manuscripts cover the following topics: The perceptual equivalance of two acoustic cues for a speech contrast is specific to phonetic perception; Duplex perception of acoustic patterns as speech and nonspeech; Evidence for phonetic processing of cues to place of articulation: Perceived manner affects perceived place; Some articulatory correlates of perceptual isochrony; Effects of utterance continuity on phonetic judgments; Laryngeal adjustments in stuttering: A glottographic observation using a modified reaction paradigm; Missing -ing in reading: Letter detection errors on word endings; Speaking rate; syllable stress, and vowel identity; Sonority and syllabicity: Acoustic correlates of perception, Influence of vocalic context on perception of the (S)-(s) distinction.

  8. ON THE NATURE OF SPEECH SCIENCE.

    ERIC Educational Resources Information Center

    PETERSON, GORDON E.

    IN THIS ARTICLE THE NATURE OF THE DISCIPLINE OF SPEECH SCIENCE IS CONSIDERED AND THE VARIOUS BASIC AND APPLIED AREAS OF THE DISCIPLINE ARE DISCUSSED. THE BASIC AREAS ENCOMPASS THE VARIOUS PROCESSES OF THE PHYSIOLOGY OF SPEECH PRODUCTION, THE ACOUSTICAL CHARACTERISTICS OF SPEECH, INCLUDING THE SPEECH WAVE TYPES AND THE INFORMATION-BEARING ACOUSTIC…

  9. Is Birdsong More Like Speech or Music?

    PubMed

    Shannon, Robert V

    2016-04-01

    Music and speech share many acoustic cues but not all are equally important. For example, harmonic pitch is essential for music but not for speech. When birds communicate is their song more like speech or music? A new study contrasting pitch and spectral patterns shows that birds perceive their song more like humans perceive speech. PMID:26944220

  10. Maternal depression and the learning-promoting effects of infant-directed speech: Roles of maternal sensitivity, depression diagnosis, and speech acoustic cues.

    PubMed

    Kaplan, Peter S; Danko, Christina M; Cejka, Anna M; Everhart, Kevin D

    2015-11-01

    The hypothesis that the associative learning-promoting effects of infant-directed speech (IDS) depend on infants' social experience was tested in a conditioned-attention paradigm with a cumulative sample of 4- to 14-month-old infants. Following six forward pairings of a brief IDS segment and a photographic slide of a smiling female face, infants of clinically depressed mothers exhibited evidence of having acquired significantly weaker voice-face associations than infants of non-depressed mothers. Regression analyses revealed that maternal depression was significantly related to infant learning even after demographic correlates of depression, antidepressant medication use, and extent of pitch modulation in maternal IDS had been taken into account. However, after maternal depression had been accounted for, maternal emotional availability, coded by blind raters from separate play interactions, accounted for significant further increments in the proportion of variance accounted for in infant learning scores. Both maternal depression and maternal insensitivity negatively, and additively, predicted poor learning. PMID:26311468

  11. Development of an analytical solution of modified Biot's equations for the optimization of lightweight acoustic protection.

    PubMed

    Kanfoud, Jamil; Ali Hamdi, Mohamed; Becot, François-Xavier; Jaouen, Luc

    2009-02-01

    During lift-off, space launchers are submitted to high-level of acoustic loads, which may damage sensitive equipments. A special acoustic absorber has been previously integrated inside the fairing of space launchers to protect the payload. A new research project has been launched to develop a low cost fairing acoustic protection system using optimized layers of porous materials covered by a thin layer of fabric. An analytical model is used for the analysis of acoustic wave propagation within the multilayer porous media. Results have been validated by impedance tube measurements. A parametric study has been conducted to determine optimal mechanical and acoustical properties of the acoustic protection under dimensional thickness constraints. The effect of the mounting conditions has been studied. Results reveal the importance of the lateral constraints on the absorption coefficient particularly in the low frequency range. A transmission study has been carried out, where the fairing structure has been simulated by a limp mass layer. The transmission loss and noise reduction factors have been computed using Biot's theory and the local acoustic impedance approximation to represent the porous layer effect. Comparisons between the two models show the frequency domains for which the local impedance model is valid. PMID:19206863

  12. Room Acoustics

    NASA Astrophysics Data System (ADS)

    Kuttruff, Heinrich; Mommertz, Eckard

    The traditional task of room acoustics is to create or formulate conditions which ensure the best possible propagation of sound in a room from a sound source to a listener. Thus, objects of room acoustics are in particular assembly halls of all kinds, such as auditoria and lecture halls, conference rooms, theaters, concert halls or churches. Already at this point, it has to be pointed out that these conditions essentially depend on the question if speech or music should be transmitted; in the first case, the criterion for transmission quality is good speech intelligibility, in the other case, however, the success of room-acoustical efforts depends on other factors that cannot be quantified that easily, not least it also depends on the hearing habits of the listeners. In any case, absolutely "good acoustics" of a room do not exist.

  13. Two-microphone spatial filtering provides speech reception benefits for cochlear implant users in difficult acoustic environments.

    PubMed

    Goldsworthy, Raymond L; Delhorne, Lorraine A; Desloge, Joseph G; Braida, Louis D

    2014-08-01

    This article introduces and provides an assessment of a spatial-filtering algorithm based on two closely-spaced (∼1 cm) microphones in a behind-the-ear shell. The evaluated spatial-filtering algorithm used fast (∼10 ms) temporal-spectral analysis to determine the location of incoming sounds and to enhance sounds arriving from straight ahead of the listener. Speech reception thresholds (SRTs) were measured for eight cochlear implant (CI) users using consonant and vowel materials under three processing conditions: An omni-directional response, a dipole-directional response, and the spatial-filtering algorithm. The background noise condition used three simultaneous time-reversed speech signals as interferers located at 90°, 180°, and 270°. Results indicated that the spatial-filtering algorithm can provide speech reception benefits of 5.8 to 10.7 dB SRT compared to an omni-directional response in a reverberant room with multiple noise sources. Given the observed SRT benefits, coupled with an efficient design, the proposed algorithm is promising as a CI noise-reduction solution. PMID:25096120

  14. Advances in speech processing

    NASA Astrophysics Data System (ADS)

    Ince, A. Nejat

    1992-10-01

    The field of speech processing is undergoing a rapid growth in terms of both performance and applications and this is fueled by the advances being made in the areas of microelectronics, computation, and algorithm design. The use of voice for civil and military communications is discussed considering advantages and disadvantages including the effects of environmental factors such as acoustic and electrical noise and interference and propagation. The structure of the existing NATO communications network and the evolving Integrated Services Digital Network (ISDN) concept are briefly reviewed to show how they meet the present and future requirements. The paper then deals with the fundamental subject of speech coding and compression. Recent advances in techniques and algorithms for speech coding now permit high quality voice reproduction at remarkably low bit rates. The subject of speech synthesis is next treated where the principle objective is to produce natural quality synthetic speech from unrestricted text input. Speech recognition where the ultimate objective is to produce a machine which would understand conversational speech with unrestricted vocabulary, from essentially any talker, is discussed. Algorithms for speech recognition can be characterized broadly as pattern recognition approaches and acoustic phonetic approaches. To date, the greatest degree of success in speech recognition has been obtained using pattern recognition paradigms. It is for this reason that the paper is concerned primarily with this technique.

  15. Lip Movement Exaggerations During Infant-Directed Speech

    PubMed Central

    Green, Jordan R.; Nip, Ignatius S. B.; Wilson, Erin M.; Mefferd, Antje S.; Yunusova, Yana

    2011-01-01

    Purpose Although a growing body of literature has indentified the positive effects of visual speech on speech and language learning, oral movements of infant-directed speech (IDS) have rarely been studied. This investigation used 3-dimensional motion capture technology to describe how mothers modify their lip movements when talking to their infants. Method Lip movements were recorded from 25 mothers as they spoke to their infants and other adults. Lip shapes were analyzed for differences across speaking conditions. The maximum fundamental frequency, duration, acoustic intensity, and first and second formant frequency of each vowel also were measured. Results Lip movements were significantly larger during IDS than during adult-directed speech, although the exaggerations were vowel specific. All of the vowels produced during IDS were characterized by an elevated vocal pitch and a slowed speaking rate when compared with vowels produced during adult-directed speech. Conclusion The pattern of lip-shape exaggerations did not provide support for the hypothesis that mothers produce exemplar visual models of vowels during IDS. Future work is required to determine whether the observed increases in vertical lip aperture engender visual and acoustic enhancements that facilitate the early learning of speech. PMID:20699342

  16. Tutorial on architectural acoustics

    NASA Astrophysics Data System (ADS)

    Shaw, Neil; Talaske, Rick; Bistafa, Sylvio

    2002-11-01

    This tutorial is intended to provide an overview of current knowledge and practice in architectural acoustics. Topics covered will include basic concepts and history, acoustics of small rooms (small rooms for speech such as classrooms and meeting rooms, music studios, small critical listening spaces such as home theatres) and the acoustics of large rooms (larger assembly halls, auditoria, and performance halls).

  17. Segmenting Words from Natural Speech: Subsegmental Variation in Segmental Cues

    ERIC Educational Resources Information Center

    Rytting, C. Anton; Brew, Chris; Fosler-Lussier, Eric

    2010-01-01

    Most computational models of word segmentation are trained and tested on transcripts of speech, rather than the speech itself, and assume that speech is converted into a sequence of symbols prior to word segmentation. We present a way of representing speech corpora that avoids this assumption, and preserves acoustic variation present in speech. We…

  18. Benefits to Speech Perception in Noise From the Binaural Integration of Electric and Acoustic Signals in Simulated Unilateral Deafness

    PubMed Central

    Ma, Ning; Morris, Saffron; Kitterick, Pádraig Thomas

    2016-01-01

    Objectives: This study used vocoder simulations with normal-hearing (NH) listeners to (1) measure their ability to integrate speech information from an NH ear and a simulated cochlear implant (CI), and (2) investigate whether binaural integration is disrupted by a mismatch in the delivery of spectral information between the ears arising from a misalignment in the mapping of frequency to place. Design: Eight NH volunteers participated in the study and listened to sentences embedded in background noise via headphones. Stimuli presented to the left ear were unprocessed. Stimuli presented to the right ear (referred to as the CI-simulation ear) were processed using an eight-channel noise vocoder with one of the three processing strategies. An Ideal strategy simulated a frequency-to-place map across all channels that matched the delivery of spectral information between the ears. A Realistic strategy created a misalignment in the mapping of frequency to place in the CI-simulation ear where the size of the mismatch between the ears varied across channels. Finally, a Shifted strategy imposed a similar degree of misalignment in all channels, resulting in consistent mismatch between the ears across frequency. The ability to report key words in sentences was assessed under monaural and binaural listening conditions and at signal to noise ratios (SNRs) established by estimating speech-reception thresholds in each ear alone. The SNRs ensured that the monaural performance of the left ear never exceeded that of the CI-simulation ear. The advantages of binaural integration were calculated by comparing binaural performance with monaural performance using the CI-simulation ear alone. Thus, these advantages reflected the additional use of the experimentally constrained left ear and were not attributable to better-ear listening. Results: Binaural performance was as accurate as, or more accurate than, monaural performance with the CI-simulation ear alone. When both ears supported a

  19. Speech perception and production in severe environments

    NASA Astrophysics Data System (ADS)

    Pisoni, David B.

    1990-09-01

    The goal was to acquire new knowledge about speech perception and production in severe environments such as high masking noise, increased cognitive load or sustained attentional demands. Changes were examined in speech production under these adverse conditions through acoustic analysis techniques. One set of studies focused on the effects of noise on speech production. The experiments in this group were designed to generate a database of speech obtained in noise and in quiet. A second set of experiments was designed to examine the effects of cognitive load on the acoustic-phonetic properties of speech. Talkers were required to carry out a demanding perceptual motor task while they read lists of test words. A final set of experiments explored the effects of vocal fatigue on the acoustic-phonetic properties of speech. Both cognitive load and vocal fatigue are present in many applications where speech recognition technology is used, yet their influence on speech production is poorly understood.

  20. Somatosensory basis of speech production.

    PubMed

    Tremblay, Stéphanie; Shiller, Douglas M; Ostry, David J

    2003-06-19

    The hypothesis that speech goals are defined acoustically and maintained by auditory feedback is a central idea in speech production research. An alternative proposal is that speech production is organized in terms of control signals that subserve movements and associated vocal-tract configurations. Indeed, the capacity for intelligible speech by deaf speakers suggests that somatosensory inputs related to movement play a role in speech production-but studies that might have documented a somatosensory component have been equivocal. For example, mechanical perturbations that have altered somatosensory feedback have simultaneously altered acoustics. Hence, any adaptation observed under these conditions may have been a consequence of acoustic change. Here we show that somatosensory information on its own is fundamental to the achievement of speech movements. This demonstration involves a dissociation of somatosensory and auditory feedback during speech production. Over time, subjects correct for the effects of a complex mechanical load that alters jaw movements (and hence somatosensory feedback), but which has no measurable or perceptible effect on acoustic output. The findings indicate that the positions of speech articulators and associated somatosensory inputs constitute a goal of speech movements that is wholly separate from the sounds produced. PMID:12815431

  1. On the Role of Ion-Temperature Anisotropy in the Growth and Propagation of the Shear-Modified Ion-Acoustic Instability

    NASA Astrophysics Data System (ADS)

    Teodorescu, C.; Koepke, M. E.; Reynolds, E. W.

    2002-05-01

    Broadband ion-acoustic waves have been observed in the Earth's ionosphere, where the electron and ion temperatures are equal, propagating obliquely to the magnetic field lines. Explaining these waves with the current-driven ion-acoustic instability in homogeneous plasma requires an unusually large ratio of electron to ion temperature. We investigate in a Q machine oblique ion-acoustic waves, excited by the combination of magnetic-field-aligned (parallel) current and sheared parallel ion flow, at almost equal ion and electron temperatures. Direct measurements of the parallel and perpendicular ion temperatures, parallel and perpendicular ion drift velocities, electron temperature and parallel electron drift velocity, parallel and perpendicular wavevector components, and mode frequency and growth rate are used to elucidate the shear-modified ion-acoustic instability mechanism and document an observed correlation between ion-temperature anisotropy and wave-propagation angle. Experimental measurements show how anisotropy significantly influences this propagation angle. These results may support the ion-acoustic wave interpretation of broadband waves in the auroral energization region where shear and anisotropy are known to exist. Although the results were obtained from an investigation of shear-modified ion-acoustic waves, our conclusions pertain to the general subject of oblique ion-acoustic waves and thus have ramifications for many space plasmas. * Work supported by NSF and NASA.

  2. Automatic speech recognition

    NASA Astrophysics Data System (ADS)

    Espy-Wilson, Carol

    2005-04-01

    Great strides have been made in the development of automatic speech recognition (ASR) technology over the past thirty years. Most of this effort has been centered around the extension and improvement of Hidden Markov Model (HMM) approaches to ASR. Current commercially-available and industry systems based on HMMs can perform well for certain situational tasks that restrict variability such as phone dialing or limited voice commands. However, the holy grail of ASR systems is performance comparable to humans-in other words, the ability to automatically transcribe unrestricted conversational speech spoken by an infinite number of speakers under varying acoustic environments. This goal is far from being reached. Key to the success of ASR is effective modeling of variability in the speech signal. This tutorial will review the basics of ASR and the various ways in which our current knowledge of speech production, speech perception and prosody can be exploited to improve robustness at every level of the system.

  3. Cylindrical and spherical dust-ion-acoustic modified Gardner solitons in dusty plasmas with two-temperature superthermal electrons

    SciTech Connect

    Alam, M. S.; Masud, M. M.; Mamun, A. A.

    2013-12-15

    A rigorous theoretical investigation has been performed on the propagation of cylindrical and spherical Gardner solitons (GSs) associated with dust-ion-acoustic (DIA) waves in a dusty plasma consisting of inertial ions, negatively charged immobile dust, and two populations of kappa distributed electrons having two distinct temperatures. The well-known reductive perturbation method has been used to derive the modified Gardner (mG) equation. The basic features (amplitude, width, polarity, etc.) of nonplanar DIA modified Gardner solitons (mGSs) have been thoroughly examined by the numerical analysis of the mG equation. It has been found that the characteristics of the nonplanar DIA mGSs significantly differ from those of planar ones. It has been also observed that kappa distributed electrons with two distinct temperatures significantly modify the basic properties of the DIA solitary waves and that the plasma system under consideration supports both compressive and rarefactive DIA mGSs. The present investigation should play an important role for understanding localized electrostatic disturbances in space and laboratory dusty plasmas where stationary negatively charged dust, inertial ions, and superthermal electrons with two distinct temperatures are omnipresent ingredients.

  4. Normal Aspects of Speech, Hearing, and Language.

    ERIC Educational Resources Information Center

    Minifie, Fred. D., Ed.; And Others

    This book is written as a guide to the understanding of the processes involved in human speech communication. Ten authorities contributed material to provide an introduction to the physiological aspects of speech production and reception, the acoustical aspects of speech production and transmission, the psychophysics of sound reception, the nature…

  5. Modified dust ion-acoustic surface waves in a semi-bounded magnetized plasma containing the rotating dust grains

    NASA Astrophysics Data System (ADS)

    Lee, Myoung-Jae; Jung, Young-Dae

    2016-05-01

    The dispersion relation for modified dust ion-acoustic surface waves in the magnetized dusty plasma containing the rotating dust grains is derived, and the effects of magnetic field configuration on the resonant growth rate are investigated. We present the results that the resonant growth rates of the wave would increase with the ratio of ion plasma frequency to cyclotron frequency as well as with the increase of wave number for the case of perpendicular magnetic field configuration when the ion plasma frequency is greater than the dust rotation frequency. For the parallel magnetic field configuration, we find that the instability occurs only for some limited ranges of the wave number and the ratio of ion plasma frequency to cyclotron frequency. The resonant growth rate is found to decrease with the increase of the wave number. The influence of dust rotational frequency on the instability is also discussed.

  6. Contributions of speech science to the technology of man-machine voice interactions

    NASA Technical Reports Server (NTRS)

    Lea, Wayne A.

    1977-01-01

    Research in speech understanding was reviewed. Plans which include prosodics research, phonological rules for speech understanding systems, and continued interdisciplinary phonetics research are discussed. Improved acoustic phonetic analysis capabilities in speech recognizers are suggested.

  7. Speech transmission index from running speech: A neural network approach

    NASA Astrophysics Data System (ADS)

    Li, F. F.; Cox, T. J.

    2003-04-01

    Speech transmission index (STI) is an important objective parameter concerning speech intelligibility for sound transmission channels. It is normally measured with specific test signals to ensure high accuracy and good repeatability. Measurement with running speech was previously proposed, but accuracy is compromised and hence applications limited. A new approach that uses artificial neural networks to accurately extract the STI from received running speech is developed in this paper. Neural networks are trained on a large set of transmitted speech examples with prior knowledge of the transmission channels' STIs. The networks perform complicated nonlinear function mappings and spectral feature memorization to enable accurate objective parameter extraction from transmitted speech. Validations via simulations demonstrate the feasibility of this new method on a one-net-one-speech extract basis. In this case, accuracy is comparable with normal measurement methods. This provides an alternative to standard measurement techniques, and it is intended that the neural network method can facilitate occupied room acoustic measurements.

  8. Optimal Gain Filter Design for Perceptual Acoustic Echo Suppressor

    NASA Astrophysics Data System (ADS)

    Kim, Kihyeon; Ko, Hanseok

    This Letter proposes an optimal gain filter for the perceptual acoustic echo suppressor. We designed an optimally-modified log-spectral amplitude estimation algorithm for the gain filter in order to achieve robust suppression of echo and noise. A new parameter including information about interferences (echo and noise) of single-talk duration is statistically analyzed, and then the speech absence probability and the a posteriori SNR are judiciously estimated to determine the optimal solution. The experiments show that the proposed gain filter attains a significantly improved reduction of echo and noise with less speech distortion.

  9. The Effectiveness of Clear Speech as a Masker

    ERIC Educational Resources Information Center

    Calandruccio, Lauren; Van Engen, Kristin; Dhar, Sumitrajit; Bradlow, Ann R.

    2010-01-01

    Purpose: It is established that speaking clearly is an effective means of enhancing intelligibility. Because any signal-processing scheme modeled after known acoustic-phonetic features of clear speech will likely affect both target and competing speech, it is important to understand how speech recognition is affected when a competing speech signal…

  10. Experiment in Learning to Discriminate Frequency Transposed Speech.

    ERIC Educational Resources Information Center

    Ahlstrom, K.G.; And Others

    In order to improve speech perception by transposing the speech signals to lower frequencies, to determine which aspects of the information in the acoustic speech signals were influenced by transposition, and to compare two different methods of training speech perception, 44 subjects were trained to discriminate between transposed words or…

  11. Speech spectrogram expert

    SciTech Connect

    Johannsen, J.; Macallister, J.; Michalek, T.; Ross, S.

    1983-01-01

    Various authors have pointed out that humans can become quite adept at deriving phonetic transcriptions from speech spectrograms (as good as 90percent accuracy at the phoneme level). The authors describe an expert system which attempts to simulate this performance. The speech spectrogram expert (spex) is actually a society made up of three experts: a 2-dimensional vision expert, an acoustic-phonetic expert, and a phonetics expert. The visual reasoning expert finds important visual features of the spectrogram. The acoustic-phonetic expert reasons about how visual features relates to phonemes, and about how phonemes change visually in different contexts. The phonetics expert reasons about allowable phoneme sequences and transformations, and deduces an english spelling for phoneme strings. The speech spectrogram expert is highly interactive, allowing users to investigate hypotheses and edit rules. 10 references.

  12. The Acquisition of Verbal Communication Skills by Severely Hearing-Impaired Children through the Modified Cued Speech-Phonetic Alphabet Method.

    ERIC Educational Resources Information Center

    Duffy, John K.

    The paper describes the potential of cued speech to provide verbal language and intelligible speech to severely hearing impaired students. The approach, which combines auditory-visual-oral and manual cues, is designed as a visual supplement to normal speech. The paper traces the development of cued speech and discusses modifications made to the R.…

  13. Speech perception as categorization

    PubMed Central

    Holt, Lori L.; Lotto, Andrew J.

    2010-01-01

    Speech perception (SP) most commonly refers to the perceptual mapping from the highly variable acoustic speech signal to a linguistic representation, whether it be phonemes, diphones, syllables, or words. This is an example of categorization, in that potentially discriminable speech sounds are assigned to functionally equivalent classes. In this tutorial, we present some of the main challenges to our understanding of the categorization of speech sounds and the conceptualization of SP that has resulted from these challenges. We focus here on issues and experiments that define open research questions relevant to phoneme categorization, arguing that SP is best understood as perceptual categorization, a position that places SP in direct contact with research from other areas of perception and cognition. PMID:20601702

  14. Production and perception of clear speech

    NASA Astrophysics Data System (ADS)

    Bradlow, Ann R.

    2003-04-01

    When a talker believes that the listener is likely to have speech perception difficulties due to a hearing loss, background noise, or a different native language, she or he will typically adopt a clear speaking style. Previous research has established that, with a simple set of instructions to the talker, ``clear speech'' can be produced by most talkers under laboratory recording conditions. Furthermore, there is reliable evidence that adult listeners with either impaired or normal hearing typically find clear speech more intelligible than conversational speech. Since clear speech production involves listener-oriented articulatory adjustments, a careful examination of the acoustic-phonetic and perceptual consequences of the conversational-to-clear speech transformation can serve as an effective window into talker- and listener-related forces in speech communication. Furthermore, clear speech research has considerable potential for the development of speech enhancement techniques. After reviewing previous and current work on the acoustic properties of clear versus conversational speech, this talk will present recent data from a cross-linguistic study of vowel production in clear speech and a cross-population study of clear speech perception. Findings from these studies contribute to an evolving view of clear speech production and perception as reflecting both universal, auditory and language-specific, phonological contrast enhancement features.

  15. Inverse material identification in coupled acoustic-structure interaction using a modified error in constitutive equation functional

    NASA Astrophysics Data System (ADS)

    Warner, James E.; Diaz, Manuel I.; Aquino, Wilkins; Bonnet, Marc

    2014-09-01

    This work focuses on the identification of heterogeneous linear elastic moduli in the context of frequency-domain, coupled acoustic-structure interaction (ASI), using either solid displacement or fluid pressure measurement data. The approach postulates the inverse problem as an optimization problem where the solution is obtained by minimizing a modified error in constitutive equation (MECE) functional. The latter measures the discrepancy in the constitutive equations that connect kinematically admissible strains and dynamically admissible stresses, while incorporating the measurement data as additional quadratic error terms. We demonstrate two strategies for selecting the MECE weighting coefficient to produce regularized solutions to the ill-posed identification problem: 1) the discrepancy principle of Morozov, and 2) an error-balance approach that selects the weight parameter as the minimizer of another functional involving the ECE and the data misfit. Numerical results demonstrate that the proposed methodology can successfully recover elastic parameters in 2D and 3D ASI systems from response measurements taken in either the solid or fluid subdomains. Furthermore, both regularization strategies are shown to produce accurate reconstructions when the measurement data is polluted with noise. The discrepancy principle is shown to produce nearly optimal solutions, while the error-balance approach, although not optimal, remains effective and does not need a priori information on the noise level.

  16. A modified beam-to-earth transformation to measure short-wavelength internal waves with an acoustic Doppler current profiler

    USGS Publications Warehouse

    Scotti, A.; Butman, B.; Beardsley, R.C.; Alexander, P.S.; Anderson, S.

    2005-01-01

    The algorithm used to transform velocity signals from beam coordinates to earth coordinates in an acoustic Doppler current profiler (ADCP) relies on the assumption that the currents are uniform over the horizontal distance separating the beams. This condition may be violated by (nonlinear) internal waves, which can have wavelengths as small as 100-200 m. In this case, the standard algorithm combines velocities measured at different phases of a wave and produces horizontal velocities that increasingly differ from true velocities with distance from the ADCP. Observations made in Massachusetts Bay show that currents measured with a bottom-mounted upward-looking ADCP during periods when short-wavelength internal waves are present differ significantly from currents measured by point current meters, except very close to the instrument. These periods are flagged with high error velocities by the standard ADCP algorithm. In this paper measurements from the four spatially diverging beams and the backscatter intensity signal are used to calculate the propagation direction and celerity of the internal waves. Once this information is known, a modified beam-to-earth transformation that combines appropriately lagged beam measurements can be used to obtain current estimates in earth coordinates that compare well with pointwise measurements. ?? 2005 American Meteorological Society.

  17. Inverse Material Identification in Coupled Acoustic-Structure Interaction using a Modified Error in Constitutive Equation Functional

    PubMed Central

    Warner, James E.; Diaz, Manuel I.; Aquino, Wilkins; Bonnet, Marc

    2014-01-01

    This work focuses on the identification of heterogeneous linear elastic moduli in the context of frequency-domain, coupled acoustic-structure interaction (ASI), using either solid displacement or fluid pressure measurement data. The approach postulates the inverse problem as an optimization problem where the solution is obtained by minimizing a modified error in constitutive equation (MECE) functional. The latter measures the discrepancy in the constitutive equations that connect kinematically admissible strains and dynamically admissible stresses, while incorporating the measurement data as additional quadratic error terms. We demonstrate two strategies for selecting the MECE weighting coefficient to produce regularized solutions to the ill-posed identification problem: 1) the discrepancy principle of Morozov, and 2) an error-balance approach that selects the weight parameter as the minimizer of another functional involving the ECE and the data misfit. Numerical results demonstrate that the proposed methodology can successfully recover elastic parameters in 2D and 3D ASI systems from response measurements taken in either the solid or fluid subdomains. Furthermore, both regularization strategies are shown to produce accurate reconstructions when the measurement data is polluted with noise. The discrepancy principle is shown to produce nearly optimal solutions, while the error-balance approach, although not optimal, remains effective and does not need a priori information on the noise level. PMID:25339790

  18. The cortical representation of the speech envelope is earlier for audiovisual speech than audio speech.

    PubMed

    Crosse, Michael J; Lalor, Edmund C

    2014-04-01

    Visual speech can greatly enhance a listener's comprehension of auditory speech when they are presented simultaneously. Efforts to determine the neural underpinnings of this phenomenon have been hampered by the limited temporal resolution of hemodynamic imaging and the fact that EEG and magnetoencephalographic data are usually analyzed in response to simple, discrete stimuli. Recent research has shown that neuronal activity in human auditory cortex tracks the envelope of natural speech. Here, we exploit this finding by estimating a linear forward-mapping between the speech envelope and EEG data and show that the latency at which the envelope of natural speech is represented in cortex is shortened by >10 ms when continuous audiovisual speech is presented compared with audio-only speech. In addition, we use a reverse-mapping approach to reconstruct an estimate of the speech stimulus from the EEG data and, by comparing the bimodal estimate with the sum of the unimodal estimates, find no evidence of any nonlinear additive effects in the audiovisual speech condition. These findings point to an underlying mechanism that could account for enhanced comprehension during audiovisual speech. Specifically, we hypothesize that low-level acoustic features that are temporally coherent with the preceding visual stream may be synthesized into a speech object at an earlier latency, which may provide an extended period of low-level processing before extraction of semantic information. PMID:24401714

  19. Speech prosody in cerebellar ataxia

    NASA Astrophysics Data System (ADS)

    Casper, Maureen

    The present study sought an acoustic signature for the speech disturbance recognized in cerebellar degeneration. Magnetic resonance imaging was used for a radiological rating of cerebellar involvement in six cerebellar ataxic dysarthric speakers. Acoustic measures of the [pap] syllables in contrastive prosodic conditions and of normal vs. brain-damaged patients were used to further our understanding both of the speech degeneration that accompanies cerebellar pathology and of speech motor control and movement in general. Pair-wise comparisons of the prosodic conditions within the normal group showed statistically significant differences for four prosodic contrasts. For three of the four contrasts analyzed, the normal speakers showed both longer durations and higher formant and fundamental frequency values in the more prominent first condition of the contrast. The acoustic measures of the normal prosodic contrast values were then used as a model to measure the degree of speech deterioration for individual cerebellar subjects. This estimate of speech deterioration as determined by individual differences between cerebellar and normal subjects' acoustic values of the four prosodic contrasts was used in correlation analyses with MRI ratings. Moderate correlations between speech deterioration and cerebellar atrophy were found in the measures of syllable duration and f0. A strong negative correlation was found for F1. Moreover, the normal model presented by these acoustic data allows for a description of the flexibility of task- oriented behavior in normal speech motor control. These data challenge spatio-temporal theory which explains movement as an artifact of time wherein longer durations predict more extreme movements and give further evidence for gestural internal dynamics of movement in which time emerges from articulatory events rather than dictating those events. This model provides a sensitive index of cerebellar pathology with quantitative acoustic

  20. Acoustic mate copying: female cowbirds attend to other females' vocalizations to modify their song preferences.

    PubMed

    Freed-Brown, Grace; White, David J

    2009-09-22

    We conducted a tutoring experiment to determine whether female brown-headed cowbirds (Molothrus ater) would attend to vocalizations of other females and use those cues to influence their own preferences for male courtship songs. We collected recordings of male songs that were unfamiliar to the subject females and paired half of the songs with female chatter vocalizations-vocalizations that females give in response to songs sung by males that are courting the females effectively. Thus, chatter immediately following a song provided a cue indicating that the song was sung by a male who was of high-enough quality to court a female successfully. Using a cross-over design, we tutored two groups of females with song-chatter pairings prior to the breeding season. In the breeding season, we placed the tutored females into sound-attenuating chambers and played them the same songs without the chatter. Females produced significantly more copulation solicitation displays in response to the songs that they had heard paired with chatter than to songs that had not been paired with chatter. This experiment is the first demonstration that females can modify their song preferences by attending to the vocal behaviour of other females. PMID:19535371

  1. An assessment of computer model techniques to predict quantitative and qualitative measures of speech perception in university classrooms for varying room sizes and noise levels

    NASA Astrophysics Data System (ADS)

    Kim, Hyeong-Seok

    The objective of this dissertation was to assess the use of computer modeling techniques to predict quantitative and qualitative measures of speech perception in classrooms under realistic conditions of background noise and reverberation. Secondary objectives included (1) finding relationships among acoustical measurements made in actual classrooms and in the computer models of the actual rooms as a prediction tool of 15 acoustic parameters at the design stage of projects and (2) finding relationships among speech perception scores and 15 acoustic parameters to determine the best predictors of speech perception in actual classroom conditions. Fifteen types of acoustical measurements were made in three actual classrooms with reverberation times of 0.5, 1.3, and 5.1 seconds. Speech perception tests using a Modified Rhyme Test list were also given to 22 subject in each room with five noise conditions of signal-to-noise ratios of 31, 24, 15, 0, -10. Computer models of the rooms were constructed using a commercially available computer model software program. The 15 acoustical measurements were made at 6 or 9 locations in the model rooms. Impulse responses obtained in the computer models of the rooms were convolved with the anechoically recorded speech tests used in the full size rooms to produce a compact disk with the MRT lists with the acoustical response of the computer model rooms. Speech perception tests using this as source material were given to the subjects over loudspeaker in an acoustic test booth. The results of the study showed correlations (R2) of between acoustical measures made in the full size classrooms and the computer models of the classrooms of 0.92 to 0.99 with standard errors of 0.033 to 7.311. Comparisons between speech perception scores tested in the rooms and acoustical measurements made in the rooms and in the computer models of the classrooms showed that the measures have similar prediction accuracy with other studies in the literatures. The

  2. Localization of Sublexical Speech Perception Components

    PubMed Central

    Turkeltaub, Peter E; Coslett, H. Branch

    2010-01-01

    Models of speech perception are in general agreement with respect to the major cortical regions involved, but lack precision with regard to localization and lateralization of processing units. To refine these models we conducted two Activation Likelihood Estimation (ALE) meta-analyses of the neuroimaging literature on sublexical speech perception. Based on foci reported in 23 fMRI experiments, we identified significant activation likelihoods in left and right superior temporal cortex and the left posterior middle frontal gyrus. Subanalyses examining phonetic and phonological processes revealed only left mid-posterior superior temporal sulcus activation likelihood. A lateralization analysis demonstrated temporal lobe left lateralization in terms of magnitude, extent, and consistency of activity. Experiments requiring explicit attention to phonology drove this lateralization. An ALE analysis of eight fMRI studies on categorical phoneme perception revealed significant activation likelihood in the left supramarginal gyrus and angular gyrus. These results are consistent with a speech processing network in which the bilateral superior temporal cortices perform acoustic analysis of speech and nonspeech auditory stimuli, the left mid-posterior superior temporal sulcus performs phonetic and phonological analysis, and the left inferior parietal lobule is involved in detection of differences between phoneme categories. These results modify current speech perception models in three ways: 1) specifying the most likely locations of dorsal stream processing units, 2) clarifying that phonetic and phonological superior temporal sulcus processing is left lateralized and localized to the mid-posterior portion, and 3) suggesting that both the supramarginal gyrus and angular gyrus may be involved in phoneme discrimination. PMID:20413149

  3. Predicting Speech Intelligibility with A Multiple Speech Subsystems Approach in Children with Cerebral Palsy

    PubMed Central

    Lee, Jimin; Hustad, Katherine C.; Weismer, Gary

    2014-01-01

    Purpose Speech acoustic characteristics of children with cerebral palsy (CP) were examined with a multiple speech subsystem approach; speech intelligibility was evaluated using a prediction model in which acoustic measures were selected to represent three speech subsystems. Method Nine acoustic variables reflecting different subsystems, and speech intelligibility, were measured in 22 children with CP. These children included 13 with a clinical diagnosis of dysarthria (SMI), and nine judged to be free of dysarthria (NSMI). Data from children with CP were compared to data from age-matched typically developing children (TD). Results Multiple acoustic variables reflecting the articulatory subsystem were different in the SMI group, compared to the NSMI and TD groups. A significant speech intelligibility prediction model was obtained with all variables entered into the model (Adjusted R-squared = .801). The articulatory subsystem showed the most substantial independent contribution (58%) to speech intelligibility. Incremental R-squared analyses revealed that any single variable explained less than 9% of speech intelligibility variability. Conclusions Children in the SMI group have articulatory subsystem problems as indexed by acoustic measures. As in the adult literature, the articulatory subsystem makes the primary contribution to speech intelligibility variance in dysarthria, with minimal or no contribution from other systems. PMID:24824584

  4. Reconstructing speech from human auditory cortex.

    PubMed

    Pasley, Brian N; David, Stephen V; Mesgarani, Nima; Flinker, Adeen; Shamma, Shihab A; Crone, Nathan E; Knight, Robert T; Chang, Edward F

    2012-01-01

    How the human auditory system extracts perceptually relevant acoustic features of speech is unknown. To address this question, we used intracranial recordings from nonprimary auditory cortex in the human superior temporal gyrus to determine what acoustic information in speech sounds can be reconstructed from population neural activity. We found that slow and intermediate temporal fluctuations, such as those corresponding to syllable rate, were accurately reconstructed using a linear model based on the auditory spectrogram. However, reconstruction of fast temporal fluctuations, such as syllable onsets and offsets, required a nonlinear sound representation based on temporal modulation energy. Reconstruction accuracy was highest within the range of spectro-temporal fluctuations that have been found to be critical for speech intelligibility. The decoded speech representations allowed readout and identification of individual words directly from brain activity during single trial sound presentations. These findings reveal neural encoding mechanisms of speech acoustic parameters in higher order human auditory cortex. PMID:22303281

  5. Reconstructing Speech from Human Auditory Cortex

    PubMed Central

    Pasley, Brian N.; David, Stephen V.; Mesgarani, Nima; Flinker, Adeen; Shamma, Shihab A.; Crone, Nathan E.; Knight, Robert T.; Chang, Edward F.

    2012-01-01

    How the human auditory system extracts perceptually relevant acoustic features of speech is unknown. To address this question, we used intracranial recordings from nonprimary auditory cortex in the human superior temporal gyrus to determine what acoustic information in speech sounds can be reconstructed from population neural activity. We found that slow and intermediate temporal fluctuations, such as those corresponding to syllable rate, were accurately reconstructed using a linear model based on the auditory spectrogram. However, reconstruction of fast temporal fluctuations, such as syllable onsets and offsets, required a nonlinear sound representation based on temporal modulation energy. Reconstruction accuracy was highest within the range of spectro-temporal fluctuations that have been found to be critical for speech intelligibility. The decoded speech representations allowed readout and identification of individual words directly from brain activity during single trial sound presentations. These findings reveal neural encoding mechanisms of speech acoustic parameters in higher order human auditory cortex. PMID:22303281

  6. Time-forward speech intelligibility in time-reversed rooms

    PubMed Central

    Longworth-Reed, Laricia; Brandewie, Eugene; Zahorik, Pavel

    2009-01-01

    The effects of time-reversed room acoustics on word recognition abilities were examined using virtual auditory space techniques, which allowed for temporal manipulation of the room acoustics independent of the speech source signals. Two acoustical conditions were tested: one in which room acoustics were simulated in a realistic time-forward fashion and one in which the room acoustics were reversed in time, causing reverberation and acoustic reflections to precede the direct-path energy. Significant decreases in speech intelligibility—from 89% on average to less than 25%—were observed between the time-forward and time-reversed rooms. This result is not predictable using standard methods for estimating speech intelligibility based on the modulation transfer function of the room. It may instead be due to increased degradation of onset information in the speech signals when room acoustics are time-reversed. PMID:19173377

  7. Interactions between distal speech rate, linguistic knowledge, and speech environment.

    PubMed

    Morrill, Tuuli; Baese-Berk, Melissa; Heffner, Christopher; Dilley, Laura

    2015-10-01

    During lexical access, listeners use both signal-based and knowledge-based cues, and information from the linguistic context can affect the perception of acoustic speech information. Recent findings suggest that the various cues used in lexical access are implemented with flexibility and may be affected by information from the larger speech context. We conducted 2 experiments to examine effects of a signal-based cue (distal speech rate) and a knowledge-based cue (linguistic structure) on lexical perception. In Experiment 1, we manipulated distal speech rate in utterances where an acoustically ambiguous critical word was either obligatory for the utterance to be syntactically well formed (e.g., Conner knew that bread and butter (are) both in the pantry) or optional (e.g., Don must see the harbor (or) boats). In Experiment 2, we examined identical target utterances as in Experiment 1 but changed the distribution of linguistic structures in the fillers. The results of the 2 experiments demonstrate that speech rate and linguistic knowledge about critical word obligatoriness can both influence speech perception. In addition, it is possible to alter the strength of a signal-based cue by changing information in the speech environment. These results provide support for models of word segmentation that include flexible weighting of signal-based and knowledge-based cues. PMID:25794478

  8. Perception of acoustic scale and size in musical instrument sounds

    PubMed Central

    van Dinther, Ralph; Patterson, Roy D.

    2010-01-01

    There is size information in natural sounds. For example, as humans grow in height, their vocal tracts increase in length, producing a predictable decrease in the formant frequencies of speech sounds. Recent studies have shown that listeners can make fine discriminations about which of two speakers has the longer vocal tract, supporting the view that the auditory system discriminates changes on the acoustic-scale dimension. Listeners can also recognize vowels scaled well beyond the range of vocal tracts normally experienced, indicating that perception is robust to changes in acoustic scale. This paper reports two perceptual experiments designed to extend research on acoustic scale and size perception to the domain of musical sounds: The first study shows that listeners can discriminate the scale of musical instrument sounds reliably, although not quite as well as for voices. The second experiment shows that listeners can recognize the family of an instrument sound which has been modified in pitch and scale beyond the range of normal experience. We conclude that processing of acoustic scale in music perception is very similar to processing of acoustic scale in speech perception. PMID:17069313

  9. Why Impromptu Speech Is Easy To Understand.

    ERIC Educational Resources Information Center

    Le Feal, K. Dejean

    Impromptu speech is characterized by the simultaneous processes of ideation (the elaboration and structuring of reasoning by the speaker as he improvises) and expression in the speaker. Other elements accompany this characteristic: division of speech flow into short segments, acoustic relief in the form of word stress following a pause, and both…

  10. Voice Modulations in German Ironic Speech

    ERIC Educational Resources Information Center

    Scharrer, Lisa; Christmann, Ursula; Knoll, Monja

    2011-01-01

    Previous research has shown that in different languages ironic speech is acoustically modulated compared to literal speech, and these modulations are assumed to aid the listener in the comprehension process by acting as cues that mark utterances as ironic. The present study was conducted to identify paraverbal features of German "ironic criticism"…

  11. Integrated speech enhancement for functional MRI environment.

    PubMed

    Pathak, Nishank; Milani, Ali A; Panahi, Issa; Briggs, Richard

    2009-01-01

    This paper presents an integrated speech enhancement (SE) method for the noisy MRI environment. We show that the performance of SE system improves considerably when the speech signal dominated by MRI acoustic noise at very low SNR is enhanced in two successive stages using two-channel SE methods followed by a single-channel post processing SE algorithm. Actual MRI noisy speech data are used in our experiments showing the improved performance of the proposed SE method. PMID:19964964

  12. Talker-to-listener distance effects on speech production and perception.

    PubMed

    Cheyne, Harold A; Kalgaonkar, Kaustubh; Clements, Mark; Zurek, Patrick

    2009-10-01

    Simulating talker-to-listener distance (TLD) in virtual audio environments requires mimicking natural changes in vocal effort. Studies have identified several acoustic parameters manipulated by talkers when varying vocal effort. However, no systematic study has investigated vocal effort variations due to TLD, under natural conditions, and their perceptual consequences. This work examined the feasibility of varying the vocal effort cues for TLD in synthesized speech and real speech by (a) recording and analyzing single word tokens spoken at 1 m < or = TLD < or = 32 m, (b) creating synthetic and modified speech tokens that vary in one or more acoustic parameters associated with vocal effort, and (c) conducting perceptual tests on the reference, synthetic, and modified tokens to identify salient cues for TLD perception. Measured changes in fundamental frequency, intensity, and formant frequencies of the reference tokens across TLD were similar to other reports in the literature. Perceptual experiments that asked listeners to estimate TLD showed that TLD estimation is most accurate with real speech; however, large standard deviations in the responses suggest that reliable judgments can only be made for gross changes in TLD. PMID:19813814

  13. Breathing-Impaired Speech after Brain Haemorrhage: A Case Study

    ERIC Educational Resources Information Center

    Heselwood, Barry

    2007-01-01

    Results are presented from an auditory and acoustic analysis of the speech of an adult male with impaired prosody and articulation due to brain haemorrhage. They show marked effects on phonation, speech rate and articulator velocity, and a speech rhythm disrupted by "intrusive" stresses. These effects are discussed in relation to the speaker's…

  14. Speech for the Deaf Child: Knowledge and Use.

    ERIC Educational Resources Information Center

    Connor, Leo E., Ed.

    Presented is a collection of 16 papers on speech development, handicaps, teaching methods, and educational trends for the aurally handicapped child. Arthur Boothroyd relates acoustic phonetics to speech teaching, and Jean Utley Lehman investigates a scheme of linguistic organization. Differences in speech production by deaf and normal hearing…

  15. Speech Development

    MedlinePlus

    ... W View More… Donate Donor Spotlight Fundraising Ideas Vehicle Donation Volunteer Efforts Speech Development skip to submenu ... Lip and Palate . Bzoch (1997). Cleft Palate Speech Management: A Multidisciplinary Approach . Shprintzen, Bardach (1995). Cleft Palate: ...

  16. Speech Problems

    MedlinePlus

    ... a person's ability to speak clearly. Some Common Speech Disorders Stuttering is a problem that interferes with fluent ... is a language disorder, while stuttering is a speech disorder. A person who stutters has trouble getting out ...

  17. VISIBLE SPEECH.

    ERIC Educational Resources Information Center

    POTTER, RALPH K.; AND OTHERS

    A CORRECTED REPUBLICATION OF THE 1947 EDITION, THE BOOK DESCRIBES A FORM OF VISIBLE SPEECH OBTAINED BY THE RECORDING OF AN ANALYSIS OF SPEECH SOMEWHAT SIMILAR TO THE ANALYSIS PERFORMED BY THE EAR. ORIGINALLY INTENDED TO PRESENT AN EXPERIMENTAL TRAINING PROGRAM IN THE READING OF VISIBLE SPEECH AND EXPANDED TO INCLUDE MATERIAL OF INTEREST TO VARIOUS…

  18. Musicians have enhanced subcortical auditory and audiovisual processing of speech and music

    PubMed Central

    Musacchia, Gabriella; Sams, Mikko; Skoe, Erika; Kraus, Nina

    2007-01-01

    Musical training is known to modify cortical organization. Here, we show that such modifications extend to subcortical sensory structures and generalize to processing of speech. Musicians had earlier and larger brainstem responses than nonmusician controls to both speech and music stimuli presented in auditory and audiovisual conditions, evident as early as 10 ms after acoustic onset. Phase-locking to stimulus periodicity, which likely underlies perception of pitch, was enhanced in musicians and strongly correlated with length of musical practice. In addition, viewing videos of speech (lip-reading) and music (instrument being played) enhanced temporal and frequency encoding in the auditory brainstem, particularly in musicians. These findings demonstrate practice-related changes in the early sensory encoding of auditory and audiovisual information. PMID:17898180

  19. Perception of Speech Reflects Optimal Use of Probabilistic Speech Cues

    ERIC Educational Resources Information Center

    Clayards, Meghan; Tanenhaus, Michael K.; Aslin, Richard N.; Jacobs, Robert A.

    2008-01-01

    Listeners are exquisitely sensitive to fine-grained acoustic detail within phonetic categories for sounds and words. Here we show that this sensitivity is optimal given the probabilistic nature of speech cues. We manipulated the probability distribution of one probabilistic cue, voice onset time (VOT), which differentiates word initial labial…

  20. Assessing Toddlers’ Speech-Sound Discrimination

    PubMed Central

    Holt, Rachael Frush; Lalonde, Kaylah

    2012-01-01

    Objective Valid and reliable methods for assessing speech perception in toddlers are lacking in the field, leading to conspicuous gaps in understanding how speech perception develops and limited clinical tools for assessing sensory aid benefit in toddlers. The objective of this investigation was to evaluate speech-sound discrimination in toddlers using modifications to the Change/No-Change procedure1. Methods Normal-hearing 2- and 3-year-olds’ discrimination of acoustically dissimilar (“easy”) and similar (“hard”) speech-sound contrasts were evaluated in a combined repeated measures and factorial design. Performance was measured in d’. Effects of contrast difficulty and age were examined, as was test-retest reliability, using repeated measures ANOVAs, planned post-hoc tests, and correlation analyses. Results The easy contrast (M=2.53) was discriminated better than the hard contrast (M=1.72) across all ages (p < .0001). The oldest group of children (M=3.13) discriminated the contrasts better than youngest (M=1.04; p < .0001) and the mid-age children (M=2.20; p = .037), who in turn discriminated the contrasts better than the youngest children (p = .010). Test-retest reliability was excellent (r = .886, p < .0001). Almost 90% of the children met the teaching criterion. The vast majority demonstrated the ability to be tested with the modified procedure and discriminated the contrasts. The few who did not were 2.5 years of age and younger. Conclusions The modifications implemented resulted, at least preliminarily, in a procedure that is reliable and sensitive to contrast difficulty and age in this young group of children, suggesting that these modifications are appropriate for this age group. With further development, the procedure holds promise for use in clinical populations who are believed to have core deficits in rapid phonological encoding, such as children with hearing loss or specific language impairment, children who are struggling to read, and

  1. Nonsensory factors in speech perception

    NASA Astrophysics Data System (ADS)

    Holt, Rachael F.; Carney, Arlene E.

    2001-05-01

    The nature of developmental differences was examined in a speech discrimination task, the change/no-change procedure, in which a varying number of speech stimuli are presented during a trial. Standard stimuli are followed by comparison stimuli that are identical to or acoustically different from the standard. Fourteen adults and 30 4- and 5-year-old children were tested with three speech contrast pairs at a variety of signal-to-noise ratios using various numbers of standard and comparison stimulus presentations. Adult speech discrimination performance followed the predictions of the multiple looks hypothesis [N. F. Viemeister and G. H. Wakefield, J. Acoust. Soc. Am. 90, 858-865 (1991)] there was an increase in d by a factor of 1.4 for a doubling in the number of standard and comparison stimulus presentations near d values of 1.0. For children, increasing the number of standard stimuli improved discrimination performance, whereas increasing the number of comparisons did not. The multiple looks hypothesis did not explain the children's data. They are explained more parsimoniously by the developmental weighting shift [Nittrouer et al., J. Acoust. Soc. Am. 101, 2253-2266 (1993)], which proposes that children attend to different aspects of speech stimuli from adults. [Work supported by NIDCD and ASHF.

  2. Prosodic Contrasts in Ironic Speech

    ERIC Educational Resources Information Center

    Bryant, Gregory A.

    2010-01-01

    Prosodic features in spontaneous speech help disambiguate implied meaning not explicit in linguistic surface structure, but little research has examined how these signals manifest themselves in real conversations. Spontaneously produced verbal irony utterances generated between familiar speakers in conversational dyads were acoustically analyzed…

  3. Embedding speech into virtual realities

    NASA Astrophysics Data System (ADS)

    Bohn, Christian-Arved; Krueger, Wolfgang

    1993-05-01

    In this work a speaker-independent speech recognition system is presented, which is suitable for implementation in Virtual Reality applications. The use of an artificial neural network in connection with a special compression of the acoustic input leads to a system, which is robust, fast, easy to use and needs no additional hardware, beside a common VR-equipment.

  4. Embedding speech into virtual realities

    NASA Technical Reports Server (NTRS)

    Bohn, Christian-Arved; Krueger, Wolfgang

    1993-01-01

    In this work a speaker-independent speech recognition system is presented, which is suitable for implementation in Virtual Reality applications. The use of an artificial neural network in connection with a special compression of the acoustic input leads to a system, which is robust, fast, easy to use and needs no additional hardware, beside a common VR-equipment.

  5. Classroom Acoustics: Understanding Barriers to Learning.

    ERIC Educational Resources Information Center

    Crandell, Carl C., Ed.; Smaldino, Joseph J., Ed.

    2001-01-01

    This booklet explores classroom acoustics and their importance on the learning potential of children with hearing loss and related disabilities. The booklet also reviews research on classroom acoustics and the need for the development of classroom acoustics standards. Chapters examine: 1) a speech-perception model demonstrating the linkage between…

  6. Results of tests performed on the Acoustic Quiet Flow Facility Three-Dimensional Model Tunnel: Report on the Modified D.S.M.A. Design

    NASA Technical Reports Server (NTRS)

    Barna, P. S.

    1996-01-01

    Numerous tests were performed on the original Acoustic Quiet Flow Facility Three-Dimensional Model Tunnel, scaled down from the full-scale plans. Results of tests performed on the original scale model tunnel were reported in April 1995, which clearly showed that this model was lacking in performance. Subsequently this scale model was modified to attempt to possibly improve the tunnel performance. The modifications included: (a) redesigned diffuser; (b) addition of a collector; (c) addition of a Nozzle-Diffuser; (d) changes in location of vent-air. Tests performed on the modified tunnel showed a marked improvement in performance amounting to a nominal increase of pressure recovery in the diffuser from 34 percent to 54 percent. Results obtained in the tests have wider application. They may also be applied to other tunnels operating with an open test section not necessarily having similar geometry as the model under consideration.

  7. Modified Ion-Acoustic Shock Waves and Double Layers in a Degenerate Electron-Positron-Ion Plasma in Presence of Heavy Negative Ions

    NASA Astrophysics Data System (ADS)

    Hossen, M. A.; Hossen, M. R.; Mamun, A. A.

    2014-12-01

    A general theory for nonlinear propagation of one dimensional modified ion-acoustic waves in an unmagnetized electron-positron-ion (e-p-i) degenerate plasma is investigated. This plasma system is assumed to contain relativistic electron and positron fluids, non-degenerate viscous positive ions, and negatively charged static heavy ions. The modified Burgers and Gardner equations have been derived by employing the reductive perturbation method and analyzed in order to identify the basic features (polarity, width, speed, etc.) of shock and double layer (DL) structures. It is observed that the basic features of these shock and DL structures obtained from this analysis are significantly different from those obtained from the analysis of standard Gardner or Burgers equations. The implications of these results in space and interstellar compact objects (viz. non-rotating white dwarfs, neutron stars, etc.) are also briefly mentioned.

  8. Fifty years of progress in acoustic phonetics

    NASA Astrophysics Data System (ADS)

    Stevens, Kenneth N.

    2004-10-01

    Three events that occurred 50 or 60 years ago shaped the study of acoustic phonetics, and in the following few decades these events influenced research and applications in speech disorders, speech development, speech synthesis, speech recognition, and other subareas in speech communication. These events were: (1) the source-filter theory of speech production (Chiba and Kajiyama; Fant); (2) the development of the sound spectrograph and its interpretation (Potter, Kopp, and Green; Joos); and (3) the birth of research that related distinctive features to acoustic patterns (Jakobson, Fant, and Halle). Following these events there has been systematic exploration of the articulatory, acoustic, and perceptual bases of phonological categories, and some quantification of the sources of variability in the transformation of this phonological representation of speech into its acoustic manifestations. This effort has been enhanced by studies of how children acquire language in spite of this variability and by research on speech disorders. Gaps in our knowledge of this inherent variability in speech have limited the directions of applications such as synthesis and recognition of speech, and have led to the implementation of data-driven techniques rather than theoretical principles. Some examples of advances in our knowledge, and limitations of this knowledge, are reviewed.

  9. Age-related changes to spectral voice characteristics affect judgments of prosodic, segmental, and talker attributes for child and adult speech

    PubMed Central

    Dilley, Laura C.; Wieland, Elizabeth A.; Gamache, Jessica L.; McAuley, J. Devin; Redford, Melissa A.

    2013-01-01

    Purpose As children mature, changes in voice spectral characteristics covary with changes in speech, language, and behavior. Spectral characteristics were manipulated to alter the perceived ages of talkers’ voices while leaving critical acoustic-prosodic correlates intact, to determine whether perceived age differences were associated with differences in judgments of prosodic, segmental, and talker attributes. Method Speech was modified by lowering formants and fundamental frequency, for 5-year-old children’s utterances, or raising them, for adult caregivers’ utterances. Next, participants differing in awareness of the manipulation (Exp. 1a) or amount of speech-language training (Exp. 1b) made judgments of prosodic, segmental, and talker attributes. Exp. 2 investigated the effects of spectral modification on intelligibility. Finally, in Exp. 3 trained analysts used formal prosody coding to assess prosodic characteristics of spectrally-modified and unmodified speech. Results Differences in perceived age were associated with differences in ratings of speech rate, fluency, intelligibility, likeability, anxiety, cognitive impairment, and speech-language disorder/delay; effects of training and awareness of the manipulation on ratings were limited. There were no significant effects of the manipulation on intelligibility or formally coded prosody judgments. Conclusions Age-related voice characteristics can greatly affect judgments of speech and talker characteristics, raising cautionary notes for developmental research and clinical work. PMID:23275414

  10. Applications for Subvocal Speech

    NASA Technical Reports Server (NTRS)

    Jorgensen, Charles; Betts, Bradley

    2007-01-01

    A research and development effort now underway is directed toward the use of subvocal speech for communication in settings in which (1) acoustic noise could interfere excessively with ordinary vocal communication and/or (2) acoustic silence or secrecy of communication is required. By "subvocal speech" is meant sub-audible electromyographic (EMG) signals, associated with speech, that are acquired from the surface of the larynx and lingual areas of the throat. Topics addressed in this effort include recognition of the sub-vocal EMG signals that represent specific original words or phrases; transformation (including encoding and/or enciphering) of the signals into forms that are less vulnerable to distortion, degradation, and/or interception; and reconstruction of the original words or phrases at the receiving end of a communication link. Potential applications include ordinary verbal communications among hazardous- material-cleanup workers in protective suits, workers in noisy environments, divers, and firefighters, and secret communications among law-enforcement officers and military personnel in combat and other confrontational situations.

  11. Experimental comparison between speech transmission index, rapid speech transmission index, and speech intelligibility index.

    PubMed

    Larm, Petra; Hongisto, Valtteri

    2006-02-01

    During the acoustical design of, e.g., auditoria or open-plan offices, it is important to know how speech can be perceived in various parts of the room. Different objective methods have been developed to measure and predict speech intelligibility, and these have been extensively used in various spaces. In this study, two such methods were compared, the speech transmission index (STI) and the speech intelligibility index (SII). Also the simplification of the STI, the room acoustics speech transmission index (RASTI), was considered. These quantities are all based on determining an apparent speech-to-noise ratio on selected frequency bands and summing them using a specific weighting. For comparison, some data were needed on the possible differences of these methods resulting from the calculation scheme and also measuring equipment. Their prediction accuracy was also of interest. Measurements were made in a laboratory having adjustable noise level and absorption, and in a real auditorium. It was found that the measurement equipment, especially the selection of the loudspeaker, can greatly affect the accuracy of the results. The prediction accuracy of the RASTI was found acceptable, if the input values for the prediction are accurately known, even though the studied space was not ideally diffuse. PMID:16521772

  12. Robust Speech Rate Estimation for Spontaneous Speech

    PubMed Central

    Wang, Dagen; Narayanan, Shrikanth S.

    2010-01-01

    In this paper, we propose a direct method for speech rate estimation from acoustic features without requiring any automatic speech transcription. We compare various spectral and temporal signal analysis and smoothing strategies to better characterize the underlying syllable structure to derive speech rate. The proposed algorithm extends the methods of spectral subband correlation by including temporal correlation and the use of prominent spectral subbands for improving the signal correlation essential for syllable detection. Furthermore, to address some of the practical robustness issues in previously proposed methods, we introduce some novel components into the algorithm such as the use of pitch confidence for filtering spurious syllable envelope peaks, magnifying window for tackling neighboring syllable smearing, and relative peak measure thresholds for pseudo peak rejection. We also describe an automated approach for learning algorithm parameters from data, and find the optimal settings through Monte Carlo simulations and parameter sensitivity analysis. Final experimental evaluations are conducted based on a portion of the Switchboard corpus for which manual phonetic segmentation information, and published results for direct comparison are available. The results show a correlation coefficient of 0.745 with respect to the ground truth based on manual segmentation. This result is about a 17% improvement compared to the current best single estimator and a 11% improvement over the multiestimator evaluated on the same Switchboard database. PMID:20428476

  13. Modeling of modified ion-acoustic shock waves in a relativistic electron degenerate multi-ion plasma for higher order nonlinearity

    NASA Astrophysics Data System (ADS)

    Hossen, M. R.; Hossen, M. A.; Sultana, S.; Mamun, A. A.

    2015-05-01

    A nonlinear propagation of modified ion-acoustic (mIA) shock waves in a relativistic degenerate plasma (containing inertial viscous positive and negative ion fluids, relativistic electron fluids, and negatively charged immobile heavy ions) has been investigated theoretically. The modified Burgers (mB) and further modified Burgers (FmB) equations have been derived by adopting reductive perturbation technique. The solutions of both mB and FmB equations have been numerically analyzed to characterize the basic features of mIA shock waves. The basic properties (speed, amplitude, width, etc.) of these electrostatic shock waves are found to be significantly modified by the effects of negatively charged static heavy ions and the plasma particle number densities. It is found that the properties of these shock waves obtained from this analysis are significantly different from those obtained from the analysis of standard Burgers equation. The implications of our results in space and interstellar compact objects like non-rotating white dwarfs, neutron stars, etc. are briefly discussed.

  14. On the role of ion-temperature anisotropy on the propagation of shear-modified ion-acoustic waves

    NASA Astrophysics Data System (ADS)

    Koepke, M. E.; Teodorescu, C.; Reynolds, E. W.

    2002-11-01

    Oblique ion-acoustic waves, excited by the combination of magnetic-field-aligned (parallel) electron drift and sheared parallel ion flow, are investigated in magnetized laboratory plasma that is characterized by ion-temperature anisotropy. Direct measurements of the parallel and perpendicular ion temperatures, parallel and perpendicular ion drift velocities, electron temperature and parallel electron drift velocity, parallel and perpendicular wavevector components, and mode frequency and growth rate are used to document an observed correlation between ion-temperature anisotropy and wave-propagation angle. Experimental measurements show that anisotropy significantly influences the propagation angle. These results support the ion-acoustic wave interpretation of broadband waves in the auroral energization region where shear and anisotropy are known to exist and may have ramifications for many space plasmas in which anisotropy exists in the electron-temperature or ion-temperature.

  15. Hearing speech in music.

    PubMed

    Ekström, Seth-Reino; Borg, Erik

    2011-01-01

    The masking effect of a piano composition, played at different speeds and in different octaves, on speech-perception thresholds was investigated in 15 normal-hearing and 14 moderately-hearing-impaired subjects. Running speech (just follow conversation, JFC) testing and use of hearing aids increased the everyday validity of the findings. A comparison was made with standard audiometric noises [International Collegium of Rehabilitative Audiology (ICRA) noise and speech spectrum-filtered noise (SPN)]. All masking sounds, music or noise, were presented at the same equivalent sound level (50 dBA). The results showed a significant effect of piano performance speed and octave (P<.01). Low octave and fast tempo had the largest effect; and high octave and slow tempo, the smallest. Music had a lower masking effect than did ICRA noise with two or six speakers at normal vocal effort (P<.01) and SPN (P<.05). Subjects with hearing loss had higher masked thresholds than the normal-hearing subjects (P<.01), but there were smaller differences between masking conditions (P<.01). It is pointed out that music offers an interesting opportunity for studying masking under realistic conditions, where spectral and temporal features can be varied independently. The results have implications for composing music with vocal parts, designing acoustic environments and creating a balance between speech perception and privacy in social settings. PMID:21768731

  16. Virtual acoustics displays

    NASA Technical Reports Server (NTRS)

    Wenzel, Elizabeth M.; Fisher, Scott S.; Stone, Philip K.; Foster, Scott H.

    1991-01-01

    The real time acoustic display capabilities are described which were developed for the Virtual Environment Workstation (VIEW) Project at NASA-Ames. The acoustic display is capable of generating localized acoustic cues in real time over headphones. An auditory symbology, a related collection of representational auditory 'objects' or 'icons', can be designed using ACE (Auditory Cue Editor), which links both discrete and continuously varying acoustic parameters with information or events in the display. During a given display scenario, the symbology can be dynamically coordinated in real time with 3-D visual objects, speech, and gestural displays. The types of displays feasible with the system range from simple warnings and alarms to the acoustic representation of multidimensional data or events.

  17. Prediction and constraint in audiovisual speech perception.

    PubMed

    Peelle, Jonathan E; Sommers, Mitchell S

    2015-07-01

    During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing the precision of prediction. Electrophysiological studies demonstrate that oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to acoustic information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration

  18. Acoustic Change Complex: Clinical Implications.

    PubMed

    Kim, Jae-Ryong

    2015-12-01

    The acoustic change complex (ACC) is a cortical auditory evoked potential elicited in response to a change in an ongoing sound. The characteristics and potential clinical implications of the ACC are reviewed in this article. The P1-N1-P2 recorded from the auditory cortex following presentation of an acoustic stimulus is believed to reflect the neural encoding of a sound signal, but this provides no information regarding sound discrimination. However, the neural processing underlying behavioral discrimination capacity can be measured by modifying the traditional methodology for recording the P1-N1-P2. When obtained in response to an acoustic change within an ongoing sound, the resulting waveform is referred to as the ACC. When elicited, the ACC indicates that the brain has detected changes within a sound and the patient has the neural capacity to discriminate the sounds. In fact, results of several studies have shown that the ACC amplitude increases with increasing magnitude of acoustic changes in intensity, spectrum, and gap duration. In addition, the ACC can be reliably recorded with good test-retest reliability not only from listeners with normal hearing but also from individuals with hearing loss, hearing aids, and cochlear implants. The ACC can be obtained even in the absence of attention, and requires relatively few stimulus presentations to record a response with a good signal-to-noise ratio. Most importantly, the ACC shows reasonable agreement with behavioral measures. Therefore, these findings suggest that the ACC might represent a promising tool for the objective clinical evaluation of auditory discrimination and/or speech perception capacity. PMID:26771009

  19. Acoustic Change Complex: Clinical Implications

    PubMed Central

    2015-01-01

    The acoustic change complex (ACC) is a cortical auditory evoked potential elicited in response to a change in an ongoing sound. The characteristics and potential clinical implications of the ACC are reviewed in this article. The P1-N1-P2 recorded from the auditory cortex following presentation of an acoustic stimulus is believed to reflect the neural encoding of a sound signal, but this provides no information regarding sound discrimination. However, the neural processing underlying behavioral discrimination capacity can be measured by modifying the traditional methodology for recording the P1-N1-P2. When obtained in response to an acoustic change within an ongoing sound, the resulting waveform is referred to as the ACC. When elicited, the ACC indicates that the brain has detected changes within a sound and the patient has the neural capacity to discriminate the sounds. In fact, results of several studies have shown that the ACC amplitude increases with increasing magnitude of acoustic changes in intensity, spectrum, and gap duration. In addition, the ACC can be reliably recorded with good test-retest reliability not only from listeners with normal hearing but also from individuals with hearing loss, hearing aids, and cochlear implants. The ACC can be obtained even in the absence of attention, and requires relatively few stimulus presentations to record a response with a good signal-to-noise ratio. Most importantly, the ACC shows reasonable agreement with behavioral measures. Therefore, these findings suggest that the ACC might represent a promising tool for the objective clinical evaluation of auditory discrimination and/or speech perception capacity. PMID:26771009

  20. Headphone localization of speech

    NASA Technical Reports Server (NTRS)

    Begault, Durand R.; Wenzel, Elizabeth M.

    1993-01-01

    Three-dimensional acoustic display systems have recently been developed that synthesize virtual sound sources over headphones based on filtering by head-related transfer functions (HRTFs), the direction-dependent spectral changes caused primarily by the pinnae. In this study, 11 inexperienced subjects judged the apparent spatial location of headphone-presented speech stimuli filtered with nonindividualized HRTFs. About half of the subjects 'pulled' their judgments toward either the median or the lateral-vertical planes, and estimates were almost always elevated. Individual differences were pronounced for the distance judgments; 15 to 46 percent of stimuli were heard inside the head, with the shortest estimates near the median plane. The results suggest that most listeners can obtain useful azimuth information from speech stimuli filtered by nonindividualized HRTFs. Measurements of localization error and reversal rates are comparable with a previous study that used broadband noise stimuli.

  1. Auditory-Perceptual Learning Improves Speech Motor Adaptation in Children

    PubMed Central

    Shiller, Douglas M.; Rochon, Marie-Lyne

    2015-01-01

    Auditory feedback plays an important role in children’s speech development by providing the child with information about speech outcomes that is used to learn and fine-tune speech motor plans. The use of auditory feedback in speech motor learning has been extensively studied in adults by examining oral motor responses to manipulations of auditory feedback during speech production. Children are also capable of adapting speech motor patterns to perceived changes in auditory feedback, however it is not known whether their capacity for motor learning is limited by immature auditory-perceptual abilities. Here, the link between speech perceptual ability and the capacity for motor learning was explored in two groups of 5–7-year-old children who underwent a period of auditory perceptual training followed by tests of speech motor adaptation to altered auditory feedback. One group received perceptual training on a speech acoustic property relevant to the motor task while a control group received perceptual training on an irrelevant speech contrast. Learned perceptual improvements led to an enhancement in speech motor adaptation (proportional to the perceptual change) only for the experimental group. The results indicate that children’s ability to perceive relevant speech acoustic properties has a direct influence on their capacity for sensory-based speech motor adaptation. PMID:24842067

  2. Improving on hidden Markov models: An articulatorily constrained, maximum likelihood approach to speech recognition and speech coding

    SciTech Connect

    Hogden, J.

    1996-11-05

    The goal of the proposed research is to test a statistical model of speech recognition that incorporates the knowledge that speech is produced by relatively slow motions of the tongue, lips, and other speech articulators. This model is called Maximum Likelihood Continuity Mapping (Malcom). Many speech researchers believe that by using constraints imposed by articulator motions, we can improve or replace the current hidden Markov model based speech recognition algorithms. Unfortunately, previous efforts to incorporate information about articulation into speech recognition algorithms have suffered because (1) slight inaccuracies in our knowledge or the formulation of our knowledge about articulation may decrease recognition performance, (2) small changes in the assumptions underlying models of speech production can lead to large changes in the speech derived from the models, and (3) collecting measurements of human articulator positions in sufficient quantity for training a speech recognition algorithm is still impractical. The most interesting (and in fact, unique) quality of Malcom is that, even though Malcom makes use of a mapping between acoustics and articulation, Malcom can be trained to recognize speech using only acoustic data. By learning the mapping between acoustics and articulation using only acoustic data, Malcom avoids the difficulties involved in collecting articulator position measurements and does not require an articulatory synthesizer model to estimate the mapping between vocal tract shapes and speech acoustics. Preliminary experiments that demonstrate that Malcom can learn the mapping between acoustics and articulation are discussed. Potential applications of Malcom aside from speech recognition are also discussed. Finally, specific deliverables resulting from the proposed research are described.

  3. An Acoustic Study of the Relationships among Neurologic Disease, Dysarthria Type, and Severity of Dysarthria

    ERIC Educational Resources Information Center

    Kim, Yunjung; Kent, Raymond D.; Weismer, Gary

    2011-01-01

    Purpose: This study examined acoustic predictors of speech intelligibility in speakers with several types of dysarthria secondary to different diseases and conducted classification analysis solely by acoustic measures according to 3 variables (disease, speech severity, and dysarthria type). Method: Speech recordings from 107 speakers with…

  4. Articulatory-to-Acoustic Relations in Response to Speaking Rate and Loudness Manipulations

    ERIC Educational Resources Information Center

    Mefferd, Antje S.; Green, Jordan R.

    2010-01-01

    Purpose: In this investigation, the authors determined the strength of association between tongue kinematic and speech acoustics changes in response to speaking rate and loudness manipulations. Performance changes in the kinematic and acoustic domains were measured using two aspects of speech production presumably affecting speech clarity:…

  5. Speech Communication.

    ERIC Educational Resources Information Center

    Anderson, Betty

    The communications approach to teaching speech to high school students views speech as the study of the communication process in order to develop an awareness of and a sensitivity to the variables that affect human interaction. In using this approach the student is encouraged to try out as many types of messages using as many techniques and…

  6. Speech Aids

    NASA Technical Reports Server (NTRS)

    1987-01-01

    Designed to assist deaf and hearing impaired-persons in achieving better speech, Resnick Worldwide Inc.'s device provides a visual means of cuing the deaf as a speech-improvement measure. This is done by electronically processing the subjects' sounds and comparing them with optimum values which are displayed for comparison.

  7. Symbolic Speech

    ERIC Educational Resources Information Center

    Podgor, Ellen S.

    1976-01-01

    The concept of symbolic speech emanates from the 1967 case of United States v. O'Brien. These discussions of flag desecration, grooming and dress codes, nude entertainment, buttons and badges, and musical expression show that the courts place symbolic speech in different strata from verbal communication. (LBH)

  8. Speech coding

    SciTech Connect

    Ravishankar, C., Hughes Network Systems, Germantown, MD

    1998-05-08

    Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the

  9. Single-shot analytical assay based on graphene-oxide-modified surface acoustic wave biosensor for detection of single-nucleotide polymorphisms.

    PubMed

    Liu, Xiang; Wang, Jia-Ying; Mao, Xiao-Bing; Ning, Yong; Zhang, Guo-Jun

    2015-09-15

    The combination of a surface acoustic wave (SAW) biosensor with graphene oxide (GO) provides a promising perspective for detecting DNA mutation. The GO-modified SAW biosensor was prepared by conjugating GO onto the SAW chip surface via electrostatic interaction. Afterward, the probe was immobilized on the GO surface, and detection of DNA mutation was realized by hybridization. The hybridization with a variety of targets would yield different mass and conformational changes on the chip surface, causing the different SAW signals in real time. A total of 137 clinical samples were detected by a single-shot analytical assay based on GO-modified SAW biosensor and direct sequencing in parallel. The diagnostic performance (both sensitivity and specificity) of the assay was evaluated with the direct sequencing as a reference testing method. The phase-shift value of three genotypes in 137 clinical samples was significantly different (p < 0.001). Furthermore, testing of diagnostic performance yielded diagnostic sensitivity and specificity of 100% and 88.6% for identifying CT and CC genotype, 98.0% and 96.2% for identifying CT and TT genotype, respectively. The single-shot analytical assay based on the GO-modified SAW biosensor could be exploited as a potential useful tool to identify CYP2D6*10 polymorphisms in clinical practice of personalized medicine. PMID:26316457

  10. Sound scattering from rough bubbly ocean surface based on modified sea surface acoustic simulator and consideration of various incident angles and sub-surface bubbles' radii

    NASA Astrophysics Data System (ADS)

    Bolghasi, Alireza; Ghadimi, Parviz; Chekab, Mohammad A. Feizi

    2016-08-01

    The aim of the present study is to improve the capabilities and precision of a recently introduced Sea Surface Acoustic Simulator (SSAS) developed based on optimization of the Helmholtz-Kirchhoff-Fresnel (HKF) method. The improved acoustic simulator, hereby known as the Modified SSAS (MSSAS), is capable of determining sound scattering from the sea surface and includes an extended Hall-Novarini model and optimized HKF method. The extended Hall-Novarini model is used for considering the effects of sub-surface bubbles over a wider range of radii of sub-surface bubbles compared to the previous SSAS version. Furthermore, MSSAS has the capability of making a three-dimensional simulation of scattered sound from the rough bubbly sea surface with less error than that of the Critical Sea Tests (CST) experiments. Also, it presents scattered pressure levels from the rough bubbly sea surface based on various incident angles of sound. Wind speed, frequency, incident angle, and pressure level of the sound source are considered as input data, and scattered pressure levels and scattering coefficients are provided. Finally, different parametric studies were conducted on wind speeds, frequencies, and incident angles to indicate that MSSAS is quite capable of simulating sound scattering from the rough bubbly sea surface, according to the scattering mechanisms determined by Ogden and Erskine. Therefore, it is concluded that MSSAS is valid for both scattering mechanisms and the transition region between them that are defined by Ogden and Erskine.

  11. Improving robustness of speech recognition systems

    NASA Astrophysics Data System (ADS)

    Mitra, Vikramjit

    2010-11-01

    Current Automatic Speech Recognition (ASR) systems fail to perform nearly as good as human speech recognition performance due to their lack of robustness against speech variability and noise contamination. The goal of this dissertation is to investigate these critical robustness issues, put forth different ways to address them and finally present an ASR architecture based upon these robustness criteria. Acoustic variations adversely affect the performance of current phone-based ASR systems, in which speech is modeled as 'beads-on-a-string', where the beads are the individual phone units. While phone units are distinctive in cognitive domain, they are varying in the physical domain and their variation occurs due to a combination of factors including speech style, speaking rate etc.; a phenomenon commonly known as 'coarticulation'. Traditional ASR systems address such coarticulatory variations by using contextualized phone-units such as triphones. Articulatory phonology accounts for coarticulatory variations by modeling speech as a constellation of constricting actions known as articulatory gestures. In such a framework, speech variations such as coarticulation and lenition are accounted for by gestural overlap in time and gestural reduction in space. To realize a gesture-based ASR system, articulatory gestures have to be inferred from the acoustic signal. At the initial stage of this research an initial study was performed using synthetically generated speech to obtain a proof-of-concept that articulatory gestures can indeed be recognized from the speech signal. It was observed that having vocal tract constriction trajectories (TVs) as intermediate representation facilitated the gesture recognition task from the speech signal. Presently no natural speech database contains articulatory gesture annotation; hence an automated iterative time-warping architecture is proposed that can annotate any natural speech database with articulatory gestures and TVs. Two natural

  12. Critique: auditory form and gestural topology in the perception of speech.

    PubMed

    Remez, R E

    1996-03-01

    Some influential accounts of speech perception have asserted that the goal of perception is to recover the articulatory gestures that create the acoustic signal, while others have proposed that speech perception proceeds by a method of acoustic categorization of signal elements. These accounts have been frustrated by difficulties in identifying a set of primitive articulatory constituents underlying speech production, and a set of primitive acoustic-auditory elements underlying speech perception. An argument by Lindblom favors an account of production and perception based on the auditory form of speech and its cognitive elaboration, rejecting the aim of defining a set of articulatory primitives by appealing to theoretical principle, while recognizing the empirical difficulty of identifying a set of acoustic or auditory primitives. An examination of this thesis found opportunities to defend some of its conclusions with independent evidence, but favors a characterization of the constituents of speech perception as linguistic rather than as articulatory or acoustic. PMID:8964930

  13. Acoustic Differences between Humorous and Sincere Communicative Intentions

    ERIC Educational Resources Information Center

    Hoicka, Elena; Gattis, Merideth

    2012-01-01

    Previous studies indicate that the acoustic features of speech discriminate between positive and negative communicative intentions, such as approval and prohibition. Two studies investigated whether acoustic features of speech can discriminate between two positive communicative intentions: humour and sweet-sincerity, where sweet-sincerity involved…

  14. Effect of trapped electron on the dust ion acoustic waves in dusty plasma using time fractional modified Korteweg-de Vries equation

    SciTech Connect

    Nazari-Golshan, A.; Nourazar, S. S.

    2013-10-15

    The time fractional modified Korteweg-de Vries (TFMKdV) equation is solved to study the nonlinear propagation of small but finite amplitude dust ion-acoustic (DIA) solitary waves in un-magnetized dusty plasma with trapped electrons. The plasma is composed of a cold ion fluid, stationary dust grains, and hot electrons obeying a trapped electron distribution. The TFMKdV equation is derived by using the semi-inverse and Agrawal's methods and then solved by the Laplace Adomian decomposition method. Our results show that the amplitude of the DIA solitary waves increases with the increase of time fractional order β, the wave velocity v{sub 0}, and the population of the background free electrons λ. However, it is vice-versa for the deviation from isothermality parameter b, which is in agreement with the result obtained previously.

  15. Research in continuous speech recognition

    NASA Astrophysics Data System (ADS)

    Schwartz, R. M.; Chow, Y. L.; Makhoul, J.

    1983-12-01

    This annual report describes the work performed during the past year in an ongoing effort to design and implement a system that performs phonetic recognition of continuous speech. The general approach used it to develop a Hidden Markov Model (HMM) of speech parameter movements, which can be used to distinguish among the different phonemes. The resulting phoneme models incorporate the contextural effects of neighboring phonemes. One main aspect of this research is to incorporate both spectral parameters and acoustic-phonetic features into the HMM formalism.

  16. Clinical and acoustical variability in hypokinetic dysarthria

    SciTech Connect

    Metter, E.J.; Hanson, W.R.

    1986-10-01

    Ten male patients with parkinsonism secondary to Parkinson's disease or progressive supranuclear palsy had clinical neurological, speech, and acoustical speech evaluations. In addition, seven of the patients were evaluated by x-ray computed tomography (CT) and (F-18)-fluorodeoxyglucose (FDG) positron emission tomography (PET). Extensive variability of speech features, both clinical and acoustical, were found and seemed to be independent of the severity of any parkinsonian sign, CT, or FDG PET. In addition, little relationship existed between the variability across each measured speech feature. What appeared to be important for the appearance of abnormal acoustic measures was the degree of overall severity of the dysarthria. These observations suggest that a better understanding of hypokinetic dysarthria may result from more extensive examination of the variability between patients. Emphasizing a specific feature such as rapid speaking rate in characterizing hypokinetic dysarthria focuses on a single and inconstant finding in a complex speech pattern.

  17. Phrase-level speech simulation with an airway modulation model of speech production

    PubMed Central

    Story, Brad H.

    2012-01-01

    Artificial talkers and speech synthesis systems have long been used as a means of understanding both speech production and speech perception. The development of an airway modulation model is described that simulates the time-varying changes of the glottis and vocal tract, as well as acoustic wave propagation, during speech production. The result is a type of artificial talker that can be used to study various aspects of how sound is generated by humans and how that sound is perceived by a listener. The primary components of the model are introduced and simulation of words and phrases are demonstrated. PMID:23503742

  18. Improving the speech intelligibility in classrooms

    NASA Astrophysics Data System (ADS)

    Lam, Choi Ling Coriolanus

    One of the major acoustical concerns in classrooms is the establishment of effective verbal communication between teachers and students. Non-optimal acoustical conditions, resulting in reduced verbal communication, can cause two main problems. First, they can lead to reduce learning efficiency. Second, they can also cause fatigue, stress, vocal strain and health problems, such as headaches and sore throats, among teachers who are forced to compensate for poor acoustical conditions by raising their voices. Besides, inadequate acoustical conditions can induce the usage of public address system. Improper usage of such amplifiers or loudspeakers can lead to impairment of students' hearing systems. The social costs of poor classroom acoustics will be large to impair the learning of children. This invisible problem has far reaching implications for learning, but is easily solved. Many researches have been carried out that they have accurately and concisely summarized the research findings on classrooms acoustics. Though, there is still a number of challenging questions remaining unanswered. Most objective indices for speech intelligibility are essentially based on studies of western languages. Even several studies of tonal languages as Mandarin have been conducted, there is much less on Cantonese. In this research, measurements have been done in unoccupied rooms to investigate the acoustical parameters and characteristics of the classrooms. The speech intelligibility tests, which based on English, Mandarin and Cantonese, and the survey were carried out on students aged from 5 years old to 22 years old. It aims to investigate the differences in intelligibility between English, Mandarin and Cantonese of the classrooms in Hong Kong. The significance on speech transmission index (STI) related to Phonetically Balanced (PB) word scores will further be developed. Together with developed empirical relationship between the speech intelligibility in classrooms with the variations

  19. Some articulatory details of emotional speech

    NASA Astrophysics Data System (ADS)

    Lee, Sungbok; Yildirim, Serdar; Bulut, Murtaza; Kazemzadeh, Abe; Narayanan, Shrikanth

    2005-09-01

    Differences in speech articulation among four emotion types, neutral, anger, sadness, and happiness are investigated by analyzing tongue tip, jaw, and lip movement data collected from one male and one female speaker of American English. The data were collected using an electromagnetic articulography (EMA) system while subjects produce simulated emotional speech. Pitch, root-mean-square (rms) energy and the first three formants were estimated for vowel segments. For both speakers, angry speech exhibited the largest rms energy and largest articulatory activity in terms of displacement range and movement speed. Happy speech is characterized by largest pitch variability. It has higher rms energy than neutral speech but articulatory activity is rather comparable to, or less than, neutral speech. That is, happy speech is more prominent in voicing activity than in articulation. Sad speech exhibits longest sentence duration and lower rms energy. However, its articulatory activity is no less than neutral speech. Interestingly, for the male speaker, articulation for vowels in sad speech is consistently more peripheral (i.e., more forwarded displacements) when compared to other emotions. However, this does not hold for female subject. These and other results will be discussed in detail with associated acoustics and perceived emotional qualities. [Work supported by NIH.

  20. Development of modified vibration test criteria for qualifying space vehicle components. [subjected to broadband random acoustic excitation

    NASA Technical Reports Server (NTRS)

    Chang, K. Y.; Kao, G. C.

    1974-01-01

    Simplified methods are described to estimate the test criteria of primary structures at component attachment points subjected to broadband random acoustic excitations. The current method utilizes a constant smeared component mass attenuation factor across the frequency range of interest. The developed method indicates that the attenuation factor is based on a frequency dependent ratio of the mechanical impedances of both the component and primary structures. The procedures used to predict the structural responses are considered as the present state-of-the-art and provide satisfactory prediction results. Example problems are used to illustrate the application procedures of the two methods and to compare the significant difference. It was found that the lower test criteria obtained by the impedance ratio method is due to the results of considering the effects of component/primary structure interaction.

  1. Measures to Evaluate the Effects of DBS on Speech Production

    PubMed Central

    Weismer, Gary; Yunusova, Yana; Bunton, Kate

    2011-01-01

    The purpose of this paper is to review and evaluate measures of speech production that could be used to document effects of Deep Brain Stimulation (DBS) on speech performance, especially in persons with Parkinson disease (PD). A small set of evaluative criteria for these measures is presented first, followed by consideration of several speech physiology and speech acoustic measures that have been studied frequently and reported on in the literature on normal speech production, and speech production affected by neuromotor disorders (dysarthria). Each measure is reviewed and evaluated against the evaluative criteria. Embedded within this review and evaluation is a presentation of new data relating speech motions to speech intelligibility measures in speakers with PD, amyotrophic lateral sclerosis (ALS), and control speakers (CS). These data are used to support the conclusion that at the present time the slope of second formant transitions (F2 slope), an acoustic measure, is well suited to make inferences to speech motion and to predict speech intelligibility. The use of other measures should not be ruled out, however, and we encourage further development of evaluative criteria for speech measures designed to probe the effects of DBS or any treatment with potential effects on speech production and communication skills. PMID:24932066

  2. Contemporary Issues in Phoneme Production by Hearing-Impaired Persons: Physiological and Acoustic Aspects.

    ERIC Educational Resources Information Center

    McGarr, Nancy S.; Whitehead, Robert

    1992-01-01

    This paper on physiologic correlates of speech production in children and youth with hearing impairments focuses specifically on the production of phonemes and includes data on respiration for speech production, phonation, speech aerodynamics, articulation, and acoustic analyses of speech by hearing-impaired persons. (Author/DB)

  3. Prediction and constraint in audiovisual speech perception

    PubMed Central

    Peelle, Jonathan E.; Sommers, Mitchell S.

    2015-01-01

    During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing precision of prediction. Electrophysiological studies demonstrate oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to auditory information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration, supported

  4. Correlation study of predictive and descriptive metrics of speech intelligibility

    NASA Astrophysics Data System (ADS)

    Stefaniw, Abigail; Shimizu, Yasushi; Smith, Dana

    2002-11-01

    There exists a wide range of speech-intelligibility metrics, each of which is designed to encapsulate a different aspect of room acoustics that relates to speech intelligibility. This study reviews the different definitions of and correlations between various proposed speech intelligibility measures. Speech Intelligibility metrics can be grouped by two main uses: prediction of designed rooms and description of existing rooms. Two descriptive metrics still under investigation are Ease of Hearing and Acoustical Comfort. These are measured by a simple questionnaire, and their relationships with each other and with significant speech intelligibility metrics are explored. A variety of rooms are modeled and auralized in cooperation with a larger study, including classrooms, lecture halls, and offices. Auralized rooms are used to conveniently provide calculated metrics and cross-talk canceled auralizations for diagnostic and descriptive intelligibility tests. Rooms are modeled in CATT-Acoustic and auralized with a multi-channel speaker array in a hemi-anechoic chamber.

  5. Speech discrimination after early exposure to pulsed-noise or speech

    PubMed Central

    Ranasinghe, Kamalini G.; Carraway, Ryan S.; Borland, Michael S.; Moreno, Nicole A.; Hanacik, Elizabeth A.; Miller, Robert S.; Kilgard, Michael P

    2012-01-01

    Early experience of structured inputs and complex sound features generate lasting changes in tonotopy and receptive field properties of primary auditory cortex (A1). In this study we tested whether these changes are severe enough to alter neural representations and behavioral discrimination of speech. We exposed two groups of rat pups during the critical period of auditory development to pulsed noise or speech. Both groups of rats were trained to discriminate speech sounds when they were young adults, and anesthetized neural responses were recorded from A1. The representation of speech in A1 and behavioral discrimination of speech remained robust to altered spectral and temporal characteristics of A1 neurons after pulsed-noise exposure. Exposure to passive speech during early development provided no added advantage in speech sound processing. Speech training increased A1 neuronal firing rate for speech stimuli in naïve rats, but did not increase responses in rats that experienced early exposure to pulsed noise or speech. Our results suggest that speech sound processing is resistant to changes in simple neural response properties caused by manipulating early acoustic environment. PMID:22575207

  6. Role of superthermality on dust acoustic structures in the frame of a modified Zakharov-Kuznetsov equation in magnetized dusty plasma

    NASA Astrophysics Data System (ADS)

    Sabetkar, Akbar; Dorranian, Davoud

    2015-03-01

    In this paper, a theoretical investigation is presented to study the existence and characteristics of the propagation of dust acoustic (DA) waves in an obliquely propagating magnetized dusty plasma with two populations of ions having two distinct temperatures, electrons that are modeled by three-dimensional nonextensive and κ -distribution functions, respectively, and negative dust particles. Normal mode analysis (reductive perturbation method) is used to derive lower- and higher-order nonlinear equations governing the evolution of small but finite amplitude DA waves, namely the Zakharov-Kuznetsov (ZK) and the modified Zakharov-Kuznetsov (mZK) equations. The basic features (e.g., amplitude, width, phase speed, polarity) of both DA ZK and mZK solitons have been thoroughly examined by the numerical analysis of their equations. It is observed that the characteristics and properties of the DA solitary waves (DASWs) are significantly modified by the superthermality of electrons, nonextensivity of ions, cold-to-hot ion temperature ratio, relative number densities of two species of ions, and the strength of the magnetic field and obliqueness of the system. Furthermore, it has been found that the DA ZK solitons exhibit only negative polarity of solitary waves when the superthermality of electrons, nonextensivity of ions, and temperature ratio of ions are smaller or greater than their critical values. The present study may add to the understanding of the nonlinear propagation features of DA wave structures in high-energy astrophysical plasma systems.

  7. Acoustic neuroma

    MedlinePlus

    Vestibular schwannoma; Tumor - acoustic; Cerebellopontine angle tumor; Angle tumor ... Acoustic neuromas have been linked with the genetic disorder neurofibromatosis type 2 (NF2). Acoustic neuromas are uncommon.

  8. Speech Enhancement Using Microphone Arrays.

    NASA Astrophysics Data System (ADS)

    Adugna, Eneyew

    Arrays of sensors have been employed effectively in communication systems for the directional transmission and reception of electromagnetic waves. Among the numerous benefits, this helps improve the signal-to-interference ratio (SIR) of the signal at the receiver. Arrays have since been used in related areas that employ propagating waves for the transmission of information. Several investigators have successfully adopted array principles to acoustics, sonar, seismic, and medical imaging. In speech applications the microphone is used as the sensor for acoustic data acquisition. The performance of subsequent speech processing algorithms--such as speech recognition or speaker recognition--relies heavily on the level of interference within the transduced or recorded speech signal. The normal practice is to use a single, hand-held or head-mounted, microphone. Under most environmental conditions, i.e., environments where other acoustic sources are also active, the speech signal from a single microphone is a superposition of acoustic signals present in the environment. Such cases represent a lower SIR value. To alleviate this problem an array of microphones--linear array, planar array, and 3-dimensional arrays--have been suggested and implemented. This work focuses on microphone arrays in room environments where reverberation is the main source of interference. The acoustic wave incident on the array from a point source is sampled and recorded by a linear array of sensors along with reflected waves. Array signal processing algorithms are developed and used to remove reverberations from the signal received by the array. Signals from other positions are considered as interference. Unlike most studies that deal with plane waves, we base our algorithm on spherical waves originating at a source point. This is especially true for room environments. The algorithm consists of two stages--a first stage to locate the source and a second stage to focus on the source. The first part

  9. Inverse solution of speech production based on perturbation theory and its application to articulatory speech synthesis

    NASA Astrophysics Data System (ADS)

    Yu, Zhenli

    1998-12-01

    The inverse solution of speech production for formant targets of vowels and vowel-to-vowel transitions is studied. Band-limited Fourier cosine expansion of vocal- tract area function or its logarithm is used to model the vocal-tract shape. The inverse solution is based on the perturbation theory of speech production incorporate with a fast calculation of the vocal-tract system. An interpolation method for dynamic constraint on the unobservable zeros and vocal-tract length along the transition between the endpoint of vowel-to-vowel transition is proposed. A unique mapping acoustic-to- geometry codebook is used to match the zeros and vocal tract length of the endpoint. The codebook is designed by geometrical and acoustical constraints. Computer simulation of the evaluation of the inverse solution shows reasonable results with respect to the naturalness of transition behavior of the vocal-tract area function. An articulatory synthesizer with a reflection-type line analog model which is driven by vocal-tract area is implemented. Synthesis evaluation of the performance of the inverse solution for vowel-to-vowel transitions as well as for isolated vowels is conducted. The resultant spectrogram vision and perceptual listening of the synthetic sounds is satisfactory. Quantitative comparison in forms of formant traces reveals fairly good matching of the formants of synthetic sounds to the original one. A novel formant targeted articulatory synthesis, as an application of the inverse solution, is proposed. The entire system consists of an inverse module and a reflection-type line analog model. The synthesizer needs only the first three formant trajectories, pitch contour and amplitude as input parameters. A formant mimic synthesis in which the input parameters can be artificially specified and a formant copy synthesis in which the input parameters are obtained by estimation from real speech sound are implemented. The formant trace or pitch contour can be separately modified

  10. Recognition of information-bearing elements in speech

    NASA Astrophysics Data System (ADS)

    Hermansky, Hynek

    2003-10-01

    An acoustic speech signal carries many different kinds of information: the basic linguistic message, many characteristics of the speaker of the message, details of the environment in which the message was produced and transmitted, etc. The human auditory/cognitive system is able to detect, decode, and separate all these information sources. Understanding this ability and emulating it on a machine has been an important but elusive scientific and engineering goal for a long time. This talk critically surveys the situation in the speech recognition field. It puts automatic recognition of speech in perspective with other acoustic signal detection and classification tasks, reviews some historical, contemporary, and evolving techniques for machine recognition of speech, critically compares competing techniques, and gives some examples of applications in speech, speaker, and language recognition and identification. The talk is intended for an audience interested but not directly involved in the processing of speech.

  11. Coding pitch differences in voiceless fricatives: Whispered relative to normal speech.

    PubMed

    Heeren, Willemijn F L

    2015-12-01

    Intonation can be perceived in whispered speech despite the absence of the fundamental frequency. In the past, acoustic correlates of pitch in whisper have been sought in vowel content, but, recently, studies of normal speech demonstrated correlates of intonation in consonants as well. This study examined how consonants may contribute to the coding of intonation in whispered relative to normal speech. The acoustic characteristics of whispered, voiceless fricatives /s/ and /f/, produced at different pitch targets (low, mid, high), were investigated and compared to corresponding normal speech productions to assess if whisper contained secondary or compensatory pitch correlates. Furthermore, listener sensitivity to fricative cues to pitch in whisper was established, also relative to normal speech. Consistent with recent studies, acoustic correlates of whispered and normal speech fricatives systematically varied with pitch target. Comparable findings across speech modes showed that acoustic correlates were secondary. Discrimination of vowel-fricative-vowel stimuli was less accurate and slower in whispered than normal speech, which is attributed to differences in acoustic cues available. Perception of fricatives presented without their vowel contexts, however, revealed comparable processing speeds and response accuracies between speech modes, supporting the finding that within fricatives, acoustic correlates of pitch are similar across speech modes. PMID:26723300

  12. Speech recognition based on pattern recognition techniques

    NASA Astrophysics Data System (ADS)

    Rabiner, Lawrence R.

    1990-05-01

    Algorithms for speech recognition can be characterized broadly as pattern recognition approaches and acoustic phonetic approaches. To date, the greatest degree of success in speech recognition has been obtained using pattern recognition paradigms. The use of pattern recognition techniques were applied to the problems of isolated word (or discrete utterance) recognition, connected word recognition, and continuous speech recognition. It is shown that understanding (and consequently the resulting recognizer performance) is best to the simplest recognition tasks and is considerably less well developed for large scale recognition systems.

  13. Modeling words with subword units in an articulatorily constrained speech recognition algorithm

    SciTech Connect

    Hogden, J.

    1997-11-20

    The goal of speech recognition is to find the most probable word given the acoustic evidence, i.e. a string of VQ codes or acoustic features. Speech recognition algorithms typically take advantage of the fact that the probability of a word, given a sequence of VQ codes, can be calculated.

  14. Articulatory Mediation of Speech Perception: A Causal Analysis of Multi-Modal Imaging Data

    ERIC Educational Resources Information Center

    Gow, David W., Jr.; Segawa, Jennifer A.

    2009-01-01

    The inherent confound between the organization of articulation and the acoustic-phonetic structure of the speech signal makes it exceptionally difficult to evaluate the competing claims of motor and acoustic-phonetic accounts of how listeners recognize coarticulated speech. Here we use Granger causation analysis of high spatiotemporal resolution…

  15. Compensation for Coarticulation: Disentangling Auditory and Gestural Theories of Perception of Coarticulatory Effects in Speech

    ERIC Educational Resources Information Center

    Viswanathan, Navin; Magnuson, James S.; Fowler, Carol A.

    2010-01-01

    According to one approach to speech perception, listeners perceive speech by applying general pattern matching mechanisms to the acoustic signal (e.g., Diehl, Lotto, & Holt, 2004). An alternative is that listeners perceive the phonetic gestures that structured the acoustic signal (e.g., Fowler, 1986). The two accounts have offered different…

  16. [Improving the speech with a prosthetic construction].

    PubMed

    Stalpers, M J; Engelen, M; van der Stappen, J A A M; Weijs, W L J; Takes, R P; van Heumen, C C M

    2016-03-01

    A 12-year-old boy had problems with his speech due to a defect in the soft palate. This defect was caused by the surgical removal of a synovial sarcoma. Testing with a nasometer revealed hypernasality above normal values. Given the size and severity of the defect in the soft palate, the possibility of improving the speech with speech therapy was limited. At a centre for special dentistry an attempt was made with a prosthetic construction to improve the performance of the palate and, in that way, the speech. This construction consisted of a denture with an obturator attached to it. With it, an effective closure of the palate could be achieved. New measurements with acoustic nasometry showed scores within the normal values. The nasality in the speech largely disappeared. The obturator is an effective and relatively easy solution for palatal insufficiency resulting from surgical resection. Intrusive reconstructive surgery can be avoided in this way. PMID:26973984

  17. Strategies for distant speech recognitionin reverberant environments

    NASA Astrophysics Data System (ADS)

    Delcroix, Marc; Yoshioka, Takuya; Ogawa, Atsunori; Kubo, Yotaro; Fujimoto, Masakiyo; Ito, Nobutaka; Kinoshita, Keisuke; Espi, Miquel; Araki, Shoko; Hori, Takaaki; Nakatani, Tomohiro

    2015-12-01

    Reverberation and noise are known to severely affect the automatic speech recognition (ASR) performance of speech recorded by distant microphones. Therefore, we must deal with reverberation if we are to realize high-performance hands-free speech recognition. In this paper, we review a recognition system that we developed at our laboratory to deal with reverberant speech. The system consists of a speech enhancement (SE) front-end that employs long-term linear prediction-based dereverberation followed by noise reduction. We combine our SE front-end with an ASR back-end that uses neural networks for acoustic and language modeling. The proposed system achieved top scores on the ASR task of the REVERB challenge. This paper describes the different technologies used in our system and presents detailed experimental results that justify our implementation choices and may provide hints for designing distant ASR systems.

  18. Cross-Channel Amplitude Sweeps Are Crucial to Speech Intelligibility

    ERIC Educational Resources Information Center

    Prendergast, Garreth; Green, Gary G. R.

    2012-01-01

    Classical views of speech perception argue that the static and dynamic characteristics of spectral energy peaks (formants) are the acoustic features that underpin phoneme recognition. Here we use representations where the amplitude modulations of sub-band filtered speech are described, precisely, in terms of co-sinusoidal pulses. These pulses are…

  19. Infant-Directed Speech Is Modulated by Infant Feedback

    ERIC Educational Resources Information Center

    Smith, Nicholas A.; Trainor, Laurel J.

    2008-01-01

    When mothers engage in infant-directed (ID) speech, their voices change in a number of characteristic ways, including adopting a higher overall pitch. Studies have examined these acoustical cues and have tested infants' preferences for ID speech. However, little is known about how these cues change with maternal sensitivity to infant feedback in…

  20. The Effects of Macroglossia on Speech: A Case Study

    ERIC Educational Resources Information Center

    Mekonnen, Abebayehu Messele

    2012-01-01

    This article presents a case study of speech production in a 14-year-old Amharic-speaking boy. The boy had developed secondary macroglossia, related to a disturbance of growth hormones, following a history of normal speech development. Perceptual analysis combined with acoustic analysis and static palatography is used to investigate the specific…

  1. A Model for Speech Processing in Second Language Listening Activities

    ERIC Educational Resources Information Center

    Zoghbor, Wafa Shahada

    2016-01-01

    Teachers' understanding of the process of speech perception could inform practice in listening classrooms. Catford (1950) developed a model for speech perception taking into account the influence of the acoustic features of the linguistic forms used by the speaker, whereby the listener "identifies" and "interprets" these…

  2. Integrated Spacesuit Audio System Enhances Speech Quality and Reduces Noise

    NASA Technical Reports Server (NTRS)

    Huang, Yiteng Arden; Chen, Jingdong; Chen, Shaoyan Sharyl

    2009-01-01

    A new approach has been proposed for increasing astronaut comfort and speech capture. Currently, the special design of a spacesuit forms an extreme acoustic environment making it difficult to capture clear speech without compromising comfort. The proposed Integrated Spacesuit Audio (ISA) system is to incorporate the microphones into the helmet and use software to extract voice signals from background noise.

  3. The Pump-Valve Model of Speech Articulation.

    ERIC Educational Resources Information Center

    Dew, Donald

    The traditional respiration-phonation-articulation-resonation model of speech production which permeates introductory literature is not the only suitable model of this process. The pump-valve model, which derives from the acoustic theory of speech production, is a viable alternative. This newer model is also consistent with modern theories. It…

  4. Does Signal Degradation Affect Top-Down Processing of Speech?

    PubMed

    Wagner, Anita; Pals, Carina; de Blecourt, Charlotte M; Sarampalis, Anastasios; Başkent, Deniz

    2016-01-01

    Speech perception is formed based on both the acoustic signal and listeners' knowledge of the world and semantic context. Access to semantic information can facilitate interpretation of degraded speech, such as speech in background noise or the speech signal transmitted via cochlear implants (CIs). This paper focuses on the latter, and investigates the time course of understanding words, and how sentential context reduces listeners' dependency on the acoustic signal for natural and degraded speech via an acoustic CI simulation.In an eye-tracking experiment we combined recordings of listeners' gaze fixations with pupillometry, to capture effects of semantic information on both the time course and effort of speech processing. Normal-hearing listeners were presented with sentences with or without a semantically constraining verb (e.g., crawl) preceding the target (baby), and their ocular responses were recorded to four pictures, including the target, a phonological (bay) competitor and a semantic (worm) and an unrelated distractor.The results show that in natural speech, listeners' gazes reflect their uptake of acoustic information, and integration of preceding semantic context. Degradation of the signal leads to a later disambiguation of phonologically similar words, and to a delay in integration of semantic information. Complementary to this, the pupil dilation data show that early semantic integration reduces the effort in disambiguating phonologically similar words. Processing degraded speech comes with increased effort due to the impoverished nature of the signal. Delayed integration of semantic information further constrains listeners' ability to compensate for inaudible signals. PMID:27080670

  5. Near-Term Fetuses Process Temporal Features of Speech

    ERIC Educational Resources Information Center

    Granier-Deferre, Carolyn; Ribeiro, Aurelie; Jacquet, Anne-Yvonne; Bassereau, Sophie

    2011-01-01

    The perception of speech and music requires processing of variations in spectra and amplitude over different time intervals. Near-term fetuses can discriminate acoustic features, such as frequencies and spectra, but whether they can process complex auditory streams, such as speech sequences and more specifically their temporal variations, fast or…

  6. Across-formant integration and speech intelligibility: Effects of acoustic source properties in the presence and absence of a contralateral interferer.

    PubMed

    Summers, Robert J; Bailey, Peter J; Roberts, Brian

    2016-08-01

    The role of source properties in across-formant integration was explored using three-formant (F1+F2+F3) analogues of natural sentences (targets). In experiment 1, F1+F3 were harmonic analogues (H1+H3) generated using a monotonous buzz source and second-order resonators; in experiment 2, F1+F3 were tonal analogues (T1+T3). F2 could take either form (H2 or T2). Target formants were always presented monaurally; the receiving ear was assigned randomly on each trial. In some conditions, only the target was present; in others, a competitor for F2 (F2C) was presented contralaterally. Buzz-excited or tonal competitors were created using the time-reversed frequency and amplitude contours of F2. Listeners must reject F2C to optimize keyword recognition. Whether or not a competitor was present, there was no effect of source mismatch between F1+F3 and F2. The impact of adding F2C was modest when it was tonal but large when it was harmonic, irrespective of whether F2C matched F1+F3. This pattern was maintained when harmonic and tonal counterparts were loudness-matched (experiment 3). Source type and competition, rather than acoustic similarity, governed the phonetic contribution of a formant. Contrary to earlier research using dichotic targets, requiring across-ear integration to optimize intelligibility, H2C was an equally effective informational masker for H2 as for T2. PMID:27586751

  7. Formant trajectory characteristics in speakers with dysarthria and homogeneous speech intelligibility scores: Further data

    NASA Astrophysics Data System (ADS)

    Kim, Yunjung; Weismer, Gary; Kent, Ray D.

    2005-09-01

    In previous work [J. Acoust. Soc. Am. 117, 2605 (2005)], we reported on formant trajectory characteristics of a relatively large number of speakers with dysarthria and near-normal speech intelligibility. The purpose of that analysis was to begin a documentation of the variability, within relatively homogeneous speech-severity groups, of acoustic measures commonly used to predict across-speaker variation in speech intelligibility. In that study we found that even with near-normal speech intelligibility (90%-100%), many speakers had reduced formant slopes for some words and distributional characteristics of acoustic measures that were different than values obtained from normal speakers. In the current report we extend those findings to a group of speakers with dysarthria with somewhat poorer speech intelligibility than the original group. Results are discussed in terms of the utility of certain acoustic measures as indices of speech intelligibility, and as explanatory data for theories of dysarthria. [Work supported by NIH Award R01 DC00319.

  8. Acoustic Emphasis in Four Year Olds

    ERIC Educational Resources Information Center

    Wonnacott, Elizabeth; Watson, Duane G.

    2008-01-01

    Acoustic emphasis may convey a range of subtle discourse distinctions, yet little is known about how this complex ability develops in children. This paper presents a first investigation of the factors which influence the production of acoustic prominence in young children's spontaneous speech. In a production experiment, SVO sentences were…

  9. Audiovisual Speech Perception in Children with Developmental Language Disorder in Degraded Listening Conditions

    ERIC Educational Resources Information Center

    Meronen, Auli; Tiippana, Kaisa; Westerholm, Jari; Ahonen, Timo

    2013-01-01

    Purpose: The effect of the signal-to-noise ratio (SNR) on the perception of audiovisual speech in children with and without developmental language disorder (DLD) was investigated by varying the noise level and the sound intensity of acoustic speech. The main hypotheses were that the McGurk effect (in which incongruent visual speech alters the…

  10. The Hypothesis of Apraxia of Speech in Children with Autism Spectrum Disorder

    ERIC Educational Resources Information Center

    Shriberg, Lawrence D.; Paul, Rhea; Black, Lois M.; van Santen, Jan P.

    2011-01-01

    In a sample of 46 children aged 4-7 years with Autism Spectrum Disorder (ASD) and intelligible speech, there was no statistical support for the hypothesis of concomitant Childhood Apraxia of Speech (CAS). Perceptual and acoustic measures of participants' speech, prosody, and voice were compared with data from 40 typically-developing children, 13…

  11. Children's Perception of Speech Produced in a Two-Talker Background

    ERIC Educational Resources Information Center

    Baker, Mallory; Buss, Emily; Jacks, Adam; Taylor, Crystal; Leibold, Lori J.

    2014-01-01

    Purpose: This study evaluated the degree to which children benefit from the acoustic modifications made by talkers when they produce speech in noise. Method: A repeated measures design compared the speech perception performance of children (5-11 years) and adults in a 2-talker masker. Target speech was produced in a 2-talker background or in…

  12. Effects of Elicitation Task Variables on Speech Production by Children with Cochlear Implants

    ERIC Educational Resources Information Center

    McCleary, Elizabeth A.; Ide-Helvie, Dana L.; Lotto, Andrew J.; Carney, Arlene Earley; Higgins, Maureen B.

    2007-01-01

    Given the interest in comparing speech production development in children with normal hearing and hearing impairment, it is important to evaluate how variables within speech elicitation tasks can differentially affect the acoustics of speech production for these groups. In a first experiment, children (6-14 years old) with cochlear implants…

  13. Speech Processing Application Based on Phonetics and Phonology of the Polish Language

    NASA Astrophysics Data System (ADS)

    Kłosowski, Piotr

    The article presents methods of improving speech processing based on phonetics and phonology of Polish language. The new presented method for speech recognition was based on detection of distinctive acoustic parameters of phonemes in Polish language. Distinctivity has been assumed as the most important selection of parameters, which have represented objects from recognized classes. Speech recognition is widely used in telecommunications applications.

  14. Free Speech Yearbook: 1972.

    ERIC Educational Resources Information Center

    Tedford, Thomas L., Ed.

    This book is a collection of essays on free speech issues and attitudes, compiled by the Commission on Freedom of Speech of the Speech Communication Association. Four articles focus on freedom of speech in classroom situations as follows: a philosophic view of teaching free speech, effects of a course on free speech on student attitudes,…

  15. Lip Movement Exaggerations during Infant-Directed Speech

    ERIC Educational Resources Information Center

    Green, Jordan R.; Nip, Ignatius S. B.; Wilson, Erin M.; Mefferd, Antje S.; Yunusova, Yana

    2010-01-01

    Purpose: Although a growing body of literature has identified the positive effects of visual speech on speech and language learning, oral movements of infant-directed speech (IDS) have rarely been studied. This investigation used 3-dimensional motion capture technology to describe how mothers modify their lip movements when talking to their…

  16. New Ways in Teaching Connected Speech. New Ways Series

    ERIC Educational Resources Information Center

    Brown, James Dean, Ed.

    2012-01-01

    Connected speech is based on a set of rules used to modify pronunciations so that words connect and flow more smoothly in natural speech (hafta versus have to). Native speakers of English tend to feel that connected speech is friendlier, more natural, more sympathetic, and more personal. Is there any reason why learners of English would prefer to…

  17. Speech analyzer

    NASA Technical Reports Server (NTRS)

    Lokerson, D. C. (Inventor)

    1977-01-01

    A speech signal is analyzed by applying the signal to formant filters which derive first, second and third signals respectively representing the frequency of the speech waveform in the first, second and third formants. A first pulse train having approximately a pulse rate representing the average frequency of the first formant is derived; second and third pulse trains having pulse rates respectively representing zero crossings of the second and third formants are derived. The first formant pulse train is derived by establishing N signal level bands, where N is an integer at least equal to two. Adjacent ones of the signal bands have common boundaries, each of which is a predetermined percentage of the peak level of a complete cycle of the speech waveform.

  18. Speech Research

    NASA Astrophysics Data System (ADS)

    Several articles addressing topics in speech research are presented. The topics include: exploring the functional significance of physiological tremor: A biospectroscopic approach; differences between experienced and inexperienced listeners to deaf speech; a language-oriented view of reading and its disabilities; Phonetic factors in letter detection; categorical perception; Short-term recall by deaf signers of American sign language; a common basis for auditory sensory storage in perception and immediate memory; phonological awareness and verbal short-term memory; initiation versus execution time during manual and oral counting by stutterers; trading relations in the perception of speech by five-year-old children; the role of the strap muscles in pitch lowering; phonetic validation of distinctive features; consonants and syllable boundaires; and vowel information in postvocalic frictions.

  19. Start/End Delays of Voiced and Unvoiced Speech Signals

    SciTech Connect

    Herrnstein, A

    1999-09-24

    Recent experiments using low power EM-radar like sensors (e.g, GEMs) have demonstrated a new method for measuring vocal fold activity and the onset times of voiced speech, as vocal fold contact begins to take place. Similarly the end time of a voiced speech segment can be measured. Secondly it appears that in most normal uses of American English speech, unvoiced-speech segments directly precede or directly follow voiced-speech segments. For many applications, it is useful to know typical duration times of these unvoiced speech segments. A corpus, assembled earlier of spoken ''Timit'' words, phrases, and sentences and recorded using simultaneously measured acoustic and EM-sensor glottal signals, from 16 male speakers, was used for this study. By inspecting the onset (or end) of unvoiced speech, using the acoustic signal, and the onset (or end) of voiced speech using the EM sensor signal, the average duration times for unvoiced segments preceding onset of vocalization were found to be 300ms, and for following segments, 500ms. An unvoiced speech period is then defined in time, first by using the onset of the EM-sensed glottal signal, as the onset-time marker for the voiced speech segment and end marker for the unvoiced segment. Then, by subtracting 300ms from the onset time mark of voicing, the unvoiced speech segment start time is found. Similarly, the times for a following unvoiced speech segment can be found. While data of this nature have proven to be useful for work in our laboratory, a great deal of additional work remains to validate such data for use with general populations of users. These procedures have been useful for applying optimal processing algorithms over time segments of unvoiced, voiced, and non-speech acoustic signals. For example, these data appear to be of use in speaker validation, in vocoding, and in denoising algorithms.

  20. Speech processing based on short-time Fourier analysis

    SciTech Connect

    Portnoff, M.R.

    1981-06-02

    Short-time Fourier analysis (STFA) is a mathematical technique that represents nonstationary signals, such as speech, music, and seismic signals in terms of time-varying spectra. This representation provides a formalism for such intuitive notions as time-varying frequency components and pitch contours. Consequently, STFA is useful for speech analysis and speech processing. This paper shows that STFA provides a convenient technique for estimating and modifying certain perceptual parameters of speech. As an example of an application of STFA of speech, the problem of time-compression or expansion of speech, while preserving pitch and time-varying frequency content is presented.

  1. Spacecraft Internal Acoustic Environment Modeling

    NASA Technical Reports Server (NTRS)

    Chu, SShao-sheng R.; Allen, Christopher S.

    2009-01-01

    Acoustic modeling can be used to identify key noise sources, determine/analyze sub-allocated requirements, keep track of the accumulation of minor noise sources, and to predict vehicle noise levels at various stages in vehicle development, first with estimates of noise sources, later with experimental data. In FY09, the physical mockup developed in FY08, with interior geometric shape similar to Orion CM (Crew Module) IML (Interior Mode Line), was used to validate SEA (Statistical Energy Analysis) acoustic model development with realistic ventilation fan sources. The sound power levels of these sources were unknown a priori, as opposed to previous studies that RSS (Reference Sound Source) with known sound power level was used. The modeling results were evaluated based on comparisons to measurements of sound pressure levels over a wide frequency range, including the frequency range where SEA gives good results. Sound intensity measurement was performed over a rectangular-shaped grid system enclosing the ventilation fan source. Sound intensities were measured at the top, front, back, right, and left surfaces of the and system. Sound intensity at the bottom surface was not measured, but sound blocking material was placed tinder the bottom surface to reflect most of the incident sound energy back to the remaining measured surfaces. Integrating measured sound intensities over measured surfaces renders estimated sound power of the source. The reverberation time T6o of the mockup interior had been modified to match reverberation levels of ISS US Lab interior for speech frequency bands, i.e., 0.5k, 1k, 2k, 4 kHz, by attaching appropriately sized Thinsulate sound absorption material to the interior wall of the mockup. Sound absorption of Thinsulate was modeled in three methods: Sabine equation with measured mockup interior reverberation time T60, layup model based on past impedance tube testing, and layup model plus air absorption correction. The evaluation/validation was

  2. Dust ion acoustic travelling waves in the framework of a modified Kadomtsev-Petviashvili equation in a magnetized dusty plasma with superthermal electrons

    NASA Astrophysics Data System (ADS)

    Saha, Asit; Chatterjee, Prasanta

    2014-02-01

    For the critical values of the parameters q and V, the work (Samanta et al. in Phys. Plasma 20:022111, 2013b) is unable to describe the nonlinear wave features in magnetized dusty plasma with superthermal electrons. To describe the nonlinear wave features for critical values of the parameters q and V, we extend the work (Samanta et al. in Phys. Plasma 20:022111, 2013b). To extend the work, we derive the modified Kadomtsev-Petviashvili (MKP) equation for dust ion acoustic waves in a magnetized dusty plasma with q-nonextensive velocity distributed electrons by considering higher order coefficients of ɛ. By applying the bifurcation theory of planar dynamical systems to this MKP equation, the existence of solitary wave solutions of both types rarefactive and compressive, periodic travelling wave solutions and kink and anti-kink wave solutions is proved. Three exact solutions of these above waves are determined. The present study could be helpful for understanding the nonlinear travelling waves propagating in mercury, solar wind, Saturn and in magnetosphere of the Earth.

  3. Talker Versus Dialect Effects on Speech Intelligibility: A Symmetrical Study.

    PubMed

    McCloy, Daniel R; Wright, Richard A; Souza, Pamela E

    2015-09-01

    This study investigates the relative effects of talker-specific variation and dialect-based variation on speech intelligibility. Listeners from two dialects of American English performed speech-in-noise tasks with sentences spoken by talkers of each dialect. An initial statistical model showed no significant effects for either talker or listener dialect group, and no interaction. However, a mixed-effects regression model including several acoustic measures of the talker's speech revealed a subtle effect of talker dialect once the various acoustic dimensions were accounted for. Results are discussed in relation to other recent studies of cross-dialect intelligibility. PMID:26529902

  4. Talker versus dialect effects on speech intelligibility: a symmetrical study

    PubMed Central

    McCloy, Daniel R.; Wright, Richard A.; Souza, Pamela E.

    2014-01-01

    This study investigates the relative effects of talker-specific variation and dialect-based variation on speech intelligibility. Listeners from two dialects of American English performed speech-in-noise tasks with sentences spoken by talkers of each dialect. An initial statistical model showed no significant effects for either talker or listener dialect group, and no interaction. However, a mixed-effects regression model including several acoustic measures of the talker’s speech revealed a subtle effect of talker dialect once the various acoustic dimensions were accounted for. Results are discussed in relation to other recent studies of cross-dialect intelligibility. PMID:26529902

  5. Learning Vowel Categories from Maternal Speech in Gurindji Kriol

    ERIC Educational Resources Information Center

    Jones, Caroline; Meakins, Felicity; Muawiyath, Shujau

    2012-01-01

    Distributional learning is a proposal for how infants might learn early speech sound categories from acoustic input before they know many words. When categories in the input differ greatly in relative frequency and overlap in acoustic space, research in bilingual development suggests that this affects the course of development. In the present…

  6. Intensity Accents in French 2 Year Olds' Speech.

    ERIC Educational Resources Information Center

    Allen, George D.

    The acoustic features and functions of accentuation in French are discussed, and features of accentuation in the speech of French 2-year-olds are explored. The four major acoustic features used to signal accentual distinctions are fundamental frequency of voicing, duration of segments and syllables, intensity of segments and syllables, and…

  7. Speech Intelligibility

    NASA Astrophysics Data System (ADS)

    Brand, Thomas

    Speech intelligibility (SI) is important for different fields of research, engineering and diagnostics in order to quantify very different phenomena like the quality of recordings, communication and playback devices, the reverberation of auditoria, characteristics of hearing impairment, benefit using hearing aids or combinations of these things.

  8. Speech Improvement.

    ERIC Educational Resources Information Center

    Gordon, Morton J.

    This book serves as a guide for the native and non-native speaker of English in overcoming various problems in articulation, rhythm, and intonation. It is also useful in group therapy speech programs. Forty-five practice chapters offer drill materials for all the vowels, diphthongs, and consonants of American English plus English stress and…

  9. Differentiating speech and nonspeech sounds via amplitude envelope cues

    NASA Astrophysics Data System (ADS)

    Lehnhoff, Robert J.; Strange, Winifred; Long, Glenis

    2001-05-01

    Recent evidence from neuroscience and behavioral speech science suggests that the temporal modulation pattern of the speech signal plays a distinctive role in speech perception. As a first step in exploring the nature of the perceptually relevant information in the temporal pattern of speech, this experiment examined whether speech versus nonspeech environmental sounds could be differentiated on the basis of their amplitude envelopes. Conversational speech was recorded from native speakers of six different languages (French, German, Hebrew, Hindi, Japanese, and Russian) along with samples of their English. Nonspeech sounds included animal vocalizations, water sounds, and other environmental sounds (e.g., thunder). The stimulus set included 30 2-s speech segments and 30 2-s nonspeech events. Frequency information was removed from all stimuli using a technique described by Dorman et al. [J. Acoust. Soc. Am. 102 (1997)]. Nine normal-hearing adult listeners participated in the experiment. Subjects decided whether each sound was (originally) speech or nonspeech and rated their confidence (7-point Likert scale). Overall, subjects differentiated speech from nonspeech very accurately (84% correct). Only 12 stimuli were not correctly categorized at greater than chance levels. Acoustical analysis is underway to determine what parameters of the amplitude envelope differentiate speech from nonspeech sounds.

  10. Enhancement of Electrolaryngeal Speech by Adaptive Filtering.

    ERIC Educational Resources Information Center

    Espy-Wilson, Carol Y.; Chari, Venkatesh R.; MacAuslan, Joel M.; Huang, Caroline B.; Walsh, Michael J.

    1998-01-01

    A study tested the quality and intelligibility, as judged by several listeners, of four users' electrolaryngeal speech, with and without filtering to compensate for perceptually objectionable acoustic characteristics. Results indicated that an adaptive filtering technique produced a noticeable improvement in the quality of the Transcutaneous…

  11. Effects of Syntactic Expectations on Speech Segmentation

    ERIC Educational Resources Information Center

    Mattys, Sven L.; Melhorn, James F.; White, Laurence

    2007-01-01

    Although the effect of acoustic cues on speech segmentation has been extensively investigated, the role of higher order information (e.g., syntax) has received less attention. Here, the authors examined whether syntactic expectations based on subject-verb agreement have an effect on segmentation and whether they do so despite conflicting acoustic…

  12. Kinematic Event Patterns in Speech: Special Problems.

    ERIC Educational Resources Information Center

    Westbury, John R.; Severson, Elizabeth J.; Lindstrom, Mary J.

    2000-01-01

    Results from a new analysis of synchronous acoustic and fleshpoint-kinematic data, recorded from 53 normal young-adult speakers of American English, are reported. The kinematic data represent speech-related actions of the tongue blade and dorsum, both lips, and the mandible, during the test words, "special" and "problem," and were drawn from an…

  13. Speech and Voice in Instructional Programmes.

    ERIC Educational Resources Information Center

    Jaspers, Fons

    1994-01-01

    Describes the application of audio as a vehicle of information. In applying audio to the audiovisual, computer-assisted instruction format, a consideration of the aspects of dominance and redundancy in auditory-visual presentation is required. Understanding acoustic and informational characteristics of audio and qualities of voice and speech may…

  14. Effects of Cognitive Load on Speech Recognition

    ERIC Educational Resources Information Center

    Mattys, Sven L.; Wiget, Lukas

    2011-01-01

    The effect of cognitive load (CL) on speech recognition has received little attention despite the prevalence of CL in everyday life, e.g., dual-tasking. To assess the effect of CL on the interaction between lexically-mediated and acoustically-mediated processes, we measured the magnitude of the "Ganong effect" (i.e., lexical bias on phoneme…

  15. GALLAUDET'S NEW HEARING AND SPEECH CENTER.

    ERIC Educational Resources Information Center

    FRISINA, D. ROBERT

    THIS REPROT DESCRIBES THE DESIGN OF A NEW SPEECH AND HEARING CENTER AND ITS INTEGRATION INTO THE OVERALL ARCHITECTURAL SCHEME OF THE CAMPUS. THE CIRCULAR SHAPE WAS SELECTED TO COMPLEMENT THE SURROUNDING STRUCTURES AND COMPENSATE FOR DIFFERENCES IN SITE, WHILE PROVIDING THE ACOUSTICAL ADVANTAGES OF NON-PARALLEL WALLS, AND FACILITATING TRAFFIC FLOW.…

  16. Speech neglect: A strange educational blind spot

    NASA Astrophysics Data System (ADS)

    Harris, Katherine Safford

    2005-09-01

    Speaking is universally acknowledged as an important human talent, yet as a topic of educated common knowledge, it is peculiarly neglected. Partly, this is a consequence of the relatively recent growth of research on speech perception, production, and development, but also a function of the way that information is sliced up by undergraduate colleges. Although the basic acoustic mechanism of vowel production was known to Helmholtz, the ability to view speech production as a physiological event is evolving even now with such techniques as fMRI. Intensive research on speech perception emerged only in the early 1930s as Fletcher and the engineers at Bell Telephone Laboratories developed the transmission of speech over telephone lines. The study of speech development was revolutionized by the papers of Eimas and his colleagues on speech perception in infants in the 1970s. Dissemination of knowledge in these fields is the responsibility of no single academic discipline. It forms a center for two departments, Linguistics, and Speech and Hearing, but in the former, there is a heavy emphasis on other aspects of language than speech and, in the latter, a focus on clinical practice. For psychologists, it is a rather minor component of a very diverse assembly of topics. I will focus on these three fields in proposing possible remedies.

  17. The Modulation Transfer Function for Speech Intelligibility

    PubMed Central

    Elliott, Taffeta M.; Theunissen, Frédéric E.

    2009-01-01

    We systematically determined which spectrotemporal modulations in speech are necessary for comprehension by human listeners. Speech comprehension has been shown to be robust to spectral and temporal degradations, but the specific relevance of particular degradations is arguable due to the complexity of the joint spectral and temporal information in the speech signal. We applied a novel modulation filtering technique to recorded sentences to restrict acoustic information quantitatively and to obtain a joint spectrotemporal modulation transfer function for speech comprehension, the speech MTF. For American English, the speech MTF showed the criticality of low modulation frequencies in both time and frequency. Comprehension was significantly impaired when temporal modulations <12 Hz or spectral modulations <4 cycles/kHz were removed. More specifically, the MTF was bandpass in temporal modulations and low-pass in spectral modulations: temporal modulations from 1 to 7 Hz and spectral modulations <1 cycles/kHz were the most important. We evaluated the importance of spectrotemporal modulations for vocal gender identification and found a different region of interest: removing spectral modulations between 3 and 7 cycles/kHz significantly increases gender misidentifications of female speakers. The determination of the speech MTF furnishes an additional method for producing speech signals with reduced bandwidth but high intelligibility. Such compression could be used for audio applications such as file compression or noise removal and for clinical applications such as signal processing for cochlear implants. PMID:19266016

  18. Speech vs. singing: infants choose happier sounds.

    PubMed

    Corbeil, Marieve; Trehub, Sandra E; Peretz, Isabelle

    2013-01-01

    Infants prefer speech to non-vocal sounds and to non-human vocalizations, and they prefer happy-sounding speech to neutral speech. They also exhibit an interest in singing, but there is little knowledge of their relative interest in speech and singing. The present study explored infants' attention to unfamiliar audio samples of speech and singing. In Experiment 1, infants 4-13 months of age were exposed to happy-sounding infant-directed speech vs. hummed lullabies by the same woman. They listened significantly longer to the speech, which had considerably greater acoustic variability and expressiveness, than to the lullabies. In Experiment 2, infants of comparable age who heard the lyrics of a Turkish children's song spoken vs. sung in a joyful/happy manner did not exhibit differential listening. Infants in Experiment 3 heard the happily sung lyrics of the Turkish children's song vs. a version that was spoken in an adult-directed or affectively neutral manner. They listened significantly longer to the sung version. Overall, happy voice quality rather than vocal mode (speech or singing) was the principal contributor to infant attention, regardless of age. PMID:23805119

  19. Speech vs. singing: infants choose happier sounds

    PubMed Central

    Corbeil, Marieve; Trehub, Sandra E.; Peretz, Isabelle

    2013-01-01

    Infants prefer speech to non-vocal sounds and to non-human vocalizations, and they prefer happy-sounding speech to neutral speech. They also exhibit an interest in singing, but there is little knowledge of their relative interest in speech and singing. The present study explored infants' attention to unfamiliar audio samples of speech and singing. In Experiment 1, infants 4–13 months of age were exposed to happy-sounding infant-directed speech vs. hummed lullabies by the same woman. They listened significantly longer to the speech, which had considerably greater acoustic variability and expressiveness, than to the lullabies. In Experiment 2, infants of comparable age who heard the lyrics of a Turkish children's song spoken vs. sung in a joyful/happy manner did not exhibit differential listening. Infants in Experiment 3 heard the happily sung lyrics of the Turkish children's song vs. a version that was spoken in an adult-directed or affectively neutral manner. They listened significantly longer to the sung version. Overall, happy voice quality rather than vocal mode (speech or singing) was the principal contributor to infant attention, regardless of age. PMID:23805119

  20. Brain-Computer Interfaces for Speech Communication

    PubMed Central

    Brumberg, Jonathan S.; Nieto-Castanon, Alfonso; Kennedy, Philip R.; Guenther, Frank H.

    2010-01-01

    This paper briefly reviews current silent speech methodologies for normal and disabled individuals. Current techniques utilizing electromyographic (EMG) recordings of vocal tract movements are useful for physically healthy individuals but fail for tetraplegic individuals who do not have accurate voluntary control over the speech articulators. Alternative methods utilizing EMG from other body parts (e.g., hand, arm, or facial muscles) or electroencephalography (EEG) can provide capable silent communication to severely paralyzed users, though current interfaces are extremely slow relative to normal conversation rates and require constant attention to a computer screen that provides visual feedback and/or cueing. We present a novel approach to the problem of silent speech via an intracortical microelectrode brain computer interface (BCI) to predict intended speech information directly from the activity of neurons involved in speech production. The predicted speech is synthesized and acoustically fed back to the user with a delay under 50 ms. We demonstrate that the Neurotrophic Electrode used in the BCI is capable of providing useful neural recordings for over 4 years, a necessary property for BCIs that need to remain viable over the lifespan of the user. Other design considerations include neural decoding techniques based on previous research involving BCIs for computer cursor or robotic arm control via prediction of intended movement kinematics from motor cortical signals in monkeys and humans. Initial results from a study of continuous speech production with instantaneous acoustic feedback show the BCI user was able to improve his control over an artificial speech synthesizer both within and across recording sessions. The success of this initial trial validates the potential of the intracortical microelectrode-based approach for providing a speech prosthesis that can allow much more rapid communication rates. PMID:20204164

  1. Intensive Speech and Language Therapy for Older Children with Cerebral Palsy: A Systems Approach

    ERIC Educational Resources Information Center

    Pennington, Lindsay; Miller, Nick; Robson, Sheila; Steen, Nick

    2010-01-01

    Aim: To investigate whether speech therapy using a speech systems approach to controlling breath support, phonation, and speech rate can increase the speech intelligibility of children with dysarthria and cerebral palsy (CP). Method: Sixteen children with dysarthria and CP participated in a modified time series design. Group characteristics were…

  2. Speech perception as an active cognitive process

    PubMed Central

    Heald, Shannon L. M.; Nusbaum, Howard C.

    2014-01-01

    One view of speech perception is that acoustic signals are transformed into representations for pattern matching to determine linguistic structure. This process can be taken as a statistical pattern-matching problem, assuming realtively stable linguistic categories are characterized by neural representations related to auditory properties of speech that can be compared to speech input. This kind of pattern matching can be termed a passive process which implies rigidity of processing with few demands on cognitive processing. An alternative view is that speech recognition, even in early stages, is an active process in which speech analysis is attentionally guided. Note that this does not mean consciously guided but that information-contingent changes in early auditory encoding can occur as a function of context and experience. Active processing assumes that attention, plasticity, and listening goals are important in considering how listeners cope with adverse circumstances that impair hearing by masking noise in the environment or hearing loss. Although theories of speech perception have begun to incorporate some active processing, they seldom treat early speech encoding as plastic and attentionally guided. Recent research has suggested that speech perception is the product of both feedforward and feedback interactions between a number of brain regions that include descending projections perhaps as far downstream as the cochlea. It is important to understand how the ambiguity of the speech signal and constraints of context dynamically determine cognitive resources recruited during perception including focused attention, learning, and working memory. Theories of speech perception need to go beyond the current corticocentric approach in order to account for the intrinsic dynamics of the auditory encoding of speech. In doing so, this may provide new insights into ways in which hearing disorders and loss may be treated either through augementation or therapy. PMID

  3. Neural Oscillations Carry Speech Rhythm through to Comprehension

    PubMed Central

    Peelle, Jonathan E.; Davis, Matthew H.

    2012-01-01

    A key feature of speech is the quasi-regular rhythmic information contained in its slow amplitude modulations. In this article we review the information conveyed by speech rhythm, and the role of ongoing brain oscillations in listeners’ processing of this content. Our starting point is the fact that speech is inherently temporal, and that rhythmic information conveyed by the amplitude envelope contains important markers for place and manner of articulation, segmental information, and speech rate. Behavioral studies demonstrate that amplitude envelope information is relied upon by listeners and plays a key role in speech intelligibility. Extending behavioral findings, data from neuroimaging – particularly electroencephalography (EEG) and magnetoencephalography (MEG) – point to phase locking by ongoing cortical oscillations to low-frequency information (~4–8 Hz) in the speech envelope. This phase modulation effectively encodes a prediction of when important events (such as stressed syllables) are likely to occur, and acts to increase sensitivity to these relevant acoustic cues. We suggest a framework through which such neural entrainment to speech rhythm can explain effects of speech rate on word and segment perception (i.e., that the perception of phonemes and words in connected speech is influenced by preceding speech rate). Neuroanatomically, acoustic amplitude modulations are processed largely bilaterally in auditory cortex, with intelligible speech resulting in differential recruitment of left-hemisphere regions. Notable among these is lateral anterior temporal cortex, which we propose functions in a domain-general fashion to support ongoing memory and integration of meaningful input. Together, the reviewed evidence suggests that low-frequency oscillations in the acoustic speech signal form the foundation of a rhythmic hierarchy supporting spoken language, mirrored by phase-locked oscillations in the human brain. PMID:22973251

  4. Segmentation of the speech signal based on changes in energy distribution in the spectrum

    NASA Astrophysics Data System (ADS)

    Jassem, W.; Kudzdela, H.; Domagala, P.

    1983-08-01

    A simple algorithm is proposed for automatic phonetic segmentation of the acoustic speech signal on the MERA 303 desk-top minicomputer. The algorithm is verified with Polish linguistic material spoken by two subjects. The proposed algorithm detects approximately 80 percent of the boundaries between enunciated segments correctly, a result no worse than that obtained using more complex methods. Speech recognition programs are discussed as speech perception models, and the nature of categorical perception of human speech sounds is examined.

  5. Speech research: Studies on the nature of speech, instrumentation for its investigation, and practical applications

    NASA Astrophysics Data System (ADS)

    Liberman, A. M.

    1982-03-01

    This report is one of a regular series on the status and progress of studies on the nature of speech, instrumentation for its investigation and practical applications. Manuscripts cover the following topics: Speech perception and memory coding in relation to reading ability; The use of orthographic structure by deaf adults: Recognition of finger-spelled letters; Exploring the information support for speech; The stream of speech; Using the acoustic signal to make inferences about place and duration of tongue-palate contact. Patterns of human interlimb coordination emerge from the the properties of nonlinear limit cycle oscillatory processes: Theory and data; Motor control: Which themes do we orchestrate? Exploring the nature of motor control in Down's syndrome; Periodicity and auditory memory: A pilot study; Reading skill and language skill: On the role of sign order and morphological structure in memory for American Sign Language sentences; Perception of nasal consonants with special reference to Catalan; and Speech production Characteristics of the hearing impaired.

  6. Cortical speech and non-speech discrimination in relation to cognitive measures in preschool children.

    PubMed

    Kuuluvainen, Soila; Alku, Paavo; Makkonen, Tommi; Lipsanen, Jari; Kujala, Teija

    2016-03-01

    Effective speech sound discrimination at preschool age is known to be a prerequisite for the development of language skills and later literacy acquisition. However, the speech specificity of cortical discrimination skills in small children is currently not known, as previous research has either studied speech functions without comparison with non-speech sounds, or used much simpler sounds such as harmonic or sinusoidal tones as control stimuli. We investigated the cortical discrimination of five syllable features (consonant, vowel, vowel duration, fundamental frequency, and intensity), covering both segmental and prosodic phonetic changes, and their acoustically matched non-speech counterparts in 63 6-year-old typically developed children, by using a multi-feature mismatch negativity (MMN) paradigm. Each of the five investigated features elicited a unique pattern of differentiating negativities: an early differentiating negativity, MMN, and a late differentiating negativity. All five studied features showed speech-related enhancement of at least one of these responses, suggesting experience-related neural commitment in both phonetic and prosodic speech processing. In addition, the cognitive performance and language skills of the children were tested extensively. The speech-related neural enhancement was positively associated with the level of performance in several neurocognitive tasks, indicating a relationship between successful establishment of cortical memory traces for speech and enhanced cognitive functioning. The results contribute to the understanding of typical developmental trajectories of linguistic vs. non-linguistic auditory skills, and provide a reference for future studies investigating deficits in language-related disorders at preschool age. PMID:26647120

  7. A multimodal spectral approach to characterize rhythm in natural speech.

    PubMed

    Alexandrou, Anna Maria; Saarinen, Timo; Kujala, Jan; Salmelin, Riitta

    2016-01-01

    Human utterances demonstrate temporal patterning, also referred to as rhythm. While simple oromotor behaviors (e.g., chewing) feature a salient periodical structure, conversational speech displays a time-varying quasi-rhythmic pattern. Quantification of periodicity in speech is challenging. Unimodal spectral approaches have highlighted rhythmic aspects of speech. However, speech is a complex multimodal phenomenon that arises from the interplay of articulatory, respiratory, and vocal systems. The present study addressed the question of whether a multimodal spectral approach, in the form of coherence analysis between electromyographic (EMG) and acoustic signals, would allow one to characterize rhythm in natural speech more efficiently than a unimodal analysis. The main experimental task consisted of speech production at three speaking rates; a simple oromotor task served as control. The EMG-acoustic coherence emerged as a sensitive means of tracking speech rhythm, whereas spectral analysis of either EMG or acoustic amplitude envelope alone was less informative. Coherence metrics seem to distinguish and highlight rhythmic structure in natural speech. PMID:26827019

  8. Lexical effects on speech production and intelligibility in Parkinson's disease

    NASA Astrophysics Data System (ADS)

    Chiu, Yi-Fang

    Individuals with Parkinson's disease (PD) often have speech deficits that lead to reduced speech intelligibility. Previous research provides a rich database regarding the articulatory deficits associated with PD including restricted vowel space (Skodda, Visser, & Schlegel, 2011) and flatter formant transitions (Tjaden & Wilding, 2004; Walsh & Smith, 2012). However, few studies consider the effect of higher level structural variables of word usage frequency and the number of similar sounding words (i.e. neighborhood density) on lower level articulation or on listeners' perception of dysarthric speech. The purpose of the study is to examine the interaction of lexical properties and speech articulation as measured acoustically in speakers with PD and healthy controls (HC) and the effect of lexical properties on the perception of their speech. Individuals diagnosed with PD and age-matched healthy controls read sentences with words that varied in word frequency and neighborhood density. Acoustic analysis was performed to compare second formant transitions in diphthongs, an indicator of the dynamics of tongue movement during speech production, across different lexical characteristics. Young listeners transcribed the spoken sentences and the transcription accuracy was compared across lexical conditions. The acoustic results indicate that both PD and HC speakers adjusted their articulation based on lexical properties but the PD group had significant reductions in second formant transitions compared to HC. Both groups of speakers increased second formant transitions for words with low frequency and low density, but the lexical effect is diphthong dependent. The change in second formant slope was limited in the PD group when the required formant movement for the diphthong is small. The data from listeners' perception of the speech by PD and HC show that listeners identified high frequency words with greater accuracy suggesting the use of lexical knowledge during the

  9. Melodic contour identification and sentence recognition using sung speech.

    PubMed

    Crew, Joseph D; Galvin, John J; Fu, Qian-Jie

    2015-09-01

    For bimodal cochlear implant users, acoustic and electric hearing has been shown to contribute differently to speech and music perception. However, differences in test paradigms and stimuli in speech and music testing can make it difficult to assess the relative contributions of each device. To address these concerns, the Sung Speech Corpus (SSC) was created. The SSC contains 50 monosyllable words sung over an octave range and can be used to test both speech and music perception using the same stimuli. Here SSC data are presented with normal hearing listeners and any advantage of musicianship is examined. PMID:26428838

  10. Melodic contour identification and sentence recognition using sung speech

    PubMed Central

    Crew, Joseph D.; Galvin, John J.; Fu, Qian-Jie

    2015-01-01

    For bimodal cochlear implant users, acoustic and electric hearing has been shown to contribute differently to speech and music perception. However, differences in test paradigms and stimuli in speech and music testing can make it difficult to assess the relative contributions of each device. To address these concerns, the Sung Speech Corpus (SSC) was created. The SSC contains 50 monosyllable words sung over an octave range and can be used to test both speech and music perception using the same stimuli. Here SSC data are presented with normal hearing listeners and any advantage of musicianship is examined. PMID:26428838

  11. Speech communications in noise

    NASA Technical Reports Server (NTRS)

    1984-01-01

    The physical characteristics of speech, the methods of speech masking measurement, and the effects of noise on speech communication are investigated. Topics include the speech signal and intelligibility, the effects of noise on intelligibility, the articulation index, and various devices for evaluating speech systems.

  12. Speech coding

    NASA Astrophysics Data System (ADS)

    Gersho, Allen

    1990-05-01

    Recent advances in algorithms and techniques for speech coding now permit high quality voice reproduction at remarkably low bit rates. The advent of powerful single-ship signal processors has made it cost effective to implement these new and sophisticated speech coding algorithms for many important applications in voice communication and storage. Some of the main ideas underlying the algorithms of major interest today are reviewed. The concept of removing redundancy by linear prediction is reviewed, first in the context of predictive quantization or DPCM. Then linear predictive coding, adaptive predictive coding, and vector quantization are discussed. The concepts of excitation coding via analysis-by-synthesis, vector sum excitation codebooks, and adaptive postfiltering are explained. The main idea of vector excitation coding (VXC) or code excited linear prediction (CELP) are presented. Finally low-delay VXC coding and phonetic segmentation for VXC are described.

  13. The evolution of speech: vision, rhythm, cooperation

    PubMed Central

    Ghazanfar, Asif A.; Takahashi, Daniel Y.

    2014-01-01

    A full account of human speech evolution must consider its multisensory, rhythmic, and cooperative characteristics. Humans, apes and monkeys recognize the correspondence between vocalizations and the associated facial postures and gain behavioral benefits from them. Some monkey vocalizations even have a speech-like acoustic rhythmicity, yet they lack the concomitant rhythmic facial motion that speech exhibits. We review data showing that facial expressions like lip-smacking may be an ancestral expression that was later linked to vocal output in order to produce rhythmic audiovisual speech. Finally, we argue that human vocal cooperation (turn-taking) may have arisen through a combination of volubility and prosociality, and provide comparative evidence from one species to support this hypothesis. PMID:25048821

  14. Measuring glottal activity during voiced speech using a tuned electromagnetic resonating collar sensor

    NASA Astrophysics Data System (ADS)

    Brown, D. R., III; Keenaghan, K.; Desimini, S.

    2005-11-01

    Non-acoustic speech sensors can be employed to obtain measurements of one or more aspects of the speech production process, such as glottal activity, even in the presence of background noise. These sensors have a long history of clinical applications and have also recently been applied to the problem of denoising speech signals recorded in acoustically noisy environments (Ng et al 2000 Proc. Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP) (Istanbul, Turkey) vol 1, pp 229-32). Recently, researchers developed a new non-acoustic speech sensor based primarily on a tuned electromagnetic resonator collar (TERC) (Brown et al 2004 Meas. Sci. Technol. 15 1291). The TERC sensor measures glottal activity by sensing small changes in the dielectric properties of the glottis that result from voiced speech. This paper builds on the seminal work in Brown et al (2004). The primary contributions of this paper are (i) a description of a new single-mode TERC sensor design addressing the comfort and complexity issues of the original sensor, (ii) a complete description of new external interface systems used to obtain long-duration recordings from the TERC sensor and (iii) more extensive experimental results and analysis for the single-mode TERC sensor including spectrograms of speech containing both voiced and unvoiced speech segments in quiet and acoustically noisy environments. The experimental results demonstrate that the single-mode TERC sensor is able to detect glottal activity up to the fourth harmonic and is also insensitive to acoustic background noise.

  15. The Auditory-Brainstem Response to Continuous, Non-repetitive Speech Is Modulated by the Speech Envelope and Reflects Speech Processing

    PubMed Central

    Reichenbach, Chagit S.; Braiman, Chananel; Schiff, Nicholas D.; Hudspeth, A. J.; Reichenbach, Tobias

    2016-01-01

    The auditory-brainstem response (ABR) to short and simple acoustical signals is an important clinical tool used to diagnose the integrity of the brainstem. The ABR is also employed to investigate the auditory brainstem in a multitude of tasks related to hearing, such as processing speech or selectively focusing on one speaker in a noisy environment. Such research measures the response of the brainstem to short speech signals such as vowels or words. Because the voltage signal of the ABR has a tiny amplitude, several hundred to a thousand repetitions of the acoustic signal are needed to obtain a reliable response. The large number of repetitions poses a challenge to assessing cognitive functions due to neural adaptation. Here we show that continuous, non-repetitive speech, lasting several minutes, may be employed to measure the ABR. Because the speech is not repeated during the experiment, the precise temporal form of the ABR cannot be determined. We show, however, that important structural features of the ABR can nevertheless be inferred. In particular, the brainstem responds at the fundamental frequency of the speech signal, and this response is modulated by the envelope of the voiced parts of speech. We accordingly introduce a novel measure that assesses the ABR as modulated by the speech envelope, at the fundamental frequency of speech and at the characteristic latency of the response. This measure has a high signal-to-noise ratio and can hence be employed effectively to measure the ABR to continuous speech. We use this novel measure to show that the ABR is weaker to intelligible speech than to unintelligible, time-reversed speech. The methods presented here can be employed for further research on speech processing in the auditory brainstem and can lead to the development of future clinical diagnosis of brainstem function. PMID:27303286

  16. Electrocorticographic representations of segmental features in continuous speech.

    PubMed

    Lotte, Fabien; Brumberg, Jonathan S; Brunner, Peter; Gunduz, Aysegul; Ritaccio, Anthony L; Guan, Cuntai; Schalk, Gerwin

    2015-01-01

    Acoustic speech output results from coordinated articulation of dozens of muscles, bones and cartilages of the vocal mechanism. While we commonly take the fluency and speed of our speech productions for granted, the neural mechanisms facilitating the requisite muscular control are not completely understood. Previous neuroimaging and electrophysiology studies of speech sensorimotor control has typically concentrated on speech sounds (i.e., phonemes, syllables and words) in isolation; sentence-length investigations have largely been used to inform coincident linguistic processing. In this study, we examined the neural representations of segmental features (place and manner of articulation, and voicing status) in the context of fluent, continuous speech production. We used recordings from the cortical surface [electrocorticography (ECoG)] to simultaneously evaluate the spatial topography and temporal dynamics of the neural correlates of speech articulation that may mediate the generation of hypothesized gestural or articulatory scores. We found that the representation of place of articulation involved broad networks of brain regions during all phases of speech production: preparation, execution and monitoring. In contrast, manner of articulation and voicing status were dominated by auditory cortical responses after speech had been initiated. These results provide a new insight into the articulatory and auditory processes underlying speech production in terms of their motor requirements and acoustic correlates. PMID:25759647

  17. Electrocorticographic representations of segmental features in continuous speech

    PubMed Central

    Lotte, Fabien; Brumberg, Jonathan S.; Brunner, Peter; Gunduz, Aysegul; Ritaccio, Anthony L.; Guan, Cuntai; Schalk, Gerwin

    2015-01-01

    Acoustic speech output results from coordinated articulation of dozens of muscles, bones and cartilages of the vocal mechanism. While we commonly take the fluency and speed of our speech productions for granted, the neural mechanisms facilitating the requisite muscular control are not completely understood. Previous neuroimaging and electrophysiology studies of speech sensorimotor control has typically concentrated on speech sounds (i.e., phonemes, syllables and words) in isolation; sentence-length investigations have largely been used to inform coincident linguistic processing. In this study, we examined the neural representations of segmental features (place and manner of articulation, and voicing status) in the context of fluent, continuous speech production. We used recordings from the cortical surface [electrocorticography (ECoG)] to simultaneously evaluate the spatial topography and temporal dynamics of the neural correlates of speech articulation that may mediate the generation of hypothesized gestural or articulatory scores. We found that the representation of place of articulation involved broad networks of brain regions during all phases of speech production: preparation, execution and monitoring. In contrast, manner of articulation and voicing status were dominated by auditory cortical responses after speech had been initiated. These results provide a new insight into the articulatory and auditory processes underlying speech production in terms of their motor requirements and acoustic correlates. PMID:25759647

  18. Speech and Communication Disorders

    MedlinePlus

    ... or understand speech. Causes include Hearing disorders and deafness Voice problems, such as dysphonia or those caused by cleft lip or palate Speech problems like stuttering Developmental disabilities Learning disorders Autism spectrum disorder Brain injury Stroke Some speech and ...

  19. Speech impairment (adult)

    MedlinePlus

    Language impairment; Impairment of speech; Inability to speak; Aphasia; Dysarthria; Slurred speech; Dysphonia voice disorders ... disorders develop gradually, but anyone can develop a speech and ... suddenly, usually in a trauma. APHASIA Alzheimer disease ...

  20. Speech impairment (adult)

    MedlinePlus

    Language impairment; Impairment of speech; Inability to speak; Aphasia; Dysarthria; Slurred speech; Dysphonia voice disorders ... Common speech and language disorders include: APHASIA Aphasia is ... understand or express spoken or written language. It commonly ...

  1. Extensions to the Speech Disorders Classification System (SDCS)

    PubMed Central

    Shriberg, Lawrence D.; Fourakis, Marios; Hall, Sheryl D.; Karlsson, Heather B.; Lohmeier, Heather L.; McSweeny, Jane L.; Potter, Nancy L.; Scheer-Cohen, Alison R.; Strand, Edythe A.; Tilkens, Christie M.; Wilson, David L.

    2010-01-01

    This report describes three extensions to a classification system for pediatric speech sound disorders termed the Speech Disorders Classification System (SDCS). Part I describes a classification extension to the SDCS to differentiate motor speech disorders from speech delay and to differentiate among three subtypes of motor speech disorders. Part II describes the Madison Speech Assessment Protocol (MSAP), an approximately two-hour battery of 25 measures that includes 15 speech tests and tasks. Part III describes the Competence, Precision, and Stability Analytics (CPSA) framework, a current set of approximately 90 perceptual- and acoustic-based indices of speech, prosody, and voice used to quantify and classify subtypes of Speech Sound Disorders (SSD). A companion paper, Shriberg, Fourakis, et al. (2010) provides reliability estimates for the perceptual and acoustic data reduction methods used in the SDCS. The agreement estimates in the companion paper support the reliability of SDCS methods and illustrate the complementary roles of perceptual and acoustic methods in diagnostic analyses of SSD of unknown origin. Examples of research using the extensions to the SDCS described in the present report include diagnostic findings for a sample of youth with motor speech disorders associated with galactosemia (Shriberg, Potter, & Strand, 2010) and a test of the hypothesis of apraxia of speech in a group of children with autism spectrum disorders (Shriberg, Paul, Black, & van Santen, 2010). All SDCS methods and reference databases running in the PEPPER (Programs to Examine Phonetic and Phonologic Evaluation Records; [Shriberg, Allen, McSweeny, & Wilson, 2001]) environment will be disseminated without cost when complete. PMID:20831378

  2. Headphone localization of speech stimuli

    NASA Technical Reports Server (NTRS)

    Begault, Durand R.; Wenzel, Elizabeth M.

    1991-01-01

    Recently, three dimensional acoustic display systems have been developed that synthesize virtual sound sources over headphones based on filtering by Head-Related Transfer Functions (HRTFs), the direction-dependent spectral changes caused primarily by the outer ears. Here, 11 inexperienced subjects judged the apparent spatial location of headphone-presented speech stimuli filtered with non-individualized HRTFs. About half of the subjects 'pulled' their judgements toward either the median or the lateral-vertical planes, and estimates were almost always elevated. Individual differences were pronounced for the distance judgements; 15 to 46 percent of stimuli were heard inside the head with the shortest estimates near the median plane. The results infer that most listeners can obtain useful azimuth information from speech stimuli filtered by nonindividualized RTFs. Measurements of localization error and reversal rates are comparable with a previous study that used broadband noise stimuli.

  3. Production and perception of clear speech in Croatian and English

    NASA Astrophysics Data System (ADS)

    Smiljanić, Rajka; Bradlow, Ann R.

    2005-09-01

    Previous research has established that naturally produced English clear speech is more intelligible than English conversational speech. The major goal of this paper was to establish the presence of the clear speech effect in production and perception of a language other than English, namely Croatian. A systematic investigation of the conversational-to-clear speech transformations across languages with different phonological properties (e.g., large versus small vowel inventory) can provide a window into the interaction of general auditory-perceptual and phonological, structural factors that contribute to the high intelligibility of clear speech. The results of this study showed that naturally produced clear speech is a distinct, listener-oriented, intelligibility-enhancing mode of speech production in both languages. Furthermore, the acoustic-phonetic features of the conversational-to-clear speech transformation revealed cross-language similarities in clear speech production strategies. In both languages, talkers exhibited a decrease in speaking rate and an increase in pitch range, as well as an expansion of the vowel space. Notably, the findings of this study showed equivalent vowel space expansion in English and Croatian clear speech, despite the difference in vowel inventory size across the two languages, suggesting that the extent of vowel contrast enhancement in hyperarticulated clear speech is independent of vowel inventory size.

  4. Combined Electric and Contralateral Acoustic Hearing: Word and Sentence Recognition with Bimodal Hearing

    ERIC Educational Resources Information Center

    Gifford, Rene H.; Dorman, Michael F.; McKarns, Sharon A.; Spahr, Anthony J.

    2007-01-01

    Purpose: The authors assessed whether (a) a full-insertion cochlear implant would provide a higher level of speech understanding than bilateral low-frequency acoustic hearing, (b) contralateral acoustic hearing would add to the speech understanding provided by the implant, and (c) the level of performance achieved with electric stimulation plus…

  5. Lexical Information Drives Perceptual Learning of Distorted Speech: Evidence From the Comprehension of Noise-Vocoded Sentences

    ERIC Educational Resources Information Center

    Davis, Matthew H.; Johnsrude, Ingrid S.; Hervais-Adelman, Alexis; Taylor, Karen; McGettigan, Carolyn

    2005-01-01

    Speech comprehension is resistant to acoustic distortion in the input, reflecting listeners' ability to adjust perceptual processes to match the speech input. For noise-vocoded sentences, a manipulation that removes spectral detail from speech, listeners' reporting improved from near 0% to 70% correct over 30 sentences (Experiment 1). Learning was…

  6. Perception and the temporal properties of speech

    NASA Astrophysics Data System (ADS)

    Gordon, Peter C.

    1991-11-01

    Four experiments addressing the role of attention in phonetic perception are reported. The first experiment shows that the relative importance of two cues to the voicing distinction changes when subjects must perform an arithmetic distractor task at the same time as identifying a speech stimulus. The voice onset time cue loses phonetic significance when subjects are distracted, while the F0 onset frequency cue does not. The second experiment shows a similar pattern for two cues to the distinction between the vowels /i/ (as in 'beat') and /I/ (as in 'bit'). Together these experiments indicate that careful attention to speech perception is necessary for strong acoustic cues to achieve their full phonetic impact, while weaker acoustic cues achieve their full phonetic impact without close attention. Experiment 3 shows that this pattern is obtained when the distractor task places little demand on verbal short term memory. Experiment 4 provides a large data set for testing formal models of the role of attention in speech perception. Attention is shown to influence the signal to noise ratio in phonetic encoding. This principle is instantiated in a network model in which the role of attention is to reduce noise in the phonetic encoding of acoustic cues. Implications of this work for understanding speech perception and general theories of the role of attention in perception are discussed.

  7. Working memory and intelligibility of hearing-aid processed speech.

    PubMed

    Souza, Pamela E; Arehart, Kathryn H; Shen, Jing; Anderson, Melinda; Kates, James M

    2015-01-01

    Previous work suggested that individuals with low working memory capacity may be at a disadvantage in adverse listening environments, including situations with background noise or substantial modification of the acoustic signal. This study explored the relationship between patient factors (including working memory capacity) and intelligibility and quality of modified speech for older individuals with sensorineural hearing loss. The modification was created using a combination of hearing aid processing [wide-dynamic range compression (WDRC) and frequency compression (FC)] applied to sentences in multitalker babble. The extent of signal modification was quantified via an envelope fidelity index. We also explored the contribution of components of working memory by including measures of processing speed and executive function. We hypothesized that listeners with low working memory capacity would perform more poorly than those with high working memory capacity across all situations, and would also be differentially affected by high amounts of signal modification. Results showed a significant effect of working memory capacity for speech intelligibility, and an interaction between working memory, amount of hearing loss and signal modification. Signal modification was the major predictor of quality ratings. These data add to the literature on hearing-aid processing and working memory by suggesting that the working memory-intelligibility effects may be related to aggregate signal fidelity, rather than to the specific signal manipulation. They also suggest that for individuals with low working memory capacity, sensorineural loss may be most appropriately addressed with WDRC and/or FC parameters that maintain the fidelity of the signal envelope. PMID:25999874

  8. Working memory and intelligibility of hearing-aid processed speech

    PubMed Central

    Souza, Pamela E.; Arehart, Kathryn H.; Shen, Jing; Anderson, Melinda; Kates, James M.

    2015-01-01

    Previous work suggested that individuals with low working memory capacity may be at a disadvantage in adverse listening environments, including situations with background noise or substantial modification of the acoustic signal. This study explored the relationship between patient factors (including working memory capacity) and intelligibility and quality of modified speech for older individuals with sensorineural hearing loss. The modification was created using a combination of hearing aid processing [wide-dynamic range compression (WDRC) and frequency compression (FC)] applied to sentences in multitalker babble. The extent of signal modification was quantified via an envelope fidelity index. We also explored the contribution of components of working memory by including measures of processing speed and executive function. We hypothesized that listeners with low working memory capacity would perform more poorly than those with high working memory capacity across all situations, and would also be differentially affected by high amounts of signal modification. Results showed a significant effect of working memory capacity for speech intelligibility, and an interaction between working memory, amount of hearing loss and signal modification. Signal modification was the major predictor of quality ratings. These data add to the literature on hearing-aid processing and working memory by suggesting that the working memory-intelligibility effects may be related to aggregate signal fidelity, rather than to the specific signal manipulation. They also suggest that for individuals with low working memory capacity, sensorineural loss may be most appropriately addressed with WDRC and/or FC parameters that maintain the fidelity of the signal envelope. PMID:25999874

  9. Speech Motor Correlates of Treatment-Related Changes in Stuttering Severity and Speech Naturalness

    ERIC Educational Resources Information Center

    Tasko, Stephen M.; McClean, Michael D.; Runyan, Charles M.

    2007-01-01

    Participants of stuttering treatment programs provide an opportunity to evaluate persons who stutter as they demonstrate varying levels of fluency. Identifying physiologic correlates of altered fluency levels may lead to insights about mechanisms of speech disfluency. This study examined respiratory, orofacial kinematic and acoustic measures in 35…

  10. Cued speech for enhancing speech perception and first language development of children with cochlear implants.

    PubMed

    Leybaert, Jacqueline; LaSasso, Carol J

    2010-06-01

    Nearly 300 million people worldwide have moderate to profound hearing loss. Hearing impairment, if not adequately managed, has strong socioeconomic and affective impact on individuals. Cochlear implants have become the most effective vehicle for helping profoundly deaf children and adults to understand spoken language, to be sensitive to environmental sounds, and, to some extent, to listen to music. The auditory information delivered by the cochlear implant remains non-optimal for speech perception because it delivers a spectrally degraded signal and lacks some of the fine temporal acoustic structure. In this article, we discuss research revealing the multimodal nature of speech perception in normally-hearing individuals, with important inter-subject variability in the weighting of auditory or visual information. We also discuss how audio-visual training, via Cued Speech, can improve speech perception in cochlear implantees, particularly in noisy contexts. Cued Speech is a system that makes use of visual information from speechreading combined with hand shapes positioned in different places around the face in order to deliver completely unambiguous information about the syllables and the phonemes of spoken language. We support our view that exposure to Cued Speech before or after the implantation could be important in the aural rehabilitation process of cochlear implantees. We describe five lines of research that are converging to support the view that Cued Speech can enhance speech perception in individuals with cochlear implants. PMID:20724357

  11. Prediction Method of Speech Recognition Performance Based on HMM-based Speech Synthesis Technique

    NASA Astrophysics Data System (ADS)

    Terashima, Ryuta; Yoshimura, Takayoshi; Wakita, Toshihiro; Tokuda, Keiichi; Kitamura, Tadashi

    We describe an efficient method that uses a HMM-based speech synthesis technique as a test pattern generator for evaluating the word recognition rate. The recognition rates of each word and speaker can be evaluated by the synthesized speech by using this method. The parameter generation technique can be formulated as an algorithm that can determine the speech parameter vector sequence O by maximizing P(O¦Q,λ) given the model parameter λ and the state sequence Q, under a dynamic acoustic feature constraint. We conducted recognition experiments to illustrate the validity of the method. Approximately 100 speakers were used to train the speaker dependent models for the speech synthesis used in these experiments, and the synthetic speech was generated as the test patterns for the target speech recognizer. As a result, the recognition rate of the HMM-based synthesized speech shows a good correlation with the recognition rate of the actual speech. Furthermore, we find that our method can predict the speaker recognition rate with approximately 2% error on average. Therefore the evaluation of the speaker recognition rate will be performed automatically by using the proposed method.

  12. Cued Speech for Enhancing Speech Perception and First Language Development of Children With Cochlear Implants

    PubMed Central

    Leybaert, Jacqueline; LaSasso, Carol J.

    2010-01-01

    Nearly 300 million people worldwide have moderate to profound hearing loss. Hearing impairment, if not adequately managed, has strong socioeconomic and affective impact on individuals. Cochlear implants have become the most effective vehicle for helping profoundly deaf children and adults to understand spoken language, to be sensitive to environmental sounds, and, to some extent, to listen to music. The auditory information delivered by the cochlear implant remains non-optimal for speech perception because it delivers a spectrally degraded signal and lacks some of the fine temporal acoustic structure. In this article, we discuss research revealing the multimodal nature of speech perception in normally-hearing individuals, with important inter-subject variability in the weighting of auditory or visual information. We also discuss how audio-visual training, via Cued Speech, can improve speech perception in cochlear implantees, particularly in noisy contexts. Cued Speech is a system that makes use of visual information from speechreading combined with hand shapes positioned in different places around the face in order to deliver completely unambiguous information about the syllables and the phonemes of spoken language. We support our view that exposure to Cued Speech before or after the implantation could be important in the aural rehabilitation process of cochlear implantees. We describe five lines of research that are converging to support the view that Cued Speech can enhance speech perception in individuals with cochlear implants. PMID:20724357

  13. Speech research

    NASA Astrophysics Data System (ADS)

    1992-06-01

    Phonology is traditionally seen as the discipline that concerns itself with the building blocks of linguistic messages. It is the study of the structure of sound inventories of languages and of the participation of sounds in rules or processes. Phonetics, in contrast, concerns speech sounds as produced and perceived. Two extreme positions on the relationship between phonological messages and phonetic realizations are represented in the literature. One holds that the primary home for linguistic symbols, including phonological ones, is the human mind, itself housed in the human brain. The second holds that their primary home is the human vocal tract.

  14. Human speech articulator measurements using low power, 2GHz Homodyne sensors

    SciTech Connect

    Barnes, T; Burnett, G C; Holzrichter, J F

    1999-06-29

    Very low power, short-range microwave ''radar-like'' sensors can measure the motions and vibrations of internal human speech articulators as speech is produced. In these animate (and also in inanimate acoustic systems) microwave sensors can measure vibration information associated with excitation sources and other interfaces. These data, together with the corresponding acoustic data, enable the calculation of system transfer functions. This information appears to be useful for a surprisingly wide range of applications such as speech coding and recognition, speaker or object identification, speech and musical instrument synthesis, noise cancellation, and other applications.

  15. Automatic Speech Recognition Based on Electromyographic Biosignals

    NASA Astrophysics Data System (ADS)

    Jou, Szu-Chen Stan; Schultz, Tanja

    This paper presents our studies of automatic speech recognition based on electromyographic biosignals captured from the articulatory muscles in the face using surface electrodes. We develop a phone-based speech recognizer and describe how the performance of this recognizer improves by carefully designing and tailoring the extraction of relevant speech feature toward electromyographic signals. Our experimental design includes the collection of audibly spoken speech simultaneously recorded as acoustic data using a close-speaking microphone and as electromyographic signals using electrodes. Our experiments indicate that electromyographic signals precede the acoustic signal by about 0.05-0.06 seconds. Furthermore, we introduce articulatory feature classifiers, which had recently shown to improved classical speech recognition significantly. We describe that the classification accuracy of articulatory features clearly benefits from the tailored feature extraction. Finally, these classifiers are integrated into the overall decoding framework applying a stream architecture. Our final system achieves a word error rate of 29.9% on a 100-word recognition task.

  16. Speech recognition and understanding

    SciTech Connect

    Vintsyuk, T.K.

    1983-05-01

    This article discusses the automatic processing of speech signals with the aim of finding a sequence of works (speech recognition) or a concept (speech understanding) being transmitted by the speech signal. The goal of the research is to develop an automatic typewriter that will automatically edit and type text under voice control. A dynamic programming method is proposed in which all possible class signals are stored, after which the presented signal is compared to all the stored signals during the recognition phase. Topics considered include element-by-element recognition of words of speech, learning speech recognition, phoneme-by-phoneme speech recognition, the recognition of connected speech, understanding connected speech, and prospects for designing speech recognition and understanding systems. An application of the composition dynamic programming method for the solution of basic problems in the recognition and understanding of speech is presented.

  17. Voice Quality Modelling for Expressive Speech Synthesis

    PubMed Central

    Socoró, Joan Claudi

    2014-01-01

    This paper presents the perceptual experiments that were carried out in order to validate the methodology of transforming expressive speech styles using voice quality (VoQ) parameters modelling, along with the well-known prosody (F0, duration, and energy), from a neutral style into a number of expressive ones. The main goal was to validate the usefulness of VoQ in the enhancement of expressive synthetic speech in terms of speech quality and style identification. A harmonic plus noise model (HNM) was used to modify VoQ and prosodic parameters that were extracted from an expressive speech corpus. Perception test results indicated the improvement of obtained expressive speech styles using VoQ modelling along with prosodic characteristics. PMID:24587738

  18. "Perception of the speech code" revisited: Speech is alphabetic after all.

    PubMed

    Fowler, Carol A; Shankweiler, Donald; Studdert-Kennedy, Michael

    2016-03-01

    We revisit an article, "Perception of the Speech Code" (PSC), published in this journal 50 years ago (Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967) and address one of its legacies concerning the status of phonetic segments, which persists in theories of speech today. In the perspective of PSC, segments both exist (in language as known) and do not exist (in articulation or the acoustic speech signal). Findings interpreted as showing that speech is not a sound alphabet, but, rather, phonemes are encoded in the signal, coupled with findings that listeners perceive articulation, led to the motor theory of speech perception, a highly controversial legacy of PSC. However, a second legacy, the paradoxical perspective on segments has been mostly unquestioned. We remove the paradox by offering an alternative supported by converging evidence that segments exist in language both as known and as used. We support the existence of segments in both language knowledge and in production by showing that phonetic segments are articulatory and dynamic and that coarticulation does not eliminate them. We show that segments leave an acoustic signature that listeners can track. This suggests that speech is well-adapted to public communication in facilitating, not creating a barrier to, exchange of language forms. PMID:26301536

  19. Acoustic and Perceptual Characteristics of Vowels Produced during Simultaneous Communication

    ERIC Educational Resources Information Center

    Schiavetti, Nicholas; Metz, Dale Evan; Whitehead, Robert L.; Brown, Shannon; Borges, Janie; Rivera, Sara; Schultz, Christine

    2004-01-01

    This study investigated the acoustical and perceptual characteristics of vowels in speech produced during simultaneous communication (SC). Twelve normal hearing, experienced sign language users were recorded under SC and speech alone (SA) conditions speaking a set of sentences containing monosyllabic words designed for measurement of vowel…

  20. On-Line Acoustic and Semantic Interpretation of Talker Information

    ERIC Educational Resources Information Center

    Creel, Sarah C.; Tumlin, Melanie A.

    2011-01-01

    Recent work demonstrates that listeners utilize talker-specific information in the speech signal to inform real-time language processing. However, there are multiple representational levels at which this may take place. Listeners might use acoustic cues in the speech signal to access the talker's identity and information about what they tend to…

  1. Acoustic Correlates of Emphatic Stress in Central Catalan

    ERIC Educational Resources Information Center

    Nadeu, Marianna; Hualde, Jose Ignacio

    2012-01-01

    A common feature of public speech in Catalan is the placement of prominence on lexically unstressed syllables ("emphatic stress"). This paper presents an acoustic study of radio speech data. Instances of emphatic stress were perceptually identified. Within-word comparison between vowels with emphatic stress and vowels with primary lexical stress…

  2. Careers in Speech Communication.

    ERIC Educational Resources Information Center

    Speech Communication Association, New York, NY.

    Brief discussions in this pamphlet suggest educational and career opportunities in the following fields of speech communication: rhetoric, public address, and communication; theatre, drama, and oral interpretation; radio, television, and film; speech pathology and audiology; speech science, phonetics, and linguistics; and speech education.…

  3. Opportunities in Speech Pathology.

    ERIC Educational Resources Information Center

    Newman, Parley W.

    The importance of speech is discussed and speech pathology is described. Types of communication disorders considered are articulation disorders, aphasia, facial deformity, hearing loss, stuttering, delayed speech, voice disorders, and cerebral palsy; examples of five disorders are given. Speech pathology is investigated from these aspects: the…

  4. Vocal tract resonances in speech, singing, and playing musical instruments

    PubMed Central

    Wolfe, Joe; Garnier, Maëva; Smith, John

    2009-01-01

    In both the voice and musical wind instruments, a valve (vocal folds, lips, or reed) lies between an upstream and downstream duct: trachea and vocal tract for the voice; vocal tract and bore for the instrument. Examining the structural similarities and functional differences gives insight into their operation and the duct-valve interactions. In speech and singing, vocal tract resonances usually determine the spectral envelope and usually have a smaller influence on the operating frequency. The resonances are important not only for the phonemic information they produce, but also because of their contribution to voice timbre, loudness, and efficiency. The role of the tract resonances is usually different in brass and some woodwind instruments, where they modify and to some extent compete or collaborate with resonances of the instrument to control the vibration of a reed or the player’s lips, and∕or the spectrum of air flow into the instrument. We give a brief overview of oscillator mechanisms and vocal tract acoustics. We discuss recent and current research on how the acoustical resonances of the vocal tract are involved in singing and the playing of musical wind instruments. Finally, we compare techniques used in determining tract resonances and suggest some future developments. PMID:19649157

  5. Measuring phonetic convergence in speech production

    PubMed Central

    Pardo, Jennifer S.

    2013-01-01

    Phonetic convergence is defined as an increase in the similarity of acoustic-phonetic form between talkers. Previous research has demonstrated phonetic convergence both when a talker listens passively to speech and while talkers engage in social interaction. Much of this research has focused on a diverse array of acoustic-phonetic attributes, with fewer studies incorporating perceptual measures of phonetic convergence. The current paper reviews research on phonetic convergence in both non-interactive and conversational settings, and attempts to consolidate the diverse array of findings by proposing a paradigm that models perceptual and acoustic measures together. By modeling acoustic measures as predictors of perceived phonetic convergence, this paradigm has the potential to reconcile some of the diverse and inconsistent findings currently reported in the literature. PMID:23986738

  6. Top-down influences of written text on perceived clarity of degraded speech.

    PubMed

    Sohoglu, Ediz; Peelle, Jonathan E; Carlyon, Robert P; Davis, Matthew H

    2014-02-01

    An unresolved question is how the reported clarity of degraded speech is enhanced when listeners have prior knowledge of speech content. One account of this phenomenon proposes top-down modulation of early acoustic processing by higher-level linguistic knowledge. Alternative, strictly bottom-up accounts argue that acoustic information and higher-level knowledge are combined at a late decision stage without modulating early acoustic processing. Here we tested top-down and bottom-up accounts using written text to manipulate listeners' knowledge of speech content. The effect of written text on the reported clarity of noise-vocoded speech was most pronounced when text was presented before (rather than after) speech (Experiment 1). Fine-grained manipulation of the onset asynchrony between text and speech revealed that this effect declined when text was presented more than 120 ms after speech onset (Experiment 2). Finally, the influence of written text was found to arise from phonological (rather than lexical) correspondence between text and speech (Experiment 3). These results suggest that prior knowledge effects are time-limited by the duration of auditory echoic memory for degraded speech, consistent with top-down modulation of early acoustic processing by linguistic knowledge. PMID:23750966

  7. Assessing the acoustical climate of underground stations.

    PubMed

    Nowicka, Elzbieta

    2007-01-01

    Designing a proper acoustical environment--indispensable to speech recognition--in long enclosures is difficult. Although there is some literature on the acoustical conditions in underground stations, there is still little information about methods that make estimation of correct reverberation conditions possible. This paper discusses the assessment of the reverberation conditions of underground stations. A comparison of the measurements of reverberation time in Warsaw's underground stations with calculated data proves there are divergences between measured and calculated early decay time values, especially for long source-receiver distances. Rapid speech transmission index values for measured stations are also presented. PMID:18082025

  8. Models of speech synthesis.

    PubMed Central

    Carlson, R

    1995-01-01

    The term "speech synthesis" has been used for diverse technical approaches. In this paper, some of the approaches used to generate synthetic speech in a text-to-speech system are reviewed, and some of the basic motivations for choosing one method over another are discussed. It is important to keep in mind, however, that speech synthesis models are needed not just for speech generation but to help us understand how speech is created, or even how articulation can explain language structure. General issues such as the synthesis of different voices, accents, and multiple languages are discussed as special challenges facing the speech synthesis community. PMID:7479805

  9. Predicting the intelligibility of vocoded speech

    PubMed Central

    Chen, Fei; Loizou, Philipos C.

    2010-01-01

    Objectives The purpose of this study is to evaluate the performance of a number of speech intelligibility indices in terms of predicting the intelligibility of vocoded speech. Design Noise-corrupted sentences were vocoded in a total of 80 conditions, involving three different SNR levels (-5, 0 and 5 dB) and two types of maskers (steady-state noise and two-talker). Tone-vocoder simulations were used as well as simulations of combined electric-acoustic stimulation (EAS). The vocoded sentences were presented to normal-hearing listeners for identification, and the resulting intelligibility scores were used to assess the correlation of various speech intelligibility measures. These included measures designed to assess speech intelligibility, including the speech-transmission index (STI) and articulation index (AI) based measures, as well as distortions in hearing aids (e.g., coherence-based measures). These measures employed primarily either the temporal-envelope or the spectral-envelope information in the prediction model. The underlying hypothesis in the present study is that measures that assess temporal envelope distortions, such as those based on the speech-transmission index, should correlate highly with the intelligibility of vocoded speech. This is based on the fact that vocoder simulations preserve primarily envelope information, similar to the processing implemented in current cochlear implant speech processors. Similarly, it is hypothesized that measures such as the coherence-based index that assess the distortions present in the spectral envelope could also be used to model the intelligibility of vocoded speech. Results Of all the intelligibility measures considered, the coherence-based and the STI-based measures performed the best. High correlations (r=0.9-0.96) were maintained with the coherence-based measures in all noisy conditions. The highest correlation obtained with the STI-based measure was 0.92, and that was obtained when high modulation rates (100

  10. The Effects of Simulated Stuttering and Prolonged Speech on the Neural Activation Patterns of Stuttering and Nonstuttering Adults

    ERIC Educational Resources Information Center

    De Nil, Luc F.; Beal, Deryk S.; Lafaille, Sophie J.; Kroll, Robert M.; Crawley, Adrian P.; Gracco, Vincent L.

    2008-01-01

    Functional magnetic resonance imaging was used to investigate the neural correlates of passive listening, habitual speech and two modified speech patterns (simulated stuttering and prolonged speech) in stuttering and nonstuttering adults. Within-group comparisons revealed increased right hemisphere biased activation of speech-related regions…

  11. Comparison of a short-time speech-based intelligibility metric to the speech transmission index and intelligibility dataa

    PubMed Central

    Payton, Karen L.; Shrestha, Mona

    2013-01-01

    Several algorithms have been shown to generate a metric corresponding to the Speech Transmission Index (STI) using speech as a probe stimulus [e.g., Goldsworthy and Greenberg, J. Acoust. Soc. Am. 116, 3679–3689 (2004)]. The time-domain approaches work well on long speech segments and have the added potential to be used for short-time analysis. This study investigates the performance of the Envelope Regression (ER) time-domain STI method as a function of window length, in acoustically degraded environments with multiple talkers and speaking styles. The ER method is compared with a short-time Theoretical STI, derived from octave-band signal-to-noise ratios and reverberation times. For windows as short as 0.3 s, the ER method tracks short-time Theoretical STI changes in stationary speech-shaped noise, fluctuating restaurant babble and stationary noise plus reverberation. The metric is also compared to intelligibility scores on conversational speech and speech articulated clearly but at normal speaking rates (Clear/Norm) in stationary noise. Correlation between the metric and intelligibility scores is high and, consistent with the subject scores, the metrics are higher for Clear/Norm speech than for conversational speech and higher for the first word in a sentence than for the last word. PMID:24180791

  12. Comparison of a short-time speech-based intelligibility metric to the speech transmission index and intelligibility data.

    PubMed

    Payton, Karen L; Shrestha, Mona

    2013-11-01

    Several algorithms have been shown to generate a metric corresponding to the Speech Transmission Index (STI) using speech as a probe stimulus [e.g., Goldsworthy and Greenberg, J. Acoust. Soc. Am. 116, 3679-3689 (2004)]. The time-domain approaches work well on long speech segments and have the added potential to be used for short-time analysis. This study investigates the performance of the Envelope Regression (ER) time-domain STI method as a function of window length, in acoustically degraded environments with multiple talkers and speaking styles. The ER method is compared with a short-time Theoretical STI, derived from octave-band signal-to-noise ratios and reverberation times. For windows as short as 0.3 s, the ER method tracks short-time Theoretical STI changes in stationary speech-shaped noise, fluctuating restaurant babble and stationary noise plus reverberation. The metric is also compared to intelligibility scores on conversational speech and speech articulated clearly but at normal speaking rates (Clear/Norm) in stationary noise. Correlation between the metric and intelligibility scores is high and, consistent with the subject scores, the metrics are higher for Clear/Norm speech than for conversational speech and higher for the first word in a sentence than for the last word. PMID:24180791

  13. Acheiving speech intelligibility at Paddington Station, London, UK

    NASA Astrophysics Data System (ADS)

    Goddard, Helen M.

    2002-11-01

    Paddington Station in London, UK is a large rail terminus for long distance electric and diesel powered trains. This magnificent train shed has four arched spans and is one of the remaining structural testaments to the architect Brunel. Given the current British and European legislative requirements for intelligible speech in public buildings AMS Acoustics were engaged to design an electroacoustic solution. In this paper we will outline how the significant problems of lively natural acoustics, the high operational noise levels and the strict aesthetic constraints were addressed. The resultant design is radical, using the most recent dsp controlled line array loudspeakers. In the paper we detail the acoustic modeling undertaken to predict both even direct sound pressure level coverage and STI. Further it presents the speech intelligibility measured upon handover of the new system. The design has proved to be successful and given the nature of the space, outstanding speech intelligibility is achieved.

  14. Chinese speech intelligibility and its relationship with the speech transmission index for children in elementary school classrooms.

    PubMed

    Peng, Jianxin; Yan, Nanjie; Wang, Dan

    2015-01-01

    The present study investigated Chinese speech intelligibility in 28 classrooms from nine different elementary schools in Guangzhou, China. The subjective Chinese speech intelligibility in the classrooms was evaluated with children in grades 2, 4, and 6 (7 to 12 years old). Acoustical measurements were also performed in these classrooms. Subjective Chinese speech intelligibility scores and objective speech intelligibility parameters, such as speech transmission index (STI), were obtained at each listening position for all tests. The relationship between subjective Chinese speech intelligibility scores and STI was revealed and analyzed. The effects of age on Chinese speech intelligibility scores were compared. Results indicate high correlations between subjective Chinese speech intelligibility scores and STI for grades 2, 4, and 6 children. Chinese speech intelligibility scores increase with increase of age under the same STI condition. The differences in scores among different age groups decrease as STI increases. To achieve 95% Chinese speech intelligibility scores, the STIs required for grades 2, 4, and 6 children are 0.75, 0.69, and 0.63, respectively. PMID:25618041

  15. Cue integration and context effects in speech: Evidence against speaking rate normalization

    PubMed Central

    Toscano, Joseph C.; McMurray, Bob

    2012-01-01

    Listeners are able to accurately recognize speech despite variation in acoustic cues across contexts, such as different speaking rates. Previous work has suggested that listeners use rate information (indicated by vowel length; VL) to modify their use of context-dependent acoustic cues, like voice-onset time (VOT), a primary cue to voicing. We present several experiments and simulations that offer an alternative explanation: listeners treat VL as an phonetic cue, rather than as an indicator of speaking rate, and rely on general cue-integration principles to combine information from VOT and VL. We demonstrate that listeners use the two cues independently, that VL is used in both naturally-produced and synthetic speech, and that effects of stimulus naturalness can be explained by a cue-integration model. Together, these results suggest that listeners do not interpret VOT relative to rate information provided by VL and that effects of speaking rate can be explained by more general cue-integration principles. PMID:22532385

  16. Robust coarticulatory modeling for continuous speech recognition

    NASA Astrophysics Data System (ADS)

    Schwartz, R.; Chow, Y. L.; Dunham, M. O.; Kimball, O.; Krasner, M.; Kubala, F.; Makhoul, J.; Price, P.; Roucos, S.

    1986-10-01

    The purpose of this project is to perform research into algorithms for the automatic recognition of individual sounds or phonemes in continuous speech. The algorithms developed should be appropriate for understanding large-vocabulary continuous speech input and are to be made available to the Strategic Computing Program for incorporation in a complete word recognition system. This report describes process to date in developing phonetic models that are appropriate for continuous speech recognition. In continuous speech, the acoustic realization of each phoneme depends heavily on the preceding and following phonemes: a process known as coarticulation. Thus, while there are relatively few phonemes in English (on the order of fifty or so), the number of possible different accoustic realizations is in the thousands. Therefore, to develop high-accuracy recognition algorithms, one may need to develop literally thousands of relatively distance phonetic models to represent the various phonetic context adequately. Developing a large number of models usually necessitates having a large amount of speech to provide reliable estimates of the model parameters. The major contributions of this work are the development of: (1) A simple but powerful formalism for modeling phonemes in context; (2) Robust training methods for the reliable estimation of model parameters by utilizing the available speech training data in a maximally effective way; and (3) Efficient search strategies for phonetic recognition while maintaining high recognition accuracy.

  17. Achieving Electric-Acoustic Benefit with a Modulated Tone

    PubMed Central

    Brown, Christopher A.; Bacon, Sid P.

    2013-01-01

    Objective When either real or simulated electric stimulation from a cochlear implant (CI) is combined with low-frequency acoustic stimulation (electric-acoustic stimulation [EAS]), speech intelligibility in noise can improve dramatically. We recently showed that a similar benefit to intelligibility can be observed in simulation when the low-frequency acoustic stimulation (low-pass target speech) is replaced with a tone that is modulated both in frequency with the fundamental frequency (F0) of the target talker and in amplitude with the amplitude envelope of the low-pass target speech (Brown & Bacon 2009). The goal of the current experiment was to examine the benefit of the modulated tone to intelligibility in CI patients. Design Eight CI users who had some residual acoustic hearing either in the implanted ear, the unimplanted ear, or both ears participated in this study. Target speech was combined with either multitalker babble or a single competing talker and presented to the implant. Stimulation to the acoustic region consisted of no signal, target speech, or a tone that was modulated in frequency to track the changes in the target talker’s F0 and in amplitude to track the amplitude envelope of target speech low-pass filtered at 500 Hz. Results All patients showed improvements in intelligibility over electric-only stimulation when either the tone or target speech was presented acoustically. The average improvement in intelligibility was 46 percentage points due to the tone and 55 percentage points due to target speech. Conclusions The results demonstrate that a tone carrying F0 and amplitude envelope cues of target speech can provide significant benefit to CI users and may lead to new technologies that could offer EAS benefit to many patients who would not benefit from current EAS approaches. PMID:19546806

  18. Speech pattern hearing aids for the profoundly hearing impaired: speech perception and auditory abilities.

    PubMed

    Faulkner, A; Ball, V; Rosen, S; Moore, B C; Fourcin, A

    1992-04-01

    A family of prototype speech pattern hearing aids for the profoundly hearing impaired has been compared to amplification. These aids are designed to extract acoustic speech patterns that convey essential phonetic contrasts, and to match this information to residual receptive abilities. In the first study, the presentation of voice fundamental frequency information from a wearable SiVo (sinusoidal voice) aid was compared to amplification in 11 profoundly deafened adults. Intonation reception was often better, and never worse, with fundamental frequency information. Four subjects scored more highly in audio-visual consonant identification with fundamental frequency information, five performed better with amplified speech, and two performed similarly under these two conditions. Five of the 11 subjects continued use of the SiVo aid after the tests were complete. A second study examined a laboratory prototype compound speech pattern aid, which encoded voice fundamental frequency, amplitude envelope, and the presence of voiceless excitation. In five profoundly deafened adults, performance was better in consonant identification when additional speech patterns were present than with fundamental frequency alone; the main advantage was derived from amplitude information. In both consonant identification and connected discourse tracking, performance with appropriately matched compound speech pattern signals was better than with amplified speech in three subjects, and similar to performance with amplified speech in the other two. In nine subjects, frequency discrimination, gap detection, and frequency selectivity were measured, and were compared to speech receptive abilities with both amplification and fundamental frequency presentation. The subjects who showed the greatest advantage from fundamental frequency presentation showed the greatest average hearing losses, and the least degree of frequency selectivity. Compound speech pattern aids appear to be more effective for some

  19. Speech prosody in cerebellar ataxia.

    PubMed

    Casper, Maureen A; Raphael, Lawrence J; Harris, Katherine S; Geibel, Jennifer M

    2007-01-01

    Persons with cerebellar ataxia exhibit changes in physical coordination and speech and voice production. Previously, these alterations of speech and voice production were described primarily via perceptual coordinates. In this study, the spatial-temporal properties of syllable production were examined in 12 speakers, six of whom were healthy speakers and six with ataxia. The speaking task was designed to elicit six different prosodic conditions and four contrastive prosodic events. Distinct prosodic patterns were elicited by the examiner for cerebellar patients and healthy speakers. These utterances were digitally recorded and analysed acoustically and statistically. The healthy speakers showed statistically significant differences among all four prosodic contrasts. The normal model described by the prosodic contrasts provided a sensitive index of cerebellar pathology with quantitative acoustic analyses. A significant interaction between subject groups and prosodic conditions revealed a compromised prosody in cerebellar patients. Significant differences were found for durational parameters, F0 and formant frequencies. The cerebellar speakers demonstrated different patterns of syllable lengthening and syllable reduction from that of the healthy speakers. PMID:17613097

  20. Emotion Identification Using Extremely Low Frequency Components of Speech Feature Contours

    PubMed Central

    Lin, Chang-Hong; Liao, Wei-Kai; Hsieh, Wen-Chi; Liao, Wei-Jiun

    2014-01-01

    The investigations of emotional speech identification can be divided into two main parts, features and classifiers. In this paper, how to extract an effective speech feature set for the emotional speech identification is addressed. In our speech feature set, we use not only statistical analysis of frame-based acoustical features, but also the approximated speech feature contours, which are obtained by extracting extremely low frequency components to speech feature contours. Furthermore, principal component analysis (PCA) is applied to the approximated speech feature contours so that an efficient representation of approximated contours can be derived. The proposed speech feature set is fed into support vector machines (SVMs) to perform multiclass emotion identification. The experimental results demonstrate the performance of the proposed system with 82.26% identification rate. PMID:24982991

  1. Hemispheric Asymmetries in Speech Perception: Sense, Nonsense and Modulations

    PubMed Central

    Rosen, Stuart; Wise, Richard J. S.; Chadha, Shabneet; Conway, Eleanor-Jayne; Scott, Sophie K.

    2011-01-01

    Background The well-established left hemisphere specialisation for language processing has long been claimed to be based on a low-level auditory specialization for specific acoustic features in speech, particularly regarding ‘rapid temporal processing’. Methodology A novel analysis/synthesis technique was used to construct a variety of sounds based on simple sentences which could be manipulated in spectro-temporal complexity, and whether they were intelligible or not. All sounds consisted of two noise-excited spectral prominences (based on the lower two formants in the original speech) which could be static or varying in frequency and/or amplitude independently. Dynamically varying both acoustic features based on the same sentence led to intelligible speech but when either or both acoustic features were static, the stimuli were not intelligible. Using the frequency dynamics from one sentence with the amplitude dynamics of another led to unintelligible sounds of comparable spectro-temporal complexity to the intelligible ones. Positron emission tomography (PET) was used to compare which brain regions were active when participants listened to the different sounds. Conclusions Neural activity to spectral and amplitude modulations sufficient to support speech intelligibility (without actually being intelligible) was seen bilaterally, with a right temporal lobe dominance. A left dominant response was seen only to intelligible sounds. It thus appears that the left hemisphere specialisation for speech is based on the linguistic properties of utterances, not on particular acoustic features. PMID:21980349

  2. Psychoacoustic cues to emotion in speech prosody and music.

    PubMed

    Coutinho, Eduardo; Dibben, Nicola

    2013-01-01

    There is strong evidence of shared acoustic profiles common to the expression of emotions in music and speech, yet relatively limited understanding of the specific psychoacoustic features involved. This study combined a controlled experiment and computational modelling to investigate the perceptual codes associated with the expression of emotion in the acoustic domain. The empirical stage of the study provided continuous human ratings of emotions perceived in excerpts of film music and natural speech samples. The computational stage created a computer model that retrieves the relevant information from the acoustic stimuli and makes predictions about the emotional expressiveness of speech and music close to the responses of human subjects. We show that a significant part of the listeners' second-by-second reported emotions to music and speech prosody can be predicted from a set of seven psychoacoustic features: loudness, tempo/speech rate, melody/prosody contour, spectral centroid, spectral flux, sharpness, and roughness. The implications of these results are discussed in the context of cross-modal similarities in the communication of emotion in the acoustic domain. PMID:23057507

  3. Talker variability in audio-visual speech perception

    PubMed Central

    Heald, Shannon L. M.; Nusbaum, Howard C.

    2014-01-01

    A change in talker is a change in the context for the phonetic interpretation of acoustic patterns of speech. Different talkers have different mappings between acoustic patterns and phonetic categories and listeners need to adapt to these differences. Despite this complexity, listeners are adept at comprehending speech in multiple-talker contexts, albeit at a slight but measurable performance cost (e.g., slower recognition). So far, this talker variability cost has been demonstrated only in audio-only speech. Other research in single-talker contexts have shown, however, that when listeners are able to see a talker’s face, speech recognition is improved under adverse listening (e.g., noise or distortion) conditions that can increase uncertainty in the mapping between acoustic patterns and phonetic categories. Does seeing a talker’s face reduce the cost of word recognition in multiple-talker contexts? We used a speeded word-monitoring task in which listeners make quick judgments about target word recognition in single- and multiple-talker contexts. Results show faster recognition performance in single-talker conditions compared to multiple-talker conditions for both audio-only and audio-visual speech. However, recognition time in a multiple-talker context was slower in the audio-visual condition compared to audio-only condition. These results suggest that seeing a talker’s face during speech perception may slow recognition by increasing the importance of talker identification, signaling to the listener a change in talker has occurred. PMID:25076919

  4. Acoustic Characteristics of Simulated Respiratory-Induced Vocal Tremor

    ERIC Educational Resources Information Center

    Lester, Rosemary A.; Story, Brad H.

    2013-01-01

    Purpose: The purpose of this study was to investigate the relation of respiratory forced oscillation to the acoustic characteristics of vocal tremor. Method: Acoustical analyses were performed to determine the characteristics of the intensity and fundamental frequency (F[subscript 0]) for speech samples obtained by Farinella, Hixon, Hoit, Story,…

  5. Acoustic Neuroma

    MedlinePlus

    An acoustic neuroma is a benign tumor that develops on the nerve that connects the ear to the brain. ... can press against the brain, becoming life-threatening. Acoustic neuroma can be difficult to diagnose, because the ...

  6. Speech research directions

    SciTech Connect

    Atal, B.S.; Rabiner, L.R.

    1986-09-01

    This paper presents an overview of the current activities in speech research. The authors discuss the state of the art in speech coding, text-to-speech synthesis, speech recognition, and speaker recognition. In the speech coding area, current algorithms perform well at bit rates down to 9.6 kb/s, and the research is directed at bringing the rate for high-quality speech coding down to 2.4 kb/s. In text-to-speech synthesis, what we currently are able to produce is very intelligible but not yet completely natural. Current research aims at providing higher quality and intelligibility to the synthetic speech that these systems produce. Finally, today's systems for speech and speaker recognition provide excellent performance on limited tasks; i.e., limited vocabulary, modest syntax, small talker populations, constrained inputs, etc.

  7. Analysis of Speech Disorders in Acute Pseudobulbar Palsy: a Longitudinal Study of a Patient with Lingual Paralysis.

    ERIC Educational Resources Information Center

    Leroy-Malherbe, V.; Chevrie-Muller, C.; Rigoard, M. T.; Arabia, C.

    1998-01-01

    This case report describes the case of a 52-year-old man with bilateral central lingual paralysis following a myocardial infarction. Analysis of speech recordings 15 days and 18 months after the attack were acoustically analyzed. The case demonstrates the usefulness of acoustic analysis to detect slight acoustic differences. (DB)

  8. Template based low data rate speech encoder

    NASA Astrophysics Data System (ADS)

    Fransen, Lawrence

    1993-09-01

    The 2400-b/s linear predictive coder (LPC) is currently being widely deployed to support tactical voice communication over narrowband channels. However, there is a need for lower-data-rate voice encoders for special applications: improved performance in high bit-error conditions, low-probability-of-intercept (LPI) voice communication, and narrowband integrated voice/data systems. An 800-b/s voice encoding algorithm is presented which is an extension of the 2400-b/s LPC. To construct template tables, speech samples of 420 speakers uttering 8 sentences each were excerpted from the Texas Instrument - Massachusetts Institute of Technology (TIMIT) Acoustic-Phonetic Speech Data Base. Speech intelligibility of the 800-b/s voice encoding algorithm measured by the diagnostic rhyme test (DRT) is 91.5 for three male speakers. This score compares favorably with the 2400-b/s LPC of a few years ago.

  9. Cortical asymmetries in speech perception: what’s wrong, what’s right, and what’s left?

    PubMed Central

    McGettigan, Carolyn; Scott, Sophie K.

    2014-01-01

    Over the last 30 years hemispheric asymmetries in speech perception have been construed within a domain general framework, where preferential processing of speech is due to left lateralized, non-linguistic acoustic sensitivities. A prominent version of this argument holds that the left temporal lobe selectively processes rapid/temporal information in sound. Acoustically, this is a poor characterization of speech and there has been little empirical support for a left-hemisphere selectivity for these cues. In sharp contrast, the right temporal lobe is demonstrably sensitive to specific acoustic properties. We suggest that acoustic accounts of speech sensitivities need to be informed by the nature of the speech signal, and that a simple domain general/specific dichotomy may be incorrect. PMID:22521208

  10. Acoustic Seal

    NASA Technical Reports Server (NTRS)

    Steinetz, Bruce M. (Inventor)

    2006-01-01

    The invention relates to a sealing device having an acoustic resonator. The acoustic resonator is adapted to create acoustic waveforms to generate a sealing pressure barrier blocking fluid flow from a high pressure area to a lower pressure area. The sealing device permits noncontacting sealing operation. The sealing device may include a resonant-macrosonic-synthesis (RMS) resonator.

  11. Acoustic seal

    NASA Technical Reports Server (NTRS)

    Steinetz, Bruce M. (Inventor)

    2006-01-01

    The invention relates to a sealing device having an acoustic resonator. The acoustic resonator is adapted to create acoustic waveforms to generate a sealing pressure barrier blocking fluid flow from a high pressure area to a lower pressure area. The sealing device permits noncontacting sealing operation. The sealing device may include a resonant-macrosonic-synthesis (RMS) resonator.

  12. Developmental Differences in the Effects of Acoustic Similarity on Memory Span.

    ERIC Educational Resources Information Center

    Hulme, Charles

    1984-01-01

    Investigates the effects of acoustic similarity on memory span in 112 children four to 10 years of age. Acoustic similarity had progressively more effect on recall with increasing age. Implications for current theories of short-term memory and its development and for the use of acoustic similarity as an indicator of speech coding are discussed.…

  13. Delayed Speech or Language Development

    MedlinePlus

    ... to Know About Zika & Pregnancy Delayed Speech or Language Development KidsHealth > For Parents > Delayed Speech or Language ... your child is right on schedule. Normal Speech & Language Development It's important to discuss early speech and ...

  14. Sensitivity to Structure in the Speech Signal by Children with Speech Sound Disorder and Reading Disability

    PubMed Central

    Johnson, Erin Phinney; Pennington, Bruce F.; Lowenstein, Joanna H.; Nittrouer, Susan

    2011-01-01

    Purpose Children with speech sound disorder (SSD) and reading disability (RD) have poor phonological awareness, a problem believed to arise largely from deficits in processing the sensory information in speech, specifically individual acoustic cues. However, such cues are details of acoustic structure. Recent theories suggest that listeners also need to be able to integrate those details to perceive linguistically relevant form. This study examined abilities of children with SSD, RD, and SSD+RD not only to process acoustic cues but also to recover linguistically relevant form from the speech signal. Method Ten- to 11-year-olds with SSD (n = 17), RD (n = 16), SSD+RD (n = 17), and Controls (n = 16) were tested to examine their sensitivity to (1) voice onset times (VOT); (2) spectral structure in fricative-vowel syllables; and (3) vocoded sentences. Results Children in all groups performed similarly with VOT stimuli, but children with disorders showed delays on other tasks, although the specifics of their performance varied. Conclusion Children with poor phonemic awareness not only lack sensitivity to acoustic details, but are also less able to recover linguistically relevant forms. This is contrary to one of the main current theories of the relation between spoken and written language development. PMID:21329941

  15. Modelling Errors in Automatic Speech Recognition for Dysarthric Speakers

    NASA Astrophysics Data System (ADS)

    Caballero Morales, Santiago Omar; Cox, Stephen J.

    2009-12-01

    Dysarthria is a motor speech disorder characterized by weakness, paralysis, or poor coordination of the muscles responsible for speech. Although automatic speech recognition (ASR) systems have been developed for disordered speech, factors such as low intelligibility and limited phonemic repertoire decrease speech recognition accuracy, making conventional speaker adaptation algorithms perform poorly on dysarthric speakers. In this work, rather than adapting the acoustic models, we model the errors made by the speaker and attempt to correct them. For this task, two techniques have been developed: (1) a set of "metamodels" that incorporate a model of the speaker's phonetic confusion matrix into the ASR process; (2) a cascade of weighted finite-state transducers at the confusion matrix, word, and language levels. Both techniques attempt to correct the errors made at the phonetic level and make use of a language model to find the best estimate of the correct word sequence. Our experiments show that both techniques outperform standard adaptation techniques.

  16. Speech articulator measurements using low power EM-wave sensors

    SciTech Connect

    Holzrichter, J.F.; Burnett, G.C.; Ng, L.C.; Lea, W.A.

    1998-01-01

    Very low power electromagnetic (EM) wave sensors are being used to measure speech articulator motions as speech is produced. Glottal tissue oscillations, jaw, tongue, soft palate, and other organs have been measured. Previously, microwave imaging (e.g., using radar sensors) appears not to have been considered for such monitoring. Glottal tissue movements detected by radar sensors correlate well with those obtained by established laboratory techniques, and have been used to estimate a voiced excitation function for speech processing applications. The noninvasive access, coupled with the small size, low power, and high resolution of these new sensors, permit promising research and development applications in speech production, communication disorders, speech recognition and related topics. {copyright} {ital 1998 Acoustical Society of America.}

  17. Mismatch Negativity with Visual-only and Audiovisual Speech

    PubMed Central

    Ponton, Curtis W.; Bernstein, Lynne E.; Auer, Edward T.

    2009-01-01

    The functional organization of cortical speech processing is thought to be hierarchical, increasing in complexity and proceeding from primary sensory areas centrifugally. The current study used the mismatch negativity (MMN) obtained with electrophysiology (EEG) to investigate the early latency period of visual speech processing under both visual-only (VO) and audiovisual (AV) conditions. Current density reconstruction (CDR) methods were used to model the cortical MMN generator locations. MMNs were obtained with VO and AV speech stimuli at early latencies (approximately 82-87 ms peak in time waveforms relative to the acoustic onset) and in regions of the right lateral temporal and parietal cortices. Latencies were consistent with bottom-up processing of the visible stimuli. We suggest that a visual pathway extracts phonetic cues from visible speech, and that previously reported effects of AV speech in classical early auditory areas, given later reported latencies, could be attributable to modulatory feedback from visual phonetic processing. PMID:19404730

  18. Neural mechanisms underlying auditory feedback control of speech

    PubMed Central

    Reilly, Kevin J.; Guenther, Frank H.

    2013-01-01

    The neural substrates underlying auditory feedback control of speech were investigated using a combination of functional magnetic resonance imaging (fMRI) and computational modeling. Neural responses were measured while subjects spoke monosyllabic words under two conditions: (i) normal auditory feedback of their speech, and (ii) auditory feedback in which the first formant frequency of their speech was unexpectedly shifted in real time. Acoustic measurements showed compensation to the shift within approximately 135 ms of onset. Neuroimaging revealed increased activity in bilateral superior temporal cortex during shifted feedback, indicative of neurons coding mismatches between expected and actual auditory signals, as well as right prefrontal and Rolandic cortical activity. Structural equation modeling revealed increased influence of bilateral auditory cortical areas on right frontal areas during shifted speech, indicating that projections from auditory error cells in posterior superior temporal cortex to motor correction cells in right frontal cortex mediate auditory feedback control of speech. PMID:18035557

  19. Linguistic aspects of speech synthesis.

    PubMed Central

    Allen, J

    1995-01-01

    The conversion of text to speech is seen as an analysis of the input text to obtain a common underlying linguistic description, followed by a synthesis of the output speech waveform from this fundamental specification. Hence, the comprehensive linguistic structure serving as the substrate for an utterance must be discovered by analysis from the text. The pronunciation of individual words in unrestricted text is determined by morphological analysis or letter-to-sound conversion, followed by specification of the word-level stress contour. In addition, many text character strings, such as titles, numbers, and acronyms, are abbreviations for normal words, which must be derived. To further refine these pronunciations and to discover the prosodic structure of the utterance, word part of speech must be computed, followed by a phrase-level parsing. From this structure the prosodic structure of the utterance can be determined, which is needed in order to specify the durational framework and fundamental frequency contour of the utterance. In discourse contexts, several factors such as the specification of new and old information, contrast, and pronominal reference can be used to further modify the prosodic specification. When the prosodic correlates have been computed and the segmental sequence is assembled, a complete input suitable for speech synthesis has been determined. Lastly, multilingual systems utilizing rule frameworks are mentioned, and future directions are characterized. PMID:7479807

  20. Linguistic aspects of speech synthesis.

    PubMed

    Allen, J

    1995-10-24

    The conversion of text to speech is seen as an analysis of the input text to obtain a common underlying linguistic description, followed by a synthesis of the output speech waveform from this fundamental specification. Hence, the comprehensive linguistic structure serving as the substrate for an utterance must be discovered by analysis from the text. The pronunciation of individual words in unrestricted text is determined by morphological analysis or letter-to-sound conversion, followed by specification of the word-level stress contour. In addition, many text character strings, such as titles, numbers, and acronyms, are abbreviations for normal words, which must be derived. To further refine these pronunciations and to discover the prosodic structure of the utterance, word part of speech must be computed, followed by a phrase-level parsing. From this structure the prosodic structure of the utterance can be determined, which is needed in order to specify the durational framework and fundamental frequency contour of the utterance. In discourse contexts, several factors such as the specification of new and old information, contrast, and pronominal reference can be used to further modify the prosodic specification. When the prosodic correlates have been computed and the segmental sequence is assembled, a complete input suitable for speech synthesis has been determined. Lastly, multilingual systems utilizing rule frameworks are mentioned, and future directions are characterized. PMID:7479807

  1. Prosody in Infant-Directed Speech Is Similar across Western and Traditional Cultures

    ERIC Educational Resources Information Center

    Broesch, Tanya L.; Bryant, Gregory A.

    2015-01-01

    When speaking to infants, adults typically alter the acoustic properties of their speech in a variety of ways compared with how they speak to other adults; for example, they use higher pitch, increased pitch range, more pitch variability, and slower speech rate. Research shows that these vocal changes happen similarly across industrialized…

  2. Speech Modification by a Deaf Child through Dynamic Orometric Modeling and Feedback.

    ERIC Educational Resources Information Center

    Fletcher, Samuel G.; Hasegawa, Akira

    1983-01-01

    A three and one-half-year-old profoundly deaf girl, whose physiologic, acoustic, and phonetic data indicated poor speech production, rapidly learned goal articulation gestures (positional and timing features of speech) after visual articulatory modeling and feedbck on tongue position with a microprocessor based instrument and video display.…

  3. Learning to Match Auditory and Visual Speech Cues: Social Influences on Acquisition of Phonological Categories

    ERIC Educational Resources Information Center

    Altvater-Mackensen, Nicole; Grossmann, Tobias

    2015-01-01

    Infants' language exposure largely involves face-to-face interactions providing acoustic and visual speech cues but also social cues that might foster language learning. Yet, both audiovisual speech information and social information have so far received little attention in research on infants' early language development. Using a preferential…

  4. Phonetic Category Cues in Adult-Directed Speech: Evidence from Three Languages with Distinct Vowel Characteristics

    ERIC Educational Resources Information Center

    Pons, Ferran; Biesanz, Jeremy C.; Kajikawa, Sachiyo; Fais, Laurel; Narayan, Chandan R.; Amano, Shigeaki; Werker, Janet F.

    2012-01-01

    Using an artificial language learning manipulation, Maye, Werker, and Gerken (2002) demonstrated that infants' speech sound categories change as a function of the distributional properties of the input. In a recent study, Werker et al. (2007) showed that Infant-directed Speech (IDS) input contains reliable acoustic cues that support distributional…

  5. Behavioral and Electrophysiological Evidence for Early and Automatic Detection of Phonological Equivalence in Variable Speech Inputs

    ERIC Educational Resources Information Center

    Kharlamov, Viktor; Campbell, Kenneth; Kazanina, Nina

    2011-01-01

    Speech sounds are not always perceived in accordance with their acoustic-phonetic content. For example, an early and automatic process of perceptual repair, which ensures conformity of speech inputs to the listener's native language phonology, applies to individual input segments that do not exist in the native inventory or to sound sequences that…

  6. Emotional and Physiological Responses of Fluent Listeners while Watching the Speech of Adults Who Stutter

    ERIC Educational Resources Information Center

    Guntupalli, Vijaya K.; Everhart, D. Erik; Kalinowski, Joseph; Nanjundeswaran, Chayadevie; Saltuklaroglu, Tim

    2007-01-01

    Background: People who stutter produce speech that is characterized by intermittent, involuntary part-word repetitions and prolongations. In addition to these signature acoustic manifestations, those who stutter often display repetitive and fixated behaviours outside the speech producing mechanism (e.g. in the head, arm, fingers, nares, etc.).…

  7. The Effect of Technology and Testing Environment on Speech Perception Using Telehealth with Cochlear Implant Recipients

    ERIC Educational Resources Information Center

    Goehring, Jenny L.; Hughes, Michelle L.; Baudhuin, Jacquelyn L.; Valente, Daniel L.; McCreery, Ryan W.; Diaz, Gina R.; Sanford, Todd; Harpster, Roger

    2012-01-01

    Purpose: In this study, the authors evaluated the effect of remote system and acoustic environment on speech perception via telehealth with cochlear implant recipients. Method: Speech perception was measured in quiet and in noise. Systems evaluated were Polycom visual concert (PVC) and a hybrid presentation system (HPS). Each system was evaluated…

  8. How may the basal ganglia contribute to auditory categorization and speech perception?

    PubMed Central

    Lim, Sung-Joo; Fiez, Julie A.; Holt, Lori L.

    2014-01-01

    Listeners must accomplish two complementary perceptual feats in extracting a message from speech. They must discriminate linguistically-relevant acoustic variability and generalize across irrelevant variability. Said another way, they must categorize speech. Since the mapping of acoustic variability is language-specific, these categories must be learned from experience. Thus, understanding how, in general, the auditory system acquires and represents categories can inform us about the toolbox of mechanisms available to speech perception. This perspective invites consideration of findings from cognitive neuroscience literatures outside of the speech domain as a means of constraining models of speech perception. Although neurobiological models of speech perception have mainly focused on cerebral cortex, research outside the speech domain is consistent with the possibility of significant subcortical contributions in category learning. Here, we review the functional role of one such structure, the basal ganglia. We examine research from animal electrophysiology, human neuroimaging, and behavior to consider characteristics of basal ganglia processing that may be advantageous for speech category learning. We also present emerging evidence for a direct role for basal ganglia in learning auditory categories in a complex, naturalistic task intended to model the incidental manner in which speech categories are acquired. To conclude, we highlight new research questions that arise in incorporating the broader neuroscience research literature in modeling speech perception, and suggest how understanding contributions of the basal ganglia can inform attempts to optimize training protocols for learning non-native speech categories in adulthood. PMID:25136291

  9. Objective Neural Indices of Speech-in-Noise Perception

    PubMed Central

    Anderson, Samira; Kraus, Nina

    2010-01-01

    Numerous factors contribute to understanding speech in noisy listening environments. There is a clinical need for objective biological assessment of auditory factors that contribute to the ability to hear speech in noise, factors that are free from the demands of attention and memory. Subcortical processing of complex sounds such as speech (auditory brainstem responses to speech and other complex stimuli [cABRs]) reflects the integrity of auditory function. Because cABRs physically resemble the evoking acoustic stimulus, they can provide objective indices of the neural transcription of specific acoustic elements (e.g., temporal, spectral) important for hearing speech. As with brainstem responses to clicks and tones, cABRs are clinically viable in individual subjects. Subcortical transcription of complex sounds is also clinically viable because of its known experience-dependence and role in auditory learning. Together with other clinical measures, cABRs can inform the underlying biological nature of listening and language disorders, inform treatment strategies, and provide an objective index of therapeutic outcomes. In this article, the authors review recent studies demonstrating the role of subcortical speech encoding in successful speech-in-noise perception. PMID:20724355

  10. Speech-related fatigue and fatigability in Parkinson's disease.

    PubMed

    Makashay, Matthew J; Cannard, Kevin R; Solomon, Nancy Pearl

    2015-01-01

    This study tested the assumption that speech is more susceptible to fatigue than normal in persons with dysarthria. After 1 h of speech-like exercises, participants with Parkinson's disease (PD) were expected to report increased perceptions of fatigue and demonstrate fatigability by producing less precise speech with corresponding acoustic changes compared to neurologically normal participants. Twelve adults with idiopathic PD and 13 neurologically normal adults produced sentences with multiple lingual targets before and after six 10-min blocks of fast syllable or word productions. Both groups reported increasing self-perceived fatigue over time, but trained listeners failed to detect systematic differences in articulatory precision or speech naturalness between sentences produced before and after speech-related exercises. Similarly, few systematic acoustic differences occurred. These findings do not support the hypothesis that dysarthric speakers are particularly susceptible to speech-related fatigue; instead, speech articulation generally appears to be resistant to fatigue induced by an hour of moderate functional exercises. PMID:25152085

  11. Speaker verification using combined acoustic and EM sensor signal processing

    SciTech Connect

    Ng, L C; Gable, T J; Holzrichter, J F

    2000-11-10

    Low Power EM radar-like sensors have made it possible to measure properties of the human speech production system in real-time, without acoustic interference. This greatly enhances the quality and quantity of information for many speech related applications. See Holzrichter, Burnett, Ng, and Lea, J. Acoustic. SOC. Am . 103 ( 1) 622 (1998). By combining the Glottal-EM-Sensor (GEMS) with the Acoustic-signals, we've demonstrated an almost 10 fold reduction in error rates from a speaker verification system experiment under a moderate noisy environment (-10dB).

  12. Low Bandwidth Vocoding using EM Sensor and Acoustic Signal Processing

    SciTech Connect

    Ng, L C; Holzrichter, J F; Larson, P E

    2001-10-25

    Low-power EM radar-like sensors have made it possible to measure properties of the human speech production system in real-time, without acoustic interference [1]. By combining these data with the corresponding acoustic signal, we've demonstrated an almost 10-fold bandwidth reduction in speech compression, compared to a standard 2.4 kbps LPC10 protocol used in the STU-III (Secure Terminal Unit, third generation) telephone. This paper describes a potential EM sensor/acoustic based vocoder implementation.

  13. Auditory plasticity and speech motor learning

    PubMed Central

    Nasir, Sazzad M.; Ostry, David J.

    2009-01-01

    Is plasticity in sensory and motor systems linked? Here, in the context of speech motor learning and perception, we test the idea sensory function is modified by motor learning and, in particular, that speech motor learning affects a speaker's auditory map. We assessed speech motor learning by using a robotic device that displaced the jaw and selectively altered somatosensory feedback during speech. We found that with practice speakers progressively corrected for the mechanical perturbation and after motor learning they also showed systematic changes in their perceptual classification of speech sounds. The perceptual shift was tied to motor learning. Individuals that displayed greater amounts of learning also showed greater perceptual change. Perceptual change was not observed in control subjects that produced the same movements, but in the absence of a force field, nor in subjects that experienced the force field but failed to adapt to the mechanical load. The perceptual effects observed here indicate the involvement of the somatosensory system in the neural processing of speech sounds and suggest that speech motor learning results in changes to auditory perceptual function. PMID:19884506

  14. Preparing an E-learning-based Speech Therapy (EST) efficacy study: Identifying suitable outcome measures to detect within-subject changes of speech intelligibility in dysarthric speakers.

    PubMed

    Beijer, L J; Rietveld, A C M; Ruiter, M B; Geurts, A C H

    2014-12-01

    We explored the suitability of perceptual and acoustic outcome measures to prepare E-learning based Speech Therapy (EST) efficacy tests regarding speech intelligibility in dysarthric speakers. Eight speakers with stroke (n=3), Parkinson's disease (n=4) and traumatic brain injury (n=1) participated in a 4 weeks EST trial. A repeated measures design was employed. Perceptual measures were (a) scale ratings for "ease of intelligibility" and "pleasantness" in continuous speech and (b) orthographic transcription scores of semantically unpredictable sentences. Acoustic measures were (c) "intensity during closure" (ΔIDC) in the occlusion phase of voiceless plosives, (d) changes in the vowel space of /a/, /e/ and /o/ and (e) the F0 variability in semantically unpredictable sentences. The only consistent finding concerned an increased (instead of the expected decreased) ΔIDC after EST, possibly caused by increased speech intensity without articulatory adjustments. The importance of suitable perceptual and acoustic measures for efficacy research is discussed. PMID:25025268

  15. Acceptance speech.

    PubMed

    Carpenter, M

    1994-01-01

    In Bangladesh, the assistant administrator of USAID gave an acceptance speech at an awards ceremony on the occasion of the 25th anniversary of oral rehydration solution (ORS). The ceremony celebrated the key role of the International Centre for Diarrhoeal Disease Research, Bangladesh (ICDDR,B) in the discovery of ORS. Its research activities over the last 25 years have brought ORS to every village in the world, preventing more than a million deaths each year. ORS is the most important medical advance of the 20th century. It is affordable and client-oriented, a true appropriate technology. USAID has provided more than US$ 40 million to ICDDR,B for diarrheal disease and measles research, urban and rural applied family planning and maternal and child health research, and vaccine development. ICDDR,B began as the relatively small Cholera Research Laboratory and has grown into an acclaimed international center for health, family planning, and population research. It leads the world in diarrheal disease research. ICDDR,B is the leading center for applied health research in South Asia. It trains public health specialists from around the world. The government of Bangladesh and the international donor community have actively joined in support of ICDDR,B. The government applies the results of ICDDR,B research to its programs to improve the health and well-being of Bangladeshis. ICDDR,B now also studies acute respiratory diseases and measles. Population and health comprise 1 of USAID's 4 strategic priorities, the others being economic growth, environment, and democracy, USAID promotes people's participation in these 4 areas and in the design and implementation of development projects. USAID is committed to the use and improvement of ORS and to complementary strategies that further reduce diarrhea-related deaths. Continued collaboration with a strong user perspective and integrated services will lead to sustainable development. PMID:12345470

  16. Speech disorders - children

    MedlinePlus

    ... deficiency; Voice disorders; Vocal disorders; Disfluency; Communication disorder - speech disorder ... The following tests can help diagnose speech disorders: Denver ... Peabody Picture Test Revised A hearing test may also be done.

  17. Speech and Communication Disorders

    MedlinePlus

    ... speech. Causes include Hearing disorders and deafness Voice problems, such as dysphonia or those caused by cleft lip or palate Speech problems like stuttering Developmental disabilities Learning disorders Autism spectrum ...

  18. Speech disorders - children

    MedlinePlus

    ... person has problems creating or forming the speech sounds needed to communicate with others. Three common speech ... are disorders in which a person repeats a sound, word, or phrase. Stuttering may be the most ...

  19. Acoustics, Noise, and Buildings. Revised Edition 1969.

    ERIC Educational Resources Information Center

    Parkin, P. H.; Humphreys, H. R.

    The fundamental physical concepts needed in any appreciation of acoustical problems are discussed by a scientist and an architect. The major areas of interest are--(1) the nature of sound, (2) the behavior of sound in rooms, (3) the design of rooms for speech, (4) the design of rooms for music, (5) the design of studios, (6) the design of high…

  20. Experiments on the Acoustics of Whistling.

    ERIC Educational Resources Information Center

    Shadle, Christine H.

    1983-01-01

    The acoustics of speech production allows the prediction of resonances for a given vocal tract configuration. Combining these predictions with aerodynamic theory developed for mechanical whistles makes theories about human whistling more complete. Several experiments involving human whistling are reported which support the theory and indicate new…

  1. A Non-Invasive Imaging Approach to Understanding Speech Changes following Deep Brain Stimulation in Parkinson’s Disease

    PubMed Central

    Narayana, Shalini; Jacks, Adam; Robin, Donald A.; Poizner, Howard; Zhang, Wei; Franklin, Crystal; Liotti, Mario; Vogel, Deanie; Fox, Peter T.

    2009-01-01

    Purpose To explore the use of non-invasive functional imaging and “virtual” lesion techniques to study the neural mechanisms underlying motor speech disorders in Parkinson’s disease. Here, we report the use of Positron Emission Tomography (PET) and transcranial magnetic stimulation (TMS) to explain exacerbated speech impairment following subthalamic nucleus deep brain stimulation (STN-DBS) in a patient with Parkinson’s disease. Method Perceptual and acoustic speech measures as well as cerebral blood flow (CBF) during speech as measured by PET were obtained with STN-DBS on and off. TMS was applied to a region in the speech motor network found to be abnormally active during DBS. Speech disruption by TMS was compared both perceptually and acoustically with that resulting from DBS on. Results Speech production was perceptually inferior and acoustically less contrastive during left STN stimulation compared to no stimulation. Increased neural activity in left dorsal premotor cortex (PMd) was observed during DBS on. “Virtual” lesioning of this region resulted in speech characterized by decreased speech segment duration, increased pause duration, and decreased intelligibility. Conclusions This case report provides evidence that impaired speech production accompanying STN-DBS may be resulting from unintended activation of PMd. Clinical application of functional imaging and TMS may lead to optimizing the delivery of STN-DBS to improve outcomes for speech production as well as general motor abilities. PMID:19029533

  2. Speech imagery recalibrates speech-perception boundaries.

    PubMed

    Scott, Mark

    2016-07-01

    The perceptual boundaries between speech sounds are malleable and can shift after repeated exposure to contextual information. This shift is known as recalibration. To date, the known inducers of recalibration are lexical (including phonotactic) information, lip-read information and reading. The experiments reported here are a proof-of-effect demonstration that speech imagery can also induce recalibration. PMID:27068050

  3. A pilot study about speech changes after partial Tucker's laryngectomy: the reduction of regressive voicing assimilation.

    PubMed

    Galant, C; Lagier, A; Vercasson, C; Santini, L; Dessi, P; Giovanni, A; Fakhry, N

    2015-12-01

    Partial frontolateral laryngectomy (PL) is performed to remove larynx tumor while preserving its main functions. So far, the speech changes induced by difficulties of voicing and the alterations to the vocal tract due to PL have been seldom addressed. The goal of our study was to make an acoustic analysis of regressive voicing assimilation (RVA) among patients after PL and to study the relationship with rates of speech. A retrospective study was conducted from January to April 2013. 11 subjects treated by partial frontolateral laryngectomy, and ten healthy subjects were included. Functional recordings of voice were analyzed and compared. For assimilation sequences we found a significant modification of voicing ratio in healthy subjects (p < 0.05) and PL patient at accelerated speaking rate only (p < 0.05). The vowel duration is significantly modified only for healthy subjects. For all subjects (PL patients and healthy) the duration of C1 consonant was not significantly modified. Our results highlight the presence of RVA in healthy subjects, but also in PL patients in the rapid speaking mode. PMID:26156226

  4. Speech and Language Delay

    MedlinePlus

    MENU Return to Web version Speech and Language Delay Overview How do I know if my child has speech delay? Every child develops at his or her ... of the same age, the problem may be speech delay. Your doctor may think your child has ...

  5. Talking Speech Input.

    ERIC Educational Resources Information Center

    Berliss-Vincent, Jane; Whitford, Gigi

    2002-01-01

    This article presents both the factors involved in successful speech input use and the potential barriers that may suggest that other access technologies could be more appropriate for a given individual. Speech input options that are available are reviewed and strategies for optimizing use of speech recognition technology are discussed. (Contains…

  6. Speech 7 through 12.

    ERIC Educational Resources Information Center

    Nederland Independent School District, TX.

    GRADES OR AGES: Grades 7 through 12. SUBJECT MATTER: Speech. ORGANIZATION AND PHYSICAL APPEARANCE: Following the foreward, philosophy and objectives, this guide presents a speech curriculum. The curriculum covers junior high and Speech I, II, III (senior high). Thirteen units of study are presented for junior high, each unit is divided into…

  7. The Tao of Speech.

    ERIC Educational Resources Information Center

    Dance, Frank E. X.

    1981-01-01

    Argues that the study of speech may present the characteristics of a "tao"--a path leading to an increase in humane being. Calls for speech teachers to profess the primacy of speech: "...the source of life of the human mind, the source of the compassion of the human spirit." (PD)

  8. Free Speech Yearbook 1978.

    ERIC Educational Resources Information Center

    Phifer, Gregg, Ed.

    The 17 articles in this collection deal with theoretical and practical freedom of speech issues. The topics include: freedom of speech in Marquette Park, Illinois; Nazis in Skokie, Illinois; freedom of expression in the Confederate States of America; Robert M. LaFollette's arguments for free speech and the rights of Congress; the United States…

  9. Auditory perception bias in speech imitation

    PubMed Central

    Postma-Nilsenová, Marie; Postma, Eric

    2013-01-01

    In an experimental study, we explored the role of auditory perception bias in vocal pitch imitation. Psychoacoustic tasks involving a missing fundamental indicate that some listeners are attuned to the relationship between all the higher harmonics present in the signal, which supports their perception of the fundamental frequency (the primary acoustic correlate of pitch). Other listeners focus on the lowest harmonic constituents of the complex sound signal which may hamper the perception of the fundamental. These two listener types are referred to as fundamental and spectral listeners, respectively. We hypothesized that the individual differences in speakers' capacity to imitate F0 found in earlier studies, may at least partly be due to the capacity to extract information about F0 from the speech signal. Participants' auditory perception bias was determined with a standard missing fundamental perceptual test. Subsequently, speech data were collected in a shadowing task with two conditions, one with a full speech signal and one with high-pass filtered speech above 300 Hz. The results showed that perception bias toward fundamental frequency was related to the degree of F0 imitation. The effect was stronger in the condition with high-pass filtered speech. The experimental outcomes suggest advantages for fundamental listeners in communicative situations where F0 imitation is used as a behavioral cue. Future research needs to determine to what extent auditory perception bias may be related to other individual properties known to improve imitation, such as phonetic talent. PMID:24204361

  10. Physical properties of modification of speech signal fragments

    NASA Astrophysics Data System (ADS)

    Gusev, Mikhail N.

    2004-04-01

    The methods used for modification of separate speech signals fragments in the process of speech synthesis by arbitrary text are described in this report. Three groups of sounds differ in the modification methods of frequency characteristics. Two groups of sounds differ in that they need different methods of duration changes. To modify the samples of a speaker's voice by the methods used it is necessary to make pre-marking, so called segementation. As variable speech fragments, the allophones are taken. The modification methods described allow form arbitrary speech successions in the wide intonation diapason on the basis of limited amount of the speaker's voice patterns.

  11. Classroom acoustics: Three pilot studies

    NASA Astrophysics Data System (ADS)

    Smaldino, Joseph J.

    2005-04-01

    This paper summarizes three related pilot projects designed to focus on the possible effects of classroom acoustics on fine auditory discrimination as it relates to language acquisition, especially English as a second language. The first study investigated the influence of improving the signal-to-noise ratio on the differentiation of English phonemes. The results showed better differentiation with better signal-to-noise ratio. The second studied speech perception in noise by young adults for whom English was a second language. The outcome indicated that the second language learners required a better signal-to-noise ratio to perform equally to the native language participants. The last study surveyed the acoustic conditions of preschool and day care classrooms, wherein first and second language learning occurs. The survey suggested an unfavorable acoustic environment for language learning.

  12. Classroom acoustics and intervention strategies to enhance the learning environment

    NASA Astrophysics Data System (ADS)

    Savage, Christal

    The classroom environment can be an acoustically difficult atmosphere for students to learn effectively, sometimes due in part to poor acoustical properties. Noise and reverberation have a substantial influence on room acoustics and subsequently intelligibility of speech. The American Speech-Language-Hearing Association (ASHA, 1995) developed minimal standards for noise and reverberation in a classroom for the purpose of providing an adequate listening environment. A lack of adherence to these standards may have undesirable consequences, which may lead to poor academic performance. The purpose of this capstone project is to develop a protocol to measure the acoustical properties of reverberation time and noise levels in elementary classrooms and present the educators with strategies to improve the learning environment. Noise level and reverberation will be measured and recorded in seven, unoccupied third grade classrooms in Lincoln Parish in North Louisiana. The recordings will occur at six specific distances in the classroom to simulate teacher and student positions. The recordings will be compared to the American Speech-Language-Hearing Association standards for noise and reverberation. If discrepancies are observed, the primary investigator will serve as an auditory consultant for the school and educators to recommend remediation and intervention strategies to improve these acoustical properties. The hypothesis of the study is that the classroom acoustical properties of noise and reverberation will exceed the American Speech-Language-Hearing Association standards; therefore, the auditory consultant will provide strategies to improve those acoustical properties.

  13. Crossmodal Source Identification in Speech Perception

    PubMed Central

    Lachs, Lorin; Pisoni, David B.

    2011-01-01

    Four experiments examined the nature of multisensory speech information. In Experiment 1, participants were asked to match heard voices with dynamic visual-alone video clips of speakers' articulating faces. This cross-modal matching task was used to examine whether vocal source matching can be accomplished across sensory modalities. The results showed that observers could match speaking faces and voices, indicating that information about the speaker was available for cross-modal comparisons. In a series of follow-up experiments, several stimulus manipulations were used to determine some of the critical acoustic and optic patterns necessary for specifying cross-modal source information. The results showed that cross-modal source information was not available in static visual displays of faces and was not contingent on a prominent acoustic cue to vocal identity (f0). Furthermore, cross-modal matching was not possible when the acoustic signal was temporally reversed. PMID:21544262

  14. Speaker verification system using acoustic data and non-acoustic data

    DOEpatents

    Gable, Todd J.; Ng, Lawrence C.; Holzrichter, John F.; Burnett, Greg C.

    2006-03-21

    A method and system for speech characterization. One embodiment includes a method for speaker verification which includes collecting data from a speaker, wherein the data comprises acoustic data and non-acoustic data. The data is used to generate a template that includes a first set of "template" parameters. The method further includes receiving a real-time identity claim from a claimant, and using acoustic data and non-acoustic data from the identity claim to generate a second set of parameters. The method further includes comparing the first set of parameters to the set of parameters to determine whether the claimant is the speaker. The first set of parameters and the second set of parameters include at least one purely non-acoustic parameter, including a non-acoustic glottal shape parameter derived from averaging multiple glottal cycle waveforms.

  15. Measurement of acoustical characteristics of mosques in Saudi Arabia.

    PubMed

    Abdou, Adel A

    2003-03-01

    The study of mosque acoustics, with regard to acoustical characteristics, sound quality for speech intelligibility, and other applicable acoustic criteria, has been largely neglected. In this study a background as to why mosques are designed as they are and how mosque design is influenced by worship considerations is given. In the study the acoustical characteristics of typically constructed contemporary mosques in Saudi Arabia have been investigated, employing a well-known impulse response. Extensive field measurements were taken in 21 representative mosques of different sizes and architectural features in order to characterize their acoustical quality and to identify the impact of air conditioning, ceiling fans, and sound reinforcement systems on their acoustics. Objective room-acoustic indicators such as reverberation time (RT) and clarity (C50) were measured. Background noise (BN) was assessed with and without the operation of air conditioning and fans. The speech transmission index (STI) was also evaluated with and without the operation of existing sound reinforcement systems. The existence of acoustical deficiencies was confirmed and quantified. The study, in addition to describing mosque acoustics, compares design goals to results obtained in practice and suggests acoustical target values for mosque design. The results show that acoustical quality in the investigated mosques deviates from optimum conditions when unoccupied, but is much better in the occupied condition. PMID:12656385

  16. Measurement of acoustical characteristics of mosques in Saudi Arabia

    NASA Astrophysics Data System (ADS)

    Abdou, Adel A.

    2003-03-01

    The study of mosque acoustics, with regard to acoustical characteristics, sound quality for speech intelligibility, and other applicable acoustic criteria, has been largely neglected. In this study a background as to why mosques are designed as they are and how mosque design is influenced by worship considerations is given. In the study the acoustical characteristics of typically constructed contemporary mosques in Saudi Arabia have been investigated, employing a well-known impulse response. Extensive field measurements were taken in 21 representative mosques of different sizes and architectural features in order to characterize their acoustical quality and to identify the impact of air conditioning, ceiling fans, and sound reinforcement systems on their acoustics. Objective room-acoustic indicators such as reverberation time (RT) and clarity (C50) were measured. Background noise (BN) was assessed with and without the operation of air conditioning and fans. The speech transmission index (STI) was also evaluated with and without the operation of existing sound reinforcement systems. The existence of acoustical deficiencies was confirmed and quantified. The study, in addition to describing mosque acoustics, compares design goals to results obtained in practice and suggests acoustical target values for mosque design. The results show that acoustical quality in the investigated mosques deviates from optimum conditions when unoccupied, but is much better in the occupied condition.

  17. Neurophysiological evidence that musical training influences the recruitment of right hemispheric homologues for speech perception

    PubMed Central

    Jantzen, McNeel G.; Howe, Bradley M.; Jantzen, Kelly J.

    2014-01-01

    Musicians have a more accurate temporal and tonal representation of auditory stimuli than their non-musician counterparts (Musacchia et al., 2007; Parbery-Clark et al., 2009a; Zendel and Alain, 2009; Kraus and Chandrasekaran, 2010). Musicians who are adept at the production and perception of music are also more sensitive to key acoustic features of speech such as voice onset timing and pitch. Together, these data suggest that musical training may enhance the processing of acoustic information for speech sounds. In the current study, we sought to provide neural evidence that musicians process speech and music in a similar way. We hypothesized that for musicians, right hemisphere areas traditionally associated with music are also engaged for the processing of speech sounds. In contrast we predicted that in non-musicians processing of speech sounds would be localized to traditional left hemisphere language areas. Speech stimuli differing in voice onset time was presented using a dichotic listening paradigm. Subjects either indicated aural location for a specified speech sound or identified a specific speech sound from a directed aural location. Musical training effects and organization of acoustic features were reflected by activity in source generators of the P50. This included greater activation of right middle temporal gyrus and superior temporal gyrus in musicians. The findings demonstrate recruitment of right hemisphere in musicians for discriminating speech sounds and a putative broadening of their language network. Musicians appear to have an increased sensitivity to acoustic features and enhanced selective attention to temporal features of speech that is facilitated by musical training and supported, in part, by right hemisphere homologues of established speech processing regions of the brain. PMID:24624107

  18. Neurophysiological evidence that musical training influences the recruitment of right hemispheric homologues for speech perception.

    PubMed

    Jantzen, McNeel G; Howe, Bradley M; Jantzen, Kelly J

    2014-01-01

    Musicians have a more accurate temporal and tonal representation of auditory stimuli than their non-musician counterparts (Musacchia et al., 2007; Parbery-Clark et al., 2009a; Zendel and Alain, 2009; Kraus and Chandrasekaran, 2010). Musicians who are adept at the production and perception of music are also more sensitive to key acoustic features of speech such as voice onset timing and pitch. Together, these data suggest that musical training may enhance the processing of acoustic information for speech sounds. In the current study, we sought to provide neural evidence that musicians process speech and music in a similar way. We hypothesized that for musicians, right hemisphere areas traditionally associated with music are also engaged for the processing of speech sounds. In contrast we predicted that in non-musicians processing of speech sounds would be localized to traditional left hemisphere language areas. Speech stimuli differing in voice onset time was presented using a dichotic listening paradigm. Subjects either indicated aural location for a specified speech sound or identified a specific speech sound from a directed aural location. Musical training effects and organization of acoustic features were reflected by activity in source generators of the P50. This included greater activation of right middle temporal gyrus and superior temporal gyrus in musicians. The findings demonstrate recruitment of right hemisphere in musicians for discriminating speech sounds and a putative broadening of their language network. Musicians appear to have an increased sensitivity to acoustic features and enhanced selective attention to temporal features of speech that is facilitated by musical training and supported, in part, by right hemisphere homologues of established speech processing regions of the brain. PMID:24624107

  19. Topological Acoustics

    NASA Astrophysics Data System (ADS)

    Yang, Zhaoju; Gao, Fei; Shi, Xihang; Lin, Xiao; Gao, Zhen; Chong, Yidong; Zhang, Baile

    2015-03-01

    The manipulation of acoustic wave propagation in fluids has numerous applications, including some in everyday life. Acoustic technologies frequently develop in tandem with optics, using shared concepts such as waveguiding and metamedia. It is thus noteworthy that an entirely novel class of electromagnetic waves, known as "topological edge states," has recently been demonstrated. These are inspired by the electronic edge states occurring in topological insulators, and possess a striking and technologically promising property: the ability to travel in a single direction along a surface without backscattering, regardless of the existence of defects or disorder. Here, we develop an analogous theory of topological fluid acoustics, and propose a scheme for realizing topological edge states in an acoustic structure containing circulating fluids. The phenomenon of disorder-free one-way sound propagation, which does not occur in ordinary acoustic devices, may have novel applications for acoustic isolators, modulators, and transducers.

  20. Topological acoustics.

    PubMed

    Yang, Zhaoju; Gao, Fei; Shi, Xihang; Lin, Xiao; Gao, Zhen; Chong, Yidong; Zhang, Baile

    2015-03-20

    The manipulation of acoustic wave propagation in fluids has numerous applications, including some in everyday life. Acoustic technologies frequently develop in tandem with optics, using shared concepts such as waveguiding and metamedia. It is thus noteworthy that an entirely novel class of electromagnetic waves, known as "topological edge states," has recently been demonstrated. These are inspired by the electronic edge states occurring in topological insulators, and possess a striking and technologically promising property: the ability to travel in a single direction along a surface without backscattering, regardless of the existence of defects or disorder. Here, we develop an analogous theory of topological fluid acoustics, and propose a scheme for realizing topological edge states in an acoustic structure containing circulating fluids. The phenomenon of disorder-free one-way sound propagation, which does not occur in ordinary acoustic devices, may have novel applications for acoustic isolators, modulators, and transducers. PMID:25839273