Science.gov

Sample records for acoustic speech signal

  1. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2006-08-08

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  2. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2004-03-23

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  3. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2006-02-14

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  4. System And Method For Characterizing Voiced Excitations Of Speech And Acoustic Signals, Removing Acoustic Noise From Speech, And Synthesizi

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2006-04-25

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  5. Denoising of human speech using combined acoustic and em sensor signal processing

    SciTech Connect

    Ng, L C; Burnett, G C; Holzrichter, J F; Gable, T J

    1999-11-29

    Low Power EM radar-like sensors have made it possible to measure properties of the human speech production system in real-time, without acoustic interference. This greatly enhances the quality and quantify of information for many speech related applications. See Holzrichter, Burnett, Ng, and Lea, J. Acoustic. Soc. Am. 103 (1) 622 (1998). By using combined Glottal-EM- Sensor- and Acoustic-signals, segments of voiced, unvoiced, and no-speech can be reliably defined. Real-time Denoising filters can be constructed to remove noise from the user's corresponding speech signal.

  6. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2002-01-01

    Low power EM waves are used to detect motions of vocal tract tissues of the human speech system before, during, and after voiced speech. A voiced excitation function is derived. The excitation function provides speech production information to enhance speech characterization and to enable noise removal from human speech.

  7. Classifying acoustic signals into phoneme categories: average and dyslexic readers make use of complex dynamical patterns and multifractal scaling properties of the speech signal.

    PubMed

    Hasselman, Fred

    2015-01-01

    Several competing aetiologies of developmental dyslexia suggest that the problems with acquiring literacy skills are causally entailed by low-level auditory and/or speech perception processes. The purpose of this study is to evaluate the diverging claims about the specific deficient peceptual processes under conditions of strong inference. Theoretically relevant acoustic features were extracted from a set of artificial speech stimuli that lie on a /bAk/-/dAk/ continuum. The features were tested on their ability to enable a simple classifier (Quadratic Discriminant Analysis) to reproduce the observed classification performance of average and dyslexic readers in a speech perception experiment. The 'classical' features examined were based on component process accounts of developmental dyslexia such as the supposed deficit in Envelope Rise Time detection and the deficit in the detection of rapid changes in the distribution of energy in the frequency spectrum (formant transitions). Studies examining these temporal processing deficit hypotheses do not employ measures that quantify the temporal dynamics of stimuli. It is shown that measures based on quantification of the dynamics of complex, interaction-dominant systems (Recurrence Quantification Analysis and the multifractal spectrum) enable QDA to classify the stimuli almost identically as observed in dyslexic and average reading participants. It seems unlikely that participants used any of the features that are traditionally associated with accounts of (impaired) speech perception. The nature of the variables quantifying the temporal dynamics of the speech stimuli imply that the classification of speech stimuli cannot be regarded as a linear aggregate of component processes that each parse the acoustic signal independent of one another, as is assumed by the 'classical' aetiologies of developmental dyslexia. It is suggested that the results imply that the differences in speech perception performance between average and

  8. Classifying acoustic signals into phoneme categories: average and dyslexic readers make use of complex dynamical patterns and multifractal scaling properties of the speech signal.

    PubMed

    Hasselman, Fred

    2015-01-01

    Several competing aetiologies of developmental dyslexia suggest that the problems with acquiring literacy skills are causally entailed by low-level auditory and/or speech perception processes. The purpose of this study is to evaluate the diverging claims about the specific deficient peceptual processes under conditions of strong inference. Theoretically relevant acoustic features were extracted from a set of artificial speech stimuli that lie on a /bAk/-/dAk/ continuum. The features were tested on their ability to enable a simple classifier (Quadratic Discriminant Analysis) to reproduce the observed classification performance of average and dyslexic readers in a speech perception experiment. The 'classical' features examined were based on component process accounts of developmental dyslexia such as the supposed deficit in Envelope Rise Time detection and the deficit in the detection of rapid changes in the distribution of energy in the frequency spectrum (formant transitions). Studies examining these temporal processing deficit hypotheses do not employ measures that quantify the temporal dynamics of stimuli. It is shown that measures based on quantification of the dynamics of complex, interaction-dominant systems (Recurrence Quantification Analysis and the multifractal spectrum) enable QDA to classify the stimuli almost identically as observed in dyslexic and average reading participants. It seems unlikely that participants used any of the features that are traditionally associated with accounts of (impaired) speech perception. The nature of the variables quantifying the temporal dynamics of the speech stimuli imply that the classification of speech stimuli cannot be regarded as a linear aggregate of component processes that each parse the acoustic signal independent of one another, as is assumed by the 'classical' aetiologies of developmental dyslexia. It is suggested that the results imply that the differences in speech perception performance between average and

  9. Classifying acoustic signals into phoneme categories: average and dyslexic readers make use of complex dynamical patterns and multifractal scaling properties of the speech signal

    PubMed Central

    2015-01-01

    Several competing aetiologies of developmental dyslexia suggest that the problems with acquiring literacy skills are causally entailed by low-level auditory and/or speech perception processes. The purpose of this study is to evaluate the diverging claims about the specific deficient peceptual processes under conditions of strong inference. Theoretically relevant acoustic features were extracted from a set of artificial speech stimuli that lie on a /bAk/-/dAk/ continuum. The features were tested on their ability to enable a simple classifier (Quadratic Discriminant Analysis) to reproduce the observed classification performance of average and dyslexic readers in a speech perception experiment. The ‘classical’ features examined were based on component process accounts of developmental dyslexia such as the supposed deficit in Envelope Rise Time detection and the deficit in the detection of rapid changes in the distribution of energy in the frequency spectrum (formant transitions). Studies examining these temporal processing deficit hypotheses do not employ measures that quantify the temporal dynamics of stimuli. It is shown that measures based on quantification of the dynamics of complex, interaction-dominant systems (Recurrence Quantification Analysis and the multifractal spectrum) enable QDA to classify the stimuli almost identically as observed in dyslexic and average reading participants. It seems unlikely that participants used any of the features that are traditionally associated with accounts of (impaired) speech perception. The nature of the variables quantifying the temporal dynamics of the speech stimuli imply that the classification of speech stimuli cannot be regarded as a linear aggregate of component processes that each parse the acoustic signal independent of one another, as is assumed by the ‘classical’ aetiologies of developmental dyslexia. It is suggested that the results imply that the differences in speech perception performance between

  10. Perceptual centres in speech - an acoustic analysis

    NASA Astrophysics Data System (ADS)

    Scott, Sophie Kerttu

    Perceptual centres, or P-centres, represent the perceptual moments of occurrence of acoustic signals - the 'beat' of a sound. P-centres underlie the perception and production of rhythm in perceptually regular speech sequences. P-centres have been modelled both in speech and non speech (music) domains. The three aims of this thesis were toatest out current P-centre models to determine which best accounted for the experimental data bto identify a candidate parameter to map P-centres onto (a local approach) as opposed to the previous global models which rely upon the whole signal to determine the P-centre the final aim was to develop a model of P-centre location which could be applied to speech and non speech signals. The first aim was investigated by a series of experiments in which a) speech from different speakers was investigated to determine whether different models could account for variation between speakers b) whether rendering the amplitude time plot of a speech signal affects the P-centre of the signal c) whether increasing the amplitude at the offset of a speech signal alters P-centres in the production and perception of speech. The second aim was carried out by a) manipulating the rise time of different speech signals to determine whether the P-centre was affected, and whether the type of speech sound ramped affected the P-centre shift b) manipulating the rise time and decay time of a synthetic vowel to determine whether the onset alteration was had more affect on P-centre than the offset manipulation c) and whether the duration of a vowel affected the P-centre, if other attributes (amplitude, spectral contents) were held constant. The third aim - modelling P-centres - was based on these results. The Frequency dependent Amplitude Increase Model of P-centre location (FAIM) was developed using a modelling protocol, the APU GammaTone Filterbank and the speech from different speakers. The P-centres of the stimuli corpus were highly predicted by attributes of

  11. Acoustic Signal Processing

    NASA Astrophysics Data System (ADS)

    Hartmann, William M.; Candy, James V.

    Signal processing refers to the acquisition, storage, display, and generation of signals - also to the extraction of information from signals and the re-encoding of information. As such, signal processing in some form is an essential element in the practice of all aspects of acoustics. Signal processing algorithms enable acousticians to separate signals from noise, to perform automatic speech recognition, or to compress information for more efficient storage or transmission. Signal processing concepts are the building blocks used to construct models of speech and hearing. Now, in the 21st century, all signal processing is effectively digital signal processing. Widespread access to high-speed processing, massive memory, and inexpensive software make signal processing procedures of enormous sophistication and power available to anyone who wants to use them. Because advanced signal processing is now accessible to everybody, there is a need for primers that introduce basic mathematical concepts that underlie the digital algorithms. The present handbook chapter is intended to serve such a purpose.

  12. Speech recognition: Acoustic, phonetic and lexical

    NASA Astrophysics Data System (ADS)

    Zue, V. W.

    1985-10-01

    Our long-term research goal is the development and implementation of speaker-independent continuous speech recognition systems. It is our conviction that proper utilization of speech-specific knowledge is essential for advanced speech recognition systems. With this in mind, we have continued to make progress on the acquisition of acoustic-phonetic and lexical knowledge. We have completed the development of a continuous digit recognition system. The system was constructed to investigate the utilization of acoustic phonetic knowledge in a speech recognition system. Some of the significant development of this study includes a soft-failure procedure for lexical access, and the discovery of a set of acoustic-phonetic features for verification. We have completed a study of the constraints provided by lexical stress on word recognition. We found that lexical stress information alone can, on the average, reduce the number of word candidates from a large dictionary by more than 80%. In conjunction with this study, we successfully developed a system that automatically determines the stress pattern of a word from the acoustic signal.

  13. Speech recognition: Acoustic, phonetic and lexical knowledge

    NASA Astrophysics Data System (ADS)

    Zue, V. W.

    1985-08-01

    During this reporting period we continued to make progress on the acquisition of acoustic-phonetic and lexical knowledge. We completed development of a continuous digit recognition system. The system was constructed to investigate the use of acoustic-phonetic knowledge in a speech recognition system. The significant achievements of this study include the development of a soft-failure procedure for lexical access and the discovery of a set of acoustic-phonetic features for verification. We completed a study of the constraints that lexical stress imposes on word recognition. We found that lexical stress information alone can, on the average, reduce the number of word candidates from a large dictionary by more than 80 percent. In conjunction with this study, we successfully developed a system that automatically determines the stress pattern of a word from the acoustic signal. We performed an acoustic study on the characteristics of nasal consonants and nasalized vowels. We have also developed recognition algorithms for nasal murmurs and nasalized vowels in continuous speech. We finished the preliminary development of a system that aligns a speech waveform with the corresponding phonetic transcription.

  14. Acoustic Markers of Prosodic Boundaries in Spanish Spontaneous Alaryngeal Speech

    ERIC Educational Resources Information Center

    Cuenca, M. H.; Barrio, M. M.

    2010-01-01

    Prosodic information aids segmentation of the continuous speech signal and thereby facilitates auditory speech processing. Durational and pitch variations are prosodic cues especially necessary to convey prosodic boundaries, but alaryngeal speakers have inconsistent control over acoustic parameters such as F0 and duration, being as a result noisy…

  15. Acoustics of Clear Speech: Effect of Instruction

    ERIC Educational Resources Information Center

    Lam, Jennifer; Tjaden, Kris; Wilding, Greg

    2012-01-01

    Purpose: This study investigated how different instructions for eliciting clear speech affected selected acoustic measures of speech. Method: Twelve speakers were audio-recorded reading 18 different sentences from the Assessment of Intelligibility of Dysarthric Speech (Yorkston & Beukelman, 1984). Sentences were produced in habitual, clear,…

  16. Study of acoustic correlates associate with emotional speech

    NASA Astrophysics Data System (ADS)

    Yildirim, Serdar; Lee, Sungbok; Lee, Chul Min; Bulut, Murtaza; Busso, Carlos; Kazemzadeh, Ebrahim; Narayanan, Shrikanth

    2004-10-01

    This study investigates the acoustic characteristics of four different emotions expressed in speech. The aim is to obtain detailed acoustic knowledge on how a speech signal is modulated by changes from neutral to a certain emotional state. Such knowledge is necessary for automatic emotion recognition and classification and emotional speech synthesis. Speech data obtained from two semi-professional actresses are analyzed and compared. Each subject produces 211 sentences with four different emotions; neutral, sad, angry, happy. We analyze changes in temporal and acoustic parameters such as magnitude and variability of segmental duration, fundamental frequency and the first three formant frequencies as a function of emotion. Acoustic differences among the emotions are also explored with mutual information computation, multidimensional scaling and acoustic likelihood comparison with normal speech. Results indicate that speech associated with anger and happiness is characterized by longer duration, shorter interword silence, higher pitch and rms energy with wider ranges. Sadness is distinguished from other emotions by lower rms energy and longer interword silence. Interestingly, the difference in formant pattern between [happiness/anger] and [neutral/sadness] are better reflected in back vowels such as /a/(/father/) than in front vowels. Detailed results on intra- and interspeaker variability will be reported.

  17. Analog Acoustic Expression in Speech Communication

    ERIC Educational Resources Information Center

    Shintel, Hadas; Nusbaum, Howard C.; Okrent, Arika

    2006-01-01

    We present the first experimental evidence of a phenomenon in speech communication we call "analog acoustic expression." Speech is generally thought of as conveying information in two distinct ways: discrete linguistic-symbolic units such as words and sentences represent linguistic meaning, and continuous prosodic forms convey information about…

  18. Speech Intelligibility Advantages using an Acoustic Beamformer Display

    NASA Technical Reports Server (NTRS)

    Begault, Durand R.; Sunder, Kaushik; Godfroy, Martine; Otto, Peter

    2015-01-01

    A speech intelligibility test conforming to the Modified Rhyme Test of ANSI S3.2 "Method for Measuring the Intelligibility of Speech Over Communication Systems" was conducted using a prototype 12-channel acoustic beamformer system. The target speech material (signal) was identified against speech babble (noise), with calculated signal-noise ratios of 0, 5 and 10 dB. The signal was delivered at a fixed beam orientation of 135 deg (re 90 deg as the frontal direction of the array) and the noise at 135 deg (co-located) and 0 deg (separated). A significant improvement in intelligibility from 57% to 73% was found for spatial separation for the same signal-noise ratio (0 dB). Significant effects for improved intelligibility due to spatial separation were also found for higher signal-noise ratios (5 and 10 dB).

  19. Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

    DOEpatents

    Holzrichter, J.F.; Ng, L.C.

    1998-03-17

    The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching. 35 figs.

  20. Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

    DOEpatents

    Holzrichter, John F.; Ng, Lawrence C.

    1998-01-01

    The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching.

  1. Evaluating a topographical mapping from speech acoustics to tongue positions

    SciTech Connect

    Hogden, J.; Heard, M.

    1995-05-01

    The {ital continuity} {ital mapping} algorithm---a procedure for learning to recover the relative positions of the articulators from speech signals---is evaluated using human speech data. The advantage of continuity mapping is that it is an unsupervised algorithm; that is, it can potentially be trained to make a mapping from speech acoustics to speech articulation without articulator measurements. The procedure starts by vector quantizing short windows of a speech signal so that each window is represented (encoded) by a single number. Next, multidimensional scaling is used to map quantization codes that were temporally close in the encoded speech to nearby points in a {ital continuity} {ital map}. Since speech sounds produced sufficiently close together in time must have been produced by similar articulator configurations, and speech sounds produced close together in time are close to each other in the continuity map, sounds produced by similar articulator positions should be mapped to similar positions in the continuity map. The data set used for evaluating the continuity mapping algorithm is comprised of simultaneously collected articulator and acoustic measurements made using an electromagnetic midsagittal articulometer on a human subject. Comparisons between measured articulator positions and those recovered using continuity mapping will be presented.

  2. Investigation of the optimum acoustical conditions for speech using auralization

    NASA Astrophysics Data System (ADS)

    Yang, Wonyoung; Hodgson, Murray

    2001-05-01

    Speech intelligibility is mainly affected by reverberation and by signal-to-noise level difference, the difference between the speech-signal and background-noise levels at a receiver. An important question for the design of rooms for speech (e.g., classrooms) is, what are the optimal values of these factors? This question has been studied experimentally and theoretically. Experimental studies found zero optimal reverberation time, but theoretical predictions found nonzero reverberation times. These contradictory results are partly caused by the different ways of accounting for background noise. Background noise sources and their locations inside the room are the most detrimental factors in speech intelligibility. However, noise levels also interact with reverberation in rooms. In this project, two major room-acoustical factors for speech intelligibility were controlled using speech and noise sources of known relative output levels located in a virtual room with known reverberation. Speech intelligibility test signals were played in the virtual room and auralized for listeners. The Modified Rhyme Test (MRT) and babble noise were used to measure subjective speech intelligibility quality. Optimal reverberation times, and the optimal values of other speech intelligibility metrics, for normal-hearing people and for hard-of-hearing people, were identified and compared.

  3. Does Signal Degradation Affect Top-Down Processing of Speech?

    PubMed

    Wagner, Anita; Pals, Carina; de Blecourt, Charlotte M; Sarampalis, Anastasios; Başkent, Deniz

    2016-01-01

    Speech perception is formed based on both the acoustic signal and listeners' knowledge of the world and semantic context. Access to semantic information can facilitate interpretation of degraded speech, such as speech in background noise or the speech signal transmitted via cochlear implants (CIs). This paper focuses on the latter, and investigates the time course of understanding words, and how sentential context reduces listeners' dependency on the acoustic signal for natural and degraded speech via an acoustic CI simulation.In an eye-tracking experiment we combined recordings of listeners' gaze fixations with pupillometry, to capture effects of semantic information on both the time course and effort of speech processing. Normal-hearing listeners were presented with sentences with or without a semantically constraining verb (e.g., crawl) preceding the target (baby), and their ocular responses were recorded to four pictures, including the target, a phonological (bay) competitor and a semantic (worm) and an unrelated distractor.The results show that in natural speech, listeners' gazes reflect their uptake of acoustic information, and integration of preceding semantic context. Degradation of the signal leads to a later disambiguation of phonologically similar words, and to a delay in integration of semantic information. Complementary to this, the pupil dilation data show that early semantic integration reduces the effort in disambiguating phonologically similar words. Processing degraded speech comes with increased effort due to the impoverished nature of the signal. Delayed integration of semantic information further constrains listeners' ability to compensate for inaudible signals. PMID:27080670

  4. Acoustic characteristics of listener-constrained speech

    NASA Astrophysics Data System (ADS)

    Ashby, Simone; Cummins, Fred

    2003-04-01

    Relatively little is known about the acoustical modifications speakers employ to meet the various constraints-auditory, linguistic and otherwise-of their listeners. Similarly, the manner by which perceived listener constraints interact with speakers' adoption of specialized speech registers is poorly Hypo (H&H) theory offers a framework for examining the relationship between speech production and output-oriented goals for communication, suggesting that under certain circumstances speakers may attempt to minimize phonetic ambiguity by employing a ``hyperarticulated'' speaking style (Lindblom, 1990). It remains unclear, however, what the acoustic correlates of hyperarticulated speech are, and how, if at all, we might expect phonetic properties to change respective to different listener-constrained conditions. This paper is part of a preliminary investigation concerned with comparing the prosodic characteristics of speech produced across a range of listener constraints. Analyses are drawn from a corpus of read hyperarticulated speech data comprising eight adult, female speakers of English. Specialized registers include speech to foreigners, infant-directed speech, speech produced under noisy conditions, and human-machine interaction. The authors gratefully acknowledge financial support of the Irish Higher Education Authority, allocated to Fred Cummins for collaborative work with Media Lab Europe.

  5. Acoustic Evidence for Phonologically Mismatched Speech Errors

    ERIC Educational Resources Information Center

    Gormley, Andrea

    2015-01-01

    Speech errors are generally said to accommodate to their new phonological context. This accommodation has been validated by several transcription studies. The transcription methodology is not the best choice for detecting errors at this level, however, as this type of error can be difficult to perceive. This paper presents an acoustic analysis of…

  6. Start/End Delays of Voiced and Unvoiced Speech Signals

    SciTech Connect

    Herrnstein, A

    1999-09-24

    Recent experiments using low power EM-radar like sensors (e.g, GEMs) have demonstrated a new method for measuring vocal fold activity and the onset times of voiced speech, as vocal fold contact begins to take place. Similarly the end time of a voiced speech segment can be measured. Secondly it appears that in most normal uses of American English speech, unvoiced-speech segments directly precede or directly follow voiced-speech segments. For many applications, it is useful to know typical duration times of these unvoiced speech segments. A corpus, assembled earlier of spoken ''Timit'' words, phrases, and sentences and recorded using simultaneously measured acoustic and EM-sensor glottal signals, from 16 male speakers, was used for this study. By inspecting the onset (or end) of unvoiced speech, using the acoustic signal, and the onset (or end) of voiced speech using the EM sensor signal, the average duration times for unvoiced segments preceding onset of vocalization were found to be 300ms, and for following segments, 500ms. An unvoiced speech period is then defined in time, first by using the onset of the EM-sensed glottal signal, as the onset-time marker for the voiced speech segment and end marker for the unvoiced segment. Then, by subtracting 300ms from the onset time mark of voicing, the unvoiced speech segment start time is found. Similarly, the times for a following unvoiced speech segment can be found. While data of this nature have proven to be useful for work in our laboratory, a great deal of additional work remains to validate such data for use with general populations of users. These procedures have been useful for applying optimal processing algorithms over time segments of unvoiced, voiced, and non-speech acoustic signals. For example, these data appear to be of use in speaker validation, in vocoding, and in denoising algorithms.

  7. Multifractal nature of unvoiced speech signals

    SciTech Connect

    Adeyemi, O.A.; Hartt, K.; Boudreaux-Bartels, G.F.

    1996-06-01

    A refinement is made in the nonlinear dynamic modeling of speech signals. Previous research successfully characterized speech signals as chaotic. Here, we analyze fricative speech signals using multifractal measures to determine various fractal regimes present in their chaotic attractors. Results support the hypothesis that speech signals have multifractal measures. {copyright} {ital 1996 American Institute of Physics.}

  8. Acoustic evidence for phonologically mismatched speech errors.

    PubMed

    Gormley, Andrea

    2015-04-01

    Speech errors are generally said to accommodate to their new phonological context. This accommodation has been validated by several transcription studies. The transcription methodology is not the best choice for detecting errors at this level, however, as this type of error can be difficult to perceive. This paper presents an acoustic analysis of speech errors that uncovers non-accommodated or mismatch errors. A mismatch error is a sub-phonemic error that results in an incorrect surface phonology. This type of error could arise during the processing of phonological rules or they could be made at the motor level of implementation. The results of this work have important implications for both experimental and theoretical research. For experimentalists, it validates the tools used for error induction and the acoustic determination of errors free of the perceptual bias. For theorists, this methodology can be used to test the nature of the processes proposed in language production.

  9. Segment-based acoustic models for continuous speech recognition

    NASA Astrophysics Data System (ADS)

    Ostendorf, Mari; Rohlicek, J. R.

    1994-02-01

    In work, we are interested in the problem of large vocabulary, speaker-independent continuous speech recognition, and primarily in the acoustic modeling component of this problem. In developing acoustic models for speech recognition, we have conflicting goals. On one hand, the models should be robust to inter- and intra-speaker variability, to the use of a different vocabulary in recognition than in training, and to the effects of moderately noisy environments. In order to accomplish this, we need to model gross features and global trends. On the other hand, the models must be sensitive and detailed enough to detect fine acoustic differences between similar words in a large vocabulary task. To answer these opposing demands requires improvements in acoustic modeling at several levels: the frame level (e.g. signal processing), the phoneme level (e.g. modeling feature dynamics), and the utterance level (e.g. defining a structural context for representing the intra-utterance dependence across phonemes). This project address the problem of acoustic modeling specifically focusing on modeling at the segment level and above.

  10. Applications of Hilbert Spectral Analysis for Speech and Sound Signals

    NASA Technical Reports Server (NTRS)

    Huang, Norden E.

    2003-01-01

    A new method for analyzing nonlinear and nonstationary data has been developed, and the natural applications are to speech and sound signals. The key part of the method is the Empirical Mode Decomposition method with which any complicated data set can be decomposed into a finite and often small number of Intrinsic Mode Functions (IMF). An IMF is defined as any function having the same numbers of zero-crossing and extrema, and also having symmetric envelopes defined by the local maxima and minima respectively. The IMF also admits well-behaved Hilbert transform. This decomposition method is adaptive, and, therefore, highly efficient. Since the decomposition is based on the local characteristic time scale of the data, it is applicable to nonlinear and nonstationary processes. With the Hilbert transform, the Intrinsic Mode Functions yield instantaneous frequencies as functions of time, which give sharp identifications of imbedded structures. This method invention can be used to process all acoustic signals. Specifically, it can process the speech signals for Speech synthesis, Speaker identification and verification, Speech recognition, and Sound signal enhancement and filtering. Additionally, as the acoustical signals from machinery are essentially the way the machines are talking to us. Therefore, the acoustical signals, from the machines, either from sound through air or vibration on the machines, can tell us the operating conditions of the machines. Thus, we can use the acoustic signal to diagnosis the problems of machines.

  11. Acoustic analysis of speech under stress.

    PubMed

    Sondhi, Savita; Khan, Munna; Vijay, Ritu; Salhan, Ashok K; Chouhan, Satish

    2015-01-01

    When a person is emotionally charged, stress could be discerned in his voice. This paper presents a simplified and a non-invasive approach to detect psycho-physiological stress by monitoring the acoustic modifications during a stressful conversation. Voice database consists of audio clips from eight different popular FM broadcasts wherein the host of the show vexes the subjects who are otherwise unaware of the charade. The audio clips are obtained from real-life stressful conversations (no simulated emotions). Analysis is done using PRAAT software to evaluate mean fundamental frequency (F0) and formant frequencies (F1, F2, F3, F4) both in neutral and stressed state. Results suggest that F0 increases with stress; however, formant frequency decreases with stress. Comparison of Fourier and chirp spectra of short vowel segment shows that for relaxed speech, the two spectra are similar; however, for stressed speech, they differ in the high frequency range due to increased pitch modulation. PMID:26558301

  12. Infant Perception of Atypical Speech Signals

    ERIC Educational Resources Information Center

    Vouloumanos, Athena; Gelfand, Hanna M.

    2013-01-01

    The ability to decode atypical and degraded speech signals as intelligible is a hallmark of speech perception. Human adults can perceive sounds as speech even when they are generated by a variety of nonhuman sources including computers and parrots. We examined how infants perceive the speech-like vocalizations of a parrot. Further, we examined how…

  13. Acoustic Study of Acted Emotions in Speech

    NASA Astrophysics Data System (ADS)

    Wang, Rong

    An extensive set of carefully recorded utterances provided a speech database for investigating acoustic correlates among eight emotional states. Four professional actors and four professional actresses simulated the emotional states of joy, conversation, nervousness, anger, sadness, hate, fear, and depression. The values of 14 acoustic parameters were extracted from analyses of the simulated portrayals. Normalization of the parameters was made to reduce the talker-dependence. Correlates of emotion were investigated by means of principal components analysis. Sadness and depression were found to be "ambiguous" with respect to each other, but "unique" with respect to joy and anger in the principal components space. Joy, conversation, nervousness, anger, hate, and fear did not separate well in the space and so exhibited ambiguity with respect to one another. The different talkers expressed joy, anger, sadness, and depression more consistently than the other four emotions. The analysis results were compared with the results of a subjective study using the same speech database and considerable consistency between the two was found.

  14. Acoustic differences among casual, conversational, and read speech

    NASA Astrophysics Data System (ADS)

    Pinnow, DeAnna

    Speech is a complex behavior that allows speakers to use many variations to satisfy the demands connected with multiple speaking environments. Speech research typically obtains speech samples in a controlled laboratory setting using read material, yet anecdotal observations of such speech, particularly from talkers with a speech and language impairment, have identified a "performance" effect in the produced speech which masks the characteristics of impaired speech outside of the lab (Goberman, Recker, & Parveen, 2010). The aim of the current study was to investigate acoustic differences among laboratory read, laboratory conversational, and casual speech through well-defined speech tasks in the laboratory and in talkers' natural environments. Eleven healthy research participants performed lab recording tasks (19 read sentences and a dialogue about their life) and collected natural-environment recordings of themselves over 3-day periods using portable recorders. Segments were analyzed for articulatory, voice, and prosodic acoustic characteristics using computer software and hand counting. The current study results indicate that lab-read speech was significantly different from casual speech: greater articulation range, improved voice quality measures, lower speech rate, and lower mean pitch. One implication of the results is that different laboratory techniques may be beneficial in obtaining speech samples that are more like casual speech, thus making it easier to correctly analyze abnormal speech characteristics with fewer errors.

  15. Acoustics in Halls for Speech and Music

    NASA Astrophysics Data System (ADS)

    Gade, Anders C.

    This chapter deals specifically with concepts, tools, and architectural variables of importance when designing auditoria for speech and music. The focus will be on cultivating the useful components of the sound in the room rather than on avoiding noise from outside or from installations, which is dealt with in Chap. 11. The chapter starts by presenting the subjective aspects of the room acoustic experience according to consensus at the time of writing. Then follows a description of their objective counterparts, the objective room acoustic parameters, among which the classical reverberation time measure is only one of many, but still of fundamental value. After explanations on how these parameters can be measured and predicted during the design phase, the remainder of the chapter deals with how the acoustic properties can be controlled by the architectural design of auditoria. This is done by presenting the influence of individual design elements as well as brief descriptions of halls designed for specific purposes, such as drama, opera, and symphonic concerts. Finally, some important aspects of loudspeaker installations in auditoria are briefly touched upon.

  16. Speaker verification using combined acoustic and EM sensor signal processing

    SciTech Connect

    Ng, L C; Gable, T J; Holzrichter, J F

    2000-11-10

    Low Power EM radar-like sensors have made it possible to measure properties of the human speech production system in real-time, without acoustic interference. This greatly enhances the quality and quantity of information for many speech related applications. See Holzrichter, Burnett, Ng, and Lea, J. Acoustic. SOC. Am . 103 ( 1) 622 (1998). By combining the Glottal-EM-Sensor (GEMS) with the Acoustic-signals, we've demonstrated an almost 10 fold reduction in error rates from a speaker verification system experiment under a moderate noisy environment (-10dB).

  17. Low Bandwidth Vocoding using EM Sensor and Acoustic Signal Processing

    SciTech Connect

    Ng, L C; Holzrichter, J F; Larson, P E

    2001-10-25

    Low-power EM radar-like sensors have made it possible to measure properties of the human speech production system in real-time, without acoustic interference [1]. By combining these data with the corresponding acoustic signal, we've demonstrated an almost 10-fold bandwidth reduction in speech compression, compared to a standard 2.4 kbps LPC10 protocol used in the STU-III (Secure Terminal Unit, third generation) telephone. This paper describes a potential EM sensor/acoustic based vocoder implementation.

  18. Acoustic properties of naturally produced clear speech at normal speaking rates

    NASA Astrophysics Data System (ADS)

    Krause, Jean C.; Braida, Louis D.

    2004-01-01

    Sentences spoken ``clearly'' are significantly more intelligible than those spoken ``conversationally'' for hearing-impaired listeners in a variety of backgrounds [Picheny et al., J. Speech Hear. Res. 28, 96-103 (1985); Uchanski et al., ibid. 39, 494-509 (1996); Payton et al., J. Acoust. Soc. Am. 95, 1581-1592 (1994)]. While producing clear speech, however, talkers often reduce their speaking rate significantly [Picheny et al., J. Speech Hear. Res. 29, 434-446 (1986); Uchanski et al., ibid. 39, 494-509 (1996)]. Yet speaking slowly is not solely responsible for the intelligibility benefit of clear speech (over conversational speech), since a recent study [Krause and Braida, J. Acoust. Soc. Am. 112, 2165-2172 (2002)] showed that talkers can produce clear speech at normal rates with training. This finding suggests that clear speech has inherent acoustic properties, independent of rate, that contribute to improved intelligibility. Identifying these acoustic properties could lead to improved signal processing schemes for hearing aids. To gain insight into these acoustical properties, conversational and clear speech produced at normal speaking rates were analyzed at three levels of detail (global, phonological, and phonetic). Although results suggest that talkers may have employed different strategies to achieve clear speech at normal rates, two global-level properties were identified that appear likely to be linked to the improvements in intelligibility provided by clear/normal speech: increased energy in the 1000-3000-Hz range of long-term spectra and increased modulation depth of low frequency modulations of the intensity envelope. Other phonological and phonetic differences associated with clear/normal speech include changes in (1) frequency of stop burst releases, (2) VOT of word-initial voiceless stop consonants, and (3) short-term vowel spectra.

  19. Infant perception of atypical speech signals.

    PubMed

    Vouloumanos, Athena; Gelfand, Hanna M

    2013-05-01

    The ability to decode atypical and degraded speech signals as intelligible is a hallmark of speech perception. Human adults can perceive sounds as speech even when they are generated by a variety of nonhuman sources including computers and parrots. We examined how infants perceive the speech-like vocalizations of a parrot. Further, we examined how visual context influences infant speech perception. Nine-month-olds heard speech and nonspeech sounds produced by either a human or a parrot, concurrently with 1 of 2 visual displays: a static checkerboard or a static image of a human face. Using an infant-controlled looking task, we examined infants' preferences for speech and nonspeech sounds. Infants listened equally to parrot speech and nonspeech when paired with a checkerboard. However, in the presence of faces, infants listened longer to parrot speech than to nonspeech sounds, such that their preference for parrot speech was similar to their preference for human speech sounds. These data are consistent with the possibility that infants treat parrot speech similarly to human speech relative to nonspeech vocalizations but only in some visual contexts. Like adults, infants may perceive a range of signals as speech.

  20. The acoustic properties of bilingual infant-directed speech.

    PubMed

    Danielson, D Kyle; Seidl, Amanda; Onishi, Kristine H; Alamian, Golnoush; Cristia, Alejandrina

    2014-02-01

    Does the acoustic input for bilingual infants equal the conjunction of the input heard by monolinguals of each separate language? The present letter tackles this question, focusing on maternal speech addressed to 11-month-old infants, on the cusp of perceptual attunement. The acoustic characteristics of the point vowels /a,i,u/ were measured in the spontaneous infant-directed speech of French-English bilingual mothers, as well as in the speech of French and English monolingual mothers. Bilingual caregivers produced their two languages with acoustic prosodic separation equal to that of the monolinguals, while also conveying distinct spectral characteristics of the point vowels in their two languages.

  1. Detection and Classification of Whale Acoustic Signals

    NASA Astrophysics Data System (ADS)

    Xian, Yin

    vocalization data set. The word error rate of the DCTNet feature is similar to the MFSC in speech recognition tasks, suggesting that the convolutional network is able to reveal acoustic content of speech signals.

  2. School cafeteria noise-The impact of room acoustics and speech intelligibility on children's voice levels

    NASA Astrophysics Data System (ADS)

    Bridger, Joseph F.

    2002-05-01

    The impact of room acoustics and speech intelligibility conditions of different school cafeterias on the voice levels of children is examined. Methods of evaluating cafeteria designs and predicting noise levels are discussed. Children are shown to modify their voice levels with changes in speech intelligibility like adults. Reverberation and signal to noise ratio are the important acoustical factors affecting speech intelligibility. Children have much more difficulty than adults in conditions where noise and reverberation are present. To evaluate the relationship of voice level and speech intelligibility, a database of real sound levels and room acoustics data was generated from measurements and data recorded during visits to a variety of existing cafeterias under different occupancy conditions. The effects of speech intelligibility and room acoustics on childrens voice levels are demonstrated. A new method is presented for predicting speech intelligibility conditions and resulting noise levels for the design of new cafeterias and renovation of existing facilities. Measurements are provided for an existing school cafeteria before and after new room acoustics treatments were added. This will be helpful for acousticians, architects, school systems, regulatory agencies, and Parent Teacher Associations to create less noisy cafeteria environments.

  3. Clear Speech Variants: An Acoustic Study in Parkinson's Disease

    ERIC Educational Resources Information Center

    Lam, Jennifer; Tjaden, Kris

    2016-01-01

    Purpose: The authors investigated how different variants of clear speech affect segmental and suprasegmental acoustic measures of speech in speakers with Parkinson's disease and a healthy control group. Method: A total of 14 participants with Parkinson's disease and 14 control participants served as speakers. Each speaker produced 18 different…

  4. Mathematical model of acoustic speech production with mobile walls of the vocal tract

    NASA Astrophysics Data System (ADS)

    Lyubimov, N. A.; Zakharov, E. V.

    2016-03-01

    A mathematical speech production model is considered that describes acoustic oscillation propagation in a vocal tract with mobile walls. The wave field function satisfies the Helmholtz equation with boundary conditions of the third kind (impedance type). The impedance mode corresponds to a threeparameter pendulum oscillation model. The experimental research demonstrates the nonlinear character of how the mobility of the vocal tract walls influence the spectral envelope of a speech signal.

  5. Acoustic assessment of speech privacy curtains in two nursing units

    PubMed Central

    Pope, Diana S.; Miller-Klein, Erik T.

    2016-01-01

    Hospitals have complex soundscapes that create challenges to patient care. Extraneous noise and high reverberation rates impair speech intelligibility, which leads to raised voices. In an unintended spiral, the increasing noise may result in diminished speech privacy, as people speak loudly to be heard over the din. The products available to improve hospital soundscapes include construction materials that absorb sound (acoustic ceiling tiles, carpet, wall insulation) and reduce reverberation rates. Enhanced privacy curtains are now available and offer potential for a relatively simple way to improve speech privacy and speech intelligibility by absorbing sound at the hospital patient's bedside. Acoustic assessments were performed over 2 days on two nursing units with a similar design in the same hospital. One unit was built with the 1970s’ standard hospital construction and the other was newly refurbished (2013) with sound-absorbing features. In addition, we determined the effect of an enhanced privacy curtain versus standard privacy curtains using acoustic measures of speech privacy and speech intelligibility indexes. Privacy curtains provided auditory protection for the patients. In general, that protection was increased by the use of enhanced privacy curtains. On an average, the enhanced curtain improved sound absorption from 20% to 30%; however, there was considerable variability, depending on the configuration of the rooms tested. Enhanced privacy curtains provide measureable improvement to the acoustics of patient rooms but cannot overcome larger acoustic design issues. To shorten reverberation time, additional absorption, and compact and more fragmented nursing unit floor plate shapes should be considered. PMID:26780959

  6. Acoustic assessment of speech privacy curtains in two nursing units.

    PubMed

    Pope, Diana S; Miller-Klein, Erik T

    2016-01-01

    Hospitals have complex soundscapes that create challenges to patient care. Extraneous noise and high reverberation rates impair speech intelligibility, which leads to raised voices. In an unintended spiral, the increasing noise may result in diminished speech privacy, as people speak loudly to be heard over the din. The products available to improve hospital soundscapes include construction materials that absorb sound (acoustic ceiling tiles, carpet, wall insulation) and reduce reverberation rates. Enhanced privacy curtains are now available and offer potential for a relatively simple way to improve speech privacy and speech intelligibility by absorbing sound at the hospital patient's bedside. Acoustic assessments were performed over 2 days on two nursing units with a similar design in the same hospital. One unit was built with the 1970s' standard hospital construction and the other was newly refurbished (2013) with sound-absorbing features. In addition, we determined the effect of an enhanced privacy curtain versus standard privacy curtains using acoustic measures of speech privacy and speech intelligibility indexes. Privacy curtains provided auditory protection for the patients. In general, that protection was increased by the use of enhanced privacy curtains. On an average, the enhanced curtain improved sound absorption from 20% to 30%; however, there was considerable variability, depending on the configuration of the rooms tested. Enhanced privacy curtains provide measureable improvement to the acoustics of patient rooms but cannot overcome larger acoustic design issues. To shorten reverberation time, additional absorption, and compact and more fragmented nursing unit floor plate shapes should be considered. PMID:26780959

  7. Acoustic assessment of speech privacy curtains in two nursing units.

    PubMed

    Pope, Diana S; Miller-Klein, Erik T

    2016-01-01

    Hospitals have complex soundscapes that create challenges to patient care. Extraneous noise and high reverberation rates impair speech intelligibility, which leads to raised voices. In an unintended spiral, the increasing noise may result in diminished speech privacy, as people speak loudly to be heard over the din. The products available to improve hospital soundscapes include construction materials that absorb sound (acoustic ceiling tiles, carpet, wall insulation) and reduce reverberation rates. Enhanced privacy curtains are now available and offer potential for a relatively simple way to improve speech privacy and speech intelligibility by absorbing sound at the hospital patient's bedside. Acoustic assessments were performed over 2 days on two nursing units with a similar design in the same hospital. One unit was built with the 1970s' standard hospital construction and the other was newly refurbished (2013) with sound-absorbing features. In addition, we determined the effect of an enhanced privacy curtain versus standard privacy curtains using acoustic measures of speech privacy and speech intelligibility indexes. Privacy curtains provided auditory protection for the patients. In general, that protection was increased by the use of enhanced privacy curtains. On an average, the enhanced curtain improved sound absorption from 20% to 30%; however, there was considerable variability, depending on the configuration of the rooms tested. Enhanced privacy curtains provide measureable improvement to the acoustics of patient rooms but cannot overcome larger acoustic design issues. To shorten reverberation time, additional absorption, and compact and more fragmented nursing unit floor plate shapes should be considered.

  8. Clinical investigation of speech signal features among patients with schizophrenia

    PubMed Central

    ZHANG, Jing; PAN, Zhongde; GUI, Chao; CUI, Donghong

    2016-01-01

    Background A new area of interest in the search for biomarkers for schizophrenia is the study of the acoustic parameters of speech called 'speech signal features'. Several of these features have been shown to be related to emotional responsiveness, a characteristic that is notably restricted in patients with schizophrenia, particularly those with prominent negative symptoms. Aim Assess the relationship of selected acoustic parameters of speech to the severity of clinical symptoms in patients with chronic schizophrenia and compare these characteristics between patients and matched healthy controls. Methods Ten speech signal features-six prosody features, formant bandwidth and amplitude, and two spectral features-were assessed using 15-minute speech samples obtained by smartphone from 26 inpatients with chronic schizophrenia (at enrollment and 1 week later) and from 30 healthy controls (at enrollment only). Clinical symptoms of the patients were also assessed at baseline and 1 week later using the Positive and Negative Syndrome Scale, the Scale for the Assessment of Negative Symptoms, and the Clinical Global Impression-Schizophrenia scale. Results In the patient group the symptoms were stable over the 1-week interval and the 1-week test-retest reliability of the 10 speech features was good (intraclass correlation coefficients [ICC] ranging from 0.55 to 0.88). Comparison of the speech features between patients and controls found no significant differences in the six prosody features or in the formant bandwidth and amplitude features, but the two spectral features were different: the Mel-frequency cepstral coefficient (MFCC) scores were significantly lower in the patient group than in the control group, and the linear prediction coding (LPC) scores were significantly higher in the patient group than in the control group. Within the patient group, 10 of the 170 associations between the 10 speech features considered and the 17 clinical parameters considered were

  9. Clinical investigation of speech signal features among patients with schizophrenia

    PubMed Central

    ZHANG, Jing; PAN, Zhongde; GUI, Chao; CUI, Donghong

    2016-01-01

    Background A new area of interest in the search for biomarkers for schizophrenia is the study of the acoustic parameters of speech called 'speech signal features'. Several of these features have been shown to be related to emotional responsiveness, a characteristic that is notably restricted in patients with schizophrenia, particularly those with prominent negative symptoms. Aim Assess the relationship of selected acoustic parameters of speech to the severity of clinical symptoms in patients with chronic schizophrenia and compare these characteristics between patients and matched healthy controls. Methods Ten speech signal features-six prosody features, formant bandwidth and amplitude, and two spectral features-were assessed using 15-minute speech samples obtained by smartphone from 26 inpatients with chronic schizophrenia (at enrollment and 1 week later) and from 30 healthy controls (at enrollment only). Clinical symptoms of the patients were also assessed at baseline and 1 week later using the Positive and Negative Syndrome Scale, the Scale for the Assessment of Negative Symptoms, and the Clinical Global Impression-Schizophrenia scale. Results In the patient group the symptoms were stable over the 1-week interval and the 1-week test-retest reliability of the 10 speech features was good (intraclass correlation coefficients [ICC] ranging from 0.55 to 0.88). Comparison of the speech features between patients and controls found no significant differences in the six prosody features or in the formant bandwidth and amplitude features, but the two spectral features were different: the Mel-frequency cepstral coefficient (MFCC) scores were significantly lower in the patient group than in the control group, and the linear prediction coding (LPC) scores were significantly higher in the patient group than in the control group. Within the patient group, 10 of the 170 associations between the 10 speech features considered and the 17 clinical parameters considered were

  10. Optimizing acoustical conditions for speech intelligibility in classrooms

    NASA Astrophysics Data System (ADS)

    Yang, Wonyoung

    High speech intelligibility is imperative in classrooms where verbal communication is critical. However, the optimal acoustical conditions to achieve a high degree of speech intelligibility have previously been investigated with inconsistent results, and practical room-acoustical solutions to optimize the acoustical conditions for speech intelligibility have not been developed. This experimental study validated auralization for speech-intelligibility testing, investigated the optimal reverberation for speech intelligibility for both normal and hearing-impaired listeners using more realistic room-acoustical models, and proposed an optimal sound-control design for speech intelligibility based on the findings. The auralization technique was used to perform subjective speech-intelligibility tests. The validation study, comparing auralization results with those of real classroom speech-intelligibility tests, found that if the room to be auralized is not very absorptive or noisy, speech-intelligibility tests using auralization are valid. The speech-intelligibility tests were done in two different auralized sound fields---approximately diffuse and non-diffuse---using the Modified Rhyme Test and both normal and hearing-impaired listeners. A hybrid room-acoustical prediction program was used throughout the work, and it and a 1/8 scale-model classroom were used to evaluate the effects of ceiling barriers and reflectors. For both subject groups, in approximately diffuse sound fields, when the speech source was closer to the listener than the noise source, the optimal reverberation time was zero. When the noise source was closer to the listener than the speech source, the optimal reverberation time was 0.4 s (with another peak at 0.0 s) with relative output power levels of the speech and noise sources SNS = 5 dB, and 0.8 s with SNS = 0 dB. In non-diffuse sound fields, when the noise source was between the speaker and the listener, the optimal reverberation time was 0.6 s with

  11. Speech and melody recognition in binaurally combined acoustic and electric hearing

    NASA Astrophysics Data System (ADS)

    Kong, Ying-Yee; Stickney, Ginger S.; Zeng, Fan-Gang

    2005-03-01

    Speech recognition in noise and music perception is especially challenging for current cochlear implant users. The present study utilizes the residual acoustic hearing in the nonimplanted ear in five cochlear implant users to elucidate the role of temporal fine structure at low frequencies in auditory perception and to test the hypothesis that combined acoustic and electric hearing produces better performance than either mode alone. The first experiment measured speech recognition in the presence of competing noise. It was found that, although the residual low-frequency (<1000 Hz) acoustic hearing produced essentially no recognition for speech recognition in noise, it significantly enhanced performance when combined with the electric hearing. The second experiment measured melody recognition in the same group of subjects and found that, contrary to the speech recognition result, the low-frequency acoustic hearing produced significantly better performance than the electric hearing. It is hypothesized that listeners with combined acoustic and electric hearing might use the correlation between the salient pitch in low-frequency acoustic hearing and the weak pitch in the envelope to enhance segregation between signal and noise. The present study suggests the importance and urgency of accurately encoding the fine-structure cue in cochlear implants. .

  12. Acoustically-Induced Electrical Signals

    NASA Astrophysics Data System (ADS)

    Brown, S. R.

    2014-12-01

    We have observed electrical signals excited by and moving along with an acoustic pulse propagating in a sandstone sample. Using resonance we are now studying the characteristics of this acousto-electric signal and determining its origin and the controlling physical parameters. Four rock samples with a range of porosities, permeabilities, and mineralogies were chosen: Berea, Boise, and Colton sandstones and Austin Chalk. Pore water salinity was varied from deionized water to sea water. Ag-AgCl electrodes were attached to the sample and were interfaced to a 4-wire electrical resistivity system. Under computer control, the acoustic signals were excited and the electrical response was recorded. We see strong acoustically-induced electrical signals in all samples, with the magnitude of the effect for each rock getting stronger as we move from the 1st to the 3rd harmonics in resonance. Given a particular fluid salinity, each rock has its own distinct sensitivity in the induced electrical effect. For example at the 2nd harmonic, Berea Sandstone produces the largest electrical signal per acoustic power input even though Austin Chalk and Boise Sandstone tend to resonate with much larger amplitudes at the same harmonic. Two effects are potentially responsible for this acoustically-induced electrical response: one the co-seismic seismo-electric effect and the other a strain-induced resistivity change known as the acousto-electric effect. We have designed experimental tests to separate these mechanisms. The tests show that the seismo-electric effect is dominant in our studies. We note that these experiments are in a fluid viscosity dominated seismo-electric regime, leading to a simple interpretation of the signals where the electric potential developed is proportional to the local acceleration of the rock. Toward a test of this theory we have measured the local time-varying acoustic strain in our samples using a laser vibrometer.

  13. Sensitivity to Structure in the Speech Signal by Children with Speech Sound Disorder and Reading Disability

    PubMed Central

    Johnson, Erin Phinney; Pennington, Bruce F.; Lowenstein, Joanna H.; Nittrouer, Susan

    2011-01-01

    Purpose Children with speech sound disorder (SSD) and reading disability (RD) have poor phonological awareness, a problem believed to arise largely from deficits in processing the sensory information in speech, specifically individual acoustic cues. However, such cues are details of acoustic structure. Recent theories suggest that listeners also need to be able to integrate those details to perceive linguistically relevant form. This study examined abilities of children with SSD, RD, and SSD+RD not only to process acoustic cues but also to recover linguistically relevant form from the speech signal. Method Ten- to 11-year-olds with SSD (n = 17), RD (n = 16), SSD+RD (n = 17), and Controls (n = 16) were tested to examine their sensitivity to (1) voice onset times (VOT); (2) spectral structure in fricative-vowel syllables; and (3) vocoded sentences. Results Children in all groups performed similarly with VOT stimuli, but children with disorders showed delays on other tasks, although the specifics of their performance varied. Conclusion Children with poor phonemic awareness not only lack sensitivity to acoustic details, but are also less able to recover linguistically relevant forms. This is contrary to one of the main current theories of the relation between spoken and written language development. PMID:21329941

  14. Acoustic Localization with Infrasonic Signals

    NASA Astrophysics Data System (ADS)

    Threatt, Arnesha; Elbing, Brian

    2015-11-01

    Numerous geophysical and anthropogenic events emit infrasonic frequencies (<20 Hz), including volcanoes, hurricanes, wind turbines and tornadoes. These sounds, which cannot be heard by the human ear, can be detected from large distances (in excess of 100 miles) due to low frequency acoustic signals having a very low decay rate in the atmosphere. Thus infrasound could be used for long-range, passive monitoring and detection of these events. An array of microphones separated by known distances can be used to locate a given source, which is known as acoustic localization. However, acoustic localization with infrasound is particularly challenging due to contamination from other signals, sensitivity to wind noise and producing a trusted source for system development. The objective of the current work is to create an infrasonic source using a propane torch wand or a subwoofer and locate the source using multiple infrasonic microphones. This presentation will present preliminary results from various microphone configurations used to locate the source.

  15. A speech processing study using an acoustic model of a multiple-channel cochlear implant

    NASA Astrophysics Data System (ADS)

    Xu, Ying

    1998-10-01

    A cochlear implant is an electronic device designed to provide sound information for adults and children who have bilateral profound hearing loss. The task of representing speech signals as electrical stimuli is central to the design and performance of cochlear implants. Studies have shown that the current speech- processing strategies provide significant benefits to cochlear implant users. However, the evaluation and development of speech-processing strategies have been complicated by hardware limitations and large variability in user performance. To alleviate these problems, an acoustic model of a cochlear implant with the SPEAK strategy is implemented in this study, in which a set of acoustic stimuli whose psychophysical characteristics are as close as possible to those produced by a cochlear implant are presented on normal-hearing subjects. To test the effectiveness and feasibility of this acoustic model, a psychophysical experiment was conducted to match the performance of a normal-hearing listener using model- processed signals to that of a cochlear implant user. Good agreement was found between an implanted patient and an age-matched normal-hearing subject in a dynamic signal discrimination experiment, indicating that this acoustic model is a reasonably good approximation of a cochlear implant with the SPEAK strategy. The acoustic model was then used to examine the potential of the SPEAK strategy in terms of its temporal and frequency encoding of speech. It was hypothesized that better temporal and frequency encoding of speech can be accomplished by higher stimulation rates and a larger number of activated channels. Vowel and consonant recognition tests were conducted on normal-hearing subjects using speech tokens processed by the acoustic model, with different combinations of stimulation rate and number of activated channels. The results showed that vowel recognition was best at 600 pps and 8 activated channels, but further increases in stimulation rate and

  16. Empirical mode decomposition for analyzing acoustical signals

    NASA Technical Reports Server (NTRS)

    Huang, Norden E. (Inventor)

    2005-01-01

    The present invention discloses a computer implemented signal analysis method through the Hilbert-Huang Transformation (HHT) for analyzing acoustical signals, which are assumed to be nonlinear and nonstationary. The Empirical Decomposition Method (EMD) and the Hilbert Spectral Analysis (HSA) are used to obtain the HHT. Essentially, the acoustical signal will be decomposed into the Intrinsic Mode Function Components (IMFs). Once the invention decomposes the acoustic signal into its constituting components, all operations such as analyzing, identifying, and removing unwanted signals can be performed on these components. Upon transforming the IMFs into Hilbert spectrum, the acoustical signal may be compared with other acoustical signals.

  17. Methods and apparatus for non-acoustic speech characterization and recognition

    SciTech Connect

    Holzrichter, J.F.

    1999-12-21

    By simultaneously recording EM wave reflections and acoustic speech information, the positions and velocities of the speech organs as speech is articulated can be defined for each acoustic speech unit. Well defined time frames and feature vectors describing the speech, to the degree required, can be formed. Such feature vectors can uniquely characterize the speech unit being articulated each time frame. The onset of speech, rejection of external noise, vocalized pitch periods, articulator conditions, accurate timing, the identification of the speaker, acoustic speech unit recognition, and organ mechanical parameters can be determined.

  18. Methods and apparatus for non-acoustic speech characterization and recognition

    DOEpatents

    Holzrichter, John F.

    1999-01-01

    By simultaneously recording EM wave reflections and acoustic speech information, the positions and velocities of the speech organs as speech is articulated can be defined for each acoustic speech unit. Well defined time frames and feature vectors describing the speech, to the degree required, can be formed. Such feature vectors can uniquely characterize the speech unit being articulated each time frame. The onset of speech, rejection of external noise, vocalized pitch periods, articulator conditions, accurate timing, the identification of the speaker, acoustic speech unit recognition, and organ mechanical parameters can be determined.

  19. Role of the middle ear muscle apparatus in mechanisms of speech signal discrimination

    NASA Technical Reports Server (NTRS)

    Moroz, B. S.; Bazarov, V. G.; Sachenko, S. V.

    1980-01-01

    A method of impedance reflexometry was used to examine 101 students with hearing impairment in order to clarify the interrelation between speech discrimination and the state of the middle ear muscles. Ability to discriminate speech signals depends to some extent on the functional state of intraaural muscles. Speech discrimination was greatly impaired in the absence of stapedial muscle acoustic reflex, in the presence of low thresholds of stimulation and in very small values of reflex amplitude increase. Discrimination was not impeded in positive AR, high values of relative thresholds and normal increase of reflex amplitude in response to speech signals with augmenting intensity.

  20. The Effects of Noise on Speech Recognition in Cochlear Implant Subjects: Predictions and Analysis Using Acoustic Models

    NASA Astrophysics Data System (ADS)

    Remus, Jeremiah J.; Collins, Leslie M.

    2005-12-01

    Cochlear implants can provide partial restoration of hearing, even with limited spectral resolution and loss of fine temporal structure, to severely deafened individuals. Studies have indicated that background noise has significant deleterious effects on the speech recognition performance of cochlear implant patients. This study investigates the effects of noise on speech recognition using acoustic models of two cochlear implant speech processors and several predictive signal-processing-based analyses. The results of a listening test for vowel and consonant recognition in noise are presented and analyzed using the rate of phonemic feature transmission for each acoustic model. Three methods for predicting patterns of consonant and vowel confusion that are based on signal processing techniques calculating a quantitative difference between speech tokens are developed and tested using the listening test results. Results of the listening test and confusion predictions are discussed in terms of comparisons between acoustic models and confusion prediction performance.

  1. Acoustic Speech Analysis Of Wayang Golek Puppeteer

    NASA Astrophysics Data System (ADS)

    Hakim, Faisal Abdul; Mandasari, Miranti Indar; Sarwono, Joko

    2010-12-01

    Active disguising speech is one problem to be taken into account in forensic speaker verification or identification processes. The verification processes are usually carried out by comparison between unknown samples and known samples. Active disguising can be occurred on both samples. To simulate the condition of speech disguising, voices of Wayang Golek Puppeteer were used. It is assumed that wayang golek puppeteer is a master of disguise. He can manipulate his voice into many different types of character's voices. This paper discusses the speech characteristics of 2 puppeteers. Comparison was made between the voices of puppeteer's habitual voice with his manipulated voice.

  2. Signal-driven computations in speech processing.

    PubMed

    Peña, Marcela; Bonatti, Luca L; Nespor, Marina; Mehler, Jacques

    2002-10-18

    Learning a language requires both statistical computations to identify words in speech and algebraic-like computations to discover higher level (grammatical) structure. Here we show that these computations can be influenced by subtle cues in the speech signal. After a short familiarization to a continuous speech stream, adult listeners are able to segment it using powerful statistics, but they fail to extract the structural regularities included in the stream even when the familiarization is greatly extended. With the introduction of subliminal segmentation cues, however, these regularities can be rapidly captured.

  3. Music, Speech and Hearing: A Course in Descriptive Acoustics.

    ERIC Educational Resources Information Center

    Strong, William J.; Dudley, J. Duane

    A general undergraduate education course in descriptive acoustics for students in music, speech, and environmental disciplines is described. Student background, course philosophy, and course format are presented. Resource materials, which include books, a study guide, films, audio tapes, demonstrations, walk-in laboratories, and an examination…

  4. Learning to perceptually organize speech signals in native fashion1

    PubMed Central

    Nittrouer, Susan; Lowenstein, Joanna H.

    2010-01-01

    The ability to recognize speech involves sensory, perceptual, and cognitive processes. For much of the history of speech perception research, investigators have focused on the first and third of these, asking how much and what kinds of sensory information are used by normal and impaired listeners, as well as how effective amounts of that information are altered by “top-down” cognitive processes. This experiment focused on perceptual processes, asking what accounts for how the sensory information in the speech signal gets organized. Two types of speech signals processed to remove properties that could be considered traditional acoustic cues (amplitude envelopes and sine wave replicas) were presented to 100 listeners in five groups: native English-speaking (L1) adults, 7-, 5-, and 3-year-olds, and native Mandarin-speaking adults who were excellent second-language (L2) users of English. The L2 adults performed more poorly than L1 adults with both kinds of signals. Children performed more poorly than L1 adults but showed disproportionately better performance for the sine waves than for the amplitude envelopes compared to both groups of adults. Sentence context had similar effects across groups, so variability in recognition was attributed to differences in perceptual organization of the sensory information, presumed to arise from native language experience. PMID:20329861

  5. Acoustical conditions for speech communication in active elementary school classrooms

    NASA Astrophysics Data System (ADS)

    Sato, Hiroshi; Bradley, John

    2005-04-01

    Detailed acoustical measurements were made in 34 active elementary school classrooms with typical rectangular room shape in schools near Ottawa, Canada. There was an average of 21 students in classrooms. The measurements were made to obtain accurate indications of the acoustical quality of conditions for speech communication during actual teaching activities. Mean speech and noise levels were determined from the distribution of recorded sound levels and the average speech-to-noise ratio was 11 dBA. Measured mid-frequency reverberation times (RT) during the same occupied conditions varied from 0.3 to 0.6 s, and were a little less than for the unoccupied rooms. RT values were not related to noise levels. Octave band speech and noise levels, useful-to-detrimental ratios, and Speech Transmission Index values were also determined. Key results included: (1) The average vocal effort of teachers corresponded to louder than Pearsons Raised voice level; (2) teachers increase their voice level to overcome ambient noise; (3) effective speech levels can be enhanced by up to 5 dB by early reflection energy; and (4) student activity is seen to be the dominant noise source, increasing average noise levels by up to 10 dBA during teaching activities. [Work supported by CLLRnet.

  6. Amplitude Modulations of Acoustic Communication Signals

    NASA Astrophysics Data System (ADS)

    Turesson, Hjalmar K.

    2011-12-01

    In human speech, amplitude modulations at 3 -- 8 Hz are important for discrimination and detection. Two different neurophysiological theories have been proposed to explain this effect. The first theory proposes that, as a consequence of neocortical synaptic dynamics, signals that are amplitude modulated at 3 -- 8 Hz are propagated better than un-modulated signals, or signals modulated above 8 Hz. This suggests that neural activity elicited by vocalizations modulated at 3 -- 8 Hz is optimally transmitted, and the vocalizations better discriminated and detected. The second theory proposes that 3 -- 8 Hz amplitude modulations interact with spontaneous neocortical oscillations. Specifically, vocalizations modulated at 3 -- 8 Hz entrain local populations of neurons, which in turn, modulate the amplitude of high frequency gamma oscillations. This suggests that vocalizations modulated at 3 -- 8 Hz should induce stronger cross-frequency coupling. Similar to human speech, we found that macaque monkey vocalizations also are amplitude modulated between 3 and 8 Hz. Humans and macaque monkeys share similarities in vocal production, implying that the auditory systems subserving perception of acoustic communication signals also share similarities. Based on the similarities between human speech and macaque monkey vocalizations, we addressed how amplitude modulated vocalizations are processed in the auditory cortex of macaque monkeys, and what behavioral relevance modulations may have. Recording single neuron activity, as well as, the activity of local populations of neurons allowed us to test both of the neurophysiological theories presented above. We found that single neuron responses to vocalizations amplitude modulated at 3 -- 8 Hz resulted in better stimulus discrimination than vocalizations lacking 3 -- 8 Hz modulations, and that the effect most likely was mediated by synaptic dynamics. In contrast, we failed to find support for the oscillation-based model proposing a

  7. Effects of human fatigue on speech signals

    NASA Astrophysics Data System (ADS)

    Stamoulis, Catherine

    2001-05-01

    Cognitive performance may be significantly affected by fatigue. In the case of critical personnel, such as pilots, monitoring human fatigue is essential to ensure safety and success of a given operation. One of the modalities that may be used for this purpose is speech, which is sensitive to respiratory changes and increased muscle tension of vocal cords, induced by fatigue. Age, gender, vocal tract length, physical and emotional state may significantly alter speech intensity, duration, rhythm, and spectral characteristics. In addition to changes in speech rhythm, fatigue may also affect the quality of speech, such as articulation. In a noisy environment, detecting fatigue-related changes in speech signals, particularly subtle changes at the onset of fatigue, may be difficult. Therefore, in a performance-monitoring system, speech parameters which are significantly affected by fatigue need to be identified and extracted from input signals. For this purpose, a series of experiments was performed under slowly varying cognitive load conditions and at different times of the day. The results of the data analysis are presented here.

  8. Speech recognition: Acoustic-phonetic knowledge acquisition and representation

    NASA Astrophysics Data System (ADS)

    Zue, Victor W.

    1988-09-01

    The long-term research goal is to develop and implement speaker-independent continuous speech recognition systems. It is believed that the proper utilization of speech-specific knowledge is essential for such advanced systems. This research is thus directed toward the acquisition, quantification, and representation, of acoustic-phonetic and lexical knowledge, and the application of this knowledge to speech recognition algorithms. In addition, we are exploring new speech recognition alternatives based on artificial intelligence and connectionist techniques. We developed a statistical model for predicting the acoustic realization of stop consonants in various positions in the syllable template. A unification-based grammatical formalism was developed for incorporating this model into the lexical access algorithm. We provided an information-theoretic justification for the hierarchical structure of the syllable template. We analyzed segmented duration for vowels and fricatives in continuous speech. Based on contextual information, we developed durational models for vowels and fricatives that account for over 70 percent of the variance, using data from multiple, unknown speakers. We rigorously evaluated the ability of human spectrogram readers to identify stop consonants spoken by many talkers and in a variety of phonetic contexts. Incorporating the declarative knowledge used by the readers, we developed a knowledge-based system for stop identification. We achieved comparable system performance to that to the readers.

  9. Acoustic Characteristics of Ataxic Speech in Japanese Patients with Spinocerebellar Degeneration (SCD)

    ERIC Educational Resources Information Center

    Ikui, Yukiko; Tsukuda, Mamoru; Kuroiwa, Yoshiyuki; Koyano, Shigeru; Hirose, Hajime; Taguchi, Takahide

    2012-01-01

    Background: In English- and German-speaking countries, ataxic speech is often described as showing scanning based on acoustic impressions. Although the term "scanning" is generally considered to represent abnormal speech features including prosodic excess or insufficiency, any precise acoustic analysis of ataxic speech has not been performed in…

  10. Massively-parallel architectures for automatic recognition of visual speech signals. Annual report

    SciTech Connect

    Sejnowski, T.J.

    1988-10-12

    Significant progress was made in the primary objective of estimating the acoustic characteristics of speech from the visual speech signals. Neural networks were trained on a data base of vowels. The raw images of faces, aligned and preprocessed, were used as input to these network, which were trained to estimate the corresponding envelope of the acoustic spectrum. The performance of the networks was better than trained humans and was comparable with optimized pattern classifiers. The approach avoids the problems of information loss through early categorization. The acoustic information the network extracts from the visual signal can be used to supplement the acoustic signal in noisy environments, such as cockpits. During the next year these results will be extended to diphthongs using recurrent neural networks and temporal sequences of input images.

  11. Fundamental frequency is critical to speech perception in noise in combined acoustic and electric hearinga

    PubMed Central

    Carroll, Jeff; Tiaden, Stephanie; Zeng, Fan-Gang

    2011-01-01

    Cochlear implant (CI) users have been shown to benefit from residual low-frequency hearing, specifically in pitch related tasks. It remains unclear whether this benefit is dependent on fundamental frequency (F0) or other acoustic cues. Three experiments were conducted to determine the role of F0, as well as its frequency modulated (FM) and amplitude modulated (AM) components, in speech recognition with a competing voice. In simulated CI listeners, the signal-to-noise ratio was varied to estimate the 50% correct response. Simulation results showed that the F0 cue contributes to a significant proportion of the benefit seen with combined acoustic and electric hearing, and additionally that this benefit is due to the FM rather than the AM component. In actual CI users, sentence recognition scores were collected with either the full F0 cue containing both the FM and AM components or the 500-Hz low-pass speech cue containing the F0 and additional harmonics. The F0 cue provided a benefit similar to the low-pass cue for speech in noise, but not in quiet. Poorer CI users benefited more from the F0 cue than better users. These findings suggest that F0 is critical to improving speech perception in noise in combined acoustic and electric hearing. PMID:21973360

  12. Spatial acoustic signal processing for immersive communication

    NASA Astrophysics Data System (ADS)

    Atkins, Joshua

    Computing is rapidly becoming ubiquitous as users expect devices that can augment and interact naturally with the world around them. In these systems it is necessary to have an acoustic front-end that is able to capture and reproduce natural human communication. Whether the end point is a speech recognizer or another human listener, the reduction of noise, reverberation, and acoustic echoes are all necessary and complex challenges. The focus of this dissertation is to provide a general method for approaching these problems using spherical microphone and loudspeaker arrays.. In this work, a theory of capturing and reproducing three-dimensional acoustic fields is introduced from a signal processing perspective. In particular, the decomposition of the spatial part of the acoustic field into an orthogonal basis of spherical harmonics provides not only a general framework for analysis, but also many processing advantages. The spatial sampling error limits the upper frequency range with which a sound field can be accurately captured or reproduced. In broadband arrays, the cost and complexity of using multiple transducers is an issue. This work provides a flexible optimization method for determining the location of array elements to minimize the spatial aliasing error. The low frequency array processing ability is also limited by the SNR, mismatch, and placement error of transducers. To address this, a robust processing method is introduced and used to design a reproduction system for rendering over arbitrary loudspeaker arrays or binaurally over headphones. In addition to the beamforming problem, the multichannel acoustic echo cancellation (MCAEC) issue is also addressed. A MCAEC must adaptively estimate and track the constantly changing loudspeaker-room-microphone response to remove the sound field presented over the loudspeakers from that captured by the microphones. In the multichannel case, the system is overdetermined and many adaptive schemes fail to converge to

  13. Predicting the intelligibility of deaf children's speech from acoustic measures

    NASA Astrophysics Data System (ADS)

    Uchanski, Rosalie M.; Geers, Ann E.; Brenner, Christine M.; Tobey, Emily A.

    2001-05-01

    A weighted combination of speech-acoustic measures may provide an objective assessment of speech intelligibility in deaf children that could be used to evaluate the benefits of sensory aids and rehabilitation programs. This investigation compared the accuracy of two different approaches, multiple linear regression and a simple neural net. These two methods were applied to identical sets of acoustic measures, including both segmental (e.g., voice-onset times of plosives, spectral moments of fricatives, second formant frequencies of vowels) and suprasegmental measures (e.g., sentence duration, number and frequency of intersentence pauses). These independent variables were obtained from digitized recordings of deaf children's imitations of 11 simple sentences. The dependent measure was the percentage of spoken words from the 36 McGarr Sentences understood by groups of naive listeners. The two predictive methods were trained on speech measures obtained from 123 out of 164 8- and 9-year-old deaf children who used cochlear implants. Then, predictions were obtained using speech measures from the remaining 41 children. Preliminary results indicate that multiple linear regression is a better predictor of intelligibility than the neural net, accounting for 79% as opposed to 65% of the variance in the data. [Work supported by NIH.

  14. Tongue-Palate Contact Pressure, Oral Air Pressure, and Acoustics of Clear Speech

    ERIC Educational Resources Information Center

    Searl, Jeff; Evitts, Paul M.

    2013-01-01

    Purpose: The authors compared articulatory contact pressure (ACP), oral air pressure (Po), and speech acoustics for conversational versus clear speech. They also assessed the relationship of these measures to listener perception. Method: Twelve adults with normal speech produced monosyllables in a phrase using conversational and clear speech.…

  15. Language identification from visual-only speech signals

    PubMed Central

    Ronquest, Rebecca E.; Levi, Susannah V.; Pisoni, David B.

    2010-01-01

    Our goal in the present study was to examine how observers identify English and Spanish from visual-only displays of speech. First, we replicated the recent findings of Soto-Faraco et al. (2007) with Spanish and English bilingual and monolingual observers using different languages and a different experimental paradigm (identification). We found that prior linguistic experience affected response bias but not sensitivity (Experiment 1). In two additional experiments, we investigated the visual cues that observers use to complete the language-identification task. The results of Experiment 2 indicate that some lexical information is available in the visual signal but that it is limited. Acoustic analyses confirmed that our Spanish and English stimuli differed acoustically with respect to linguistic rhythmic categories. In Experiment 3, we tested whether this rhythmic difference could be used by observers to identify the language when the visual stimuli is temporally reversed, thereby eliminating lexical information but retaining rhythmic differences. The participants performed above chance even in the backward condition, suggesting that the rhythmic differences between the two languages may aid language identification in visual-only speech signals. The results of Experiments 3A and 3B also confirm previous findings that increased stimulus length facilitates language identification. Taken together, the results of these three experiments replicate earlier findings and also show that prior linguistic experience, lexical information, rhythmic structure, and utterance length influence visual-only language identification. PMID:20675804

  16. Language identification from visual-only speech signals.

    PubMed

    Ronquest, Rebecca E; Levi, Susannah V; Pisoni, David B

    2010-08-01

    Our goal in the present study was to examine how observers identify English and Spanish from visual-only displays of speech. First, we replicated the recent findings of Soto-Faraco et al. (2007) with Spanish and English bilingual and monolingual observers using different languages and a different experimental paradigm (identification). We found that prior linguistic experience affected response bias but not sensitivity (Experiment 1). In two additional experiments, we investigated the visual cues that observers use to complete the language-identification task. The results of Experiment 2 indicate that some lexical information is available in the visual signal but that it is limited. Acoustic analyses confirmed that our Spanish and English stimuli differed acoustically with respect to linguistic rhythmic categories. In Experiment 3, we tested whether this rhythmic difference could be used by observers to identify the language when the visual stimuli is temporally reversed, thereby eliminating lexical information but retaining rhythmic differences. The participants performed above chance even in the backward condition, suggesting that the rhythmic differences between the two languages may aid language identification in visual-only speech signals. The results of Experiments 3A and 3B also confirm previous findings that increased stimulus length facilitates language identification. Taken together, the results of these three experiments replicate earlier findings and also show that prior linguistic experience, lexical information, rhythmic structure, and utterance length influence visual-only language identification.

  17. Processing speech signal using auditory-like filterbank provides least uncertainty about articulatory gestures

    PubMed Central

    Ghosh, Prasanta Kumar; Goldstein, Louis M.; Narayanan, Shrikanth S.

    2011-01-01

    Understanding how the human speech production system is related to the human auditory system has been a perennial subject of inquiry. To investigate the production–perception link, in this paper, a computational analysis has been performed using the articulatory movement data obtained during speech production with concurrently recorded acoustic speech signals from multiple subjects in three different languages: English, Cantonese, and Georgian. The form of articulatory gestures during speech production varies across languages, and this variation is considered to be reflected in the articulatory position and kinematics. The auditory processing of the acoustic speech signal is modeled by a parametric representation of the cochlear filterbank which allows for realizing various candidate filterbank structures by changing the parameter value. Using mathematical communication theory, it is found that the uncertainty about the articulatory gestures in each language is maximally reduced when the acoustic speech signal is represented using the output of a filterbank similar to the empirically established cochlear filterbank in the human auditory system. Possible interpretations of this finding are discussed. PMID:21682422

  18. Adding articulatory features to acoustic features for automatic speech recognition

    SciTech Connect

    Zlokarnik, I.

    1995-05-01

    A hidden-Markov-model (HMM) based speech recognition system was evaluated that makes use of simultaneously recorded acoustic and articulatory data. The articulatory measurements were gathered by means of electromagnetic articulography and describe the movement of small coils fixed to the speakers` tongue and jaw during the production of German V{sub 1}CV{sub 2} sequences [P. Hoole and S. Gfoerer, J. Acoust. Soc. Am. Suppl. 1 {bold 87}, S123 (1990)]. Using the coordinates of the coil positions as an articulatory representation, acoustic and articulatory features were combined to make up an acoustic--articulatory feature vector. The discriminant power of this combined representation was evaluated for two subjects on a speaker-dependent isolated word recognition task. When the articulatory measurements were used both for training and testing the HMMs, the articulatory representation was capable of reducing the error rate of comparable acoustic-based HMMs by a relative percentage of more than 60%. In a separate experiment, the articulatory movements during the testing phase were estimated using a multilayer perceptron that performed an acoustic-to-articulatory mapping. Under these more realistic conditions, when articulatory measurements are only available during the training, the error rate could be reduced by a relative percentage of 18% to 25%.

  19. A 94-GHz Millimeter-Wave Sensor for Speech Signal Acquisition

    PubMed Central

    Li, Sheng; Tian, Ying; Lu, Guohua; Zhang, Yang; Lv, Hao; Yu, Xiao; Xue, Huijun; Zhang, Hua; Wang, Jianqi; Jing, Xijing

    2013-01-01

    High frequency millimeter-wave (MMW) radar-like sensors enable the detection of speech signals. This novel non-acoustic speech detection method has some special advantages not offered by traditional microphones, such as preventing strong-acoustic interference, high directional sensitivity with penetration, and long detection distance. A 94-GHz MMW radar sensor was employed in this study to test its speech acquisition ability. A 34-GHz zero intermediate frequency radar, a 34-GHz superheterodyne radar, and a microphone were also used for comparison purposes. A short-time phase-spectrum-compensation algorithm was used to enhance the detected speech. The results reveal that the 94-GHz radar sensor showed the highest sensitivity and obtained the highest speech quality subjective measurement score. This result suggests that the MMW radar sensor has better performance than a traditional microphone in terms of speech detection for detection distances longer than 1 m. As a substitute for the traditional speech acquisition method, this novel speech acquisition method demonstrates a large potential for many speech related applications. PMID:24284764

  20. A 94-GHz millimeter-wave sensor for speech signal acquisition.

    PubMed

    Li, Sheng; Tian, Ying; Lu, Guohua; Zhang, Yang; Lv, Hao; Yu, Xiao; Xue, Huijun; Zhang, Hua; Wang, Jianqi; Jing, Xijing

    2013-01-01

    High frequency millimeter-wave (MMW) radar-like sensors enable the detection of speech signals. This novel non-acoustic speech detection method has some special advantages not offered by traditional microphones, such as preventing strong-acoustic interference, high directional sensitivity with penetration, and long detection distance. A 94-GHz MMW radar sensor was employed in this study to test its speech acquisition ability. A 34-GHz zero intermediate frequency radar, a 34-GHz superheterodyne radar, and a microphone were also used for comparison purposes. A short-time phase-spectrum-compensation algorithm was used to enhance the detected speech. The results reveal that the 94-GHz radar sensor showed the highest sensitivity and obtained the highest speech quality subjective measurement score. This result suggests that the MMW radar sensor has better performance than a traditional microphone in terms of speech detection for detection distances longer than 1 m. As a substitute for the traditional speech acquisition method, this novel speech acquisition method demonstrates a large potential for many speech related applications.

  1. Speech Recognition and Acoustic Features in Combined Electric and Acoustic Stimulation

    ERIC Educational Resources Information Center

    Yoon, Yang-soo; Li, Yongxin; Fu, Qian-Jie

    2012-01-01

    Purpose: In this study, the authors aimed to identify speech information processed by a hearing aid (HA) that is additive to information processed by a cochlear implant (CI) as a function of signal-to-noise ratio (SNR). Method: Speech recognition was measured with CI alone, HA alone, and CI + HA. Ten participants were separated into 2 groups; good…

  2. Improved perception of speech in noise and Mandarin tones with acoustic simulations of harmonic coding for cochlear implants.

    PubMed

    Li, Xing; Nie, Kaibao; Imennov, Nikita S; Won, Jong Ho; Drennan, Ward R; Rubinstein, Jay T; Atlas, Les E

    2012-11-01

    Harmonic and temporal fine structure (TFS) information are important cues for speech perception in noise and music perception. However, due to the inherently coarse spectral and temporal resolution in electric hearing, the question of how to deliver harmonic and TFS information to cochlear implant (CI) users remains unresolved. A harmonic-single-sideband-encoder [(HSSE); Nie et al. (2008). Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing; Lie et al., (2010). Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing] strategy has been proposed that explicitly tracks the harmonics in speech and transforms them into modulators conveying both amplitude modulation and fundamental frequency information. For unvoiced speech, HSSE transforms the TFS into a slowly varying yet still noise-like signal. To investigate its potential, four- and eight-channel vocoder simulations of HSSE and the continuous-interleaved-sampling (CIS) strategy were implemented, respectively. Using these vocoders, five normal-hearing subjects' speech recognition performance was evaluated under different masking conditions; another five normal-hearing subjects' Mandarin tone identification performance was also evaluated. Additionally, the neural discharge patterns evoked by HSSE- and CIS-encoded Mandarin tone stimuli were simulated using an auditory nerve model. All subjects scored significantly higher with HSSE than with CIS vocoders. The modeling analysis demonstrated that HSSE can convey temporal pitch cues better than CIS. Overall, the results suggest that HSSE is a promising strategy to enhance speech perception with CIs. PMID:23145619

  3. Massively parallel network architectures for automatic recognition of visual speech signals. Final technical report

    SciTech Connect

    Sejnowski, T.J.; Goldstein, M.

    1990-01-01

    This research sought to produce a massively-parallel network architecture that could interpret speech signals from video recordings of human talkers. This report summarizes the project's results: (1) A corpus of video recordings from two human speakers was analyzed with image processing techniques and used as the data for this study; (2) We demonstrated that a feed forward network could be trained to categorize vowels from these talkers. The performance was comparable to that of the nearest neighbors techniques and to trained humans on the same data; (3) We developed a novel approach to sensory fusion by training a network to transform from facial images to short-time spectral amplitude envelopes. This information can be used to increase the signal-to-noise ratio and hence the performance of acoustic speech recognition systems in noisy environments; (4) We explored the use of recurrent networks to perform the same mapping for continuous speech. Results of this project demonstrate the feasibility of adding a visual speech recognition component to enhance existing speech recognition systems. Such a combined system could be used in noisy environments, such as cockpits, where improved communication is needed. This demonstration of presymbolic fusion of visual and acoustic speech signals is consistent with our current understanding of human speech perception.

  4. Simulation of CIS speech signal processing strategy based on electrical stimulating model of cochlear implant

    NASA Astrophysics Data System (ADS)

    Qian, Zheng; Yu, Dan

    2006-11-01

    During the operating course of Cochlear implant, the speech signal processing strategy converts original speech signal into dim current signal. And then this signal will be transmitted into the embedded electrode to stimulate the remnant auditory nerve to restore the audition of patient. It could be shown that the speech processing strategy is the key part to realize the performance of cochlear implant, but its evaluation method for validity is always lacking. In this paper, the electrical stimulating model of cochlear implant is established at first, and then the acoustic simulation of Continuous Interleaved Sampling (CIS) strategy could be finished on this model. The synthesizing signal simulates the speech signal which could be heard by the deaf with cochlear implant. Therefore, the identification ability of CIS strategy could be estimated by delivering this synthesizing signal to normal audition people. Further more, some detailed analyses for every step of this acoustic simulation could be considered in order to improve the performance and parameters selection of CIS strategy. This work will be helpful for the deaf to enhance their perception and understanding during the speech identification course.

  5. Automatic Speech Recognition from Neural Signals: A Focused Review

    PubMed Central

    Herff, Christian; Schultz, Tanja

    2016-01-01

    Speech interfaces have become widely accepted and are nowadays integrated in various real-life applications and devices. They have become a part of our daily life. However, speech interfaces presume the ability to produce intelligible speech, which might be impossible due to either loud environments, bothering bystanders or incapabilities to produce speech (i.e., patients suffering from locked-in syndrome). For these reasons it would be highly desirable to not speak but to simply envision oneself to say words or sentences. Interfaces based on imagined speech would enable fast and natural communication without the need for audible speech and would give a voice to otherwise mute people. This focused review analyzes the potential of different brain imaging techniques to recognize speech from neural signals by applying Automatic Speech Recognition technology. We argue that modalities based on metabolic processes, such as functional Near Infrared Spectroscopy and functional Magnetic Resonance Imaging, are less suited for Automatic Speech Recognition from neural signals due to low temporal resolution but are very useful for the investigation of the underlying neural mechanisms involved in speech processes. In contrast, electrophysiologic activity is fast enough to capture speech processes and is therefor better suited for ASR. Our experimental results indicate the potential of these signals for speech recognition from neural data with a focus on invasively measured brain activity (electrocorticography). As a first example of Automatic Speech Recognition techniques used from neural signals, we discuss the Brain-to-text system. PMID:27729844

  6. An acoustic comparison of two women's infant- and adult-directed speech

    NASA Astrophysics Data System (ADS)

    Andruski, Jean; Katz-Gershon, Shiri

    2003-04-01

    In addition to having prosodic characteristics that are attractive to infant listeners, infant-directed (ID) speech shares certain characteristics of adult-directed (AD) clear speech, such as increased acoustic distance between vowels, that might be expected to make ID speech easier for adults to perceive in noise than AD conversational speech. However, perceptual tests of two women's ID productions by Andruski and Bessega [J. Acoust. Soc. Am. 112, 2355] showed that is not always the case. In a word identification task that compared ID speech with AD clear and conversational speech, one speaker's ID productions were less well-identified than AD clear speech, but better identified than AD conversational speech. For the second woman, ID speech was the least accurately identified of the three speech registers. For both speakers, hard words (infrequent words with many lexical neighbors) were also at an increased disadvantage relative to easy words (frequent words with few lexical neighbors) in speech registers that were less accurately perceived. This study will compare several acoustic properties of these women's productions, including pitch and formant-frequency characteristics. Results of the acoustic analyses will be examined with the original perceptual results to suggest reasons for differences in listener's accuracy in identifying these two women's ID speech in noise.

  7. Acoustic Analysis of the Voiced-Voiceless Distinction in Dutch Tracheoesophageal Speech

    ERIC Educational Resources Information Center

    Jongmans, Petra; Wempe, Ton G.; van Tinteren, Harm; Hilgers, Frans J. M.; Pols, Louis C. W.; van As-Brooks, Corina J.

    2010-01-01

    Purpose: Confusions between voiced and voiceless plosives and voiced and voiceless fricatives are common in Dutch tracheoesophageal (TE) speech. This study investigates (a) which acoustic measures are found to convey a correct voicing contrast in TE speech and (b) whether different measures are found in TE speech than in normal laryngeal (NL)…

  8. Perceptual and Acoustic Reliability Estimates for the Speech Disorders Classification System (SDCS)

    ERIC Educational Resources Information Center

    Shriberg, Lawrence D.; Fourakis, Marios; Hall, Sheryl D.; Karlsson, Heather B.; Lohmeier, Heather L.; McSweeny, Jane L.; Potter, Nancy L.; Scheer-Cohen, Alison R.; Strand, Edythe A.; Tilkens, Christie M.; Wilson, David L.

    2010-01-01

    A companion paper describes three extensions to a classification system for paediatric speech sound disorders termed the Speech Disorders Classification System (SDCS). The SDCS uses perceptual and acoustic data reduction methods to obtain information on a speaker's speech, prosody, and voice. The present paper provides reliability estimates for…

  9. Acoustic-Emergent Phonology in the Amplitude Envelope of Child-Directed Speech.

    PubMed

    Leong, Victoria; Goswami, Usha

    2015-01-01

    When acquiring language, young children may use acoustic spectro-temporal patterns in speech to derive phonological units in spoken language (e.g., prosodic stress patterns, syllables, phonemes). Children appear to learn acoustic-phonological mappings rapidly, without direct instruction, yet the underlying developmental mechanisms remain unclear. Across different languages, a relationship between amplitude envelope sensitivity and phonological development has been found, suggesting that children may make use of amplitude modulation (AM) patterns within the envelope to develop a phonological system. Here we present the Spectral Amplitude Modulation Phase Hierarchy (S-AMPH) model, a set of algorithms for deriving the dominant AM patterns in child-directed speech (CDS). Using Principal Components Analysis, we show that rhythmic CDS contains an AM hierarchy comprising 3 core modulation timescales. These timescales correspond to key phonological units: prosodic stress (Stress AM, ~2 Hz), syllables (Syllable AM, ~5 Hz) and onset-rime units (Phoneme AM, ~20 Hz). We argue that these AM patterns could in principle be used by naïve listeners to compute acoustic-phonological mappings without lexical knowledge. We then demonstrate that the modulation statistics within this AM hierarchy indeed parse the speech signal into a primitive hierarchically-organised phonological system comprising stress feet (proto-words), syllables and onset-rime units. We apply the S-AMPH model to two other CDS corpora, one spontaneous and one deliberately-timed. The model accurately identified 72-82% (freely-read CDS) and 90-98% (rhythmically-regular CDS) stress patterns, syllables and onset-rime units. This in-principle demonstration that primitive phonology can be extracted from speech AMs is termed Acoustic-Emergent Phonology (AEP) theory. AEP theory provides a set of methods for examining how early phonological development is shaped by the temporal modulation structure of speech across

  10. Acoustic-Emergent Phonology in the Amplitude Envelope of Child-Directed Speech

    PubMed Central

    Leong, Victoria; Goswami, Usha

    2015-01-01

    When acquiring language, young children may use acoustic spectro-temporal patterns in speech to derive phonological units in spoken language (e.g., prosodic stress patterns, syllables, phonemes). Children appear to learn acoustic-phonological mappings rapidly, without direct instruction, yet the underlying developmental mechanisms remain unclear. Across different languages, a relationship between amplitude envelope sensitivity and phonological development has been found, suggesting that children may make use of amplitude modulation (AM) patterns within the envelope to develop a phonological system. Here we present the Spectral Amplitude Modulation Phase Hierarchy (S-AMPH) model, a set of algorithms for deriving the dominant AM patterns in child-directed speech (CDS). Using Principal Components Analysis, we show that rhythmic CDS contains an AM hierarchy comprising 3 core modulation timescales. These timescales correspond to key phonological units: prosodic stress (Stress AM, ~2 Hz), syllables (Syllable AM, ~5 Hz) and onset-rime units (Phoneme AM, ~20 Hz). We argue that these AM patterns could in principle be used by naïve listeners to compute acoustic-phonological mappings without lexical knowledge. We then demonstrate that the modulation statistics within this AM hierarchy indeed parse the speech signal into a primitive hierarchically-organised phonological system comprising stress feet (proto-words), syllables and onset-rime units. We apply the S-AMPH model to two other CDS corpora, one spontaneous and one deliberately-timed. The model accurately identified 72–82% (freely-read CDS) and 90–98% (rhythmically-regular CDS) stress patterns, syllables and onset-rime units. This in-principle demonstration that primitive phonology can be extracted from speech AMs is termed Acoustic-Emergent Phonology (AEP) theory. AEP theory provides a set of methods for examining how early phonological development is shaped by the temporal modulation structure of speech across

  11. Acoustic-Emergent Phonology in the Amplitude Envelope of Child-Directed Speech.

    PubMed

    Leong, Victoria; Goswami, Usha

    2015-01-01

    When acquiring language, young children may use acoustic spectro-temporal patterns in speech to derive phonological units in spoken language (e.g., prosodic stress patterns, syllables, phonemes). Children appear to learn acoustic-phonological mappings rapidly, without direct instruction, yet the underlying developmental mechanisms remain unclear. Across different languages, a relationship between amplitude envelope sensitivity and phonological development has been found, suggesting that children may make use of amplitude modulation (AM) patterns within the envelope to develop a phonological system. Here we present the Spectral Amplitude Modulation Phase Hierarchy (S-AMPH) model, a set of algorithms for deriving the dominant AM patterns in child-directed speech (CDS). Using Principal Components Analysis, we show that rhythmic CDS contains an AM hierarchy comprising 3 core modulation timescales. These timescales correspond to key phonological units: prosodic stress (Stress AM, ~2 Hz), syllables (Syllable AM, ~5 Hz) and onset-rime units (Phoneme AM, ~20 Hz). We argue that these AM patterns could in principle be used by naïve listeners to compute acoustic-phonological mappings without lexical knowledge. We then demonstrate that the modulation statistics within this AM hierarchy indeed parse the speech signal into a primitive hierarchically-organised phonological system comprising stress feet (proto-words), syllables and onset-rime units. We apply the S-AMPH model to two other CDS corpora, one spontaneous and one deliberately-timed. The model accurately identified 72-82% (freely-read CDS) and 90-98% (rhythmically-regular CDS) stress patterns, syllables and onset-rime units. This in-principle demonstration that primitive phonology can be extracted from speech AMs is termed Acoustic-Emergent Phonology (AEP) theory. AEP theory provides a set of methods for examining how early phonological development is shaped by the temporal modulation structure of speech across

  12. Method and apparatus for obtaining complete speech signals for speech recognition applications

    NASA Technical Reports Server (NTRS)

    Abrash, Victor (Inventor); Cesari, Federico (Inventor); Franco, Horacio (Inventor); George, Christopher (Inventor); Zheng, Jing (Inventor)

    2009-01-01

    The present invention relates to a method and apparatus for obtaining complete speech signals for speech recognition applications. In one embodiment, the method continuously records an audio stream comprising a sequence of frames to a circular buffer. When a user command to commence or terminate speech recognition is received, the method obtains a number of frames of the audio stream occurring before or after the user command in order to identify an augmented audio signal for speech recognition processing. In further embodiments, the method analyzes the augmented audio signal in order to locate starting and ending speech endpoints that bound at least a portion of speech to be processed for recognition. At least one of the speech endpoints is located using a Hidden Markov Model.

  13. Attentional modulation of informational masking on early cortical representations of speech signals.

    PubMed

    Zhang, Changxin; Arnott, Stephen R; Rabaglia, Cristina; Avivi-Reich, Meital; Qi, James; Wu, Xihong; Li, Liang; Schneider, Bruce A

    2016-01-01

    To recognize speech in a noisy auditory scene, listeners need to perceptually segregate the target talker's voice from other competing sounds (stream segregation). A number of studies have suggested that the attentional demands placed on listeners increase as the acoustic properties and informational content of the competing sounds become more similar to that of the target voice. Hence we would expect attentional demands to be considerably greater when speech is masked by speech than when it is masked by steady-state noise. To investigate the role of attentional mechanisms in the unmasking of speech sounds, event-related potentials (ERPs) were recorded to a syllable masked by noise or competing speech under both active (the participant was asked to respond when the syllable was presented) or passive (no response was required) listening conditions. The results showed that the long-latency auditory response to a syllable (/bi/), presented at different signal-to-masker ratios (SMRs), was similar in both passive and active listening conditions, when the masker was a steady-state noise. In contrast, a switch from the passive listening condition to the active one, when the masker was two-talker speech, significantly enhanced the ERPs to the syllable. These results support the hypothesis that the need to engage attentional mechanisms in aid of scene analysis increases as the similarity (both acoustic and informational) between the target speech and the competing background sounds increases.

  14. Acoustic landmarks drive delta-theta oscillations to enable speech comprehension by facilitating perceptual parsing.

    PubMed

    Doelling, Keith B; Arnal, Luc H; Ghitza, Oded; Poeppel, David

    2014-01-15

    A growing body of research suggests that intrinsic neuronal slow (<10 Hz) oscillations in auditory cortex appear to track incoming speech and other spectro-temporally complex auditory signals. Within this framework, several recent studies have identified critical-band temporal envelopes as the specific acoustic feature being reflected by the phase of these oscillations. However, how this alignment between speech acoustics and neural oscillations might underpin intelligibility is unclear. Here we test the hypothesis that the 'sharpness' of temporal fluctuations in the critical band envelope acts as a temporal cue to speech syllabic rate, driving delta-theta rhythms to track the stimulus and facilitate intelligibility. We interpret our findings as evidence that sharp events in the stimulus cause cortical rhythms to re-align and parse the stimulus into syllable-sized chunks for further decoding. Using magnetoencephalographic recordings, we show that by removing temporal fluctuations that occur at the syllabic rate, envelope-tracking activity is reduced. By artificially reinstating these temporal fluctuations, envelope-tracking activity is regained. These changes in tracking correlate with intelligibility of the stimulus. Together, the results suggest that the sharpness of fluctuations in the stimulus, as reflected in the cochlear output, drive oscillatory activity to track and entrain to the stimulus, at its syllabic rate. This process likely facilitates parsing of the stimulus into meaningful chunks appropriate for subsequent decoding, enhancing perception and intelligibility.

  15. Acoustic landmarks drive delta-theta oscillations to enable speech comprehension by facilitating perceptual parsing

    PubMed Central

    Doelling, Keith; Arnal, Luc; Ghitza, Oded; Poeppel, David

    2013-01-01

    A growing body of research suggests that intrinsic neuronal slow (< 10 Hz) oscillations in auditory cortex appear to track incoming speech and other spectro-temporally complex auditory signals. Within this framework, several recent studies have identified critical-band temporal envelopes as the specific acoustic feature being reflected by the phase of these oscillations. However, how this alignment between speech acoustics and neural oscillations might underpin intelligibility is unclear. Here we test the hypothesis that the ‘sharpness’ of temporal fluctuations in the critical band envelope acts as a temporal cue to speech syllabic rate, driving delta-theta rhythms to track the stimulus and facilitate intelligibility. We interpret our findings as evidence that sharp events in the stimulus cause cortical rhythms to re-align and parse the stimulus into syllable-sized chunks for further decoding. Using magnetoencephalographic recordings, we show that by removing temporal fluctuations that occur at the syllabic rate, envelope-tracking activity is reduced. By artificially reinstating these temporal fluctuations, envelope-tracking activity is regained. These changes in tracking correlate with intelligibility of the stimulus. Together, the results suggest that the sharpness of fluctuations in the stimulus, as reflected in the cochlear output, drive oscillatory activity to track and entrain to the stimulus, at its syllabic rate. This process likely facilitates parsing of the stimulus into meaningful chunks appropriate for subsequent decoding, enhancing perception and intelligibility. PMID:23791839

  16. Moving to the Speed of Sound: Context Modulation of the Effect of Acoustic Properties of Speech

    ERIC Educational Resources Information Center

    Shintel, Hadas; Nusbaum, Howard C.

    2008-01-01

    Suprasegmental acoustic patterns in speech can convey meaningful information and affect listeners' interpretation in various ways, including through systematic analog mapping of message-relevant information onto prosody. We examined whether the effect of analog acoustic variation is governed by the acoustic properties themselves. For example, fast…

  17. Mandarin Speech Perception in Combined Electric and Acoustic Stimulation

    PubMed Central

    Li, Yongxin; Zhang, Guoping; Galvin, John J.; Fu, Qian-Jie

    2014-01-01

    For deaf individuals with residual low-frequency acoustic hearing, combined use of a cochlear implant (CI) and hearing aid (HA) typically provides better speech understanding than with either device alone. Because of coarse spectral resolution, CIs do not provide fundamental frequency (F0) information that contributes to understanding of tonal languages such as Mandarin Chinese. The HA can provide good representation of F0 and, depending on the range of aided acoustic hearing, first and second formant (F1 and F2) information. In this study, Mandarin tone, vowel, and consonant recognition in quiet and noise was measured in 12 adult Mandarin-speaking bimodal listeners with the CI-only and with the CI+HA. Tone recognition was significantly better with the CI+HA in noise, but not in quiet. Vowel recognition was significantly better with the CI+HA in quiet, but not in noise. There was no significant difference in consonant recognition between the CI-only and the CI+HA in quiet or in noise. There was a wide range in bimodal benefit, with improvements often greater than 20 percentage points in some tests and conditions. The bimodal benefit was compared to CI subjects’ HA-aided pure-tone average (PTA) thresholds between 250 and 2000 Hz; subjects were divided into two groups: “better” PTA (<50 dB HL) or “poorer” PTA (>50 dB HL). The bimodal benefit differed significantly between groups only for consonant recognition. The bimodal benefit for tone recognition in quiet was significantly correlated with CI experience, suggesting that bimodal CI users learn to better combine low-frequency spectro-temporal information from acoustic hearing with temporal envelope information from electric hearing. Given the small number of subjects in this study (n = 12), further research with Chinese bimodal listeners may provide more information regarding the contribution of acoustic and electric hearing to tonal language perception. PMID:25386962

  18. Acoustical Characteristics of Mastication Sounds: Application of Speech Analysis Techniques

    NASA Astrophysics Data System (ADS)

    Brochetti, Denise

    Food scientists have used acoustical methods to study characteristics of mastication sounds in relation to food texture. However, a model for analysis of the sounds has not been identified, and reliability of the methods has not been reported. Therefore, speech analysis techniques were applied to mastication sounds, and variation in measures of the sounds was examined. To meet these objectives, two experiments were conducted. In the first experiment, a digital sound spectrograph generated waveforms and wideband spectrograms of sounds by 3 adult subjects (1 male, 2 females) for initial chews of food samples differing in hardness and fracturability. Acoustical characteristics were described and compared. For all sounds, formants appeared in the spectrograms, and energy occurred across a 0 to 8000-Hz range of frequencies. Bursts characterized waveforms for peanut, almond, raw carrot, ginger snap, and hard candy. Duration and amplitude of the sounds varied with the subjects. In the second experiment, the spectrograph was used to measure the duration, amplitude, and formants of sounds for the initial 2 chews of cylindrical food samples (raw carrot, teething toast) differing in diameter (1.27, 1.90, 2.54 cm). Six adult subjects (3 males, 3 females) having normal occlusions and temporomandibular joints chewed the samples between the molar teeth and with the mouth open. Ten repetitions per subject were examined for each food sample. Analysis of estimates of variation indicated an inconsistent intrasubject variation in the acoustical measures. Food type and sample diameter also affected the estimates, indicating the variable nature of mastication. Generally, intrasubject variation was greater than intersubject variation. Analysis of ranks of the data indicated that the effect of sample diameter on the acoustical measures was inconsistent and depended on the subject and type of food. If inferences are to be made concerning food texture from acoustical measures of mastication

  19. Negative blood oxygen level dependent signals during speech comprehension.

    PubMed

    Rodriguez Moreno, Diana; Schiff, Nicholas D; Hirsch, Joy

    2015-05-01

    Speech comprehension studies have generally focused on the isolation and function of regions with positive blood oxygen level dependent (BOLD) signals with respect to a resting baseline. Although regions with negative BOLD signals in comparison to a resting baseline have been reported in language-related tasks, their relationship to regions of positive signals is not fully appreciated. Based on the emerging notion that the negative signals may represent an active function in language tasks, the authors test the hypothesis that negative BOLD signals during receptive language are more associated with comprehension than content-free versions of the same stimuli. Regions associated with comprehension of speech were isolated by comparing responses to passive listening to natural speech to two incomprehensible versions of the same speech: one that was digitally time reversed and one that was muffled by removal of high frequencies. The signal polarity was determined by comparing the BOLD signal during each speech condition to the BOLD signal during a resting baseline. As expected, stimulation-induced positive signals relative to resting baseline were observed in the canonical language areas with varying signal amplitudes for each condition. Negative BOLD responses relative to resting baseline were observed primarily in frontoparietal regions and were specific to the natural speech condition. However, the BOLD signal remained indistinguishable from baseline for the unintelligible speech conditions. Variations in connectivity between brain regions with positive and negative signals were also specifically related to the comprehension of natural speech. These observations of anticorrelated signals related to speech comprehension are consistent with emerging models of cooperative roles represented by BOLD signals of opposite polarity.

  20. How stable are acoustic metrics of contrastive speech rhythm?

    PubMed

    Wiget, Lukas; White, Laurence; Schuppler, Barbara; Grenon, Izabelle; Rauch, Olesya; Mattys, Sven L

    2010-03-01

    Acoustic metrics of contrastive speech rhythm, based on vocalic and intervocalic interval durations, are intended to capture stable typological differences between languages. They should consequently be robust to variation between speakers, sentence materials, and measurers. This paper assesses the impact of these sources of variation on the metrics %V (proportion of utterance comprised of vocalic intervals), VarcoV (rate-normalized standard deviation of vocalic interval duration), and nPVI-V (a measure of the durational variability between successive pairs of vocalic intervals). Five measurers analyzed the same corpus of speech: five sentences read by six speakers of Standard Southern British English. Differences between sentences were responsible for the greatest variation in rhythm scores. Inter-speaker differences were also a source of significant variability. However, there was relatively little variation due to segmentation differences between measurers following an agreed protocol. An automated phone alignment process was also used: Rhythm scores thus derived showed good agreement with the human measurers. A number of recommendations for researchers wishing to exploit contrastive rhythm metrics are offered in conclusion. PMID:20329856

  1. Discrimination of Speech Stimuli Based on Neuronal Response Phase Patterns Depends on Acoustics But Not Comprehension

    PubMed Central

    Poeppel, David

    2010-01-01

    Speech stimuli give rise to neural activity in the listener that can be observed as waveforms using magnetoencephalography. Although waveforms vary greatly from trial to trial due to activity unrelated to the stimulus, it has been demonstrated that spoken sentences can be discriminated based on theta-band (3–7 Hz) phase patterns in single-trial response waveforms. Furthermore, manipulations of the speech signal envelope and fine structure that reduced intelligibility were found to produce correlated reductions in discrimination performance, suggesting a relationship between theta-band phase patterns and speech comprehension. This study investigates the nature of this relationship, hypothesizing that theta-band phase patterns primarily reflect cortical processing of low-frequency (<40 Hz) modulations present in the acoustic signal and required for intelligibility, rather than processing exclusively related to comprehension (e.g., lexical, syntactic, semantic). Using stimuli that are quite similar to normal spoken sentences in terms of low-frequency modulation characteristics but are unintelligible (i.e., their time-inverted counterparts), we find that discrimination performance based on theta-band phase patterns is equal for both types of stimuli. Consistent with earlier findings, we also observe that whereas theta-band phase patterns differ across stimuli, power patterns do not. We use a simulation model of the single-trial response to spoken sentence stimuli to demonstrate that phase-locked responses to low-frequency modulations of the acoustic signal can account not only for the phase but also for the power results. The simulation offers insight into the interpretation of the empirical results with respect to phase-resetting and power-enhancement models of the evoked response. PMID:20484530

  2. Acoustic characteristics of Korean stops in Korean child-directed speech

    NASA Astrophysics Data System (ADS)

    Kim, Minjung; Stoel-Gammon, Carol

    2005-04-01

    A variety of cross-linguistic studies have documented that the acoustic properties of speech addressed to young children include exaggeration of pitch contours and acoustically salient features of phonetic units. It has been suggested that phonetic modifications of child-directed speech facilitate young children's speech perception by providing detailed phonetic information about the target word. While there are several studies reporting vowel modifications in speech to infants (i.e., hyper-articulated vowels), there has been relatively little research about consonant modifications in speech to young children (except for VOT). The present study examines acoustic properties of Korean stops in Korean mothers' speech to their children aged 29 to 38 months (N=6). Korean tense, lax, and aspirated stops are all voiceless in word-initial position, and are perceptually differentiated by several acoustic parameters including VOT, f0 of the following vowel, and the amplitude difference of the first and second harmonics at the voice onset of the following vowel. This study compares values of these parameters in Korean motherese to those in speech to adult Koreans from same speakers. Results focus on the acoustic properties of Korean stops in child-directed speech and how they are modified to help Korean young children learn the three-way phonetic contrast.

  3. Generation of Acoustic Signals from Buried Explosions

    NASA Astrophysics Data System (ADS)

    Bonner, J. L.; Reinke, R.; Waxler, R.; Lenox, E. A.

    2012-12-01

    Buried explosions generate both seismic and acoustic signals. The mechanism for the acoustic generation is generally assumed to be large ground motions above the source region that cause atmospheric pressure disturbances which can propagate locally or regionally depending on source size and weather conditions. In order to better understand the factors that control acoustic generation from buried explosions, we conducted a series of 200 lb explosions detonated in and above the dry alluvium and limestones of Kirtland AFB, New Mexico. In this experiment, nicknamed HUMBLE REDWOOD III, we detonated charges at heights of burst of 2 m (no crater) and depths of burst of 7 m (fully confined). The seismic and acoustic signals were recorded on a network of near-source (< 90 m) co-located accelerometer and overpressure sensors, circular rings of acoustic sensors at 0.3 and 1 km, and multiple seismic and infrasound sensors at local-to-regional distances. Near-source acoustic signals for the 200 lb buried explosion in limestone show an impulsive, short-duration (0.04 s) initial peak, followed by a broad duration (0.3 s) negative pressure trough, and finally a second positive pulse (0.18 s duration). The entire width of the acoustic signal generated by this small buried explosion is 0.5 s and results in a 2 Hz peak in spectral energy. High-velocity wind conditions quickly attenuate the signal with few observations beyond 1 km. We have attempted to model these acoustic waveforms by estimating near-source ground motion using synthetic spall seismograms. Spall seismograms have similar characteristics as the observed acoustics and usually include an initial positive motion P wave, followed by -1 g acceleration due to the ballistic free fall of the near surface rock units, and ends with positive accelerations due to "slapdown" of the material. Spall seismograms were synthesized using emplacement media parameters and high-speed video observations of the surface movements. We present a

  4. Tuned to the Signal: The Privileged Status of Speech for Young Infants

    ERIC Educational Resources Information Center

    Vouloumanos, Athena; Werker, Janet F.

    2004-01-01

    Do young infants treat speech as a special signal, compared with structurally similar non-speech sounds? We presented 2- to 7-month-old infants with nonsense speech sounds and complex non-speech analogues. The non-speech analogues retain many of the spectral and temporal properties of the speech signal, including the pitch contour information…

  5. Capacity of voiceband channel with speech signal interference

    NASA Astrophysics Data System (ADS)

    Wulich, D.; Goldfeld, L.

    1994-08-01

    An estimation of the capacity of a voiceband channel with speech signal interference and background Gaussian white noise has been made. The solution is based on the fact that over a time interval of tens of milliseconds the speech signal can be considered as a stationary Gaussian process. In such a model the total interference is nonwhite but Gaussian, a situation for which the capacity can be found according to the formulas given in classical literature. The results are important where the voice signal acts as an interference, for example the crosstalk problem in telephone lines or data over voice (DOV) systems where the speech is transmitted simultaneously with the digitally modulated signal.

  6. DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1

    NASA Astrophysics Data System (ADS)

    Garofolo, J. S.; Lamel, L. F.; Fisher, W. M.; Fiscus, J. G.; Pallett, D. S.

    1993-02-01

    The Texas Instruments/Massachusetts Institute of Technology (TIMIT) corpus of read speech has been designed to provide speech data for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of automatic speech recognition systems. TIMIT contains speech from 630 speakers representing 8 major dialect divisions of American English, each speaking 10 phonetically-rich sentences. The TIMIT corpus includes time-aligned orthographic, phonetic, and word transcriptions, as well as speech waveform data for each spoken sentence. The release of TIMIT contains several improvements over the Prototype CD-ROM released in December, 1988: (1) full 630-speaker corpus, (2) checked and corrected transcriptions, (3) word-alignment transcriptions, (4) NIST SPHERE-headered waveform files and header manipulation software, (5) phonemic dictionary, (6) new test and training subsets balanced for dialectal and phonetic coverage, and (7) more extensive documentation.

  7. Speech masking and cancelling and voice obscuration

    DOEpatents

    Holzrichter, John F.

    2013-09-10

    A non-acoustic sensor is used to measure a user's speech and then broadcasts an obscuring acoustic signal diminishing the user's vocal acoustic output intensity and/or distorting the voice sounds making them unintelligible to persons nearby. The non-acoustic sensor is positioned proximate or contacting a user's neck or head skin tissue for sensing speech production information.

  8. Formant Centralization Ratio: A Proposal for a New Acoustic Measure of Dysarthric Speech

    ERIC Educational Resources Information Center

    Sapir, Shimon; Ramig, Lorraine O.; Spielman, Jennifer L.; Fox, Cynthia

    2010-01-01

    Purpose: The vowel space area (VSA) has been used as an acoustic metric of dysarthric speech, but with varying degrees of success. In this study, the authors aimed to test an alternative metric to the VSA--the "formant centralization ratio" (FCR), which is hypothesized to more effectively differentiate dysarthric from healthy speech and register…

  9. Effects of Training on the Acoustic-Phonetic Representation of Synthetic Speech

    ERIC Educational Resources Information Center

    Francis, Alexander L.; Nusbaum, Howard C.; Fenn, Kimberly

    2007-01-01

    Purpose: Investigate training-related changes in acoustic-phonetic representation of consonants produced by a text-to-speech (TTS) computer speech synthesizer. Method: Forty-eight adult listeners were trained to better recognize words produced by a TTS system. Nine additional untrained participants served as controls. Before and after training,…

  10. Dimensional analysis of acoustically propagated signals

    NASA Technical Reports Server (NTRS)

    Hansen, Scott D.; Thomson, Dennis W.

    1993-01-01

    Traditionally, long term measurements of atmospherically propagated sound signals have consisted of time series of multiminute averages. Only recently have continuous measurements with temporal resolution corresponding to turbulent time scales been available. With modern digital data acquisition systems we now have the capability to simultaneously record both acoustical and meteorological parameters with sufficient temporal resolution to allow us to examine in detail relationships between fluctuating sound and the meteorological variables, particularly wind and temperature, which locally determine the acoustic refractive index. The atmospheric acoustic propagation medium can be treated as a nonlinear dynamical system, a kind of signal processor whose innards depend on thermodynamic and turbulent processes in the atmosphere. The atmosphere is an inherently nonlinear dynamical system. In fact one simple model of atmospheric convection, the Lorenz system, may well be the most widely studied of all dynamical systems. In this paper we report some results of our having applied methods used to characterize nonlinear dynamical systems to study the characteristics of acoustical signals propagated through the atmosphere. For example, we investigate whether or not it is possible to parameterize signal fluctuations in terms of fractal dimensions. For time series one such parameter is the limit capacity dimension. Nicolis and Nicolis were among the first to use the kind of methods we have to study the properties of low dimension global attractors.

  11. Analysis of Acoustic Features in Speakers with Cognitive Disorders and Speech Impairments

    NASA Astrophysics Data System (ADS)

    Saz, Oscar; Simón, Javier; Rodríguez, W. Ricardo; Lleida, Eduardo; Vaquero, Carlos

    2009-12-01

    This work presents the results in the analysis of the acoustic features (formants and the three suprasegmental features: tone, intensity and duration) of the vowel production in a group of 14 young speakers suffering different kinds of speech impairments due to physical and cognitive disorders. A corpus with unimpaired children's speech is used to determine the reference values for these features in speakers without any kind of speech impairment within the same domain of the impaired speakers; this is 57 isolated words. The signal processing to extract the formant and pitch values is based on a Linear Prediction Coefficients (LPCs) analysis of the segments considered as vowels in a Hidden Markov Model (HMM) based Viterbi forced alignment. Intensity and duration are also based in the outcome of the automated segmentation. As main conclusion of the work, it is shown that intelligibility of the vowel production is lowered in impaired speakers even when the vowel is perceived as correct by human labelers. The decrease in intelligibility is due to a 30% of increase in confusability in the formants map, a reduction of 50% in the discriminative power in energy between stressed and unstressed vowels and to a 50% increase of the standard deviation in the length of the vowels. On the other hand, impaired speakers keep good control of tone in the production of stressed and unstressed vowels.

  12. Speech intelligibility in rooms: Effect of prior listening exposure interacts with room acoustics.

    PubMed

    Zahorik, Pavel; Brandewie, Eugene J

    2016-07-01

    There is now converging evidence that a brief period of prior listening exposure to a reverberant room can influence speech understanding in that environment. Although the effect appears to depend critically on the amplitude modulation characteristic of the speech signal reaching the ear, the extent to which the effect may be influenced by room acoustics has not been thoroughly evaluated. This study seeks to fill this gap in knowledge by testing the effect of prior listening exposure or listening context on speech understanding in five different simulated sound fields, ranging from anechoic space to a room with broadband reverberation time (T60) of approximately 3 s. Although substantial individual variability in the effect was observed and quantified, the context effect was, on average, strongly room dependent. At threshold, the effect was minimal in anechoic space, increased to a maximum of 3 dB on average in moderate reverberation (T60 = 1 s), and returned to minimal levels again in high reverberation. This interaction suggests that the functional effects of prior listening exposure may be limited to sound fields with moderate reverberation (0.4 ≤ T60 ≤ 1 s). PMID:27475133

  13. Physical properties of modification of speech signal fragments

    NASA Astrophysics Data System (ADS)

    Gusev, Mikhail N.

    2004-04-01

    The methods used for modification of separate speech signals fragments in the process of speech synthesis by arbitrary text are described in this report. Three groups of sounds differ in the modification methods of frequency characteristics. Two groups of sounds differ in that they need different methods of duration changes. To modify the samples of a speaker's voice by the methods used it is necessary to make pre-marking, so called segementation. As variable speech fragments, the allophones are taken. The modification methods described allow form arbitrary speech successions in the wide intonation diapason on the basis of limited amount of the speaker's voice patterns.

  14. Multilevel Analysis in Analyzing Speech Data

    ERIC Educational Resources Information Center

    Guddattu, Vasudeva; Krishna, Y.

    2011-01-01

    The speech produced by human vocal tract is a complex acoustic signal, with diverse applications in phonetics, speech synthesis, automatic speech recognition, speaker identification, communication aids, speech pathology, speech perception, machine translation, hearing research, rehabilitation and assessment of communication disorders and many…

  15. Speech perception of sine-wave signals by children with cochlear implants

    PubMed Central

    Nittrouer, Susan; Kuess, Jamie; Lowenstein, Joanna H.

    2015-01-01

    Children need to discover linguistically meaningful structures in the acoustic speech signal. Being attentive to recurring, time-varying formant patterns helps in that process. However, that kind of acoustic structure may not be available to children with cochlear implants (CIs), thus hindering development. The major goal of this study was to examine whether children with CIs are as sensitive to time-varying formant structure as children with normal hearing (NH) by asking them to recognize sine-wave speech. The same materials were presented as speech in noise, as well, to evaluate whether any group differences might simply reflect general perceptual deficits on the part of children with CIs. Vocabulary knowledge, phonemic awareness, and “top-down” language effects were all also assessed. Finally, treatment factors were examined as possible predictors of outcomes. Results showed that children with CIs were as accurate as children with NH at recognizing sine-wave speech, but poorer at recognizing speech in noise. Phonemic awareness was related to that recognition. Top-down effects were similar across groups. Having had a period of bimodal stimulation near the time of receiving a first CI facilitated these effects. Results suggest that children with CIs have access to the important time-varying structure of vocal-tract formants. PMID:25994709

  16. The Use of Artificial Neural Networks to Estimate Speech Intelligibility from Acoustic Variables: A Preliminary Analysis.

    ERIC Educational Resources Information Center

    Metz, Dale Evan; And Others

    1992-01-01

    A preliminary scheme for estimating the speech intelligibility of hearing-impaired speakers from acoustic parameters, using a computerized artificial neural network to process mathematically the acoustic input variables, is outlined. Tests with 60 hearing-impaired speakers found the scheme to be highly accurate in identifying speakers separated by…

  17. Sine-wave and noise-vocoded sine-wave speech in a tone language: Acoustic details matter.

    PubMed

    Rosen, Stuart; Hui, Sze Ngar Catherine

    2015-12-01

    Sine-wave speech (SWS) is a highly simplified version of speech consisting only of frequency- and amplitude-modulated sinusoids representing the formants. That listeners can successfully understand SWS has led to claims that speech perception must be based on abstract properties of the stimuli far removed from their specific acoustic form. Here it is shown, in bilingual Cantonese/English listeners, that performance with Cantonese SWS is improved by noise vocoding, with no effect on English SWS utterances. This manipulation preserves the abstract informational structure in the signals but changes its surface form. The differential effects of noise vocoding likely arise from the fact that Cantonese is a tonal language and hence more reliant on fundamental frequency (F0) contours for its intelligibility. SWS does not preserve tonal information from the original speech but does have false tonal information signalled by the lowest frequency sinusoid. Noise vocoding SWS appears to minimise the tonal percept, which thus interferes less in the perception of Cantonese. It has no effect in English, which is minimally reliant on F0 variations for intelligibility. Therefore it is not only the informational structure of a sound that is important but also how its acoustic detail interacts with the phonological structure of a given language.

  18. Sine-wave and noise-vocoded sine-wave speech in a tone language: Acoustic details matter.

    PubMed

    Rosen, Stuart; Hui, Sze Ngar Catherine

    2015-12-01

    Sine-wave speech (SWS) is a highly simplified version of speech consisting only of frequency- and amplitude-modulated sinusoids representing the formants. That listeners can successfully understand SWS has led to claims that speech perception must be based on abstract properties of the stimuli far removed from their specific acoustic form. Here it is shown, in bilingual Cantonese/English listeners, that performance with Cantonese SWS is improved by noise vocoding, with no effect on English SWS utterances. This manipulation preserves the abstract informational structure in the signals but changes its surface form. The differential effects of noise vocoding likely arise from the fact that Cantonese is a tonal language and hence more reliant on fundamental frequency (F0) contours for its intelligibility. SWS does not preserve tonal information from the original speech but does have false tonal information signalled by the lowest frequency sinusoid. Noise vocoding SWS appears to minimise the tonal percept, which thus interferes less in the perception of Cantonese. It has no effect in English, which is minimally reliant on F0 variations for intelligibility. Therefore it is not only the informational structure of a sound that is important but also how its acoustic detail interacts with the phonological structure of a given language. PMID:26723325

  19. Study on adaptive compressed sensing & reconstruction of quantized speech signals

    NASA Astrophysics Data System (ADS)

    Yunyun, Ji; Zhen, Yang

    2012-12-01

    Compressed sensing (CS) is a rising focus in recent years for its simultaneous sampling and compression of sparse signals. Speech signals can be considered approximately sparse or compressible in some domains for natural characteristics. Thus, it has great prospect to apply compressed sensing to speech signals. This paper is involved in three aspects. Firstly, the sparsity and sparsifying matrix for speech signals are analyzed. Simultaneously, a kind of adaptive sparsifying matrix based on the long-term prediction of voiced speech signals is constructed. Secondly, a CS matrix called two-block diagonal (TBD) matrix is constructed for speech signals based on the existing block diagonal matrix theory to find out that its performance is empirically superior to that of the dense Gaussian random matrix when the sparsifying matrix is the DCT basis. Finally, we consider the quantization effect on the projections. Two corollaries about the impact of the adaptive quantization and nonadaptive quantization on reconstruction performance with two different matrices, the TBD matrix and the dense Gaussian random matrix, are derived. We find that the adaptive quantization and the TBD matrix are two effective ways to mitigate the quantization effect on reconstruction of speech signals in the framework of CS.

  20. A maximum likelihood approach to estimating articulator positions from speech acoustics

    SciTech Connect

    Hogden, J.

    1996-09-23

    This proposal presents an algorithm called maximum likelihood continuity mapping (MALCOM) which recovers the positions of the tongue, jaw, lips, and other speech articulators from measurements of the sound-pressure waveform of speech. MALCOM differs from other techniques for recovering articulator positions from speech in three critical respects: it does not require training on measured or modeled articulator positions, it does not rely on any particular model of sound propagation through the vocal tract, and it recovers a mapping from acoustics to articulator positions that is linearly, not topographically, related to the actual mapping from acoustics to articulation. The approach categorizes short-time windows of speech into a finite number of sound types, and assumes the probability of using any articulator position to produce a given sound type can be described by a parameterized probability density function. MALCOM then uses maximum likelihood estimation techniques to: (1) find the most likely smooth articulator path given a speech sample and a set of distribution functions (one distribution function for each sound type), and (2) change the parameters of the distribution functions to better account for the data. Using this technique improves the accuracy of articulator position estimates compared to continuity mapping -- the only other technique that learns the relationship between acoustics and articulation solely from acoustics. The technique has potential application to computer speech recognition, speech synthesis and coding, teaching the hearing impaired to speak, improving foreign language instruction, and teaching dyslexics to read. 34 refs., 7 figs.

  1. Language-specific developmental differences in speech production: a cross-language acoustic study.

    PubMed

    Li, Fangfang

    2012-01-01

    Speech productions of 40 English- and 40 Japanese-speaking children (aged 2-5) were examined and compared with the speech produced by 20 adult speakers (10 speakers per language). Participants were recorded while repeating words that began with "s" and "sh" sounds. Clear language-specific patterns in adults' speech were found, with English speakers differentiating "s" and "sh" in 1 acoustic dimension (i.e., spectral mean) and Japanese speakers differentiating the 2 categories in 3 acoustic dimensions (i.e., spectral mean, standard deviation, and onset F2 frequency). For both language groups, children's speech exhibited a gradual change from an early undifferentiated form to later differentiated categories. The separation processes, however, only occur in those acoustic dimensions used by adults in the corresponding languages.

  2. Tracheal activity recognition based on acoustic signals.

    PubMed

    Olubanjo, Temiloluwa; Ghovanloo, Maysam

    2014-01-01

    Tracheal activity recognition can play an important role in continuous health monitoring for wearable systems and facilitate the advancement of personalized healthcare. Neck-worn systems provide access to a unique set of health-related data that other wearable devices simply cannot obtain. Activities including breathing, chewing, clearing the throat, coughing, swallowing, speech and even heartbeat can be recorded from around the neck. In this paper, we explore tracheal activity recognition using a combination of promising acoustic features from related work and apply simplistic classifiers including K-NN and Naive Bayes. For wearable systems in which low power consumption is of primary concern, we show that with a sub-optimal sampling rate of 16 kHz, we have achieved average classification results in the range of 86.6% to 87.4% using 1-NN, 3-NN, 5-NN and Naive Bayes. All classifiers obtained the highest recognition rate in the range of 97.2% to 99.4% for speech classification. This is promising to mitigate privacy concerns associated with wearable systems interfering with the user's conversations.

  3. Language Comprehension in Language-Learning Impaired Children Improved with Acoustically Modified Speech

    NASA Astrophysics Data System (ADS)

    Tallal, Paula; Miller, Steve L.; Bedi, Gail; Byma, Gary; Wang, Xiaoqin; Nagarajan, Srikantan S.; Schreiner, Christoph; Jenkins, William M.; Merzenich, Michael M.

    1996-01-01

    A speech processing algorithm was developed to create more salient versions of the rapidly changing elements in the acoustic waveform of speech that have been shown to be deficiently processed by language-learning impaired (LLI) children. LLI children received extensive daily training, over a 4-week period, with listening exercises in which all speech was translated into this synthetic form. They also received daily training with computer "games" designed to adaptively drive improvements in temporal processing thresholds. Significant improvements in speech discrimination and language comprehension abilities were demonstrated in two independent groups of LLI children.

  4. Effect of signal to noise ratio on the speech perception ability of older adults

    PubMed Central

    Shojaei, Elahe; Ashayeri, Hassan; Jafari, Zahra; Zarrin Dast, Mohammad Reza; Kamali, Koorosh

    2016-01-01

    Background: Speech perception ability depends on auditory and extra-auditory elements. The signal- to-noise ratio (SNR) is an extra-auditory element that has an effect on the ability to normally follow speech and maintain a conversation. Speech in noise perception difficulty is a common complaint of the elderly. In this study, the importance of SNR magnitude as an extra-auditory effect on speech perception in noise was examined in the elderly. Methods: The speech perception in noise test (SPIN) was conducted on 25 elderly participants who had bilateral low–mid frequency normal hearing thresholds at three SNRs in the presence of ipsilateral white noise. These participants were selected by available sampling method. Cognitive screening was done using the Persian Mini Mental State Examination (MMSE) test. Results: Independent T- test, ANNOVA and Pearson Correlation Index were used for statistical analysis. There was a significant difference in word discrimination scores at silence and at three SNRs in both ears (p≤0.047). Moreover, there was a significant difference in word discrimination scores for paired SNRs (0 and +5, 0 and +10, and +5 and +10 (p≤0.04)). No significant correlation was found between age and word recognition scores at silence and at three SNRs in both ears (p≥0.386). Conclusion: Our results revealed that decreasing the signal level and increasing the competing noise considerably reduced the speech perception ability in normal hearing at low–mid thresholds in the elderly. These results support the critical role of SNRs for speech perception ability in the elderly. Furthermore, our results revealed that normal hearing elderly participants required compensatory strategies to maintain normal speech perception in challenging acoustic situations. PMID:27390712

  5. A Hybrid Acoustic and Pronunciation Model Adaptation Approach for Non-native Speech Recognition

    NASA Astrophysics Data System (ADS)

    Oh, Yoo Rhee; Kim, Hong Kook

    In this paper, we propose a hybrid model adaptation approach in which pronunciation and acoustic models are adapted by incorporating the pronunciation and acoustic variabilities of non-native speech in order to improve the performance of non-native automatic speech recognition (ASR). Specifically, the proposed hybrid model adaptation can be performed at either the state-tying or triphone-modeling level, depending at which acoustic model adaptation is performed. In both methods, we first analyze the pronunciation variant rules of non-native speakers and then classify each rule as either a pronunciation variant or an acoustic variant. The state-tying level hybrid method then adapts pronunciation models and acoustic models by accommodating the pronunciation variants in the pronunciation dictionary and by clustering the states of triphone acoustic models using the acoustic variants, respectively. On the other hand, the triphone-modeling level hybrid method initially adapts pronunciation models in the same way as in the state-tying level hybrid method; however, for the acoustic model adaptation, the triphone acoustic models are then re-estimated based on the adapted pronunciation models and the states of the re-estimated triphone acoustic models are clustered using the acoustic variants. From the Korean-spoken English speech recognition experiments, it is shown that ASR systems employing the state-tying and triphone-modeling level adaptation methods can relatively reduce the average word error rates (WERs) by 17.1% and 22.1% for non-native speech, respectively, when compared to a baseline ASR system.

  6. The acoustics for speech of eight auditoriums in the city of Sao Paulo

    NASA Astrophysics Data System (ADS)

    Bistafa, Sylvio R.

    2002-11-01

    Eight auditoriums with a proscenium type of stage, which usually operate as dramatic theaters in the city of Sao Paulo, were acoustically surveyed in terms of their adequacy to unassisted speech. Reverberation times, early decay times, and speech levels were measured in different positions, together with objective measures of speech intelligibility. The measurements revealed reverberation time values rather uniform throughout the rooms, whereas significant variations were found in the values of the other acoustical measures with position. The early decay time was found to be better correlated with the objective measures of speech intelligibility than the reverberation time. The results from the objective measurements of speech intelligibility revealed that the speech transmission index STI, and its simplified version RaSTI, are strongly correlated with the early-to-late sound ratio C50 (1 kHz). However, it was found that the criterion value of acceptability of the latter is more easily met than the former. The results from these measurements enable to understand how the characteristics of the architectural design determine the acoustical quality for speech. Measurements of ST1-Gade were made as an attempt to validate it as an objective measure of ''support'' for the actor. The preliminary diagnosing results with ray tracing simulations will also be presented.

  7. A magnetic resonance imaging study on the articulatory and acoustic speech parameters of Malay vowels

    PubMed Central

    2014-01-01

    The phonetic properties of six Malay vowels are investigated using magnetic resonance imaging (MRI) to visualize the vocal tract in order to obtain dynamic articulatory parameters during speech production. To resolve image blurring due to the tongue movement during the scanning process, a method based on active contour extraction is used to track tongue contours. The proposed method efficiently tracks tongue contours despite the partial blurring of MRI images. Consequently, the articulatory parameters that are effectively measured as tongue movement is observed, and the specific shape of the tongue and its position for all six uttered Malay vowels are determined. Speech rehabilitation procedure demands some kind of visual perceivable prototype of speech articulation. To investigate the validity of the measured articulatory parameters based on acoustic theory of speech production, an acoustic analysis based on the uttered vowels by subjects has been performed. As the acoustic speech and articulatory parameters of uttered speech were examined, a correlation between formant frequencies and articulatory parameters was observed. The experiments reported a positive correlation between the constriction location of the tongue body and the first formant frequency, as well as a negative correlation between the constriction location of the tongue tip and the second formant frequency. The results demonstrate that the proposed method is an effective tool for the dynamic study of speech production. PMID:25060583

  8. Identifying Potential Noise Sources within Acoustic Signals

    NASA Astrophysics Data System (ADS)

    Holcomb, Victoria; Lewalle, Jacques

    2013-11-01

    We test a new algorithm for its ability to detect sources of noise within random background. The goal of these tests is to better understand how to identify sources within acoustic signals while simultaneously determining the strengths and weaknesses of the algorithm in question. Unlike previously published algorithms, the antenna method does not pinpoint events by looking for the most energetic portions of a signal. The algorithm searches for the ideal lag combinations between three signals by taking excerpts of possible events. The excerpt with the lowest calculated minimum distance between possible events is how the algorithm identifies sources. At the minimum distance, the events are close in time and frequency. This method can be compared to the cross correlation and denoising methods to better understand its effectiveness. This work is supported in part by Spectral Energies LLC, under an SBIR grant from AFRL, as well as the Syracuse University MAE department.

  9. Evaluation of acoustical conditions for speech communication in working elementary school classrooms.

    PubMed

    Sato, Hiroshi; Bradley, John S

    2008-04-01

    Detailed acoustical measurements were made in 41 working elementary school classrooms near Ottawa, Canada to obtain more representative and more accurate indications of the acoustical quality of conditions for speech communication during actual teaching activities. This paper describes the room acoustics characteristics and noise environment of 27 traditional rectangular classrooms from the 41 measured rooms. The purpose of the work was to better understand how to improve speech communication between teachers and students. The study found, that on average, the students experienced: teacher speech levels of 60.4 dB A, noise levels of 49.1 dB A, and a mean speech-to-noise ratio of 11 dB A during teaching activities. The mean reverberation time in the occupied classrooms was 0.41 s, which was 10% less than in the unoccupied rooms. The reverberation time measurements were used to determine the average absorption added by each student. Detailed analyses of early and late-arriving speech sounds showed these sound levels could be predicted quite accurately and suggest improved approaches to room acoustics design.

  10. Vowel Acoustics in Adults with Apraxia of Speech

    ERIC Educational Resources Information Center

    Jacks, Adam; Mathes, Katey A.; Marquardt, Thomas P.

    2010-01-01

    Purpose: To investigate the hypothesis that vowel production is more variable in adults with acquired apraxia of speech (AOS) relative to healthy individuals with unimpaired speech. Vowel formant frequency measures were selected as the specific target of focus. Method: Seven adults with AOS and aphasia produced 15 repetitions of 6 American English…

  11. Precategorical Acoustic Storage and the Perception of Speech

    ERIC Educational Resources Information Center

    Frankish, Clive

    2008-01-01

    Theoretical accounts of both speech perception and of short term memory must consider the extent to which perceptual representations of speech sounds might survive in relatively unprocessed form. This paper describes a novel version of the serial recall task that can be used to explore this area of shared interest. In immediate recall of digit…

  12. Acoustic properties of vowels in clear and conversational speech by female non-native English speakers

    NASA Astrophysics Data System (ADS)

    Li, Chi-Nin; So, Connie K.

    2005-04-01

    Studies have shown that talkers can improve the intelligibility of their speech when instructed to speak as if talking to a hearing-impaired person. The improvement of speech intelligibility is associated with specific acoustic-phonetic changes: increases in vowel duration and fundamental frequency (F0), a wider pitch range, and a shift in formant frequencies for F1 and F2. Most previous studies of clear speech production have been conducted with native speakers; research with second language speakers is much less common. The present study examined the acoustic properties of non-native English vowels produced in a clear speaking style. Five female Cantonese speakers and a comparison group of English speakers were recorded producing four vowels (/i u ae a/) in /bVt/ context in conversational and clear speech. Vowel durations, F0, pitch range, and the first two formants for each of the four vowels were measured. Analyses revealed that for both groups of speakers, vowel durations, F0, pitch range, and F1 spoken clearly were greater than those produced conversationally. However, F2 was higher in conversational speech than in clear speech. The findings suggest that female non-native English speakers exhibit acoustic-phonetic patterns similar to those of native speakers when asked to produce English vowels clearly.

  13. Speech privacy and annoyance considerations in the acoustic environment of passenger cars of high-speed trains.

    PubMed

    Jeon, Jin Yong; Hong, Joo Young; Jang, Hyung Suk; Kim, Jae Hyeon

    2015-12-01

    It is necessary to consider not only annoyance of interior noises but also speech privacy to achieve acoustic comfort in a passenger car of a high-speed train because speech from other passengers can be annoying. This study aimed to explore an optimal acoustic environment to satisfy speech privacy and reduce annoyance in a passenger car. Two experiments were conducted using speech sources and compartment noise of a high speed train with varying speech-to-noise ratios (SNRA) and background noise levels (BNL). Speech intelligibility was tested in experiment I, and in experiment II, perceived speech privacy, annoyance, and acoustic comfort of combined sounds with speech and background noise were assessed. The results show that speech privacy and annoyance were significantly influenced by the SNRA. In particular, the acoustic comfort was evaluated as acceptable when the SNRA was less than -6 dB for both speech privacy and noise annoyance. In addition, annoyance increased significantly as the BNL exceeded 63 dBA, whereas the effect of the background-noise level on the speech privacy was not significant. These findings suggest that an optimal level of interior noise in a passenger car might exist between 59 and 63 dBA, taking normal speech levels into account.

  14. Education in acoustics and speech science using vocal-tract models.

    PubMed

    Arai, Takayuki

    2012-03-01

    Several vocal-tract models were reviewed, with special focus given to the sliding vocal-tract model [T. Arai, Acoust. Sci. Technol. 27(6), 384-388 (2006)]. All of the models have been shown to be excellent tools for teaching acoustics and speech science to elementary through university level students. The sliding three-tube model is based on Fant's three-tube model [G. Fant, Acoustic Theory of Speech Production (Mouton, The Hague, The Netherlands, 2006)] and consists of a long tube with a slider simulating tongue constriction. In this article, the design of the sliding vocal-tract model was reviewed. Then a science workshop was discussed where children were asked to make their own sliding vocal-tract models using simple materials. It was also discussed how the sliding vocal-tract model compares to our other vocal-tract models, emphasizing how the model can be used to instruct students at higher levels, such as undergraduate and graduate education in acoustics and speech science. Through this discussion the vocal-tract models were shown to be a powerful tool for education in acoustics and speech science for all ages of students.

  15. Bird population density estimated from acoustic signals

    USGS Publications Warehouse

    Dawson, D.K.; Efford, M.G.

    2009-01-01

    Many animal species are detected primarily by sound. Although songs, calls and other sounds are often used for population assessment, as in bird point counts and hydrophone surveys of cetaceans, there are few rigorous methods for estimating population density from acoustic data. 2. The problem has several parts - distinguishing individuals, adjusting for individuals that are missed, and adjusting for the area sampled. Spatially explicit capture-recapture (SECR) is a statistical methodology that addresses jointly the second and third parts of the problem. We have extended SECR to use uncalibrated information from acoustic signals on the distance to each source. 3. We applied this extension of SECR to data from an acoustic survey of ovenbird Seiurus aurocapilla density in an eastern US deciduous forest with multiple four-microphone arrays. We modelled average power from spectrograms of ovenbird songs measured within a window of 0??7 s duration and frequencies between 4200 and 5200 Hz. 4. The resulting estimates of the density of singing males (0??19 ha -1 SE 0??03 ha-1) were consistent with estimates of the adult male population density from mist-netting (0??36 ha-1 SE 0??12 ha-1). The fitted model predicts sound attenuation of 0??11 dB m-1 (SE 0??01 dB m-1) in excess of losses from spherical spreading. 5.Synthesis and applications. Our method for estimating animal population density from acoustic signals fills a gap in the census methods available for visually cryptic but vocal taxa, including many species of bird and cetacean. The necessary equipment is simple and readily available; as few as two microphones may provide adequate estimates, given spatial replication. The method requires that individuals detected at the same place are acoustically distinguishable and all individuals vocalize during the recording interval, or that the per capita rate of vocalization is known. We believe these requirements can be met, with suitable field methods, for a significant

  16. Perceiving speech in context: Compensation for contextual variability during acoustic cue encoding and categorization

    NASA Astrophysics Data System (ADS)

    Toscano, Joseph Christopher

    Several fundamental questions about speech perception concern how listeners understand spoken language despite considerable variability in speech sounds across different contexts (the problem of lack of invariance in speech). This contextual variability is caused by several factors, including differences between individual talkers' voices, variation in speaking rate, and effects of coarticulatory context. A number of models have been proposed to describe how the speech system handles differences across contexts. Critically, these models make different predictions about (1) whether contextual variability is handled at the level of acoustic cue encoding or categorization, (2) whether it is driven by feedback from category-level processes or interactions between cues, and (3) whether listeners discard fine-grained acoustic information to compensate for contextual variability. Separating the effects of cue- and category-level processing has been difficult because behavioral measures tap processes that occur well after initial cue encoding and are influenced by task demands and linguistic information. Recently, we have used the event-related brain potential (ERP) technique to examine cue encoding and online categorization. Specifically, we have looked at differences in the auditory N1 as a measure of acoustic cue encoding and the P3 as a measure of categorization. This allows us to examine multiple levels of processing during speech perception and can provide a useful tool for studying effects of contextual variability. Here, I apply this approach to determine the point in processing at which context has an effect on speech perception and to examine whether acoustic cues are encoded continuously. Several types of contextual variability (talker gender, speaking rate, and coarticulation), as well as several acoustic cues (voice onset time, formant frequencies, and bandwidths), are examined in a series of experiments. The results suggest that (1) at early stages of speech

  17. Acoustic signals generated in inclined granular flows

    NASA Astrophysics Data System (ADS)

    Tan, Danielle S.; Jenkins, James T.; Keast, Stephen C.; Sachse, Wolfgang H.

    2015-10-01

    Spontaneous avalanching in specific deserts produces a low-frequency sound known as "booming." This creates a puzzle, because avalanches down the face of a dune result in collisions between sand grains that occur at much higher frequencies. Reproducing this phenomenon in the laboratory permits a better understanding of the underlying mechanisms for the generation of such lower frequency acoustic emissions, which may also be relevant to other dry granular flows. Here we report measurements of low-frequency acoustical signals, produced by dried "sounding" sand (sand capable of booming in the desert) flowing down an inclined chute. The amplitude of the signal diminishes over time but reappears upon drying of the sand. We show that the presence of this sound in the experiments may provide supporting evidence for a previously published "waveguide" explanation for booming. Also, we propose a model based on kinetic theory for a sheared inclined flow in which the flowing layer exhibits "breathing" modes superimposed on steady shearing. The predicted oscillation frequency is of a similar order of magnitude as the measurements, indicating that small perturbations can sustain oscillations of a low frequency. However, the frequency is underestimated, which indicates that the stiffness has been underestimated. Also, the model predicts a discrete spectrum of frequencies, instead of the broadband spectrum measured experimentally.

  18. Detection of Obstructive sleep apnea in awake subjects by exploiting body posture effects on the speech signal.

    PubMed

    Kriboy, M; Tarasiuk, A; Zigel, Y

    2014-01-01

    Obstructive sleep apnea (OSA) is a common sleep disorder. OSA is associated with several anatomical and functional abnormalities of the upper airway. It was shown that these abnormalities in the upper airway are also likely to be the reason for increased rate of apneic events in the supine position. Functional and structural changes in the vocal tract can affect the acoustic properties of speech. We hypothesize that acoustic properties of speech that are affected by body position may aid in distinguishing between OSA and non-OSA patients. We aimed to explore the possibility to differentiate OSA and non-OSA patients by analyzing the acoustic properties of their speech signal in upright sitting and supine positions. 35 awake patients were recorded while pronouncing sustained vowels in the upright sitting and supine positions. Using linear discriminant analysis (LDA) classifier, accuracy of 84.6%, sensitivity of 92.7%, and specificity of 80.0% were achieved. This study provides the proof of concept that it is possible to screen for OSA by analyzing and comparing speech properties acquired in upright sitting vs. supine positions. An acoustic-based screening system during wakefulness may address the growing needs for a reliable OSA screening tool; further studies are needed to support these findings. PMID:25570924

  19. Aging Affects Neural Synchronization to Speech-Related Acoustic Modulations

    PubMed Central

    Goossens, Tine; Vercammen, Charlotte; Wouters, Jan; van Wieringen, Astrid

    2016-01-01

    As people age, speech perception problems become highly prevalent, especially in noisy situations. In addition to peripheral hearing and cognition, temporal processing plays a key role in speech perception. Temporal processing of speech features is mediated by synchronized activity of neural oscillations in the central auditory system. Previous studies indicate that both the degree and hemispheric lateralization of synchronized neural activity relate to speech perception performance. Based on these results, we hypothesize that impaired speech perception in older persons may, in part, originate from deviances in neural synchronization. In this study, auditory steady-state responses that reflect synchronized activity of theta, beta, low and high gamma oscillations (i.e., 4, 20, 40, and 80 Hz ASSR, respectively) were recorded in young, middle-aged, and older persons. As all participants had normal audiometric thresholds and were screened for (mild) cognitive impairment, differences in synchronized neural activity across the three age groups were likely to be attributed to age. Our data yield novel findings regarding theta and high gamma oscillations in the aging auditory system. At an older age, synchronized activity of theta oscillations is increased, whereas high gamma synchronization is decreased. In contrast to young persons who exhibit a right hemispheric dominance for processing of high gamma range modulations, older adults show a symmetrical processing pattern. These age-related changes in neural synchronization may very well underlie the speech perception problems in aging persons. PMID:27378906

  20. Low-frequency speech cues and simulated electric-acoustic hearing

    PubMed Central

    Brown, Christopher A.; Bacon, Sid P.

    2009-01-01

    The addition of low-frequency acoustic information to real or simulated electric stimulation (so-called electric-acoustic stimulation or EAS) often results in large improvements in intelligibility, particularly in competing backgrounds. This may reflect the availability of fundamental frequency (F0) information in the acoustic region. The contributions of F0 and the amplitude envelope (as well as voicing) of speech to simulated EAS was examined by replacing the low-frequency speech with a tone that was modulated in frequency to track the F0 of the speech, in amplitude with the envelope of the low-frequency speech, or both. A four-channel vocoder simulated electric hearing. Significant benefit over vocoder alone was observed with the addition of a tone carrying F0 or envelope cues, and both cues combined typically provided significantly more benefit than either alone. The intelligibility improvement over vocoder was between 24 and 57 percentage points, and was unaffected by the presence of a tone carrying these cues from a background talker. These results confirm the importance of the F0 of target speech for EAS (in simulation). They indicate that significant benefit can be provided by a tone carrying F0 and amplitude envelope cues. The results support a glimpsing account of EAS and argue against segregation. PMID:19275323

  1. Acoustic and Articulatory Features of Diphthong Production: A Speech Clarity Study

    ERIC Educational Resources Information Center

    Tasko, Stephen M.; Greilick, Kristin

    2010-01-01

    Purpose: The purpose of this study was to evaluate how speaking clearly influences selected acoustic and orofacial kinematic measures associated with diphthong production. Method: Forty-nine speakers, drawn from the University of Wisconsin X-Ray Microbeam Speech Production Database (J. R. Westbury, 1994), served as participants. Samples of clear…

  2. Effects of contextual cues on speech recognition in simulated electric-acoustic stimulation.

    PubMed

    Kong, Ying-Yee; Donaldson, Gail; Somarowthu, Ala

    2015-05-01

    Low-frequency acoustic cues have shown to improve speech perception in cochlear-implant listeners. However, the mechanisms underlying this benefit are still not well understood. This study investigated the extent to which low-frequency cues can facilitate listeners' use of linguistic knowledge in simulated electric-acoustic stimulation (EAS). Experiment 1 examined differences in the magnitude of EAS benefit at the phoneme, word, and sentence levels. Speech materials were processed via noise-channel vocoding and lowpass (LP) filtering. The amount of spectral degradation in the vocoder speech was varied by applying different numbers of vocoder channels. Normal-hearing listeners were tested on vocoder-alone, LP-alone, and vocoder + LP conditions. Experiment 2 further examined factors that underlie the context effect on EAS benefit at the sentence level by limiting the low-frequency cues to temporal envelope and periodicity (AM + FM). Results showed that EAS benefit was greater for higher-context than for lower-context speech materials even when the LP ear received only low-frequency AM + FM cues. Possible explanations for the greater EAS benefit observed with higher-context materials may lie in the interplay between perceptual and expectation-driven processes for EAS speech recognition, and/or the band-importance functions for different types of speech materials. PMID:25994712

  3. Effects of contextual cues on speech recognition in simulated electric-acoustic stimulation

    PubMed Central

    Kong, Ying-Yee; Donaldson, Gail; Somarowthu, Ala

    2015-01-01

    Low-frequency acoustic cues have shown to improve speech perception in cochlear-implant listeners. However, the mechanisms underlying this benefit are still not well understood. This study investigated the extent to which low-frequency cues can facilitate listeners' use of linguistic knowledge in simulated electric-acoustic stimulation (EAS). Experiment 1 examined differences in the magnitude of EAS benefit at the phoneme, word, and sentence levels. Speech materials were processed via noise-channel vocoding and lowpass (LP) filtering. The amount of spectral degradation in the vocoder speech was varied by applying different numbers of vocoder channels. Normal-hearing listeners were tested on vocoder-alone, LP-alone, and vocoder + LP conditions. Experiment 2 further examined factors that underlie the context effect on EAS benefit at the sentence level by limiting the low-frequency cues to temporal envelope and periodicity (AM + FM). Results showed that EAS benefit was greater for higher-context than for lower-context speech materials even when the LP ear received only low-frequency AM + FM cues. Possible explanations for the greater EAS benefit observed with higher-context materials may lie in the interplay between perceptual and expectation-driven processes for EAS speech recognition, and/or the band-importance functions for different types of speech materials. PMID:25994712

  4. Contributions of electric and acoustic hearing to bimodal speech and music perception.

    PubMed

    Crew, Joseph D; Galvin, John J; Landsberger, David M; Fu, Qian-Jie

    2015-01-01

    Cochlear implant (CI) users have difficulty understanding speech in noisy listening conditions and perceiving music. Aided residual acoustic hearing in the contralateral ear can mitigate these limitations. The present study examined contributions of electric and acoustic hearing to speech understanding in noise and melodic pitch perception. Data was collected with the CI only, the hearing aid (HA) only, and both devices together (CI+HA). Speech reception thresholds (SRTs) were adaptively measured for simple sentences in speech babble. Melodic contour identification (MCI) was measured with and without a masker instrument; the fundamental frequency of the masker was varied to be overlapping or non-overlapping with the target contour. Results showed that the CI contributes primarily to bimodal speech perception and that the HA contributes primarily to bimodal melodic pitch perception. In general, CI+HA performance was slightly improved relative to the better ear alone (CI-only) for SRTs but not for MCI, with some subjects experiencing a decrease in bimodal MCI performance relative to the better ear alone (HA-only). Individual performance was highly variable, and the contribution of either device to bimodal perception was both subject- and task-dependent. The results suggest that individualized mapping of CIs and HAs may further improve bimodal speech and music perception. PMID:25790349

  5. Primary Progressive Apraxia of Speech: Clinical Features and Acoustic and Neurologic Correlates

    PubMed Central

    Strand, Edythe A.; Clark, Heather; Machulda, Mary; Whitwell, Jennifer L.; Josephs, Keith A.

    2015-01-01

    Purpose This study summarizes 2 illustrative cases of a neurodegenerative speech disorder, primary progressive apraxia of speech (AOS), as a vehicle for providing an overview of the disorder and an approach to describing and quantifying its perceptual features and some of its temporal acoustic attributes. Method Two individuals with primary progressive AOS underwent speech-language and neurologic evaluations on 2 occasions, ranging from 2.0 to 7.5 years postonset. Performance on several tests, tasks, and rating scales, as well as several acoustic measures, were compared over time within and between cases. Acoustic measures were compared with performance of control speakers. Results Both patients initially presented with AOS as the only or predominant sign of disease and without aphasia or dysarthria. The presenting features and temporal progression were captured in an AOS Rating Scale, an Articulation Error Score, and temporal acoustic measures of utterance duration, syllable rates per second, rates of speechlike alternating motion and sequential motion, and a pairwise variability index measure. Conclusions AOS can be the predominant manifestation of neurodegenerative disease. Clinical ratings of its attributes and acoustic measures of some of its temporal characteristics can support its diagnosis and help quantify its salient characteristics and progression over time. PMID:25654422

  6. Acoustic signal detection of manatee calls

    NASA Astrophysics Data System (ADS)

    Niezrecki, Christopher; Phillips, Richard; Meyer, Michael; Beusse, Diedrich O.

    2003-04-01

    The West Indian manatee (trichechus manatus latirostris) has become endangered partly because of a growing number of collisions with boats. A system to warn boaters of the presence of manatees, that can signal to boaters that manatees are present in the immediate vicinity, could potentially reduce these boat collisions. In order to identify the presence of manatees, acoustic methods are employed. Within this paper, three different detection algorithms are used to detect the calls of the West Indian manatee. The detection systems are tested in the laboratory using simulated manatee vocalizations from an audio compact disc. The detection method that provides the best overall performance is able to correctly identify ~=96% of the manatee vocalizations. However the system also results in a false positive rate of ~=16%. The results of this work may ultimately lead to the development of a manatee warning system that can warn boaters of the presence of manatees.

  7. The advantages of sound localization and speech perception of bilateral electric acoustic stimulation

    PubMed Central

    Moteki, Hideaki; Kitoh, Ryosuke; Tsukada, Keita; Iwasaki, Satoshi; Nishio, Shin-Ya

    2015-01-01

    Conclusion: Bilateral electric acoustic stimulation (EAS) effectively improved speech perception in noise and sound localization in patients with high-frequency hearing loss. Objective: To evaluate bilateral EAS efficacy of sound localization detection and speech perception in noise in two cases of high-frequency hearing loss. Methods: Two female patients, aged 38 and 45 years, respectively, received bilateral EAS sequentially. Pure-tone audiometry was performed preoperatively and postoperatively to evaluate the hearing preservation in the lower frequencies. Speech perception outcomes in quiet and noise and sound localization were assessed with unilateral and bilateral EAS. Results: Residual hearing in the lower frequencies was well preserved after insertion of a FLEX24 electrode (24 mm) using the round window approach. After bilateral EAS, speech perception improved in quiet and even more so in noise. In addition, the sound localization ability of both cases with bilateral EAS improved remarkably. PMID:25423260

  8. Compression and its effect on the speech signal.

    PubMed

    Verschuure, J; Maas, A J; Stikvoort, E; de Jong, R M; Goedegebure, A; Dreschler, W A

    1996-04-01

    Compression systems are often used in hearing aids to increase the wearing comfort. A patient has to readjust frequently the gain of a linear hearing aid because of the limited dynamic hearing range and the changing acoustical conditions. A great deal of attention has been given to the static parameters but very little to the dynamic parameters. We present a general method to describe the dynamic behavior of a compression system by comparing modulations at the output with modulations at the input. The use of this method resulted in a single parameter describing the temporal characteristics of a compressor, the cut-off modulation frequency. In this paper its value is compared with known properties of running speech. A limitation of this method is the use of only small modulation depths, and the consequence of this limitation is tested. The use of this method is described for an experimental digital compressor developed by the authors, and the effects of some temporal parameters such as attack and release time are studied. This method shows the rather large effects of some of the parameters on the effectiveness of a compressor on speech. This method is also used to analyze two generally accepted compression systems in hearing aids. The theoretical method is next compared to the effects of compression on the distribution of the amplitude envelope of running speech, and it could be shown that single-channel compression systems do not reduce the distribution width of speech filtered in frequency bands. This finding questions the use of compression systems for fitting the speech banana in the dynamic hearing range of impaired listeners.

  9. Removal of noise from noise-degraded speech signals

    NASA Astrophysics Data System (ADS)

    1989-06-01

    Techniques for the removal of noise from noise-degraded speech signals were reviewed and evaluation with special emphasis on live radio and telephone communications and the extraction of information from similar noisy recordings. The related area on the development of speech-enhancement devices for hearing-impaired people was reviewed. Evaluation techniques were reviewed to determine their suitability, particularly for the assessment of changes in the performance of workers who might use noise-reduction equipments on a daily basis in the applications cited above. The main conclusion was that noise-reduction methods may be useful in improving the performance of human operators who extract information from noisy speech material despite a lack of improvement found in using conventional closed-response intelligibility tests to assess those methods.

  10. Vowel Acoustics in Dysarthria: Speech Disorder Diagnosis and Classification

    ERIC Educational Resources Information Center

    Lansford, Kaitlin L.; Liss, Julie M.

    2014-01-01

    Purpose: The purpose of this study was to determine the extent to which vowel metrics are capable of distinguishing healthy from dysarthric speech and among different forms of dysarthria. Method: A variety of vowel metrics were derived from spectral and temporal measurements of vowel tokens embedded in phrases produced by 45 speakers with…

  11. A model that predicts the binaural advantage to speech intelligibility from the mixed target and interferer signals.

    PubMed

    Cosentino, Stefano; Marquardt, Torsten; McAlpine, David; Culling, John F; Falk, Tiago H

    2014-02-01

    A model is presented that predicts the binaural advantage to speech intelligibility by analyzing the right and left recordings at the two ears containing mixed target and interferer signals. This auditory-inspired model implements an equalization-cancellation stage to predict the binaural unmasking (BU) component, in conjunction with a modulation-frequency estimation block to estimate the "better ear" effect (BE) component of the binaural advantage. The model's performance was compared to experimental data obtained under anechoic and reverberant conditions using a single speech-shaped noise interferer paradigm. The internal BU and BE components were compared to those of the speech intelligibility model recently proposed by Lavandier et al. [J. Acoust. Soc. Am. 131, 218-231 (2012)], which requires separate inputs for target and interferer. The data indicate that the proposed model provides comparably good predictions from a mixed-signals input under both anechoic and reverberant conditions.

  12. A novel radar sensor for the non-contact detection of speech signals.

    PubMed

    Jiao, Mingke; Lu, Guohua; Jing, Xijing; Li, Sheng; Li, Yanfeng; Wang, Jianqi

    2010-01-01

    Different speech detection sensors have been developed over the years but they are limited by the loss of high frequency speech energy, and have restricted non-contact detection due to the lack of penetrability. This paper proposes a novel millimeter microwave radar sensor to detect speech signals. The utilization of a high operating frequency and a superheterodyne receiver contributes to the high sensitivity of the radar sensor for small sound vibrations. In addition, the penetrability of microwaves allows the novel sensor to detect speech signals through nonmetal barriers. Results show that the novel sensor can detect high frequency speech energies and that the speech quality is comparable to traditional microphone speech. Moreover, the novel sensor can detect speech signals through a nonmetal material of a certain thickness between the sensor and the subject. Thus, the novel speech sensor expands traditional speech detection techniques and provides an exciting alternative for broader application prospects.

  13. Contribution of Consonant Landmarks to Speech Recognition in Simulated Acoustic-Electric Hearing

    PubMed Central

    Chen, Fei; Loizou, Philipos C.

    2009-01-01

    Objectives The purpose of this study is to assess the contribution of information provided by obstruent consonants (e.g. stops, fricatives) to speech intelligibility in simulated acoustic-electric hearing. As a secondary objective, this study examines the performance of an objective measure that can potentially be used for predicting the intelligibility of vocoded speech. Design Noise-corrupted sentences are used in Exp. 1 in which the noise-corrupted obstruent consonants are replaced with clean obstruent consonants, while leaving the sonorant sounds (vowels, semivowels and nasals) corrupted. In one condition, listeners have only access to the low-frequency (<600 Hz) acoustic portion of the clean consonant spectra, in another condition listeners have only access to the higher frequency (>600 Hz) portion (vocoded) of the clean consonant spectra, and in the third condition they have access to both. In Exp. 2, we investigate a speech-coding strategy that selectively attenuates the low-frequency portion of the consonant spectra while leaving the vocoded portion corrupted by noise. Finally, using the data collected from Exp. 1 and 2, we evaluate the performance of an objective measure in terms of predicting intelligibility of vocoded speech. This measure was originally designed to predict speech quality and has never been evaluated with vocoded speech. Results Significant improvements (about 30 percentage points) in intelligibility were noted in Exp. 1 in steady and two-talker masker conditions when the listeners had access to the clean obstruent consonants in both the acoustic and vocoded portions of the spectrum. The improvement was more evident in the low SNR levels (−5 and 0 dB). Further analysis indicated that it was access to the vocoded portion of the consonant spectra, rather than access to the low-frequency acoustic portion of the consonant spectra that contributed the most to the large improvements in performance. In Exp. 2, a small (14 percentage points

  14. Effects of deafness on acoustic characteristics of American English tense/lax vowels in maternal speech to infants

    PubMed Central

    Kondaurova, Maria V.; Bergeson, Tonya R.; Dilley, Laura C.

    2012-01-01

    Recent studies have demonstrated that mothers exaggerate phonetic properties of infant-directed (ID) speech. However, these studies focused on a single acoustic dimension (frequency), whereas speech sounds are composed of multiple acoustic cues. Moreover, little is known about how mothers adjust phonetic properties of speech to children with hearing loss. This study examined mothers’ production of frequency and duration cues to the American English tense/lax vowel contrast in speech to profoundly deaf (N = 14) and normal-hearing (N = 14) infants, and to an adult experimenter. First and second formant frequencies and vowel duration of tense (/i/, /u/) and lax (/I/, /ʊ/) vowels were measured. Results demonstrated that for both infant groups mothers hyperarticulated the acoustic vowel space and increased vowel duration in ID speech relative to adult-directed speech. Mean F2 values were decreased for the /u/ vowel and increased for the /I/ vowel, and vowel duration was longer for the /i/, /u/, and /I/ vowels in ID speech. However, neither acoustic cue differed in speech to hearing-impaired or normal-hearing infants. These results suggest that both formant frequencies and vowel duration that differentiate American English tense/lx vowel contrasts are modified in ID speech regardless of the hearing status of the addressee. PMID:22894224

  15. A Chimpanzee Recognizes Synthetic Speech With Significantly Reduced Acoustic Cues to Phonetic Content

    PubMed Central

    Heimbauer, Lisa A.; Beran, Michael J.; Owren, Michael J.

    2011-01-01

    Summary A long-standing debate concerns whether humans are specialized for speech perception [1–7], which some researchers argue is demonstrated by the ability to understand synthetic speech with significantly reduced acoustic cues to phonetic content [2–4,7]. We tested a chimpanzee (Pan troglodytes) that recognizes 128 spoken words [8,9], asking whether she could understand such speech. Three experiments presented 48 individual words, with the animal selecting a corresponding visuo-graphic symbol from among four alternatives. Experiment 1 tested spectrally reduced, noise-vocoded (NV) synthesis, originally developed to simulate input received by human cochlear-implant users [10]. Experiment 2 tested “impossibly unspeechlike” [3] sine-wave (SW) synthesis, which reduces speech to just three moving tones [11]. Although receiving only intermittent and non-contingent reward, the chimpanzee performed well above chance level, including when hearing synthetic versions for the first time. Recognition of SW words was least accurate, but improved in Experiment 3 when natural words in the same session were rewarded. The chimpanzee was more accurate with NV than SW versions, as were 32 human participants hearing these items. The chimpanzee's ability to spontaneously recognize acoustically reduced synthetic words suggests that experience rather than specialization is critical for speech-perception capabilities that some have suggested are uniquely human [12–14]. PMID:21723125

  16. Formant Centralization Ratio (FCR): A proposal for a new acoustic measure of dysarthric speech

    PubMed Central

    Sapir, Shimon; Ramig, Lorraine O.; Spielman, Jennifer L.; Fox, Cynthia

    2009-01-01

    Background and Aims The vowel space area (VSA) has been used as an acoustic metric of dysarthric speech, but with varying degrees of success. Here we test an alternative metric -- Formant centralization ratio (FCR) -- that is hypothesized to more effectively differentiate dysarthric from healthy speech and register treatment effects. Methods Speech recordings of 38 individuals with idiopathic Parkinson disease (IPD) and dysarthria (19 of whom received one month of intensive speech therapy (LSVT® LOUD)) and 14 healthy controls were acoustically analyzed. Vowels were extracted from short phrases. The same vowel-formant elements were used to construct the FCR, expressed as (F2u+F2𝔞+F1i+F1u)/(F2i+F1𝔞), the VSA, expressed as ABS((F1i*(F2𝔞–F2u)+F1𝔞*(F2u–F2i)+F1u*(F2i–F2𝔞))/2), a logarithmically scaled version of the VSA (LnVSA), and the F2i/F2u ratio. Results Unlike the VSA and the LnVSA, the FCR and F2i/F2u robustly differentiated dysarthric from healthy speech and were not gender-sensitive. All metrics effectively registered treatment effects and were strongly correlated with each other. Conclusions Albeit preliminary, the present findings indicate that the FCR is a sensitive, valid and reliable acoustic metric for distinguishing dysarthric from normal speech and for monitoring treatment effects, probably so because of reduced sensitivity to inter-speaker variability and enhanced sensitivity to vowel centralization. PMID:19948755

  17. Subauditory Speech Recognition based on EMG/EPG Signals

    NASA Technical Reports Server (NTRS)

    Jorgensen, Charles; Lee, Diana Dee; Agabon, Shane; Lau, Sonie (Technical Monitor)

    2003-01-01

    Sub-vocal electromyogram/electro palatogram (EMG/EPG) signal classification is demonstrated as a method for silent speech recognition. Recorded electrode signals from the larynx and sublingual areas below the jaw are noise filtered and transformed into features using complex dual quad tree wavelet transforms. Feature sets for six sub-vocally pronounced words are trained using a trust region scaled conjugate gradient neural network. Real time signals for previously unseen patterns are classified into categories suitable for primitive control of graphic objects. Feature construction, recognition accuracy and an approach for extension of the technique to a variety of real world application areas are presented.

  18. Acoustic signalling reflects personality in a social mammal

    PubMed Central

    Friel, Mary; Kunc, Hansjoerg P.; Griffin, Kym; Asher, Lucy; Collins, Lisa M.

    2016-01-01

    Social interactions among individuals are often mediated through acoustic signals. If acoustic signals are consistent and related to an individual's personality, these consistent individual differences in signalling may be an important driver in social interactions. However, few studies in non-human mammals have investigated the relationship between acoustic signalling and personality. Here we show that acoustic signalling rate is repeatable and strongly related to personality in a highly social mammal, the domestic pig (Sus scrofa domestica). Furthermore, acoustic signalling varied between environments of differing quality, with males from a poor-quality environment having a reduced vocalization rate compared with females and males from an enriched environment. Such differences may be mediated by personality with pigs from a poor-quality environment having more reactive and more extreme personality scores compared with pigs from an enriched environment. Our results add to the evidence that acoustic signalling reflects personality in a non-human mammal. Signals reflecting personalities may have far reaching consequences in shaping the evolution of social behaviours as acoustic communication forms an integral part of animal societies. PMID:27429775

  19. Acoustic signalling reflects personality in a social mammal.

    PubMed

    Friel, Mary; Kunc, Hansjoerg P; Griffin, Kym; Asher, Lucy; Collins, Lisa M

    2016-06-01

    Social interactions among individuals are often mediated through acoustic signals. If acoustic signals are consistent and related to an individual's personality, these consistent individual differences in signalling may be an important driver in social interactions. However, few studies in non-human mammals have investigated the relationship between acoustic signalling and personality. Here we show that acoustic signalling rate is repeatable and strongly related to personality in a highly social mammal, the domestic pig (Sus scrofa domestica). Furthermore, acoustic signalling varied between environments of differing quality, with males from a poor-quality environment having a reduced vocalization rate compared with females and males from an enriched environment. Such differences may be mediated by personality with pigs from a poor-quality environment having more reactive and more extreme personality scores compared with pigs from an enriched environment. Our results add to the evidence that acoustic signalling reflects personality in a non-human mammal. Signals reflecting personalities may have far reaching consequences in shaping the evolution of social behaviours as acoustic communication forms an integral part of animal societies. PMID:27429775

  20. From prosodic structure to acoustic saliency: A fMRI investigation of speech rate, clarity, and emphasis

    NASA Astrophysics Data System (ADS)

    Golfinopoulos, Elisa

    Acoustic variability in fluent speech can arise at many stages in speech production planning and execution. For example, at the phonological encoding stage, the grouping of phonemes into syllables determines which segments are coarticulated and, by consequence, segment-level acoustic variation. Likewise phonetic encoding, which determines the spatiotemporal extent of articulatory gestures, will affect the acoustic detail of segments. Functional magnetic resonance imaging (fMRI) was used to measure brain activity of fluent adult speakers in four speaking conditions: fast, normal, clear, and emphatic (or stressed) speech. These speech manner changes typically result in acoustic variations that do not change the lexical or semantic identity of productions but do affect the acoustic saliency of phonemes, syllables and/or words. Acoustic responses recorded inside the scanner were assessed quantitatively using eight acoustic measures and sentence duration was used as a covariate of non-interest in the neuroimaging analysis. Compared to normal speech, emphatic speech was characterized acoustically by a greater difference between stressed and unstressed vowels in intensity, duration, and fundamental frequency, and neurally by increased activity in right middle premotor cortex and supplementary motor area, and bilateral primary sensorimotor cortex. These findings are consistent with right-lateralized motor planning of prosodic variation in emphatic speech. Clear speech involved an increase in average vowel and sentence durations and average vowel spacing, along with increased activity in left middle premotor cortex and bilateral primary sensorimotor cortex. These findings are consistent with an increased reliance on feedforward control, resulting in hyper-articulation, under clear as compared to normal speech. Fast speech was characterized acoustically by reduced sentence duration and average vowel spacing, and neurally by increased activity in left anterior frontal

  1. The effects of spectral and temporal parameters on perceived confirmation of an auditory non-speech signal.

    PubMed

    Bodendörfer, Xaver; Kortekaas, Reinier; Weingarten, Markus; Schlittmeier, Sabine

    2015-08-01

    In human-machine interactions, the confirmation of an action or input is a very important information for users. A paired comparison experiment explored the effects of four acoustic parameters on the perceived confirmation of auditory non-speech signals. Reducing the frequency-ratio and the pulse-to-pulse time between two successive pulses increased perceived confirmation. The effects of the parameters frequency and number of pulses were not clear-cut. The results provide information for designing auditory confirmation signals. It is shown that findings about the effects of certain parameters on the perceived urgency of warning signals cannot be easily inverted to perceived confirmation. PMID:26328737

  2. Knowledge and attitudes of teachers regarding the impact of classroom acoustics on speech perception and learning.

    PubMed

    Ramma, Lebogang

    2009-01-01

    This study investigated the knowledge and attitude of primary school teachers regarding the impact of poor classroom acoustics on learners' speech perception and learning in class. Classrooms with excessive background noise and reflective surfaces could be a barrier to learning, and it is important that teachers are aware of this. There is currently limited research data about teachers' knowledge regarding the topic of classroom acoustics. Seventy teachers from three Johannesburg primary schools participated in this study. A survey by way of structured self-administered questionnaire was the primary data collection method. The findings of this study showed that most of the participants in this study did not have adequate knowledge of classroom acoustics. Most of the participants were also unaware of the impact that classrooms with poor acoustic environments can have on speech perception and learning. These results are discussed in relation to the practical implication of empowering teachers to manage the acoustic environment of their classrooms, limitations of the study as well as implications for future research.

  3. 4D time-frequency representation for binaural speech signal processing

    NASA Astrophysics Data System (ADS)

    Mikhael, Raed; Szu, Harold H.

    2006-04-01

    Hearing is the ability to detect and process auditory information produced by the vibrating hair cilia residing in the corti of the ears to the auditory cortex of the brain via the auditory nerve. The primary and secondary corti of the brain interact with one another to distinguish and correlate the received information by distinguishing the varying spectrum of arriving frequencies. Binaural hearing is nature's way of employing the power inherent in working in pairs to process information, enhance sound perception, and reduce undesired noise. One ear might play a prominent role in sound recognition, while the other reinforces their perceived mutual information. Developing binaural hearing aid devices can be crucial in emulating the working powers of two ears and may be a step closer to significantly alleviating hearing loss of the inner ear. This can be accomplished by combining current speech research to already existing technologies such as RF communication between PDAs and Bluetooth. Ear Level Instrument (ELI) developed by Micro-tech Hearing Instruments and Starkey Laboratories is a good example of a digital bi-directional signal communicating between a PDA/mobile phone and Bluetooth. The agreement and disagreement of arriving auditory information to the Bluetooth device can be classified as sound and noise, respectively. Finding common features of arriving sound using a four coordinate system for sound analysis (four dimensional time-frequency representation), noise can be greatly reduced and hearing aids would become more efficient. Techniques developed by Szu within an Artificial Neural Network (ANN), Blind Source Separation (BSS), Adaptive Wavelets Transform (AWT), and Independent Component Analysis (ICA) hold many possibilities to the improvement of acoustic segmentation of phoneme, all of which will be discussed in this paper. Transmitted and perceived acoustic speech signal will improve, as the binaural hearing aid will emulate two ears in sound

  4. An eighth-scale speech source for subjective assessments in acoustic models

    NASA Astrophysics Data System (ADS)

    Orlowski, R. J.

    1981-08-01

    The design of a source is described which is suitable for making speech recordings in eighth-scale acoustic models of auditoria. An attempt was made to match the directionality of the source with the directionality of the human voice using data reported in the literature. A narrow aperture was required for the design which was provided by mounting an inverted conical horn over the diaphragm of a high frequency loudspeaker. Resonance problems were encountered with the use of a horn and a description is given of the electronic techniques adopted to minimize the effect of these resonances. Subjective and objective assessments on the completed speech source have proved satisfactory. It has been used in a modelling exercise concerned with the acoustic design of a theatre with a thrust-type stage.

  5. Acoustic-Phonetic Differences between Infant- and Adult-Directed Speech: The Role of Stress and Utterance Position

    ERIC Educational Resources Information Center

    Wang, Yuanyuan; Seidl, Amanda; Cristia, Alejandrina

    2015-01-01

    Previous studies have shown that infant-directed speech (IDS) differs from adult-directed speech (ADS) on a variety of dimensions. The aim of the current study was to investigate whether acoustic differences between IDS and ADS in English are modulated by prosodic structure. We compared vowels across the two registers (IDS, ADS) in both stressed…

  6. Speech after Radial Forearm Free Flap Reconstruction of the Tongue: A Longitudinal Acoustic Study of Vowel and Diphthong Sounds

    ERIC Educational Resources Information Center

    Laaksonen, Juha-Pertti; Rieger, Jana; Happonen, Risto-Pekka; Harris, Jeffrey; Seikaly, Hadi

    2010-01-01

    The purpose of this study was to use acoustic analyses to describe speech outcomes over the course of 1 year after radial forearm free flap (RFFF) reconstruction of the tongue. Eighteen Canadian English-speaking females and males with reconstruction for oral cancer had speech samples recorded (pre-operative, and 1 month, 6 months, and 1 year…

  7. Interpretation of acoustic signals from fluidzed beds

    SciTech Connect

    Halow, J.S.; Daw, C.S.; Finney, C.E.A.; Nguyen, K.

    1996-12-31

    Rhythmic {open_quotes}whooshing{close_quotes} sounds associated with rising bubbles are a characteristic feature of many fluidized beds. Although clearly distinguishable to the ear, these sounds are rather complicated in detail and seem to contain a large background of apparently irrelevant stochastic noise. While it is clear that these sounds contain some information about bed dynamics, it is not obvious how this information can be interpreted in a meaningful way. In this presentation we describe a technique for processing bed sounds that appears to work well for beds with large particles operating in a slugging or near-slugging mode. We find that our processing algorithm allows us to determine important bubble/slug features from sound measurements alone, including slug location at any point in time, the average bubble frequency and frequency variation, and corresponding dynamic pressure drops at different bed locations. We also have been able to correlate a portion of the acoustic signal with particle impacts on surfaces and particle motions near the grid. We conclude from our observations that relatively simple sound measurements can provide much diagnostic information and could be potentially used for bed control. 5 refs., 4 figs.

  8. Thirty years of underwater acoustic signal processing in China

    NASA Astrophysics Data System (ADS)

    Li, Qihu

    2012-11-01

    Advances in technology and theory in 30 years of underwater acoustic signal processing and its applications in China are presented in this paper. The topics include research work in the field of underwater acoustic signal modeling, acoustic field matching, ocean waveguide and internal wave, the extraction and processing technique for acoustic vector signal information, the space/time correlation characteristics of low frequency acoustic channels, the invariant features of underwater target radiated noise, the transmission technology of underwater voice/image data and its anti-interference technique. Some frontier technologies in sonar design are also discussed, including large aperture towed line array sonar, high resolution synthetic aperture sonar, deep sea siren and deep sea manned subsea vehicle, diver detection sonar and demonstration projector of national ocean monitoring system in China, etc.

  9. Are you a good mimic? Neuro-acoustic signatures for speech imitation ability.

    PubMed

    Reiterer, Susanne M; Hu, Xiaochen; Sumathi, T A; Singh, Nandini C

    2013-01-01

    We investigated individual differences in speech imitation ability in late bilinguals using a neuro-acoustic approach. One hundred and thirty-eight German-English bilinguals matched on various behavioral measures were tested for "speech imitation ability" in a foreign language, Hindi, and categorized into "high" and "low ability" groups. Brain activations and speech recordings were obtained from 26 participants from the two extreme groups as they performed a functional neuroimaging experiment which required them to "imitate" sentences in three conditions: (A) German, (B) English, and (C) German with fake English accent. We used recently developed novel acoustic analysis, namely the "articulation space" as a metric to compare speech imitation abilities of the two groups. Across all three conditions, direct comparisons between the two groups, revealed brain activations (FWE corrected, p < 0.05) that were more widespread with significantly higher peak activity in the left supramarginal gyrus and postcentral areas for the low ability group. The high ability group, on the other hand showed significantly larger articulation space in all three conditions. In addition, articulation space also correlated positively with imitation ability (Pearson's r = 0.7, p < 0.01). Our results suggest that an expanded articulation space for high ability individuals allows access to a larger repertoire of sounds, thereby providing skilled imitators greater flexibility in pronunciation and language learning. PMID:24155739

  10. Are you a good mimic? Neuro-acoustic signatures for speech imitation ability

    PubMed Central

    Reiterer, Susanne M.; Hu, Xiaochen; Sumathi, T. A.; Singh, Nandini C.

    2013-01-01

    We investigated individual differences in speech imitation ability in late bilinguals using a neuro-acoustic approach. One hundred and thirty-eight German-English bilinguals matched on various behavioral measures were tested for “speech imitation ability” in a foreign language, Hindi, and categorized into “high” and “low ability” groups. Brain activations and speech recordings were obtained from 26 participants from the two extreme groups as they performed a functional neuroimaging experiment which required them to “imitate” sentences in three conditions: (A) German, (B) English, and (C) German with fake English accent. We used recently developed novel acoustic analysis, namely the “articulation space” as a metric to compare speech imitation abilities of the two groups. Across all three conditions, direct comparisons between the two groups, revealed brain activations (FWE corrected, p < 0.05) that were more widespread with significantly higher peak activity in the left supramarginal gyrus and postcentral areas for the low ability group. The high ability group, on the other hand showed significantly larger articulation space in all three conditions. In addition, articulation space also correlated positively with imitation ability (Pearson's r = 0.7, p < 0.01). Our results suggest that an expanded articulation space for high ability individuals allows access to a larger repertoire of sounds, thereby providing skilled imitators greater flexibility in pronunciation and language learning. PMID:24155739

  11. Are you a good mimic? Neuro-acoustic signatures for speech imitation ability.

    PubMed

    Reiterer, Susanne M; Hu, Xiaochen; Sumathi, T A; Singh, Nandini C

    2013-01-01

    We investigated individual differences in speech imitation ability in late bilinguals using a neuro-acoustic approach. One hundred and thirty-eight German-English bilinguals matched on various behavioral measures were tested for "speech imitation ability" in a foreign language, Hindi, and categorized into "high" and "low ability" groups. Brain activations and speech recordings were obtained from 26 participants from the two extreme groups as they performed a functional neuroimaging experiment which required them to "imitate" sentences in three conditions: (A) German, (B) English, and (C) German with fake English accent. We used recently developed novel acoustic analysis, namely the "articulation space" as a metric to compare speech imitation abilities of the two groups. Across all three conditions, direct comparisons between the two groups, revealed brain activations (FWE corrected, p < 0.05) that were more widespread with significantly higher peak activity in the left supramarginal gyrus and postcentral areas for the low ability group. The high ability group, on the other hand showed significantly larger articulation space in all three conditions. In addition, articulation space also correlated positively with imitation ability (Pearson's r = 0.7, p < 0.01). Our results suggest that an expanded articulation space for high ability individuals allows access to a larger repertoire of sounds, thereby providing skilled imitators greater flexibility in pronunciation and language learning.

  12. The Effectiveness of Clear Speech as a Masker

    ERIC Educational Resources Information Center

    Calandruccio, Lauren; Van Engen, Kristin; Dhar, Sumitrajit; Bradlow, Ann R.

    2010-01-01

    Purpose: It is established that speaking clearly is an effective means of enhancing intelligibility. Because any signal-processing scheme modeled after known acoustic-phonetic features of clear speech will likely affect both target and competing speech, it is important to understand how speech recognition is affected when a competing speech signal…

  13. RAPID ACOUSTIC PROCESSING IN THE AUDITORY BRAINSTEM IS NOT RELATED TO CORTICAL ASYMMETRY FOR THE SYLLABLE RATE OF SPEECH

    PubMed Central

    Abrams, Daniel A.; Nicol, Trent; Zecker, Steven; Kraus, Nina

    2010-01-01

    Objective Temporal acuity in the auditory brainstem is correlated with left-dominant patterns of cortical asymmetry for processing rapid speech-sound stimuli. Here we investigate whether a similar relationship exists between brainstem processing of rapid speech components and cortical processing of syllable patterns in speech. Methods We measured brainstem and cortical evoked potentials in response to speech tokens in 23 children. We used established measures of auditory brainstem and cortical activity to examine functional relationships between these structures. Results We found no relationship between brainstem responses to fast acoustic elements of speech and right-dominant cortical processing of syllable patterns. Conclusions Brainstem processing of rapid elements in speech is not functionally related to rightward cortical asymmetry associated with the processing of syllable-rate features in speech. Viewed together with previous evidence linking brainstem timing with leftward cortical asymmetry for faster acoustic features, findings support the existence of distinct mechanisms for encoding rapid vs. slow elements of speech. Significance Results provide a fundamental advance in our knowledge of the segregation of subcortical input associated with cortical asymmetries for acoustic rate processing in the human auditory system. Implications of these findings for auditory perception, reading ability and development are discussed. PMID:20378402

  14. A Method of Speech Periodicity Enhancement Using Transform-domain Signal Decomposition☆

    PubMed Central

    Lee, Tan; Kleijn, W. Bastiaan; Kong, Ying-Yee

    2015-01-01

    Periodicity is an important property of speech signals. It is the basis of the signal’s fundamental frequency and the pitch of voice, which is crucial to speech communication. This paper presents a novel framework of periodicity enhancement for noisy speech. The enhancement is applied to the linear prediction residual of speech. The residual signal goes through a constant-pitch time warping process and two sequential lapped-frequency transforms, by which the periodic component is concentrated in certain transform coefficients. By emphasizing the respective transform coefficients, periodicity enhancement of noisy residual signal is achieved. The enhanced residual signal and estimated linear prediction filter parameters are used to synthesize the output speech. An adaptive algorithm is proposed for adjusting the weights for the periodic and aperiodic components. Effectiveness of the proposed approach is demonstrated via experimental evaluation. It is observed that harmonic structure of the original speech could be properly restored to improve the perceptual quality of enhanced speech. PMID:26150679

  15. Control of spoken vowel acoustics and the influence of phonetic context in human speech sensorimotor cortex.

    PubMed

    Bouchard, Kristofer E; Chang, Edward F

    2014-09-17

    Speech production requires the precise control of vocal tract movements to generate individual speech sounds (phonemes) which, in turn, are rapidly organized into complex sequences. Multiple productions of the same phoneme can exhibit substantial variability, some of which is inherent to control of the vocal tract and its biomechanics, and some of which reflects the contextual effects of surrounding phonemes ("coarticulation"). The role of the CNS in these aspects of speech motor control is not well understood. To address these issues, we recorded multielectrode cortical activity directly from human ventral sensory-motor cortex (vSMC) during the production of consonant-vowel syllables. We analyzed the relationship between the acoustic parameters of vowels (pitch and formants) and cortical activity on a single-trial level. We found that vSMC activity robustly predicted acoustic parameters across vowel categories (up to 80% of variance), as well as different renditions of the same vowel (up to 25% of variance). Furthermore, we observed significant contextual effects on vSMC representations of produced phonemes that suggest active control of coarticulation: vSMC representations for vowels were biased toward the representations of the preceding consonant, and conversely, representations for consonants were biased toward upcoming vowels. These results reveal that vSMC activity for phonemes are not invariant and provide insight into the cortical mechanisms of coarticulation.

  16. Control of Spoken Vowel Acoustics and the Influence of Phonetic Context in Human Speech Sensorimotor Cortex

    PubMed Central

    Bouchard, Kristofer E.

    2014-01-01

    Speech production requires the precise control of vocal tract movements to generate individual speech sounds (phonemes) which, in turn, are rapidly organized into complex sequences. Multiple productions of the same phoneme can exhibit substantial variability, some of which is inherent to control of the vocal tract and its biomechanics, and some of which reflects the contextual effects of surrounding phonemes (“coarticulation”). The role of the CNS in these aspects of speech motor control is not well understood. To address these issues, we recorded multielectrode cortical activity directly from human ventral sensory-motor cortex (vSMC) during the production of consonant-vowel syllables. We analyzed the relationship between the acoustic parameters of vowels (pitch and formants) and cortical activity on a single-trial level. We found that vSMC activity robustly predicted acoustic parameters across vowel categories (up to 80% of variance), as well as different renditions of the same vowel (up to 25% of variance). Furthermore, we observed significant contextual effects on vSMC representations of produced phonemes that suggest active control of coarticulation: vSMC representations for vowels were biased toward the representations of the preceding consonant, and conversely, representations for consonants were biased toward upcoming vowels. These results reveal that vSMC activity for phonemes are not invariant and provide insight into the cortical mechanisms of coarticulation. PMID:25232105

  17. How private is your consultation? Acoustic and audiological measures of speech privacy in the otolaryngology clinic.

    PubMed

    Clamp, Philip J; Grant, David G; Zapala, David A; Hawkins, David B

    2011-01-01

    The right to confidentiality is a central tenet of the doctor-patient relationship. In the United Kingdom this right to confidentiality is recognised in published GMC guidance. In USA the Healthcare Insurance Portability and Accountability Act of 1996 (HIPAA) strengthened the legal requirement to protect patient information in all forms and failure to do so now constitutes a federal offence. The aims of this study are to assess the acoustic privacy of an otolaryngology outpatient consultation room. Acoustic privacy was measured using the articulation index (AI) and Bamford-Kowal-Bench (BKB) speech discrimination tests. BKB speech tests were calibrated to normal conversational volume (50 dB SPL). Both AI and BKB were calculated in four positions around the ENT clinic: within the consultation room, outside the consulting room door, in the nearest waiting area chair and in the farthest waiting area chair. Tests were undertaken with the clinic room door closed and open to assess the effect on privacy. With the clinic room door closed, mean BKB scores in nearest and farthest waiting area chairs were 51 and 41% respectively. AI scores in the waiting area chairs were 0.03 and 0.02. With the clinic room door open, privacy was lost in both AI and BKB testing, with almost 100% of word discernable at normal talking levels. The results of this study highlight the poor level of speech privacy within a standard ENT outpatient department. AI is a poor predictor or privacy.

  18. Correlation of signals of thermal acoustic radiation

    NASA Astrophysics Data System (ADS)

    Anosov, A. A.; Passechnik, V. I.

    2003-03-01

    The spatial correlation function is measured for the pressure of thermal acoustic radiation from a source (a narrow plasticine plate) whose temperature is made both higher and lower than the temperature of the receiver. The spatial correlation function of the pressure of thermal acoustic radiation is found to be oscillatory in character. The oscillation amplitude is determined not by the absolute temperature of the source but by the temperature difference between the source and the receiver. The correlation function changes its sign when a source heated with respect to the receiver is replaced by a cooled one.

  19. Influence of Visual Echo and Visual Reverberation on Speech Fluency in Stutterers.

    ERIC Educational Resources Information Center

    Smolka, Elzbieta; Adamczyk, Bogdan

    1992-01-01

    The influence of visual signals (echo and reverberation) on speech fluency in 60 stutterers and nonstutterers was examined. Visual signals were found to exert a corrective influence on the speech of stutterers but less than the influence of acoustic stimuli. Use of visual signals in combination with acoustic and tactile signals is recommended. (DB)

  20. Teachers and Teaching: Speech Production Accommodations Due to Changes in the Acoustic Environment

    PubMed Central

    Hunter, Eric J.; Bottalico, Pasquale; Graetzer, Simone; Leishman, Timothy W.; Berardi, Mark L.; Eyring, Nathan G.; Jensen, Zachary R.; Rolins, Michael K.; Whiting, Jennifer K.

    2016-01-01

    School teachers have an elevated risk of voice problems due to the vocal demands in the workplace. This manuscript presents the results of three studies investigating teachers’ voice use at work. In the first study, 57 teachers were observed for 2 weeks (waking hours) to compare how they used their voice in the school environment and in non-school environments. In a second study, 45 participants performed a short vocal task in two different rooms: a variable acoustic room and an anechoic chamber. Subjects were taken back and forth between the two rooms. Each time they entered the variable acoustics room, the reverberation time and/or the background noise condition had been modified. In this latter study, subjects responded to questions about their vocal comfort and their perception of changes in the acoustic environment. In a third study, 20 untrained vocalists performed a simple vocal task in the following conditions: with and without background babble and with and without transparent plexiglass shields to increase the first reflection. Relationships were examined between [1] the results for the room acoustic parameters; [2] the subjects’ perception of the room; and [3] the recorded speech acoustic. Several differences between male and female subjects were found; some of those differences held for each room condition (at school vs. not at school, reverberation level, noise level, and early reflection). PMID:26949426

  1. Mesoscale variations in acoustic signals induced by atmospheric gravity waves.

    PubMed

    Chunchuzov, Igor; Kulichkov, Sergey; Perepelkin, Vitaly; Ziemann, Astrid; Arnold, Klaus; Kniffka, Anke

    2009-02-01

    The results of acoustic tomographic monitoring of the coherent structures in the lower atmosphere and the effects of these structures on acoustic signal parameters are analyzed in the present study. From the measurements of acoustic travel time fluctuations (periods 1 min-1 h) with distant receivers, the temporal fluctuations of the effective sound speed and wind speed are retrieved along different ray paths connecting an acoustic pulse source and several receivers. By using a coherence analysis of the fluctuations near spatially distanced ray turning points, the internal wave-associated fluctuations are filtered and their spatial characteristics (coherences, horizontal phase velocities, and spatial scales) are estimated. The capability of acoustic tomography in estimating wind shear near ground is shown. A possible mechanism describing the temporal modulation of the near-ground wind field by ducted internal waves in the troposphere is proposed.

  2. Effect of several acoustic cues on perceiving Mandarin retroflex affricates and fricatives in continuous speech.

    PubMed

    Zhu, Jian; Chen, Yaping

    2016-07-01

    Relatively little attention has been paid to the perception of the three-way contrast between unaspirated affricates, aspirated affricates and fricatives in Mandarin Chinese. This study reports two experiments that explore the acoustic cues relevant to the contrast between the Mandarin retroflex series /tʂ/, /tʂ(h)/ and /ʂ/ in continuous speech. Twenty participants performed two three-alternative forced-choice tasks, in which acoustic cues including closure, frication duration (FD), aspiration, and vocalic contexts (VCs) were systematically manipulated and presented in a carrier phrase. A subsequent classification tree analysis shows that FD distinguishes /tʂ/ from /tʂ(h)/ and /ʂ/, and that closure cues the affricate manner. Interactions between VC and individual cues are also found. The FD threshold for separating /ʂ/ and /tʂ/ is susceptible to the influence of the following vocalic segments, shifting to lower values if frication is followed by the low vowel /a/. On the other hand, while aspiration cues /tʂ(h)/ before /a/ and //, this acoustic cue is obscured by gesture continuation when /tʂ(h)/ precedes its homorganic approximant /ɻ/ in natural speech, which might cause potential confusion between /tʂ(h)/ and /ʂ/. PMID:27475170

  3. Logopenic and nonfluent variants of primary progressive aphasia are differentiated by acoustic measures of speech production.

    PubMed

    Ballard, Kirrie J; Savage, Sharon; Leyton, Cristian E; Vogel, Adam P; Hornberger, Michael; Hodges, John R

    2014-01-01

    Differentiation of logopenic (lvPPA) and nonfluent/agrammatic (nfvPPA) variants of Primary Progressive Aphasia is important yet remains challenging since it hinges on expert based evaluation of speech and language production. In this study acoustic measures of speech in conjunction with voxel-based morphometry were used to determine the success of the measures as an adjunct to diagnosis and to explore the neural basis of apraxia of speech in nfvPPA. Forty-one patients (21 lvPPA, 20 nfvPPA) were recruited from a consecutive sample with suspected frontotemporal dementia. Patients were diagnosed using the current gold-standard of expert perceptual judgment, based on presence/absence of particular speech features during speaking tasks. Seventeen healthy age-matched adults served as controls. MRI scans were available for 11 control and 37 PPA cases; 23 of the PPA cases underwent amyloid ligand PET imaging. Measures, corresponding to perceptual features of apraxia of speech, were periods of silence during reading and relative vowel duration and intensity in polysyllable word repetition. Discriminant function analyses revealed that a measure of relative vowel duration differentiated nfvPPA cases from both control and lvPPA cases (r(2) = 0.47) with 88% agreement with expert judgment of presence of apraxia of speech in nfvPPA cases. VBM analysis showed that relative vowel duration covaried with grey matter intensity in areas critical for speech motor planning and programming: precentral gyrus, supplementary motor area and inferior frontal gyrus bilaterally, only affected in the nfvPPA group. This bilateral involvement of frontal speech networks in nfvPPA potentially affects access to compensatory mechanisms involving right hemisphere homologues. Measures of silences during reading also discriminated the PPA and control groups, but did not increase predictive accuracy. Findings suggest that a measure of relative vowel duration from of a polysyllable word repetition task

  4. Logopenic and Nonfluent Variants of Primary Progressive Aphasia Are Differentiated by Acoustic Measures of Speech Production

    PubMed Central

    Ballard, Kirrie J.; Savage, Sharon; Leyton, Cristian E.; Vogel, Adam P.; Hornberger, Michael; Hodges, John R.

    2014-01-01

    Differentiation of logopenic (lvPPA) and nonfluent/agrammatic (nfvPPA) variants of Primary Progressive Aphasia is important yet remains challenging since it hinges on expert based evaluation of speech and language production. In this study acoustic measures of speech in conjunction with voxel-based morphometry were used to determine the success of the measures as an adjunct to diagnosis and to explore the neural basis of apraxia of speech in nfvPPA. Forty-one patients (21 lvPPA, 20 nfvPPA) were recruited from a consecutive sample with suspected frontotemporal dementia. Patients were diagnosed using the current gold-standard of expert perceptual judgment, based on presence/absence of particular speech features during speaking tasks. Seventeen healthy age-matched adults served as controls. MRI scans were available for 11 control and 37 PPA cases; 23 of the PPA cases underwent amyloid ligand PET imaging. Measures, corresponding to perceptual features of apraxia of speech, were periods of silence during reading and relative vowel duration and intensity in polysyllable word repetition. Discriminant function analyses revealed that a measure of relative vowel duration differentiated nfvPPA cases from both control and lvPPA cases (r2 = 0.47) with 88% agreement with expert judgment of presence of apraxia of speech in nfvPPA cases. VBM analysis showed that relative vowel duration covaried with grey matter intensity in areas critical for speech motor planning and programming: precentral gyrus, supplementary motor area and inferior frontal gyrus bilaterally, only affected in the nfvPPA group. This bilateral involvement of frontal speech networks in nfvPPA potentially affects access to compensatory mechanisms involving right hemisphere homologues. Measures of silences during reading also discriminated the PPA and control groups, but did not increase predictive accuracy. Findings suggest that a measure of relative vowel duration from of a polysyllable word repetition task

  5. Pulse analysis of acoustic emission signals

    NASA Technical Reports Server (NTRS)

    Houghton, J. R.; Packman, P. F.

    1977-01-01

    A method for the signature analysis of pulses in the frequency domain and the time domain is presented. Fourier spectrum, Fourier transfer function, shock spectrum and shock spectrum ratio were examined in the frequency domain analysis and pulse shape deconvolution was developed for use in the time domain analysis. Comparisons of the relative performance of each analysis technique are made for the characterization of acoustic emission pulses recorded by a measuring system. To demonstrate the relative sensitivity of each of the methods to small changes in the pulse shape, signatures of computer modeled systems with analytical pulses are presented. Optimization techniques are developed and used to indicate the best design parameter values for deconvolution of the pulse shape. Several experiments are presented that test the pulse signature analysis methods on different acoustic emission sources. These include acoustic emission associated with (a) crack propagation, (b) ball dropping on a plate, (c) spark discharge, and (d) defective and good ball bearings. Deconvolution of the first few micro-seconds of the pulse train is shown to be the region in which the significant signatures of the acoustic emission event are to be found.

  6. Pulse analysis of acoustic emission signals

    NASA Technical Reports Server (NTRS)

    Houghton, J. R.; Packman, P. F.

    1977-01-01

    A method for the signature analysis of pulses in the frequency domain and the time domain is presented. Fourier spectrum, Fourier transfer function, shock spectrum and shock spectrum ratio were examined in the frequency domain analysis, and pulse shape deconvolution was developed for use in the time domain analysis. Comparisons of the relative performance of each analysis technique are made for the characterization of acoustic emission pulses recorded by a measuring system. To demonstrate the relative sensitivity of each of the methods to small changes in the pulse shape, signatures of computer modeled systems with analytical pulses are presented. Optimization techniques are developed and used to indicate the best design parameters values for deconvolution of the pulse shape. Several experiments are presented that test the pulse signature analysis methods on different acoustic emission sources. These include acoustic emissions associated with: (1) crack propagation, (2) ball dropping on a plate, (3) spark discharge and (4) defective and good ball bearings. Deconvolution of the first few micro-seconds of the pulse train are shown to be the region in which the significant signatures of the acoustic emission event are to be found.

  7. The role of metrical information in apraxia of speech. Perceptual and acoustic analyses of word stress.

    PubMed

    Aichert, Ingrid; Späth, Mona; Ziegler, Wolfram

    2016-02-01

    Several factors are known to influence speech accuracy in patients with apraxia of speech (AOS), e.g., syllable structure or word length. However, the impact of word stress has largely been neglected so far. More generally, the role of prosodic information at the phonetic encoding stage of speech production often remains unconsidered in models of speech production. This study aimed to investigate the influence of word stress on error production in AOS. Two-syllabic words with stress on the first (trochees) vs. the second syllable (iambs) were compared in 14 patients with AOS, three of them exhibiting pure AOS, and in a control group of six normal speakers. The patients produced significantly more errors on iambic than on trochaic words. A most prominent metrical effect was obtained for segmental errors. Acoustic analyses of word durations revealed a disproportionate advantage of the trochaic meter in the patients relative to the healthy controls. The results indicate that German apraxic speakers are sensitive to metrical information. It is assumed that metrical patterns function as prosodic frames for articulation planning, and that the regular metrical pattern in German, the trochaic form, has a facilitating effect on word production in patients with AOS. PMID:26792367

  8. Acoustic signal characteristics during IR laser ablation and their consequences for acoustic tissue discrimination

    NASA Astrophysics Data System (ADS)

    Nahen, Kester; Vogel, Alfred

    2000-06-01

    IR laser ablation of skin is accompanied by acoustic signals the characteristics of which are closely linked to the ablation dynamics. A discrimination between different tissue layers, for example necrotic and vital tissue during laser burn debridement, is therefore possible by an analysis of the acoustic signal. We were able to discriminate tissue layers by evaluating the acoustic energy. To get a better understanding of the tissue specificity of the ablation noise, we investigated the correlation between sample water content, ablation dynamics, and characteristics of the acoustic signal. A free running Er:YAG laser with a maximum pulse energy of 2 J and a spot diameter of 5 mm was used to ablate gelatin samples with different water content. The ablation noise in air was detected using a piezoelectric transducer with a bandwidth of 1 MHz, and the acoustic signal generated inside the ablated sample was measured simultaneously ba a piezoelectric transducer in contact with the sample. Laser flash Schlieren photography was used to investigate the expansion velocity of the vapor plume and the velocity of the ejected material. We observed large differences between the ablation dynamics and material ejection velocity for gelatin samples with 70% and 90% water content. These differences cannot be explained by the small change of the gelatin absorption coefficient, but are largely related to differences of the mechanical properties of the sample. The different ablation dynamics are responsible for an increase of the acoustic energy by a factor of 10 for the sample with the higher water content.

  9. The effects of selective attention and speech acoustics on neural speech-tracking in a multi-talker scene.

    PubMed

    Rimmele, Johanna M; Zion Golumbic, Elana; Schröger, Erich; Poeppel, David

    2015-07-01

    Attending to one speaker in multi-speaker situations is challenging. One neural mechanism proposed to underlie the ability to attend to a particular speaker is phase-locking of low-frequency activity in auditory cortex to speech's temporal envelope ("speech-tracking"), which is more precise for attended speech. However, it is not known what brings about this attentional effect, and specifically if it reflects enhanced processing of the fine structure of attended speech. To investigate this question we compared attentional effects on speech-tracking of natural versus vocoded speech which preserves the temporal envelope but removes the fine structure of speech. Pairs of natural and vocoded speech stimuli were presented concurrently and participants attended to one stimulus and performed a detection task while ignoring the other stimulus. We recorded magnetoencephalography (MEG) and compared attentional effects on the speech-tracking response in auditory cortex. Speech-tracking of natural, but not vocoded, speech was enhanced by attention, whereas neural tracking of ignored speech was similar for natural and vocoded speech. These findings suggest that the more precise speech-tracking of attended natural speech is related to processing its fine structure, possibly reflecting the application of higher-order linguistic processes. In contrast, when speech is unattended its fine structure is not processed to the same degree and thus elicits less precise speech-tracking more similar to vocoded speech.

  10. Acoustic Aspects of Photoacoustic Signal Generation and Detection in Gases

    NASA Astrophysics Data System (ADS)

    Miklós, A.

    2015-09-01

    In this paper photoacoustic signal generation and detection in gases is investigated and discussed from the standpoint of acoustics. Four topics are considered: the effect of the absorption-desorption process of modulated and pulsed light on the heat power density released in the gas; the generation of the primary sound by the released heat in an unbounded medium; the excitation of an acoustic resonator by the primary sound; and finally, the generation of the measurable PA signal by a microphone. When light is absorbed by a molecule and the excess energy is relaxed by collisions with the surrounding molecules, the average kinetic energy, thus also the temperature of an ensemble of molecules (called "particle" in acoustics) will increase. In other words heat energy is added to the energy of the particle. The rate of the energy transfer is characterized by the heat power density. A simple two-level model of absorption-desorption is applied for describing the heat power generation process for modulated and pulsed illumination. Sound generation by a laser beam in an unbounded medium is discussed by means of the Green's function technique. It is shown that the duration of the generated sound pulse depends mostly on beam geometry. A photoacoustic signal is mostly detected in a photoacoustic cell composed of acoustic resonators, buffers, filters, etc. It is not easy to interpret the measured PA signal in such a complicated acoustic system. The acoustic response of a PA detector to different kinds of excitations (modulated cw, pulsed, periodic pulse train) is discussed. It is shown that acoustic resonators respond very differently to modulated cw excitation and to excitation by a pulse train. The microphone for detecting the PA signal is also a part of the acoustic system; its properties have to be taken into account by the design of a PA detector. The moving membrane of the microphone absorbs acoustic energy; thus, it may influence the resonance frequency and

  11. Wavelet-based ground vehicle recognition using acoustic signals

    NASA Astrophysics Data System (ADS)

    Choe, Howard C.; Karlsen, Robert E.; Gerhart, Grant R.; Meitzler, Thomas J.

    1996-03-01

    We present, in this paper, a wavelet-based acoustic signal analysis to remotely recognize military vehicles using their sound intercepted by acoustic sensors. Since expedited signal recognition is imperative in many military and industrial situations, we developed an algorithm that provides an automated, fast signal recognition once implemented in a real-time hardware system. This algorithm consists of wavelet preprocessing, feature extraction and compact signal representation, and a simple but effective statistical pattern matching. The current status of the algorithm does not require any training. The training is replaced by human selection of reference signals (e.g., squeak or engine exhaust sound) distinctive to each individual vehicle based on human perception. This allows a fast archiving of any new vehicle type in the database once the signal is collected. The wavelet preprocessing provides time-frequency multiresolution analysis using discrete wavelet transform (DWT). Within each resolution level, feature vectors are generated from statistical parameters and energy content of the wavelet coefficients. After applying our algorithm on the intercepted acoustic signals, the resultant feature vectors are compared with the reference vehicle feature vectors in the database using statistical pattern matching to determine the type of vehicle from where the signal originated. Certainly, statistical pattern matching can be replaced by an artificial neural network (ANN); however, the ANN would require training data sets and time to train the net. Unfortunately, this is not always possible for many real world situations, especially collecting data sets from unfriendly ground vehicles to train the ANN. Our methodology using wavelet preprocessing and statistical pattern matching provides robust acoustic signal recognition. We also present an example of vehicle recognition using acoustic signals collected from two different military ground vehicles. In this paper, we will

  12. Advantages from bilateral hearing in speech perception in noise with simulated cochlear implants and residual acoustic hearing.

    PubMed

    Schoof, Tim; Green, Tim; Faulkner, Andrew; Rosen, Stuart

    2013-02-01

    Acoustic simulations were used to study the contributions of spatial hearing that may arise from combining a cochlear implant with either a second implant or contralateral residual low-frequency acoustic hearing. Speech reception thresholds (SRTs) were measured in twenty-talker babble. Spatial separation of speech and noise was simulated using a spherical head model. While low-frequency acoustic information contralateral to the implant simulation produced substantially better SRTs there was no effect of spatial cues on SRT, even when interaural differences were artificially enhanced. Simulated bilateral implants showed a significant head shadow effect, but no binaural unmasking based on interaural time differences, and weak, inconsistent overall spatial release from masking. There was also a small but significant non-spatial summation effect. It appears that typical cochlear implant speech processing strategies may substantially reduce the utility of spatial cues, even in the absence of degraded neural processing arising from auditory deprivation. PMID:23363118

  13. The effects of selective attention and speech acoustics on neural speech-tracking in a multi-talker scene

    PubMed Central

    Rimmele, Johanna M.; Golumbic, Elana Zion; Schröger, Erich; Poeppel, David

    2015-01-01

    Attending to one speaker in multi-speaker situations is challenging. One neural mechanism proposed to underlie the ability to attend to a particular speaker is phase-locking of low-frequency activity in auditory cortex to speech’s temporal envelope (“speech-tracking”), which is more precise for attended speech. However, it is not known what brings about this attentional effect, and specifically if it reflects enhanced processing of the fine structure of attended speech. To investigate this question we compared attentional effects on speech-tracking of natural vs. vocoded speech which preserves the temporal envelope but removes the fine-structure of speech. Pairs of natural and vocoded speech stimuli were presented concurrently and participants attended to one stimulus and performed a detection task while ignoring the other stimulus. We recorded magnetoencephalography (MEG) and compared attentional effects on the speech-tracking response in auditory cortex. Speech-tracking of natural, but not vocoded, speech was enhanced by attention, whereas neural tracking of ignored speech was similar for natural and vocoded speech. These findings suggest that the more precise speech tracking of attended natural speech is related to processing its fine structure, possibly reflecting the application of higher-order linguistic processes. In contrast, when speech is unattended its fine structure is not processed to the same degree and thus elicits less precise speech tracking more similar to vocoded speech. PMID:25650107

  14. Acoustic Feature Optimization Based on F-Ratio for Robust Speech Recognition

    NASA Astrophysics Data System (ADS)

    Sun, Yanqing; Zhou, Yu; Zhao, Qingwei; Yan, Yonghong

    This paper focuses on the problem of performance degradation in mismatched speech recognition. The F-Ratio analysis method is utilized to analyze the significance of different frequency bands for speech unit classification, and we find that frequencies around 1kHz and 3kHz, which are the upper bounds of the first and the second formants for most of the vowels, should be emphasized in comparison to the Mel-frequency cepstral coefficients (MFCC). The analysis result is further observed to be stable in several typical mismatched situations. Similar to the Mel-Frequency scale, another frequency scale called the F-Ratio-scale is thus proposed to optimize the filter bank design for the MFCC features, and make each subband contains equal significance for speech unit classification. Under comparable conditions, with the modified features we get a relative 43.20% decrease compared with the MFCC in sentence error rate for the emotion affected speech recognition, 35.54%, 23.03% for the noisy speech recognition at 15dB and 0dB SNR (signal to noise ratio) respectively, and 64.50% for the three years' 863 test data. The application of the F-Ratio analysis on the clean training set of the Aurora2 database demonstrates its robustness over languages, texts and sampling rates.

  15. Analysis of acoustic signals on welding and cutting

    SciTech Connect

    Morita, Takao; Ogawa, Yoji; Sumitomo, Takashi

    1995-12-31

    The sounds emitted during the welding and cutting processes are closely related to the processing phenomena, and sometimes they provide useful information for evaluation of their processing conditions. The analyses of acoustic signals from arc welding, plasma arc cutting, oxy-flame cutting, and water jet cutting are carried out in details in order to develop effective signal processing algorithm. The sound from TIG arc welding has the typical line spectrum which principal frequency, is almost the same as that of supplied electricity. The disturbance of welding process is clearly appeared oil the acoustic emission. The sound exposure level for CO{sub 2} or MIG welding is higher than that for TIG welding, and the relative intensity of the typical line spectrum caused by supplied electricity becomes low. But the sudden transition of welding condition oil produces an apparent change of sound exposure level. On the contrary, the acoustics from cutting processes are much louder than those of arc welding and show more chaotic behavior because the supplied fluid velocity and temperature of arc for cutting processes are much higher than those for welding processes. Therefore, it requires a special technique to extract the well meaning signals from the loud acoustic sounds. Further point of view, the reduction of acoustic exposure level becomes an important research theme with the growth of application fields of cutting processes.

  16. Digital Signal Processing in Acoustics--Part 2.

    ERIC Educational Resources Information Center

    Davies, H.; McNeill, D. J.

    1986-01-01

    Reviews the potential of a data acquisition system for illustrating the nature and significance of ideas in digital signal processing. Focuses on the fast Fourier transform and the utility of its two-channel format, emphasizing cross-correlation and its two-microphone technique of acoustic intensity measurement. Includes programing format. (ML)

  17. Atmospheric influence on volcano-acoustic signals

    NASA Astrophysics Data System (ADS)

    Matoza, Robin; de Groot-Hedlin, Catherine; Hedlin, Michael; Fee, David; Garcés, Milton; Le Pichon, Alexis

    2010-05-01

    Volcanoes are natural sources of infrasound, useful for studying infrasonic propagation in the atmosphere. Large, explosive volcanic eruptions typically produce signals that can be recorded at ranges of hundreds of kilometers propagating in atmospheric waveguides. In addition, sustained volcanic eruptions can produce smaller-amplitude repetitive signals recordable at >10 km range. These include repetitive impulsive signals and continuous tremor signals. The source functions of these signals can remain relatively invariant over timescales of weeks to months. Observed signal fluctuations from such persistent sources at an infrasound recording station may therefore be attributed to dynamic atmospheric propagation effects. We present examples of repetitive and sustained volcano infrasound sources at Mount St. Helens, Washington and Kilauea Volcano, Hawaii, USA. The data recorded at >10 km range show evidence of propagation effects induced by tropospheric variability at the mesoscale and microscale. Ray tracing and finite-difference simulations of the infrasound propagation produce qualitatively consistent results. However, the finite-difference simulations indicate that low-frequency effects such as diffraction, and scattering from topography may be important factors for infrasonic propagation at this scale.

  18. An acoustical assessment of pitch-matching accuracy in relation to speech frequency, speech frequency range, age and gender in preschool children

    NASA Astrophysics Data System (ADS)

    Trollinger, Valerie L.

    This study investigated the relationship between acoustical measurement of singing accuracy in relationship to speech fundamental frequency, speech fundamental frequency range, age and gender in preschool-aged children. Seventy subjects from Southeastern Pennsylvania; the San Francisco Bay Area, California; and Terre Haute, Indiana, participated in the study. Speech frequency was measured by having the subjects participate in spontaneous and guided speech activities with the researcher, with 18 diverse samples extracted from each subject's recording for acoustical analysis for fundamental frequency in Hz with the CSpeech computer program. The fundamental frequencies were averaged together to derive a mean speech frequency score for each subject. Speech range was calculated by subtracting the lowest fundamental frequency produced from the highest fundamental frequency produced, resulting in a speech range measured in increments of Hz. Singing accuracy was measured by having the subjects each echo-sing six randomized patterns using the pitches Middle C, D, E, F♯, G and A (440), using the solfege syllables of Do and Re, which were recorded by a 5-year-old female model. For each subject, 18 samples of singing were recorded. All samples were analyzed by the CSpeech for fundamental frequency. For each subject, deviation scores in Hz were derived by calculating the difference between what the model sang in Hz and what the subject sang in response in Hz. Individual scores for each child consisted of an overall mean total deviation frequency, mean frequency deviations for each pattern, and mean frequency deviation for each pitch. Pearson correlations, MANOVA and ANOVA analyses, Multiple Regressions and Discriminant Analysis revealed the following findings: (1) moderate but significant (p < .001) relationships emerged between mean speech frequency and the ability to sing the pitches E, F♯, G and A in the study; (2) mean speech frequency also emerged as the strongest

  19. Acoustic diagnosis of pulmonary hypertension: automated speech- recognition-inspired classification algorithm outperforms physicians.

    PubMed

    Kaddoura, Tarek; Vadlamudi, Karunakar; Kumar, Shine; Bobhate, Prashant; Guo, Long; Jain, Shreepal; Elgendi, Mohamed; Coe, James Y; Kim, Daniel; Taylor, Dylan; Tymchak, Wayne; Schuurmans, Dale; Zemp, Roger J; Adatia, Ian

    2016-01-01

    We hypothesized that an automated speech- recognition-inspired classification algorithm could differentiate between the heart sounds in subjects with and without pulmonary hypertension (PH) and outperform physicians. Heart sounds, electrocardiograms, and mean pulmonary artery pressures (mPAp) were recorded simultaneously. Heart sound recordings were digitized to train and test speech-recognition-inspired classification algorithms. We used mel-frequency cepstral coefficients to extract features from the heart sounds. Gaussian-mixture models classified the features as PH (mPAp ≥ 25 mmHg) or normal (mPAp < 25 mmHg). Physicians blinded to patient data listened to the same heart sound recordings and attempted a diagnosis. We studied 164 subjects: 86 with mPAp ≥ 25 mmHg (mPAp 41 ± 12 mmHg) and 78 with mPAp < 25 mmHg (mPAp 17 ± 5 mmHg) (p  < 0.005). The correct diagnostic rate of the automated speech-recognition-inspired algorithm was 74% compared to 56% by physicians (p = 0.005). The false positive rate for the algorithm was 34% versus 50% (p = 0.04) for clinicians. The false negative rate for the algorithm was 23% and 68% (p = 0.0002) for physicians. We developed an automated speech-recognition-inspired classification algorithm for the acoustic diagnosis of PH that outperforms physicians that could be used to screen for PH and encourage earlier specialist referral. PMID:27609672

  20. Acoustic diagnosis of pulmonary hypertension: automated speech- recognition-inspired classification algorithm outperforms physicians

    PubMed Central

    Kaddoura, Tarek; Vadlamudi, Karunakar; Kumar, Shine; Bobhate, Prashant; Guo, Long; Jain, Shreepal; Elgendi, Mohamed; Coe, James Y; Kim, Daniel; Taylor, Dylan; Tymchak, Wayne; Schuurmans, Dale; Zemp, Roger J.; Adatia, Ian

    2016-01-01

    We hypothesized that an automated speech- recognition-inspired classification algorithm could differentiate between the heart sounds in subjects with and without pulmonary hypertension (PH) and outperform physicians. Heart sounds, electrocardiograms, and mean pulmonary artery pressures (mPAp) were recorded simultaneously. Heart sound recordings were digitized to train and test speech-recognition-inspired classification algorithms. We used mel-frequency cepstral coefficients to extract features from the heart sounds. Gaussian-mixture models classified the features as PH (mPAp ≥ 25 mmHg) or normal (mPAp < 25 mmHg). Physicians blinded to patient data listened to the same heart sound recordings and attempted a diagnosis. We studied 164 subjects: 86 with mPAp ≥ 25 mmHg (mPAp 41 ± 12 mmHg) and 78 with mPAp < 25 mmHg (mPAp 17 ± 5 mmHg) (p  < 0.005). The correct diagnostic rate of the automated speech-recognition-inspired algorithm was 74% compared to 56% by physicians (p = 0.005). The false positive rate for the algorithm was 34% versus 50% (p = 0.04) for clinicians. The false negative rate for the algorithm was 23% and 68% (p = 0.0002) for physicians. We developed an automated speech-recognition-inspired classification algorithm for the acoustic diagnosis of PH that outperforms physicians that could be used to screen for PH and encourage earlier specialist referral. PMID:27609672

  1. Acoustic diagnosis of pulmonary hypertension: automated speech- recognition-inspired classification algorithm outperforms physicians

    NASA Astrophysics Data System (ADS)

    Kaddoura, Tarek; Vadlamudi, Karunakar; Kumar, Shine; Bobhate, Prashant; Guo, Long; Jain, Shreepal; Elgendi, Mohamed; Coe, James Y.; Kim, Daniel; Taylor, Dylan; Tymchak, Wayne; Schuurmans, Dale; Zemp, Roger J.; Adatia, Ian

    2016-09-01

    We hypothesized that an automated speech- recognition-inspired classification algorithm could differentiate between the heart sounds in subjects with and without pulmonary hypertension (PH) and outperform physicians. Heart sounds, electrocardiograms, and mean pulmonary artery pressures (mPAp) were recorded simultaneously. Heart sound recordings were digitized to train and test speech-recognition-inspired classification algorithms. We used mel-frequency cepstral coefficients to extract features from the heart sounds. Gaussian-mixture models classified the features as PH (mPAp ≥ 25 mmHg) or normal (mPAp < 25 mmHg). Physicians blinded to patient data listened to the same heart sound recordings and attempted a diagnosis. We studied 164 subjects: 86 with mPAp ≥ 25 mmHg (mPAp 41 ± 12 mmHg) and 78 with mPAp < 25 mmHg (mPAp 17 ± 5 mmHg) (p  < 0.005). The correct diagnostic rate of the automated speech-recognition-inspired algorithm was 74% compared to 56% by physicians (p = 0.005). The false positive rate for the algorithm was 34% versus 50% (p = 0.04) for clinicians. The false negative rate for the algorithm was 23% and 68% (p = 0.0002) for physicians. We developed an automated speech-recognition-inspired classification algorithm for the acoustic diagnosis of PH that outperforms physicians that could be used to screen for PH and encourage earlier specialist referral.

  2. The influence of source acceleration on acoustic signals

    NASA Technical Reports Server (NTRS)

    Kelly, Jeffrey J.; Wilson, Mark R.

    1993-01-01

    The effect of aircraft acceleration on acoustic signals is often ignored in both analytical studies and data reduction of flight test measurements. In this study, the influence of source acceleration on acoustic signals is analyzed using computer simulated signals for an accelerating point source. Both rotating and translating sources are considered. Using a known signal allows an assessment of the influence of source acceleration on the received signal. Aircraft acceleration must also be considered in the measurement and reduction of flyover noise. Tracking of the aircraft over an array of microphones enables ensemble averaging of the acoustic signal, thus increasing the confidence in the measured data. This is only valid when both the altitude and velocity remain constant. For an accelerating aircraft, each microphone is exposed to differing flight velocities, Doppler shifts, and smear angles. Thus, averaging across the array in the normal manner is constrained by aircraft acceleration and microphone spacing. In this study computer simulated spectra, containing acceleration, are averaged across a 12 microphone array mimicking a flight test with accelerated profile in which noise data was obtained. Overlapped processing is performed is performed in the flight test measurements in order to alleviate spectral smearing.

  3. Transient Auditory Storage of Acoustic Details Is Associated with Release of Speech from Informational Masking in Reverberant Conditions

    ERIC Educational Resources Information Center

    Huang, Ying; Huang, Qiang; Chen, Xun; Wu, Xihong; Li, Liang

    2009-01-01

    Perceptual integration of the sound directly emanating from the source with reflections needs both temporal storage and correlation computation of acoustic details. We examined whether the temporal storage is frequency dependent and associated with speech unmasking. In Experiment 1, a break in correlation (BIC) between interaurally correlated…

  4. Discrimination and Comprehension of Synthetic Speech by Students with Visual Impairments: The Case of Similar Acoustic Patterns

    ERIC Educational Resources Information Center

    Papadopoulos, Konstantinos; Argyropoulos, Vassilios S.; Kouroupetroglou, Georgios

    2008-01-01

    This study examined the perceptions held by sighted students and students with visual impairments of the intelligibility and comprehensibility of similar acoustic patterns produced by synthetic speech. It determined the types of errors the students made and compared the performance of the two groups on auditory discrimination and comprehension.

  5. Processing of Speech Signals for Physical and Sensory Disabilities

    NASA Astrophysics Data System (ADS)

    Levitt, Harry

    1995-10-01

    Assistive technology involving voice communication is used primarily by people who are deaf, hard of hearing, or who have speech and/or language disabilities. It is also used to a lesser extent by people with visual or motor disabilities. A very wide range of devices has been developed for people with hearing loss. These devices can be categorized not only by the modality of stimulation [i.e., auditory, visual, tactile, or direct electrical stimulation of the auditory nerve (auditory-neural)] but also in terms of the degree of speech processing that is used. At least four such categories can be distinguished: assistive devices (a) that are not designed specifically for speech, (b) that take the average characteristics of speech into account, (c) that process articulatory or phonetic characteristics of speech, and (d) that embody some degree of automatic speech recognition. Assistive devices for people with speech and/or language disabilities typically involve some form of speech synthesis or symbol generation for severe forms of language disability. Speech synthesis is also used in text-to-speech systems for sightless persons. Other applications of assistive technology involving voice communication include voice control of wheelchairs and other devices for people with mobility disabilities.

  6. Processing of speech signals for physical and sensory disabilities.

    PubMed

    Levitt, H

    1995-10-24

    Assistive technology involving voice communication is used primarily by people who are deaf, hard of hearing, or who have speech and/or language disabilities. It is also used to a lesser extent by people with visual or motor disabilities. A very wide range of devices has been developed for people with hearing loss. These devices can be categorized not only by the modality of stimulation [i.e., auditory, visual, tactile, or direct electrical stimulation of the auditory nerve (auditory-neural)] but also in terms of the degree of speech processing that is used. At least four such categories can be distinguished: assistive devices (a) that are not designed specifically for speech, (b) that take the average characteristics of speech into account, (c) that process articulatory or phonetic characteristics of speech, and (d) that embody some degree of automatic speech recognition. Assistive devices for people with speech and/or language disabilities typically involve some form of speech synthesis or symbol generation for severe forms of language disability. Speech synthesis is also used in text-to-speech systems for sightless persons. Other applications of assistive technology involving voice communication include voice control of wheelchairs and other devices for people with mobility disabilities. PMID:7479816

  7. Comparing the effects of reverberation and of noise on speech recognition in simulated electric-acoustic listening

    PubMed Central

    Helms Tillery, Kate; Brown, Christopher A.; Bacon, Sid P.

    2012-01-01

    Cochlear implant users report difficulty understanding speech in both noisy and reverberant environments. Electric-acoustic stimulation (EAS) is known to improve speech intelligibility in noise. However, little is known about the potential benefits of EAS in reverberation, or about how such benefits relate to those observed in noise. The present study used EAS simulations to examine these questions. Sentences were convolved with impulse responses from a model of a room whose estimated reverberation times were varied from 0 to 1 sec. These reverberated stimuli were then vocoded to simulate electric stimulation, or presented as a combination of vocoder plus low-pass filtered speech to simulate EAS. Monaural sentence recognition scores were measured in two conditions: reverberated speech and speech in a reverberated noise. The long-term spectrum and amplitude modulations of the noise were equated to the reverberant energy, allowing a comparison of the effects of the interferer (speech vs noise). Results indicate that, at least in simulation, (1) EAS provides significant benefit in reverberation; (2) the benefits of EAS in reverberation may be underestimated by those in a comparable noise; and (3) the EAS benefit in reverberation likely arises from partially preserved cues in this background accessible via the low-frequency acoustic component. PMID:22280603

  8. Time-forward speech intelligibility in time-reversed rooms

    PubMed Central

    Longworth-Reed, Laricia; Brandewie, Eugene; Zahorik, Pavel

    2009-01-01

    The effects of time-reversed room acoustics on word recognition abilities were examined using virtual auditory space techniques, which allowed for temporal manipulation of the room acoustics independent of the speech source signals. Two acoustical conditions were tested: one in which room acoustics were simulated in a realistic time-forward fashion and one in which the room acoustics were reversed in time, causing reverberation and acoustic reflections to precede the direct-path energy. Significant decreases in speech intelligibility—from 89% on average to less than 25%—were observed between the time-forward and time-reversed rooms. This result is not predictable using standard methods for estimating speech intelligibility based on the modulation transfer function of the room. It may instead be due to increased degradation of onset information in the speech signals when room acoustics are time-reversed. PMID:19173377

  9. Time-Frequency Feature Representation Using Multi-Resolution Texture Analysis and Acoustic Activity Detector for Real-Life Speech Emotion Recognition

    PubMed Central

    Wang, Kun-Ching

    2015-01-01

    The classification of emotional speech is mostly considered in speech-related research on human-computer interaction (HCI). In this paper, the purpose is to present a novel feature extraction based on multi-resolutions texture image information (MRTII). The MRTII feature set is derived from multi-resolution texture analysis for characterization and classification of different emotions in a speech signal. The motivation is that we have to consider emotions have different intensity values in different frequency bands. In terms of human visual perceptual, the texture property on multi-resolution of emotional speech spectrogram should be a good feature set for emotion classification in speech. Furthermore, the multi-resolution analysis on texture can give a clearer discrimination between each emotion than uniform-resolution analysis on texture. In order to provide high accuracy of emotional discrimination especially in real-life, an acoustic activity detection (AAD) algorithm must be applied into the MRTII-based feature extraction. Considering the presence of many blended emotions in real life, in this paper make use of two corpora of naturally-occurring dialogs recorded in real-life call centers. Compared with the traditional Mel-scale Frequency Cepstral Coefficients (MFCC) and the state-of-the-art features, the MRTII features also can improve the correct classification rates of proposed systems among different language databases. Experimental results show that the proposed MRTII-based feature information inspired by human visual perception of the spectrogram image can provide significant classification for real-life emotional recognition in speech. PMID:25594590

  10. [A modified speech enhancement algorithm for electronic cochlear implant and its digital signal processing realization].

    PubMed

    Wang, Yulin; Tian, Xuelong

    2014-08-01

    In order to improve the speech quality and auditory perceptiveness of electronic cochlear implant under strong noise background, a speech enhancement system used for electronic cochlear implant front-end was constructed. Taking digital signal processing (DSP) as the core, the system combines its multi-channel buffered serial port (McBSP) data transmission channel with extended audio interface chip TLV320AIC10, so speech signal acquisition and output with high speed are realized. Meanwhile, due to the traditional speech enhancement method which has the problems as bad adaptability, slow convergence speed and big steady-state error, versiera function and de-correlation principle were used to improve the existing adaptive filtering algorithm, which effectively enhanced the quality of voice communications. Test results verified the stability of the system and the de-noising performance of the algorithm, and it also proved that they could provide clearer speech signals for the deaf or tinnitus patients.

  11. [A modified speech enhancement algorithm for electronic cochlear implant and its digital signal processing realization].

    PubMed

    Wang, Yulin; Tian, Xuelong

    2014-08-01

    In order to improve the speech quality and auditory perceptiveness of electronic cochlear implant under strong noise background, a speech enhancement system used for electronic cochlear implant front-end was constructed. Taking digital signal processing (DSP) as the core, the system combines its multi-channel buffered serial port (McBSP) data transmission channel with extended audio interface chip TLV320AIC10, so speech signal acquisition and output with high speed are realized. Meanwhile, due to the traditional speech enhancement method which has the problems as bad adaptability, slow convergence speed and big steady-state error, versiera function and de-correlation principle were used to improve the existing adaptive filtering algorithm, which effectively enhanced the quality of voice communications. Test results verified the stability of the system and the de-noising performance of the algorithm, and it also proved that they could provide clearer speech signals for the deaf or tinnitus patients. PMID:25464779

  12. [A modified speech enhancement algorithm for electronic cochlear implant and its digital signal processing realization].

    PubMed

    Wang, Yulin; Tian, Xuelong

    2014-08-01

    In order to improve the speech quality and auditory perceptiveness of electronic cochlear implant under strong noise background, a speech enhancement system used for electronic cochlear implant front-end was constructed. Taking digital signal processing (DSP) as the core, the system combines its multi-channel buffered serial port (McBSP) data transmission channel with extended audio interface chip TLV320AIC10, so speech signal acquisition and output with high speed are realized. Meanwhile, due to the traditional speech enhancement method which has the problems as bad adaptability, slow convergence speed and big steady-state error, versiera function and de-correlation principle were used to improve the existing adaptive filtering algorithm, which effectively enhanced the quality of voice communications. Test results verified the stability of the system and the de-noising performance of the algorithm, and it also proved that they could provide clearer speech signals for the deaf or tinnitus patients. PMID:25508410

  13. What Information Is Necessary for Speech Categorization? Harnessing Variability in the Speech Signal by Integrating Cues Computed Relative to Expectations

    ERIC Educational Resources Information Center

    McMurray, Bob; Jongman, Allard

    2011-01-01

    Most theories of categorization emphasize how continuous perceptual information is mapped to categories. However, equally important are the informational assumptions of a model, the type of information subserving this mapping. This is crucial in speech perception where the signal is variable and context dependent. This study assessed the…

  14. A Critical Examination of the Statistic Used for Processing Speech Signals.

    ERIC Educational Resources Information Center

    Knox, Keith

    This paper assesses certain properties of human mental processes by focusing on the tactics utilized in perceiving speech signals. Topics discussed in the paper include the power spectrum approach to fluctuations and noise, with particular reference to biological structures; "l/f-like" fluctuations in speech and music and the functioning of a…

  15. Recognition of emotions in Mexican Spanish speech: an approach based on acoustic modelling of emotion-specific vowels.

    PubMed

    Caballero-Morales, Santiago-Omar

    2013-01-01

    An approach for the recognition of emotions in speech is presented. The target language is Mexican Spanish, and for this purpose a speech database was created. The approach consists in the phoneme acoustic modelling of emotion-specific vowels. For this, a standard phoneme-based Automatic Speech Recognition (ASR) system was built with Hidden Markov Models (HMMs), where different phoneme HMMs were built for the consonants and emotion-specific vowels associated with four emotional states (anger, happiness, neutral, sadness). Then, estimation of the emotional state from a spoken sentence is performed by counting the number of emotion-specific vowels found in the ASR's output for the sentence. With this approach, accuracy of 87-100% was achieved for the recognition of emotional state of Mexican Spanish speech.

  16. Recognition of Emotions in Mexican Spanish Speech: An Approach Based on Acoustic Modelling of Emotion-Specific Vowels

    PubMed Central

    Caballero-Morales, Santiago-Omar

    2013-01-01

    An approach for the recognition of emotions in speech is presented. The target language is Mexican Spanish, and for this purpose a speech database was created. The approach consists in the phoneme acoustic modelling of emotion-specific vowels. For this, a standard phoneme-based Automatic Speech Recognition (ASR) system was built with Hidden Markov Models (HMMs), where different phoneme HMMs were built for the consonants and emotion-specific vowels associated with four emotional states (anger, happiness, neutral, sadness). Then, estimation of the emotional state from a spoken sentence is performed by counting the number of emotion-specific vowels found in the ASR's output for the sentence. With this approach, accuracy of 87–100% was achieved for the recognition of emotional state of Mexican Spanish speech. PMID:23935410

  17. Acoustic markers of prominence influence infants' and adults' segmentation of speech sequences.

    PubMed

    Bion, Ricardo A H; Benavides-Varela, Silvia; Nespor, Marina

    2011-03-01

    Two experiments investigated the way acoustic markers of prominence influence the grouping of speech sequences by adults and 7-month-old infants. In the first experiment, adults were familiarized with and asked to memorize sequences of adjacent syllables that alternated in either pitch or duration. During the test phase, participants heard pairs of syllables with constant pitch and duration and were asked whether the syllables had appeared adjacently during familiarization. Adults were better at remembering pairs of syllables that during familiarization had short syllables preceding long syllables, or high-pitched syllables preceding low-pitched syllables. In the second experiment, infants were familiarized and tested with similar stimuli as in the first experiment, and their preference for pairs of syllables was accessed using the head-turn preference paradigm.When familiarized with syllables alternating in pitch, infants showed a preference to listen to pairs of syllables that had high pitch in the first syllable. However, no preference was found when the familiarization stream alternated in duration. It is proposed that these perceptual biases help infants and adults find linguistic units in the continuous speech stream.While the bias for grouping based on pitch appears early in development, biases for durational grouping might rely on more extensive linguistic experience. PMID:21524015

  18. Statistical evidence that musical universals derive from the acoustic characteristics of human speech

    NASA Astrophysics Data System (ADS)

    Schwartz, David; Howe, Catharine; Purves, Dale

    2003-04-01

    Listeners of all ages and societies produce a similar consonance ordering of chromatic scale tone combinations. Despite intense interest in this perceptual phenomenon over several millennia, it has no generally accepted explanation in physical, psychological, or physiological terms. Here we show that the musical universal of consonance ordering can be understood in terms of the statistical relationship between a pattern of sound pressure at the ear and the possible generative sources of the acoustic energy pattern. Since human speech is the principal naturally occurring source of tone-evoking (i.e., periodic) sound energy for human listeners, we obtained normalized spectra from more than 100000 recorded speech segments. The probability distribution of amplitude/frequency combinations derived from these spectra predicts both the fundamental frequency ratios that define the chromatic scale intervals and the consonance ordering of chromatic scale tone combinations. We suggest that these observations reveal the statistical character of the perceptual process by which the auditory system guides biologically successful behavior in response to inherently ambiguous sound stimuli.

  19. A machine for neural computation of acoustical patterns with application to real time speech recognition

    NASA Astrophysics Data System (ADS)

    Mueller, P.; Lazzaro, J.

    1986-08-01

    400 analog electronic neurons have been assembled and connected for the analysis and recognition of acoustical patterns, including speech. Input to the net comes from a set of 18 band pass filters (Qmax 300 dB/octave; 180 to 6000 Hz, log scale). The net is organized into two parts, the first performs in real time the decomposition of the input patterns into their primitives of energy, space (frequency) and time relations. The other part decodes the set of primitives. 216 neurons are dedicated to pattern decomposition. The output of the individual filters is rectified and fed to two sets of 18 neurons in an opponent center-surround organization of synaptic connections (``on center'' and (``off center''). These units compute maxima and minima of energy at different frequencies. The next two sets of neutrons compute the temporal boundaries (``on'') and ``off'') and the following two the movement of the energy maxima (formants) up or down the frequency axis. There are in addition ``hyperacuity'' units which expand the frequency resolution to 36, other units tuned to a particular range of duration of the ``on center'' units and others tuned exclusively to very low energy sounds. In order to recognize speech sounds at the phoneme or diphone level, the set of primitives belonging to the phoneme is decoded such that only one neuron or a non-overlapping group of neurons fire when the sound pattern is present at the input. For display and translation into phonetic symbols the output from these neurons is fed into an EPROM decoder and computer which displays in real time a phonetic representation of the speech input.

  20. A measure of aperiodicity content in a speech signal

    NASA Astrophysics Data System (ADS)

    Deshmukh, Om D.; Espy-Wilson, Carol Y.

    2003-04-01

    Most of the current aperiodicity detectors measure aperiodicity indirectly, in which the absence of periodic component in a nonsilent region is termed as aperiodicity. Such indirect measurements are inadequate and can be misleading especially in cases like voiced fricatives or breathy vowels. This motivated us to develop a direct measure of aperiodicity which is independent of the periodicity measure. The speech signal is passed through a 60-channel gamma tone auditory filterbank. Average Magnitude Difference Function (AMDF) is computed on the envelope of each channel. The randomness in the distribution of the AMDF dips is the basis for the measure of the aperiodicity whereas the measure of the periodicity is based on the occurrence of the dips at multiple locations. The system was evaluated on the MOCHA database, which has simultaneous recording of the EGG data, and on the TIMIT database. Preliminary analysis shows that the aperiodicity and voicing accuracy on a per frame basis are 95% and 90.3%, respectively. In voiced fricatives and voiced stops, high aperiodicity and high periodicity were detected in 21.6% of the frames. Note that not all these sounds necessarily had both the sources prominent.

  1. Physical and perceptual factors shape the neural mechanisms that integrate audiovisual signals in speech comprehension.

    PubMed

    Lee, HweeLing; Noppeney, Uta

    2011-08-01

    Face-to-face communication challenges the human brain to integrate information from auditory and visual senses with linguistic representations. Yet the role of bottom-up physical (spectrotemporal structure) input and top-down linguistic constraints in shaping the neural mechanisms specialized for integrating audiovisual speech signals are currently unknown. Participants were presented with speech and sinewave speech analogs in visual, auditory, and audiovisual modalities. Before the fMRI study, they were trained to perceive physically identical sinewave speech analogs as speech (SWS-S) or nonspeech (SWS-N). Comparing audiovisual integration (interactions) of speech, SWS-S, and SWS-N revealed a posterior-anterior processing gradient within the left superior temporal sulcus/gyrus (STS/STG): Bilateral posterior STS/STG integrated audiovisual inputs regardless of spectrotemporal structure or speech percept; in left mid-STS, the integration profile was primarily determined by the spectrotemporal structure of the signals; more anterior STS regions discarded spectrotemporal structure and integrated audiovisual signals constrained by stimulus intelligibility and the availability of linguistic representations. In addition to this "ventral" processing stream, a "dorsal" circuitry encompassing posterior STS/STG and left inferior frontal gyrus differentially integrated audiovisual speech and SWS signals. Indeed, dynamic causal modeling and Bayesian model comparison provided strong evidence for a parallel processing structure encompassing a ventral and a dorsal stream with speech intelligibility training enhancing the connectivity between posterior and anterior STS/STG. In conclusion, audiovisual speech comprehension emerges in an interactive process with the integration of auditory and visual signals being progressively constrained by stimulus intelligibility along the STS and spectrotemporal structure in a dorsal fronto-temporal circuitry.

  2. A Comparison of Signal Enhancement Methods for Extracting Tonal Acoustic Signals

    NASA Technical Reports Server (NTRS)

    Jones, Michael G.

    1998-01-01

    The measurement of pure tone acoustic pressure signals in the presence of masking noise, often generated by mean flow, is a continual problem in the field of passive liner duct acoustics research. In support of the Advanced Subsonic Technology Noise Reduction Program, methods were investigated for conducting measurements of advanced duct liner concepts in harsh, aeroacoustic environments. This report presents the results of a comparison study of three signal extraction methods for acquiring quality acoustic pressure measurements in the presence of broadband noise (used to simulate the effects of mean flow). The performance of each method was compared to a baseline measurement of a pure tone acoustic pressure 3 dB above a uniform, broadband noise background.

  3. Signal processing methodologies for an acoustic fetal heart rate monitor

    NASA Technical Reports Server (NTRS)

    Pretlow, Robert A., III; Stoughton, John W.

    1992-01-01

    Research and development is presented of real time signal processing methodologies for the detection of fetal heart tones within a noise-contaminated signal from a passive acoustic sensor. A linear predictor algorithm is utilized for detection of the heart tone event and additional processing derives heart rate. The linear predictor is adaptively 'trained' in a least mean square error sense on generic fetal heart tones recorded from patients. A real time monitor system is described which outputs to a strip chart recorder for plotting the time history of the fetal heart rate. The system is validated in the context of the fetal nonstress test. Comparisons are made with ultrasonic nonstress tests on a series of patients. Comparative data provides favorable indications of the feasibility of the acoustic monitor for clinical use.

  4. The effect of different open plan and enclosed classroom acoustic conditions on speech perception in Kindergarten children.

    PubMed

    Mealings, Kiri T; Demuth, Katherine; Buchholz, Jörg M; Dillon, Harvey

    2015-10-01

    Open plan classrooms, where several classes are in the same room, have recently re-emerged in Australian primary schools. This paper explores how the acoustics of four Kindergarten classrooms [an enclosed classroom (25 children), double classroom (44 children), fully open plan triple classroom (91 children), and a semi-open plan K-6 "21st century learning space" (205 children)] affect speech perception. Twenty-two to 23 5-6-year-old children in each classroom participated in an online four-picture choice speech perception test while adjacent classes engaged in quiet versus noisy activities. The noise levels recorded during the test were higher the larger the classroom, except in the noisy condition for the K-6 classroom, possibly due to acoustic treatments. Linear mixed effects models revealed children's performance accuracy and speed decreased as noise level increased. Additionally, children's speech perception abilities decreased the further away they were seated from the loudspeaker in noise levels above 50 dBA. These results suggest that fully open plan classrooms are not appropriate learning environments for critical listening activities with young children due to their high intrusive noise levels which negatively affect speech perception. If open plan classrooms are desired, they need to be acoustically designed to be appropriate for critical listening activities.

  5. Speaker height estimation from speech: Fusing spectral regression and statistical acoustic models.

    PubMed

    Hansen, John H L; Williams, Keri; Bořil, Hynek

    2015-08-01

    Estimating speaker height can assist in voice forensic analysis and provide additional side knowledge to benefit automatic speaker identification or acoustic model selection for automatic speech recognition. In this study, a statistical approach to height estimation that incorporates acoustic models within a non-uniform height bin width Gaussian mixture model structure as well as a formant analysis approach that employs linear regression on selected phones are presented. The accuracy and trade-offs of these systems are explored by examining the consistency of the results, location, and causes of error as well a combined fusion of the two systems using data from the TIMIT corpus. Open set testing is also presented using the Multi-session Audio Research Project corpus and publicly available YouTube audio to examine the effect of channel mismatch between training and testing data and provide a realistic open domain testing scenario. The proposed algorithms achieve a highly competitive performance to previously published literature. Although the different data partitioning in the literature and this study may prevent performance comparisons in absolute terms, the mean average error of 4.89 cm for males and 4.55 cm for females provided by the proposed algorithm on TIMIT utterances containing selected phones suggest a considerable estimation error decrease compared to past efforts.

  6. Speech timing and linguistic rhythm: on the acoustic bases of rhythm typologies.

    PubMed

    Rathcke, Tamara V; Smith, Rachel H

    2015-05-01

    Research into linguistic rhythm has been dominated by the idea that languages can be classified according to rhythmic templates, amenable to assessment by acoustic measures of vowel and consonant durations. This study tested predictions of two proposals explaining the bases of rhythmic typologies: the Rhythm Class Hypothesis which assumes that the templates arise from an extensive vs a limited use of durational contrasts, and the Control and Compensation Hypothesis which proposes that the templates are rooted in more vs less flexible speech production strategies. Temporal properties of segments, syllables and rhythmic feet were examined in two accents of British English, a "stress-timed" variety from Leeds, and a "syllable-timed" variety spoken by Panjabi-English bilinguals from Bradford. Rhythm metrics were calculated. A perception study confirmed that the speakers of the two varieties differed in their perceived rhythm. The results revealed that both typologies were informative in that to a certain degree, they predicted temporal patterns of the two varieties. None of the metrics tested was capable of adequately reflecting the temporal complexity found in the durational data. These findings contribute to the critical evaluation of the explanatory adequacy of rhythm metrics. Acoustic bases and limitations of the traditional rhythmic typologies are discussed.

  7. Speaker height estimation from speech: Fusing spectral regression and statistical acoustic models.

    PubMed

    Hansen, John H L; Williams, Keri; Bořil, Hynek

    2015-08-01

    Estimating speaker height can assist in voice forensic analysis and provide additional side knowledge to benefit automatic speaker identification or acoustic model selection for automatic speech recognition. In this study, a statistical approach to height estimation that incorporates acoustic models within a non-uniform height bin width Gaussian mixture model structure as well as a formant analysis approach that employs linear regression on selected phones are presented. The accuracy and trade-offs of these systems are explored by examining the consistency of the results, location, and causes of error as well a combined fusion of the two systems using data from the TIMIT corpus. Open set testing is also presented using the Multi-session Audio Research Project corpus and publicly available YouTube audio to examine the effect of channel mismatch between training and testing data and provide a realistic open domain testing scenario. The proposed algorithms achieve a highly competitive performance to previously published literature. Although the different data partitioning in the literature and this study may prevent performance comparisons in absolute terms, the mean average error of 4.89 cm for males and 4.55 cm for females provided by the proposed algorithm on TIMIT utterances containing selected phones suggest a considerable estimation error decrease compared to past efforts. PMID:26328721

  8. Compensation for Coarticulation: Disentangling Auditory and Gestural Theories of Perception of Coarticulatory Effects in Speech

    ERIC Educational Resources Information Center

    Viswanathan, Navin; Magnuson, James S.; Fowler, Carol A.

    2010-01-01

    According to one approach to speech perception, listeners perceive speech by applying general pattern matching mechanisms to the acoustic signal (e.g., Diehl, Lotto, & Holt, 2004). An alternative is that listeners perceive the phonetic gestures that structured the acoustic signal (e.g., Fowler, 1986). The two accounts have offered different…

  9. Modeling of Acoustic Emission Signal Propagation in Waveguides

    PubMed Central

    Zelenyak, Andreea-Manuela; Hamstad, Marvin A.; Sause, Markus G. R.

    2015-01-01

    Acoustic emission (AE) testing is a widely used nondestructive testing (NDT) method to investigate material failure. When environmental conditions are harmful for the operation of the sensors, waveguides are typically mounted in between the inspected structure and the sensor. Such waveguides can be built from different materials or have different designs in accordance with the experimental needs. All these variations can cause changes in the acoustic emission signals in terms of modal conversion, additional attenuation or shift in frequency content. A finite element method (FEM) was used to model acoustic emission signal propagation in an aluminum plate with an attached waveguide and was validated against experimental data. The geometry of the waveguide is systematically changed by varying the radius and height to investigate the influence on the detected signals. Different waveguide materials were implemented and change of material properties as function of temperature were taken into account. Development of the option of modeling different waveguide options replaces the time consuming and expensive trial and error alternative of experiments. Thus, the aim of this research has important implications for those who use waveguides for AE testing. PMID:26007731

  10. INSTRUMENTATION FOR SURVEYING ACOUSTIC SIGNALS IN NATURAL GAS TRANSMISSION LINES

    SciTech Connect

    John L. Loth; Gary J. Morris; George M. Palmer; Richard Guiler; Deepak Mehra

    2003-09-01

    In the U.S. natural gas is distributed through more than one million miles of high-pressure transmission pipelines. If all leaks and infringements could be detected quickly, it would enhance safety and U.S. energy security. Only low frequency acoustic waves appear to be detectable over distances up to 60 km where pipeline shut-off valves provide access to the inside of the pipeline. This paper describes a Portable Acoustic Monitoring Package (PAMP) developed to record and identify acoustic signals characteristic of: leaks, pump noise, valve and flow metering noise, third party infringement, manual pipeline water and gas blow-off, etc. This PAMP consists of a stainless steel 1/2 inch NPT plumbing tree rated for use on 1000 psi pipelines. Its instrumentation is designed to measure acoustic waves over the entire frequency range from zero to 16,000 Hz by means of four instruments: (1) microphone, (2) 3-inch water full range differential pressure transducer with 0.1% of range sensitivity, (3) a novel 3 inch to 100 inch water range amplifier, using an accumulator with needle valve and (4) a line-pressure transducer. The weight of the PAMP complete with all accessories is 36 pounds. This includes a remote control battery/switch box assembly on a 25-foot extension chord, a laptop data acquisition computer on a field table and a sun shield.

  11. Emotional recognition from the speech signal for a virtual education agent

    NASA Astrophysics Data System (ADS)

    Tickle, A.; Raghu, S.; Elshaw, M.

    2013-06-01

    This paper explores the extraction of features from the speech wave to perform intelligent emotion recognition. A feature extract tool (openSmile) was used to obtain a baseline set of 998 acoustic features from a set of emotional speech recordings from a microphone. The initial features were reduced to the most important ones so recognition of emotions using a supervised neural network could be performed. Given that the future use of virtual education agents lies with making the agents more interactive, developing agents with the capability to recognise and adapt to the emotional state of humans is an important step.

  12. A self-organizing neural network architecture for auditory and speech perception with applications to acoustic and other temporal prediction problems

    NASA Astrophysics Data System (ADS)

    Cohen, Michael; Grossberg, Stephen

    1994-09-01

    This project is developing autonomous neural network models for the real-time perception and production of acoustic and speech signals. Our SPINET pitch model was developed to take realtime acoustic input and to simulate the key pitch data. SPINET was embedded into a model for auditory scene analysis, or how the auditory system separates sound sources in environments with multiple sources. The model groups frequency components based on pitch and spatial location cues and resonantly binds them within different streams. The model simulates psychophysical grouping data, such as how an ascending, tone groups with a descending tone even if noise exists at the intersection point, and how a tone before and after a noise burst is perceived to continue through the noise. These resonant streams input to working memories, wherein phonetic percepts adapt to global speech rate. Computer simulations quantitatively generate the experimentally observed category boundary shifts for voiced stop pairs that have the same or different place of articulation, including why the interval to hear a double (geminate) stop is twice as long as that to hear two different stops. This model also uses resonant feedback, here between list categories and working memory.

  13. A Robust Approach For Acoustic Noise Suppression In Speech Using ANFIS

    NASA Astrophysics Data System (ADS)

    Martinek, Radek; Kelnar, Michal; Vanus, Jan; Bilik, Petr; Zidek, Jan

    2015-11-01

    The authors of this article deals with the implementation of a combination of techniques of the fuzzy system and artificial intelligence in the application area of non-linear noise and interference suppression. This structure used is called an Adaptive Neuro Fuzzy Inference System (ANFIS). This system finds practical use mainly in audio telephone (mobile) communication in a noisy environment (transport, production halls, sports matches, etc). Experimental methods based on the two-input adaptive noise cancellation concept was clearly outlined. Within the experiments carried out, the authors created, based on the ANFIS structure, a comprehensive system for adaptive suppression of unwanted background interference that occurs in audio communication and degrades the audio signal. The system designed has been tested on real voice signals. This article presents the investigation and comparison amongst three distinct approaches to noise cancellation in speech; they are LMS (least mean squares) and RLS (recursive least squares) adaptive filtering and ANFIS. A careful review of literatures indicated the importance of non-linear adaptive algorithms over linear ones in noise cancellation. It was concluded that the ANFIS approach had the overall best performance as it efficiently cancelled noise even in highly noise-degraded speech. Results were drawn from the successful experimentation, subjective-based tests were used to analyse their comparative performance while objective tests were used to validate them. Implementation of algorithms was experimentally carried out in Matlab to justify the claims and determine their relative performances.

  14. Acoustic source characteristics, across-formant integration, and speech intelligibility under competitive conditions.

    PubMed

    Roberts, Brian; Summers, Robert J; Bailey, Peter J

    2015-06-01

    An important aspect of speech perception is the ability to group or select formants using cues in the acoustic source characteristics--for example, fundamental frequency (F0) differences between formants promote their segregation. This study explored the role of more radical differences in source characteristics. Three-formant (F1+F2+F3) synthetic speech analogues were derived from natural sentences. In Experiment 1, F1+F3 were generated by passing a harmonic glottal source (F0 = 140 Hz) through second-order resonators (H1+H3); in Experiment 2, F1+F3 were tonal (sine-wave) analogues (T1+T3). F2 could take either form (H2 or T2). In some conditions, the target formants were presented alone, either monaurally or dichotically (left ear = F1+F3; right ear = F2). In others, they were accompanied by a competitor for F2 (F1+F2C+F3; F2), which listeners must reject to optimize recognition. Competitors (H2C or T2C) were created using the time-reversed frequency and amplitude contours of F2. Dichotic presentation of F2 and F2C ensured that the impact of the competitor arose primarily through informational masking. In the absence of F2C, the effect of a source mismatch between F1+F3 and F2 was relatively modest. When F2C was present, intelligibility was lowest when F2 was tonal and F2C was harmonic, irrespective of which type matched F1+F3. This finding suggests that source type and context, rather than similarity, govern the phonetic contribution of a formant. It is proposed that wideband harmonic analogues are more effective informational maskers than narrowband tonal analogues, and so become dominant in across-frequency integration of phonetic information when placed in competition. PMID:25751040

  15. Effects of a music therapy voice protocol on speech intelligibility, vocal acoustic measures, and mood of individuals with Parkinson's disease.

    PubMed

    Haneishi, E

    2001-01-01

    This study examined the effects of a Music Therapy Voice Protocol (MTVP) on speech intelligibility, vocal intensity, maximum vocal range, maximum duration of sustained vowel phonation, vocal fundamental frequency, vocal fundamental frequency variability, and mood of individuals with Parkinson's disease. Four female patients, who demonstrated voice and speech problems, served as their own controls and participated in baseline assessment (study pretest), a series of MTVP sessions involving vocal and singing exercises, and final evaluation (study posttest). In study pre and posttests, data for speech intelligibility and all acoustic variables were collected. Statistically significant increases were found in speech intelligibility, as rated by caregivers, and in vocal intensity from study pretest to posttest as the results of paired samples t-tests. In addition, before and after each MTVP session (session pre and posttests), self-rated mood scores and selected acoustic variables were collected. No significant differences were found in any of the variables from the session pretests to posttests, across the entire treatment period, or their interactions as the results of two-way ANOVAs with repeated measures. Although not significant, the mean of mood scores in session posttests (M = 8.69) was higher than that in session pretests (M = 7.93). PMID:11796078

  16. Low-Frequency Acoustic Signals Propagation in Buried Pipelines

    NASA Astrophysics Data System (ADS)

    Ovchinnikov, A. L.; Lapshin, B. M.

    2016-01-01

    The article deals with the issues concerning acoustic signals propagation in the large-diameter oil pipelines caused by mechanical action on the pipe body. Various mechanisms of signals attenuation are discussed. It is shown that the calculation of the attenuation caused only by internal energy loss, i.e, the presence of viscosity, thermal conductivity and liquid pipeline wall friction lead to low results. The results of experimental studies, carried out using the existing pipeline with a diameter of 1200 mm. are shown. It is experimentally proved that the main mechanism of signal attenuation is the energy emission into the environment. The numerical values of attenuation coefficients that are 0,14- 0.18 dB/m for the pipeline of 1200 mm in diameter, in the frequency range from 50 Hz to 500 Hz, are determined.

  17. Adaptive plasticity in wild field cricket's acoustic signaling.

    PubMed

    Bertram, Susan M; Harrison, Sarah J; Thomson, Ian R; Fitzsimmons, Lauren P

    2013-01-01

    Phenotypic plasticity can be adaptive when phenotypes are closely matched to changes in the environment. In crickets, rhythmic fluctuations in the biotic and abiotic environment regularly result in diel rhythms in density of sexually active individuals. Given that density strongly influences the intensity of sexual selection, we asked whether crickets exhibit plasticity in signaling behavior that aligns with these rhythmic fluctuations in the socio-sexual environment. We quantified the acoustic mate signaling behavior of wild-caught males of two cricket species, Gryllus veletis and G. pennsylvanicus. Crickets exhibited phenotypically plastic mate signaling behavior, with most males signaling more often and more attractively during the times of day when mating activity is highest in the wild. Most male G. pennsylvanicus chirped more often and louder, with shorter interpulse durations, pulse periods, chirp durations, and interchirp durations, and at slightly higher carrier frequencies during the time of the day that mating activity is highest in the wild. Similarly, most male G. veletis chirped more often, with more pulses per chirp, longer interpulse durations, pulse periods, and chirp durations, shorter interchirp durations, and at lower carrier frequencies during the time of peak mating activity in the wild. Among-male variation in signaling plasticity was high, with some males signaling in an apparently maladaptive manner. Body size explained some of the among-male variation in G. pennsylvanicus plasticity but not G. veletis plasticity. Overall, our findings suggest that crickets exhibit phenotypically plastic mate attraction signals that closely match the fluctuating socio-sexual context they experience.

  18. Precursory acoustic signals and ground deformation in volcanic explosions

    NASA Astrophysics Data System (ADS)

    Bowman, D. C.; Kim, K.; Anderson, J.; Lees, J. M.; Taddeucci, J.; Graettinger, A. H.; Sonder, I.; Valentine, G.

    2013-12-01

    We investigate precursory acoustic signals that appear prior to volcanic explosions in real and experimental settings. Acoustic records of a series of experimental blasts designed to mimic maar explosions show precursory energy 0.02 to 0.05 seconds before the high amplitude overpressure arrival. These blasts consisted of 1 to 1/3 lb charges detonated in unconsolidated granular material at depths between 0.5 and 1 m, and were performed during the Buffalo Man Made Maars experiment in Springville, New York, USA. The preliminary acoustic arrival is 1 to 2 orders of magnitude lower in amplitude compared to the main blast wave. The waveforms vary from blast to blast, perhaps reflecting the different explosive yields and burial depths of each shot. Similar arrivals are present in some infrasound records at Santiaguito volcano, Guatemala, where they precede the main blast signal by about 2 seconds and are about 1 order of magnitude weaker. Precursory infrasound has also been described at Sakurajima volcano, Japan (Yokoo et al, 2013; Bull. Volc. Soc. Japan, 58, 163-181) and Suwanosejima volcano, Japan (Yokoo and Iguchi, 2010; JVGR, 196, 287-294), where it is attributed to rapid deformation of the vent region. Vent deformation has not been directly observed at these volcanoes because of the difficulty of visually observing the crater floor. However, particle image velocimetry of video records at Santiaguito has revealed rapid and widespread ground motion just prior to eruptions (Johnson et al, 2008; Nature, 456, 377-381) and may be the cause of much of the infrasound recorded at that volcano (Johnson and Lees, 2010; GRL, 37, L22305). High speed video records of the blasts during the Man Made Maars experiment also show rapid deformation of the ground immediately before the explosion plume breaches the surface. We examine the connection between source yield, burial depths, ground deformation, and the production of initial acoustic phases for each simulated maar explosion. We

  19. Automatic speech recognition

    NASA Astrophysics Data System (ADS)

    Espy-Wilson, Carol

    2005-04-01

    Great strides have been made in the development of automatic speech recognition (ASR) technology over the past thirty years. Most of this effort has been centered around the extension and improvement of Hidden Markov Model (HMM) approaches to ASR. Current commercially-available and industry systems based on HMMs can perform well for certain situational tasks that restrict variability such as phone dialing or limited voice commands. However, the holy grail of ASR systems is performance comparable to humans-in other words, the ability to automatically transcribe unrestricted conversational speech spoken by an infinite number of speakers under varying acoustic environments. This goal is far from being reached. Key to the success of ASR is effective modeling of variability in the speech signal. This tutorial will review the basics of ASR and the various ways in which our current knowledge of speech production, speech perception and prosody can be exploited to improve robustness at every level of the system.

  20. Acoustic changes in the production of lexical stress during Lombard speech.

    PubMed

    Arciuli, Joanne; Simpson, Briony S; Vogel, Adam P; Ballard, Kirrie J

    2014-06-01

    The Lombard effect describes the phenomenon of individuals increasing their vocal intensity when speaking in the presence of background noise. Here, we conducted an investigation of the production of lexical stress during Lombard speech. Participants (N = 27) produced the same sentences in three conditions: one quiet condition and two noise conditions at 70 dB (white noise; multi-talker babble). Manual acoustic analyses (syllable duration, vowel intensity, and vowel fundamental frequency) were completed for repeated productions of two trisyllabic words with opposing patterns of lexical stress (weak-strong; strong-weak) in each of the three conditions. In total, 324 productions were analysed (12 utterances per participant). Results revealed that, rather than increasing vocal intensity equally across syllables, participants alter the degree of stress contrastivity when speaking in noise. This was especially evident in the production of strong-weak lexical stress where there was an increase in contrastivity across syllables in terms of intensity and fundamental frequency. This preliminary study paves the way for further research that is needed to establish these findings using a larger set of multisyllabic stimuli.

  1. Aerodynamic and acoustical measures of speech, operatic, and Broadway vocal styles in a professional female singer.

    PubMed

    Stone, R E; Cleveland, Thomas F; Sundberg, P Johan; Prokop, Jan

    2003-09-01

    Understanding how the voice is used in different styles of singing is commonly based on intuitive descriptions offered by performers who are proficient in only one style. Such descriptions are debatable, lack reproducibility, and lack scientifically derived explanations of the characteristics. We undertook acoustic and aerodynamic analyses of a female subject with professional experience in both operatic and Broadway styles of singing, who sang examples in these two styles. How representative the examples are of the respective styles was investigated by means of a listening test. Further, as a reference point, we compared the styles with her speech. Variation in styles associated with pitch and vocal loudness was investigated for various parameters: subglottal pressure, closed quotient, glottal leakage, H1-H2 difference (the level difference between the two lowest partials of the source spectrum), and glottal compliance (the ratio between the air volume displaced in a glottal pulse and the subglottal pressure). Formant frequencies, long-term-average spectrum, and vibrato characteristics were also studied. Characteristics of operatic style emerge as distinctly different from Broadway style, the latter being more similar to speaking. PMID:14513952

  2. Computational principles underlying the recognition of acoustic signals in insects.

    PubMed

    Clemens, Jan; Hennig, R Matthias

    2013-08-01

    Many animals produce pulse-like signals during acoustic communication. These signals exhibit structure on two time scales: they consist of trains of pulses that are often broadcast in packets-so called chirps. Temporal parameters of the pulse and of the chirp are decisive for female preference. Despite these signals being produced by animals from many different taxa (e.g. frogs, grasshoppers, crickets, bushcrickets, flies), a general framework for their evaluation is still lacking. We propose such a framework, based on a simple and physiologically plausible model. The model consists of feature detectors, whose time-varying output is averaged over the signal and then linearly combined to yield the behavioral preference. We fitted this model to large data sets collected in two species of crickets and found that Gabor filters--known from visual and auditory physiology--explain the preference functions in these two species very well. We further explored the properties of Gabor filters and found a systematic relationship between parameters of the filters and the shape of preference functions. Although these Gabor filters were relatively short, they were also able to explain aspects of the preference for signal parameters on the longer time scale due to the integration step in our model. Our framework explains a wide range of phenomena associated with female preference for a widespread class of signals in an intuitive and physiologically plausible fashion. This approach thus constitutes a valuable tool to understand the functioning and evolution of communication systems in many species.

  3. Calculation of selective filters of a device for primary analysis of speech signals

    NASA Astrophysics Data System (ADS)

    Chudnovskii, L. S.; Ageev, V. M.

    2014-07-01

    The amplitude-frequency responses of filters for primary analysis of speech signals, which have a low quality factor and a high rolloff factor in the high-frequency range, are calculated using the linear theory of speech production and psychoacoustic measurement data. The frequency resolution of the filter system for a sinusoidal signal is 40-200 Hz. The modulation-frequency resolution of amplitude- and frequency-modulated signals is 3-6 Hz. The aforementioned features of the calculated filters are close to the amplitudefrequency responses of biological auditory systems at the level of the eighth nerve.

  4. Age-related Changes in Acoustic Modifications of Mandarin Maternal Speech to Preverbal Infants and Five-Year-Old Children: A Longitudinal Study

    PubMed Central

    Liu, Huei-Mei; Tsao, Feng-Ming; Kuhl, Patricia K.

    2010-01-01

    Acoustic-phonetic exaggeration of infant-directed speech (IDS) is well documented, but few studies address whether these features are modified with a child's age. Mandarin-speaking mothers were recorded while addressing an adult and their child at two ages (7-12 months and 5 years) to examine the acoustic-phonetic differences between IDS and child–directed speech (CDS). CDS exhibits an exaggeration pattern resembling that of IDS—expanded vowel space, longer vowels, higher pitch, and greater lexical tone differences—when compared to ADS. Longitudinal analysis demonstrated that the extent of acoustic exaggeration is significantly smaller in CDS than in IDS. Age-related changes in maternal speech provide some support for the hypothesis that mothers adjust their speech directed toward children as a function of the child's language ability. PMID:19232142

  5. Experimental investigation of the effects of the acoustical conditions in a simulated classroom on speech recognition and learning in children a

    PubMed Central

    Valente, Daniel L.; Plevinsky, Hallie M.; Franco, John M.; Heinrichs-Graham, Elizabeth C.; Lewis, Dawna E.

    2012-01-01

    The potential effects of acoustical environment on speech understanding are especially important as children enter school where students’ ability to hear and understand complex verbal information is critical to learning. However, this ability is compromised because of widely varied and unfavorable classroom acoustics. The extent to which unfavorable classroom acoustics affect children’s performance on longer learning tasks is largely unknown as most research has focused on testing children using words, syllables, or sentences as stimuli. In the current study, a simulated classroom environment was used to measure comprehension performance of two classroom learning activities: a discussion and lecture. Comprehension performance was measured for groups of elementary-aged students in one of four environments with varied reverberation times and background noise levels. The reverberation time was either 0.6 or 1.5 s, and the signal-to-noise level was either +10 or +7 dB. Performance is compared to adult subjects as well as to sentence-recognition in the same condition. Significant differences were seen in comprehension scores as a function of age and condition; both increasing background noise and reverberation degraded performance in comprehension tasks compared to minimal differences in measures of sentence-recognition. PMID:22280587

  6. Modern Techniques in Acoustical Signal and Image Processing

    SciTech Connect

    Candy, J V

    2002-04-04

    Acoustical signal processing problems can lead to some complex and intricate techniques to extract the desired information from noisy, sometimes inadequate, measurements. The challenge is to formulate a meaningful strategy that is aimed at performing the processing required even in the face of uncertainties. This strategy can be as simple as a transformation of the measured data to another domain for analysis or as complex as embedding a full-scale propagation model into the processor. The aims of both approaches are the same--to extract the desired information and reject the extraneous, that is, develop a signal processing scheme to achieve this goal. In this paper, we briefly discuss this underlying philosophy from a ''bottom-up'' approach enabling the problem to dictate the solution rather than visa-versa.

  7. Study on demodulated signal distribution and acoustic pressure phase sensitivity of a self-interfered distributed acoustic sensing system

    NASA Astrophysics Data System (ADS)

    Shang, Ying; Yang, Yuan-Hong; Wang, Chen; Liu, Xiao-Hui; Wang, Chang; Peng, Gang-Ding

    2016-06-01

    We propose a demodulated signal distribution theory for a self-interfered distributed acoustic sensing system. The distribution region of Rayleigh backscattering including the acoustic sensing signal in the sensing fiber is investigated theoretically under different combinations of both the path difference and pulse width Additionally we determine the optimal solution between the path difference and pulse width to obtain the maximum phase change per unit length. We experimentally test this theory and realize a good acoustic pressure phase sensitivity of  -150 dB re rad/(μPa·m) of fiber in the frequency range from 200 Hz to 1 kHz.

  8. Signal Restoration of Non-stationary Acoustic Signals in the Time Domain

    NASA Technical Reports Server (NTRS)

    Babkin, Alexander S.

    1988-01-01

    Signal restoration is a method of transforming a nonstationary signal acquired by a ground based microphone to an equivalent stationary signal. The benefit of the signal restoration is a simplification of the flight test requirements because it could dispense with the need to acquire acoustic data with another aircraft flying in concert with the rotorcraft. The data quality is also generally improved because the contamination of the signal by the propeller and wind noise is not present. The restoration methodology can also be combined with other data acquisition methods, such as a multiple linear microphone array for further improvement of the test results. The methodology and software are presented for performing the signal restoration in the time domain. The method has no restrictions on flight path geometry or flight regimes. Only requirement is that the aircraft spatial position be known relative to the microphone location and synchronized with the acoustic data. The restoration process assumes that the moving source radiates a stationary signal, which is then transformed into a nonstationary signal by various modulation processes. The restoration contains only the modulation due to the source motion.

  9. Language-Specific Developmental Differences in Speech Production: A Cross-Language Acoustic Study

    ERIC Educational Resources Information Center

    Li, Fangfang

    2012-01-01

    Speech productions of 40 English- and 40 Japanese-speaking children (aged 2-5) were examined and compared with the speech produced by 20 adult speakers (10 speakers per language). Participants were recorded while repeating words that began with "s" and "sh" sounds. Clear language-specific patterns in adults' speech were found, with English…

  10. Joint spatial-spectral feature space clustering for speech activity detection from ECoG signals.

    PubMed

    Kanas, Vasileios G; Mporas, Iosif; Benz, Heather L; Sgarbas, Kyriakos N; Bezerianos, Anastasios; Crone, Nathan E

    2014-04-01

    Brain-machine interfaces for speech restoration have been extensively studied for more than two decades. The success of such a system will depend in part on selecting the best brain recording sites and signal features corresponding to speech production. The purpose of this study was to detect speech activity automatically from electrocorticographic signals based on joint spatial-frequency clustering of the ECoG feature space. For this study, the ECoG signals were recorded while a subject performed two different syllable repetition tasks. We found that the optimal frequency resolution to detect speech activity from ECoG signals was 8 Hz, achieving 98.8% accuracy by employing support vector machines as a classifier. We also defined the cortical areas that held the most information about the discrimination of speech and nonspeech time intervals. Additionally, the results shed light on the distinct cortical areas associated with the two syllables repetition tasks and may contribute to the development of portable ECoG-based communication.

  11. Articulatory Mediation of Speech Perception: A Causal Analysis of Multi-Modal Imaging Data

    ERIC Educational Resources Information Center

    Gow, David W., Jr.; Segawa, Jennifer A.

    2009-01-01

    The inherent confound between the organization of articulation and the acoustic-phonetic structure of the speech signal makes it exceptionally difficult to evaluate the competing claims of motor and acoustic-phonetic accounts of how listeners recognize coarticulated speech. Here we use Granger causation analysis of high spatiotemporal resolution…

  12. Adaptive Plasticity in Wild Field Cricket’s Acoustic Signaling

    PubMed Central

    Bertram, Susan M.; Harrison, Sarah J.; Thomson, Ian R.; Fitzsimmons, Lauren P.

    2013-01-01

    Phenotypic plasticity can be adaptive when phenotypes are closely matched to changes in the environment. In crickets, rhythmic fluctuations in the biotic and abiotic environment regularly result in diel rhythms in density of sexually active individuals. Given that density strongly influences the intensity of sexual selection, we asked whether crickets exhibit plasticity in signaling behavior that aligns with these rhythmic fluctuations in the socio-sexual environment. We quantified the acoustic mate signaling behavior of wild-caught males of two cricket species, Gryllus veletis and G. pennsylvanicus. Crickets exhibited phenotypically plastic mate signaling behavior, with most males signaling more often and more attractively during the times of day when mating activity is highest in the wild. Most male G. pennsylvanicus chirped more often and louder, with shorter interpulse durations, pulse periods, chirp durations, and interchirp durations, and at slightly higher carrier frequencies during the time of the day that mating activity is highest in the wild. Similarly, most male G. veletis chirped more often, with more pulses per chirp, longer interpulse durations, pulse periods, and chirp durations, shorter interchirp durations, and at lower carrier frequencies during the time of peak mating activity in the wild. Among-male variation in signaling plasticity was high, with some males signaling in an apparently maladaptive manner. Body size explained some of the among-male variation in G. pennsylvanicus plasticity but not G. veletis plasticity. Overall, our findings suggest that crickets exhibit phenotypically plastic mate attraction signals that closely match the fluctuating socio-sexual context they experience. PMID:23935965

  13. Interactions between distal speech rate, linguistic knowledge, and speech environment.

    PubMed

    Morrill, Tuuli; Baese-Berk, Melissa; Heffner, Christopher; Dilley, Laura

    2015-10-01

    During lexical access, listeners use both signal-based and knowledge-based cues, and information from the linguistic context can affect the perception of acoustic speech information. Recent findings suggest that the various cues used in lexical access are implemented with flexibility and may be affected by information from the larger speech context. We conducted 2 experiments to examine effects of a signal-based cue (distal speech rate) and a knowledge-based cue (linguistic structure) on lexical perception. In Experiment 1, we manipulated distal speech rate in utterances where an acoustically ambiguous critical word was either obligatory for the utterance to be syntactically well formed (e.g., Conner knew that bread and butter (are) both in the pantry) or optional (e.g., Don must see the harbor (or) boats). In Experiment 2, we examined identical target utterances as in Experiment 1 but changed the distribution of linguistic structures in the fillers. The results of the 2 experiments demonstrate that speech rate and linguistic knowledge about critical word obligatoriness can both influence speech perception. In addition, it is possible to alter the strength of a signal-based cue by changing information in the speech environment. These results provide support for models of word segmentation that include flexible weighting of signal-based and knowledge-based cues.

  14. Cognitive Bias for Learning Speech Sounds From a Continuous Signal Space Seems Nonlinguistic

    PubMed Central

    de Boer, Bart

    2015-01-01

    When learning language, humans have a tendency to produce more extreme distributions of speech sounds than those observed most frequently: In rapid, casual speech, vowel sounds are centralized, yet cross-linguistically, peripheral vowels occur almost universally. We investigate whether adults’ generalization behavior reveals selective pressure for communication when they learn skewed distributions of speech-like sounds from a continuous signal space. The domain-specific hypothesis predicts that the emergence of sound categories is driven by a cognitive bias to make these categories maximally distinct, resulting in more skewed distributions in participants’ reproductions. However, our participants showed more centered distributions, which goes against this hypothesis, indicating that there are no strong innate linguistic biases that affect learning these speech-like sounds. The centralization behavior can be explained by a lack of communicative pressure to maintain categories. PMID:27648212

  15. Cognitive Bias for Learning Speech Sounds From a Continuous Signal Space Seems Nonlinguistic

    PubMed Central

    de Boer, Bart

    2015-01-01

    When learning language, humans have a tendency to produce more extreme distributions of speech sounds than those observed most frequently: In rapid, casual speech, vowel sounds are centralized, yet cross-linguistically, peripheral vowels occur almost universally. We investigate whether adults’ generalization behavior reveals selective pressure for communication when they learn skewed distributions of speech-like sounds from a continuous signal space. The domain-specific hypothesis predicts that the emergence of sound categories is driven by a cognitive bias to make these categories maximally distinct, resulting in more skewed distributions in participants’ reproductions. However, our participants showed more centered distributions, which goes against this hypothesis, indicating that there are no strong innate linguistic biases that affect learning these speech-like sounds. The centralization behavior can be explained by a lack of communicative pressure to maintain categories.

  16. Cognitive Bias for Learning Speech Sounds From a Continuous Signal Space Seems Nonlinguistic.

    PubMed

    van der Ham, Sabine; de Boer, Bart

    2015-10-01

    When learning language, humans have a tendency to produce more extreme distributions of speech sounds than those observed most frequently: In rapid, casual speech, vowel sounds are centralized, yet cross-linguistically, peripheral vowels occur almost universally. We investigate whether adults' generalization behavior reveals selective pressure for communication when they learn skewed distributions of speech-like sounds from a continuous signal space. The domain-specific hypothesis predicts that the emergence of sound categories is driven by a cognitive bias to make these categories maximally distinct, resulting in more skewed distributions in participants' reproductions. However, our participants showed more centered distributions, which goes against this hypothesis, indicating that there are no strong innate linguistic biases that affect learning these speech-like sounds. The centralization behavior can be explained by a lack of communicative pressure to maintain categories. PMID:27648212

  17. Signal processing and tracking of arrivals in ocean acoustic tomography.

    PubMed

    Dzieciuch, Matthew A

    2014-11-01

    The signal processing for ocean acoustic tomography experiments has been improved to account for the scattering of the individual arrivals. The scattering reduces signal coherence over time, bandwidth, and space. In the typical experiment, scattering is caused by the random internal-wave field and results in pulse spreading (over arrival-time and arrival-angle) and wander. The estimator-correlator is an effective procedure that improves the signal-to-noise ratio of travel-time estimates and also provides an estimate of signal coherence. The estimator-correlator smoothes the arrival pulse at the expense of resolution. After an arrival pulse has been measured, it must be associated with a model arrival, typically a ray arrival. For experiments with thousands of transmissions, this is a tedious task that is error-prone when done manually. An error metric that accounts for peak amplitude as well as travel-time and arrival-angle can be defined. The Viterbi algorithm can then be adapted to the task of automated peak tracking. Repeatable, consistent results are produced that are superior to a manual tracking procedure. The tracking can be adjusted by tuning the error metric in logical, quantifiable manner. PMID:25373953

  18. Signal processing and tracking of arrivals in ocean acoustic tomography.

    PubMed

    Dzieciuch, Matthew A

    2014-11-01

    The signal processing for ocean acoustic tomography experiments has been improved to account for the scattering of the individual arrivals. The scattering reduces signal coherence over time, bandwidth, and space. In the typical experiment, scattering is caused by the random internal-wave field and results in pulse spreading (over arrival-time and arrival-angle) and wander. The estimator-correlator is an effective procedure that improves the signal-to-noise ratio of travel-time estimates and also provides an estimate of signal coherence. The estimator-correlator smoothes the arrival pulse at the expense of resolution. After an arrival pulse has been measured, it must be associated with a model arrival, typically a ray arrival. For experiments with thousands of transmissions, this is a tedious task that is error-prone when done manually. An error metric that accounts for peak amplitude as well as travel-time and arrival-angle can be defined. The Viterbi algorithm can then be adapted to the task of automated peak tracking. Repeatable, consistent results are produced that are superior to a manual tracking procedure. The tracking can be adjusted by tuning the error metric in logical, quantifiable manner.

  19. Extended amplification of acoustic signals by amphibian burrows.

    PubMed

    Muñoz, Matías I; Penna, Mario

    2016-07-01

    Animals relying on acoustic signals for communication must cope with the constraints imposed by the environment for sound propagation. A resource to improve signal broadcast is the use of structures that favor the emission or the reception of sounds. We conducted playback experiments to assess the effect of the burrows occupied by the frogs Eupsophus emiliopugini and E. calcaratus on the amplitude of outgoing vocalizations. In addition, we evaluated the influence of these cavities on the reception of externally generated sounds potentially interfering with conspecific communication, namely, the vocalizations emitted by four syntopic species of anurans (E. emiliopugini, E. calcaratus, Batrachyla antartandica, and Pleurodema thaul) and the nocturnal owls Strix rufipes and Glaucidium nanum. Eupsophus advertisement calls emitted from within the burrows experienced average amplitude gains of 3-6 dB at 100 cm from the burrow openings. Likewise, the incoming vocalizations of amphibians and birds were amplified on average above 6 dB inside the cavities. The amplification of internally broadcast Eupsophus vocalizations favors signal detection by nearby conspecifics. Reciprocally, the amplification of incoming conspecific and heterospecific signals facilitates the detection of neighboring males and the monitoring of the levels of potentially interfering biotic noise by resident frogs, respectively. PMID:27209276

  20. Applications of sub-audible speech recognition based upon electromyographic signals

    NASA Technical Reports Server (NTRS)

    Jorgensen, C. Charles (Inventor); Betts, Bradley J. (Inventor)

    2009-01-01

    Method and system for generating electromyographic or sub-audible signals (''SAWPs'') and for transmitting and recognizing the SAWPs that represent the original words and/or phrases. The SAWPs may be generated in an environment that interferes excessively with normal speech or that requires stealth communications, and may be transmitted using encoded, enciphered or otherwise transformed signals that are less subject to signal distortion or degradation in the ambient environment.

  1. Acoustic and perceptual evaluation of category goodness of /t/ and /k/ in typical and misarticulated children's speech.

    PubMed

    Strömbergsson, Sofia; Salvi, Giampiero; House, David

    2015-06-01

    This investigation explores perceptual and acoustic characteristics of children's successful and unsuccessful productions of /t/ and /k/, with a specific aim of exploring perceptual sensitivity to phonetic detail, and the extent to which this sensitivity is reflected in the acoustic domain. Recordings were collected from 4- to 8-year-old children with a speech sound disorder (SSD) who misarticulated one of the target plosives, and compared to productions recorded from peers with typical speech development (TD). Perceptual responses were registered with regards to a visual-analog scale, ranging from "clear [t]" to "clear [k]." Statistical models of prototypical productions were built, based on spectral moments and discrete cosine transform features, and used in the scoring of SSD productions. In the perceptual evaluation, "clear substitutions" were rated as less prototypical than correct productions. Moreover, target-appropriate productions of /t/ and /k/ produced by children with SSD were rated as less prototypical than those produced by TD peers. The acoustical modeling could to a large extent discriminate between the gross categories /t/ and /k/, and scored the SSD utterances on a continuous scale that was largely consistent with the category of production. However, none of the methods exhibited the same sensitivity to phonetic detail as the human listeners.

  2. Disordered speech disrupts conversational entrainment: a study of acoustic-prosodic entrainment and communicative success in populations with communication challenges.

    PubMed

    Borrie, Stephanie A; Lubold, Nichola; Pon-Barry, Heather

    2015-01-01

    Conversational entrainment, a pervasive communication phenomenon in which dialogue partners adapt their behaviors to align more closely with one another, is considered essential for successful spoken interaction. While well-established in other disciplines, this phenomenon has received limited attention in the field of speech pathology and the study of communication breakdowns in clinical populations. The current study examined acoustic-prosodic entrainment, as well as a measure of communicative success, in three distinctly different dialogue groups: (i) healthy native vs. healthy native speakers (Control), (ii) healthy native vs. foreign-accented speakers (Accented), and (iii) healthy native vs. dysarthric speakers (Disordered). Dialogue group comparisons revealed significant differences in how the groups entrain on particular acoustic-prosodic features, including pitch, intensity, and jitter. Most notably, the Disordered dialogues were characterized by significantly less acoustic-prosodic entrainment than the Control dialogues. Further, a positive relationship between entrainment indices and communicative success was identified. These results suggest that the study of conversational entrainment in speech pathology will have essential implications for both scientific theory and clinical application in this domain.

  3. Acoustic Emission Signals in Thin Plates Produced by Impact Damage

    NASA Technical Reports Server (NTRS)

    Prosser, William H.; Gorman, Michael R.; Humes, Donald H.

    1999-01-01

    Acoustic emission (AE) signals created by impact sources in thin aluminum and graphite/epoxy composite plates were analyzed. Two different impact velocity regimes were studied. Low-velocity (less than 0.21 km/s) impacts were created with an airgun firing spherical steel projectiles (4.5 mm diameter). High-velocity (1.8 to 7 km/s) impacts were generated with a two-stage light-gas gun firing small cylindrical nylon projectiles (1.5 mm diameter). Both the impact velocity and impact angle were varied. The impacts did not penetrate the aluminum plates at either low or high velocities. For high-velocity impacts in composites, there were both impacts that fully penetrated the plate as well as impacts that did not. All impacts generated very large amplitude AE signals (1-5 V at the sensor), which propagated as plate (extensional and/or flexural) modes. In the low-velocity impact studies, the signal was dominated by a large flexural mode with only a small extensional mode component detected. As the impact velocity was increased within the low velocity regime, the overall amplitudes of both the extensional and flexural modes increased. In addition, a relative increase in the amplitude of high-frequency components of the flexural mode was also observed. Signals caused by high-velocity impacts that did not penetrate the plate contained both a large extensional and flexural mode component of comparable amplitudes. The signals also contained components of much higher frequency and were easily differentiated from those caused by low-velocity impacts. An interesting phenomenon was observed in that the large flexural mode component, seen in every other case, was absent from the signal when the impact particle fully penetrated through the composite plates.

  4. Spectral models of additive and modulation noise in speech and phonatory excitation signals

    NASA Astrophysics Data System (ADS)

    Schoentgen, Jean

    2003-01-01

    The article presents spectral models of additive and modulation noise in speech. The purpose is to learn about the causes of noise in the spectra of normal and disordered voices and to gauge whether the spectral properties of the perturbations of the phonatory excitation signal can be inferred from the spectral properties of the speech signal. The approach to modeling consists of deducing the Fourier series of the perturbed speech, assuming that the Fourier series of the noise and of the clean monocycle-periodic excitation are known. The models explain published data, take into account the effects of supraglottal tremor, demonstrate the modulation distortion owing to vocal tract filtering, establish conditions under which noise cues of different speech signals may be compared, and predict the impossibility of inferring the spectral properties of the frequency modulating noise from the spectral properties of the frequency modulation noise (e.g., phonatory jitter and frequency tremor). The general conclusion is that only phonatory frequency modulation noise is spectrally relevant. Other types of noise in speech are either epiphenomenal, or their spectral effects are masked by the spectral effects of frequency modulation noise.

  5. Audiovisual Speech Perception in Children with Developmental Language Disorder in Degraded Listening Conditions

    ERIC Educational Resources Information Center

    Meronen, Auli; Tiippana, Kaisa; Westerholm, Jari; Ahonen, Timo

    2013-01-01

    Purpose: The effect of the signal-to-noise ratio (SNR) on the perception of audiovisual speech in children with and without developmental language disorder (DLD) was investigated by varying the noise level and the sound intensity of acoustic speech. The main hypotheses were that the McGurk effect (in which incongruent visual speech alters the…

  6. Temporal coherence of acoustic signals in a fluctuating ocean.

    PubMed

    Voronovich, Alexander G; Ostashev, Vladimir E; Colosi, John A

    2011-06-01

    Temporal coherence of acoustic signals propagating in a fluctuating ocean is important for many practical applications and has been studied intensively experimentally. However, only a few theoretical formulations of temporal coherence exist. In the present paper, a three-dimensional (3D) modal theory of sound propagation in a fluctuating ocean is used to derive closed-form equations for the spatial-temporal coherence function of a broadband signal. The theory is applied to the analysis of the temporal coherence of a monochromatic signal propagating in an ocean perturbed by linear internal waves obeying the Garrett-Munk (G-M) spectral model. In particular, the temporal coherence function is calculated for propagation ranges up to 10(4) km and for five sound frequencies: 12, 25, 50, 75, and 100 Hz. Then, the dependence of the coherence time (i.e., the value of the time lag at which the temporal coherence decreases by a factor of e) on range and frequency is studied. The results obtained are compared with experimental data and predictions of the path-integral theory.

  7. Explosive activity at Mt. Yasur volcano: characterization of acoustic signals

    NASA Astrophysics Data System (ADS)

    Spina, L.; Taddeucci, J.; Scarlato, P.; Freda, C.; Gresta, S.

    2012-04-01

    Mt. Yasur (Vanuatu Islands) is an active volcano characterized by persistent Strombolian to mild Vulcanian explosive activity, well known to generate a broad variety of air pressure waves. Between 9 and 12 July 2011, we recorded explosive activity from the three active vents of Mt. Yasur by means of a multiparametric station, comprising thermal and visual high-speed cameras and two ECM microphones recording both infrasonic and sonic signals at 10 kHz sampling frequency. A total of 106 major acoustic events, lasting on average 5 seconds (up to 20 in some ash-rich explosion), correspond to visually recorded explosions at the vents and exhibit a surprisingly broad waveform variability. Major events intervene between minor transients with strongly repetitive waveforms typical of puffing activity. Spectral analyses have been computed on both major events and whole traces. Analysis of major events, carried out using a 5.12 s long window, reveals peak frequencies mostly beneath 5 Hz, only a few events displaying a notable energy content in the sonic band (up to 100 Hz ca). Peak-to-peak amplitude as well as RMS values (evaluated from event start to end) were computed on both raw and filtered (above and below 20 Hz) signals. Spectrograms of the whole traces, carried out using 1.28, 2.56, and 5.12 seconds long windows with 50% overlap, outline clearly the frequency content of major events and the occurrence of puffing ones. We also evaluated the peak frequency of each spectrum of the spectrogram, in order to detect spectral variation of the puffing signal. Considering their great variability, we classified the major events on the base of their spectral content rather than on waveform, grouping together all events having similar spectra by cross-correlating them. Three spectral families cover most of the dataset, as follows: 1) variable and irregular shaped spectra, with energy mainly below 4 Hz; 2) monochromatic events, with simple spectra corresponding in the time domain to

  8. Acoustic emission source localization based on distance domain signal representation

    NASA Astrophysics Data System (ADS)

    Gawronski, M.; Grabowski, K.; Russek, P.; Staszewski, W. J.; Uhl, T.; Packo, P.

    2016-04-01

    Acoustic emission is a vital non-destructive testing technique and is widely used in industry for damage detection, localisation and characterization. The latter two aspects are particularly challenging, as AE data are typically noisy. What is more, elastic waves generated by an AE event, propagate through a structural path and are significantly distorted. This effect is particularly prominent for thin elastic plates. In these media the dispersion phenomenon results in severe localisation and characterization issues. Traditional Time Difference of Arrival methods for localisation techniques typically fail when signals are highly dispersive. Hence, algorithms capable of dispersion compensation are sought. This paper presents a method based on the Time - Distance Domain Transform for an accurate AE event localisation. The source localisation is found through a minimization problem. The proposed technique focuses on transforming the time signal to the distance domain response, which would be recorded at the source. Only, basic elastic material properties and plate thickness are used in the approach, avoiding arbitrary parameters tuning.

  9. Designing acoustics for linguistically diverse classrooms: Effects of background noise, reverberation and talker foreign accent on speech comprehension by native and non-native English-speaking listeners

    NASA Astrophysics Data System (ADS)

    Peng, Zhao Ellen

    The current classroom acoustics standard (ANSI S12.60-2010) recommends core learning spaces not to exceed background noise level (BNL) of 35 dBA and reverberation time (RT) of 0.6 second, based on speech intelligibility performance mainly by the native English-speaking population. Existing literature has not correlated these recommended values well with student learning outcomes. With a growing population of non-native English speakers in American classrooms, the special needs for perceiving degraded speech among non-native listeners, either due to realistic room acoustics or talker foreign accent, have not been addressed in the current standard. This research seeks to investigate the effects of BNL and RT on the comprehension of English speech from native English and native Mandarin Chinese talkers as perceived by native and non-native English listeners, and to provide acoustic design guidelines to supplement the existing standard. This dissertation presents two studies on the effects of RT and BNL on more realistic classroom learning experiences. How do native and non-native English-speaking listeners perform on speech comprehension tasks under adverse acoustic conditions, if the English speech is produced by talkers of native English (Study 1) versus native Mandarin Chinese (Study 2)? Speech comprehension materials were played back in a listening chamber to individual listeners: native and non-native English-speaking in Study 1; native English, native Mandarin Chinese, and other non-native English-speaking in Study 2. Each listener was screened for baseline English proficiency level, and completed dual tasks simultaneously involving speech comprehension and adaptive dot-tracing under 15 acoustic conditions, comprised of three BNL conditions (RC-30, 40, and 50) and five RT scenarios (0.4 to 1.2 seconds). The results show that BNL and RT negatively affect both objective performance and subjective perception of speech comprehension, more severely for non

  10. Acoustic Analysis of Clear Versus Conversational Speech in Individuals with Parkinson Disease

    ERIC Educational Resources Information Center

    Goberman, A.M.; Elmer, L.W.

    2005-01-01

    A number of studies have been devoted to the examination of clear versus conversational speech in non-impaired speakers. The purpose of these previous studies has been primarily to help increase speech intelligibility for the benefit of hearing-impaired listeners. The goal of the present study was to examine differences between conversational and…

  11. A Computational Model of Word Segmentation from Continuous Speech Using Transitional Probabilities of Atomic Acoustic Events

    ERIC Educational Resources Information Center

    Rasanen, Okko

    2011-01-01

    Word segmentation from continuous speech is a difficult task that is faced by human infants when they start to learn their native language. Several studies indicate that infants might use several different cues to solve this problem, including intonation, linguistic stress, and transitional probabilities between subsequent speech sounds. In this…

  12. Acoustic Analysis of the Speech of Children with Cochlear Implants: A Longitudinal Study

    ERIC Educational Resources Information Center

    Liker, Marko; Mildner, Vesna; Sindija, Branka

    2007-01-01

    The aim of the study was to analyse the speech of the children with cochlear implants, and compare it with the speech of hearing controls. We focused on three categories of Croatian sounds: vowels (F1 and F2 frequencies), fricatives (noise frequencies of /s/ and /[esh]/ ), and affricates (total duration and the pattern of stop-fricative components…

  13. Prosodic Contrasts in Ironic Speech

    ERIC Educational Resources Information Center

    Bryant, Gregory A.

    2010-01-01

    Prosodic features in spontaneous speech help disambiguate implied meaning not explicit in linguistic surface structure, but little research has examined how these signals manifest themselves in real conversations. Spontaneously produced verbal irony utterances generated between familiar speakers in conversational dyads were acoustically analyzed…

  14. A Fibre Bragg Grating Sensor as a Receiver for Acoustic Communications Signals

    PubMed Central

    Wild, Graham; Hinckley, Steven

    2011-01-01

    A Fibre Bragg Grating (FBG) acoustic sensor is used as a receiver for acoustic communications signals. Acoustic transmissions were generated in aluminium and Carbon Fibre Composite (CFC) panels. The FBG receiver was coupled to the bottom surface opposite a piezoelectric transmitter. For the CFC, a second FBG was embedded within the layup for comparison. We show the transfer function, frequency response, and transient response of the acoustic communications channels. In addition, the FBG receiver was used to detect Phase Shift Keying (PSK) communications signals, which was shown to be the most robust method in a highly resonant communications channel. PMID:22346585

  15. Signal processing for passive detection and classification of underwater acoustic signals

    NASA Astrophysics Data System (ADS)

    Chung, Kil Woo

    2011-12-01

    This dissertation examines signal processing for passive detection, classification and tracking of underwater acoustic signals for improving port security and the security of coastal and offshore operations. First, we consider the problem of passive acoustic detection of a diver in a shallow water environment. A frequency-domain multi-band matched-filter approach to swimmer detection is presented. The idea is to break the frequency contents of the hydrophone signals into multiple narrow frequency bands, followed by time averaged (about half of a second) energy calculation over each band. Then, spectra composed of such energy samples over the chosen frequency bands are correlated to form a decision variable. The frequency bands with highest Signal/Noise ratio are used for detection. The performance of the proposed approach is demonstrated for experimental data collected for a diver in the Hudson River. We also propose a new referenceless frequency-domain multi-band detector which, unlike other reference-based detectors, does not require a diver specific signature. Instead, our detector matches to a general feature of the diver spectrum in the high frequency range: the spectrum is roughly periodic in time and approximately flat when the diver exhales. The performance of the proposed approach is demonstrated by using experimental data collected from the Hudson River. Moreover, we present detection, classification and tracking of small vessel signals. Hydroacoustic sensors can be applied for the detection of noise generated by vessels, and this noise can be used for vessel detection, classification and tracking. This dissertation presents recent improvements aimed at the measurement and separation of ship DEMON (Detection of Envelope Modulation on Noise) acoustic signatures in busy harbor conditions. Ship signature measurements were conducted in the Hudson River and NY Harbor. The DEMON spectra demonstrated much better temporal stability compared with the full ship

  16. On-Line Acoustic and Semantic Interpretation of Talker Information

    ERIC Educational Resources Information Center

    Creel, Sarah C.; Tumlin, Melanie A.

    2011-01-01

    Recent work demonstrates that listeners utilize talker-specific information in the speech signal to inform real-time language processing. However, there are multiple representational levels at which this may take place. Listeners might use acoustic cues in the speech signal to access the talker's identity and information about what they tend to…

  17. Disordered speech disrupts conversational entrainment: a study of acoustic-prosodic entrainment and communicative success in populations with communication challenges

    PubMed Central

    Borrie, Stephanie A.; Lubold, Nichola; Pon-Barry, Heather

    2015-01-01

    Conversational entrainment, a pervasive communication phenomenon in which dialogue partners adapt their behaviors to align more closely with one another, is considered essential for successful spoken interaction. While well-established in other disciplines, this phenomenon has received limited attention in the field of speech pathology and the study of communication breakdowns in clinical populations. The current study examined acoustic-prosodic entrainment, as well as a measure of communicative success, in three distinctly different dialogue groups: (i) healthy native vs. healthy native speakers (Control), (ii) healthy native vs. foreign-accented speakers (Accented), and (iii) healthy native vs. dysarthric speakers (Disordered). Dialogue group comparisons revealed significant differences in how the groups entrain on particular acoustic–prosodic features, including pitch, intensity, and jitter. Most notably, the Disordered dialogues were characterized by significantly less acoustic-prosodic entrainment than the Control dialogues. Further, a positive relationship between entrainment indices and communicative success was identified. These results suggest that the study of conversational entrainment in speech pathology will have essential implications for both scientific theory and clinical application in this domain. PMID:26321996

  18. Integrated Spacesuit Audio System Enhances Speech Quality and Reduces Noise

    NASA Technical Reports Server (NTRS)

    Huang, Yiteng Arden; Chen, Jingdong; Chen, Shaoyan Sharyl

    2009-01-01

    A new approach has been proposed for increasing astronaut comfort and speech capture. Currently, the special design of a spacesuit forms an extreme acoustic environment making it difficult to capture clear speech without compromising comfort. The proposed Integrated Spacesuit Audio (ISA) system is to incorporate the microphones into the helmet and use software to extract voice signals from background noise.

  19. Acoustic correlates of vowel intelligibility in clear and conversational speech for young normal-hearing and elderly hearing-impaired listeners.

    PubMed

    Ferguson, Sarah Hargus; Quené, Hugo

    2014-06-01

    The present investigation carried out acoustic analyses of vowels in clear and conversational speech produced by 41 talkers. Mixed-effects models were then deployed to examine relationships among acoustic and perceptual data for these vowels. Acoustic data include vowel duration, steady-state formant frequencies, and two measures of dynamic formant movement. Perceptual data consist of vowel intelligibility in noise for young normal-hearing and elderly hearing-impaired listeners, as reported by Ferguson in 2004 and 2012 [J. Acoust. Soc. Am. 116, 2365-2373 (2004); J. Speech Lang. Hear. Res. 55, 779-790 (2012)], respectively. Significant clear speech effects were observed for all acoustic metrics, although not all measures changed for all vowels and considerable talker variability was observed. Mixed-effects analyses revealed that the contribution of duration and steady-state formant information to vowel intelligibility differed for the two listener groups. This outcome is consistent with earlier research suggesting that hearing loss, and possibly aging, alters the way acoustic cues are used for identifying vowels.

  20. Effects of Semantic Context and Feedback on Perceptual Learning of Speech Processed through an Acoustic Simulation of a Cochlear Implant

    ERIC Educational Resources Information Center

    Loebach, Jeremy L.; Pisoni, David B.; Svirsky, Mario A.

    2010-01-01

    The effect of feedback and materials on perceptual learning was examined in listeners with normal hearing who were exposed to cochlear implant simulations. Generalization was most robust when feedback paired the spectrally degraded sentences with their written transcriptions, promoting mapping between the degraded signal and its acoustic-phonetic…

  1. Filtering of Acoustic Signals within the Hearing Organ

    PubMed Central

    Ramamoorthy, Sripriya; Chen, Fangyi; Jacques, Steven L.; Wang, Ruikang; Choudhury, Niloy; Fridberger, Anders

    2014-01-01

    The detection of sound by the mammalian hearing organ involves a complex mechanical interplay among different cell types. The inner hair cells, which are the primary sensory receptors, are stimulated by the structural vibrations of the entire organ of Corti. The outer hair cells are thought to modulate these sound-evoked vibrations to enhance hearing sensitivity and frequency resolution, but it remains unclear whether other structures also contribute to frequency tuning. In the current study, sound-evoked vibrations were measured at the stereociliary side of inner and outer hair cells and their surrounding supporting cells, using optical coherence tomography interferometry in living anesthetized guinea pigs. Our measurements demonstrate the presence of multiple vibration modes as well as significant differences in frequency tuning and response phase among different cell types. In particular, the frequency tuning at the inner hair cells differs from other cell types, causing the locus of maximum inner hair cell activation to be shifted toward the apex of the cochlea compared with the outer hair cells. These observations show that additional processing and filtering of acoustic signals occur within the organ of Corti before inner hair cell excitation, representing a departure from established theories. PMID:24990925

  2. Behavioral Signal Processing: Deriving Human Behavioral Informatics From Speech and Language

    PubMed Central

    Narayanan, Shrikanth; Georgiou, Panayiotis G.

    2013-01-01

    The expression and experience of human behavior are complex and multimodal and characterized by individual and contextual heterogeneity and variability. Speech and spoken language communication cues offer an important means for measuring and modeling human behavior. Observational research and practice across a variety of domains from commerce to healthcare rely on speech- and language-based informatics for crucial assessment and diagnostic information and for planning and tracking response to an intervention. In this paper, we describe some of the opportunities as well as emerging methodologies and applications of human behavioral signal processing (BSP) technology and algorithms for quantitatively understanding and modeling typical, atypical, and distressed human behavior with a specific focus on speech- and language-based communicative, affective, and social behavior. We describe the three important BSP components of acquiring behavioral data in an ecologically valid manner across laboratory to real-world settings, extracting and analyzing behavioral cues from measured data, and developing models offering predictive and decision-making support. We highlight both the foundational speech and language processing building blocks as well as the novel processing and modeling opportunities. Using examples drawn from specific real-world applications ranging from literacy assessment and autism diagnostics to psychotherapy for addiction and marital well being, we illustrate behavioral informatics applications of these signal processing techniques that contribute to quantifying higher level, often subjectively described, human behavior in a domain-sensitive fashion. PMID:24039277

  3. Analysis of speech signals' characteristics based on MF-DFA with moving overlapping windows

    NASA Astrophysics Data System (ADS)

    Zhao, Huan; He, Shaofang

    2016-01-01

    In this paper, multi-fractal characteristics of speech signals are analyzed based on MF-DFA, and it is found that the multi-fractal features are influenced greatly by frame length and noise, besides, there is a little difference between them among speech frames. Secondly, motivated by framing and using frame shift to ensure the continuity and smooth transition of speech in speech signals processing, an advanced MF-DFA (MF-DFA with forward moving overlapping windows) is proposed. The length of moving overlapping windows is determined by parameter θ. Given the value of time scale s, we have MF-DFA with the maximum moving overlapping windows and MF-DFA with half overlapping windows when θ = 1 / s and θ = 1 / 2 respectively. Moreover, when θ = 1 we exactly have MF-DFA. Numerical experiments and analysis illustrate that the multi-fractal characteristics based on AMF-DFA outperform MF-DFA and MF-DMA in stability, noise immunity and discrimination.

  4. Optimization of speech in noise with three signal processing algorithms for normal-hearing and hearing-impaired subjects

    NASA Astrophysics Data System (ADS)

    Franck, Bas A. M.; Dreschler, Wouter A.; Lyzenga, Johannes

    2002-05-01

    In this study a three-dimensional Simplex procedure was applied to optimize speech in noise by a combination of signal processing algorithms for different acoustic conditions and hearing losses. The algorithms used to span the three dimensions are noise reduction, spectral tilting, and spectral enhancement, respectively. Additionally, we studied the algorithms for their main effects and interaction effects within the optimization process. The subjects were asked to evaluate two consecutive, differently processed sentences on listening comfort. Three different noise types and two signal-to-noise ratios (S/N) were used. Three groups of subjects participated: normal hearing, normal hearing with simulated impaired auditory filtering (by spectral smearing), and sensorineurally hearing-impaired subjects. For the normal-hearing group we applied S/N=0 dB. For the hearing-impaired and the simulated hearing-impaired subjects we applied S/N=5 dB. We will discuss the similarities and differences in the response patterns of the three groups. Also, the individual preferences will be related to the hearing capacity, and to the type of interfering noise. Finally, we will discuss differences in the perceptual features that are used to judge listening comfort of the fragments by normal-hearing and hearing-impaired subjects.

  5. Sub-Audible Speech Recognition Based upon Electromyographic Signals

    NASA Technical Reports Server (NTRS)

    Jorgensen, Charles C. (Inventor); Lee, Diana D. (Inventor); Agabon, Shane T. (Inventor)

    2012-01-01

    Method and system for processing and identifying a sub-audible signal formed by a source of sub-audible sounds. Sequences of samples of sub-audible sound patterns ("SASPs") for known words/phrases in a selected database are received for overlapping time intervals, and Signal Processing Transforms ("SPTs") are formed for each sample, as part of a matrix of entry values. The matrix is decomposed into contiguous, non-overlapping two-dimensional cells of entries, and neural net analysis is applied to estimate reference sets of weight coefficients that provide sums with optimal matches to reference sets of values. The reference sets of weight coefficients are used to determine a correspondence between a new (unknown) word/phrase and a word/phrase in the database.

  6. Acoustic Variations in Adductor Spasmodic Dysphonia as a Function of Speech Task.

    ERIC Educational Resources Information Center

    Sapienza, Christine M.; Walton, Suzanne; Murry, Thomas

    1999-01-01

    Acoustic phonatory events were identified in 14 women diagnosed with adductor spasmodic dysphonia (ADSD), a focal laryngeal dystonia that disturbs phonatory function, and compared with those of 14 age-matched women with no vocal dysfunction. Findings indicated ADSD subjects produced more aberrant acoustic events than controls during tasks of…

  7. Ecology of acoustic signalling and the problem of masking interference in insects.

    PubMed

    Schmidt, Arne K D; Balakrishnan, Rohini

    2015-01-01

    The efficiency of long-distance acoustic signalling of insects in their natural habitat is constrained in several ways. Acoustic signals are not only subjected to changes imposed by the physical structure of the habitat such as attenuation and degradation but also to masking interference from co-occurring signals of other acoustically communicating species. Masking interference is likely to be a ubiquitous problem in multi-species assemblages, but successful communication in natural environments under noisy conditions suggests powerful strategies to deal with the detection and recognition of relevant signals. In this review we present recent work on the role of the habitat as a driving force in shaping insect signal structures. In the context of acoustic masking interference, we discuss the ecological niche concept and examine the role of acoustic resource partitioning in the temporal, spatial and spectral domains as sender strategies to counter masking. We then examine the efficacy of different receiver strategies: physiological mechanisms such as frequency tuning, spatial release from masking and gain control as useful strategies to counteract acoustic masking. We also review recent work on the effects of anthropogenic noise on insect acoustic communication and the importance of insect sounds as indicators of biodiversity and ecosystem health.

  8. Study of acoustic emission signals during fracture shear deformation

    NASA Astrophysics Data System (ADS)

    Ostapchuk, A. A.; Pavlov, D. V.; Markov, V. K.; Krasheninnikov, A. V.

    2016-07-01

    We study acoustic manifestations of different regimes of shear deformation of a fracture filled with a thin layer of granular material. It is established that the observed acoustic portrait is determined by the structure of the fracture at the mesolevel. Joint analysis of the activity of acoustic pulses and their spectral characteristics makes it possible to construct the pattern of internal evolutionary processes occurring in the thin layer of the interblock contact and consider the fracture deformation process as the evolution of a self-organizing system.

  9. Nonlinear Statistical Modeling of Speech

    NASA Astrophysics Data System (ADS)

    Srinivasan, S.; Ma, T.; May, D.; Lazarou, G.; Picone, J.

    2009-12-01

    Contemporary approaches to speech and speaker recognition decompose the problem into four components: feature extraction, acoustic modeling, language modeling and search. Statistical signal processing is an integral part of each of these components, and Bayes Rule is used to merge these components into a single optimal choice. Acoustic models typically use hidden Markov models based on Gaussian mixture models for state output probabilities. This popular approach suffers from an inherent assumption of linearity in speech signal dynamics. Language models often employ a variety of maximum entropy techniques, but can employ many of the same statistical techniques used for acoustic models. In this paper, we focus on introducing nonlinear statistical models to the feature extraction and acoustic modeling problems as a first step towards speech and speaker recognition systems based on notions of chaos and strange attractors. Our goal in this work is to improve the generalization and robustness properties of a speech recognition system. Three nonlinear invariants are proposed for feature extraction: Lyapunov exponents, correlation fractal dimension, and correlation entropy. We demonstrate an 11% relative improvement on speech recorded under noise-free conditions, but show a comparable degradation occurs for mismatched training conditions on noisy speech. We conjecture that the degradation is due to difficulties in estimating invariants reliably from noisy data. To circumvent these problems, we introduce two dynamic models to the acoustic modeling problem: (1) a linear dynamic model (LDM) that uses a state space-like formulation to explicitly model the evolution of hidden states using an autoregressive process, and (2) a data-dependent mixture of autoregressive (MixAR) models. Results show that LDM and MixAR models can achieve comparable performance with HMM systems while using significantly fewer parameters. Currently we are developing Bayesian parameter estimation and

  10. Estimation of the Tool Condition by Applying the Wavelet Transform to Acoustic Emission Signals

    SciTech Connect

    Gomez, M. P.; Piotrkowski, R.; Ruzzante, J. E.; D'Attellis, C. E.

    2007-03-21

    This work follows the search of parameters to evaluate the tool condition in machining processes. The selected sensing technique is acoustic emission and it is applied to a turning process of steel samples. The obtained signals are studied using the wavelet transformation. The tool wear level is quantified as a percentage of the final wear specified by the Standard ISO 3685. The amplitude and relevant scale obtained of acoustic emission signals could be related with the wear level.

  11. Synergy of seismic, acoustic, and video signals in blast analysis

    SciTech Connect

    Anderson, D.P.; Stump, B.W.; Weigand, J.

    1997-09-01

    The range of mining applications from hard rock quarrying to coal exposure to mineral recovery leads to a great variety of blasting practices. A common characteristic of many of the sources is that they are detonated at or near the earth`s surface and thus can be recorded by camera or video. Although the primary interest is in the seismic waveforms that these blasts generate, the visual observations of the blasts provide important constraints that can be applied to the physical interpretation of the seismic source function. In particular, high speed images can provide information on detonation times of individuals charges, the timing and amount of mass movement during the blasting process and, in some instances, evidence of wave propagation away from the source. All of these characteristics can be valuable in interpreting the equivalent seismic source function for a set of mine explosions and quantifying the relative importance of the different processes. This paper documents work done at the Los Alamos National Laboratory and Southern Methodist University to take standard Hi-8 video of mine blasts, recover digital images from them, and combine them with ground motion records for interpretation. The steps in the data acquisition, processing, display, and interpretation are outlined. The authors conclude that the combination of video with seismic and acoustic signals can be a powerful diagnostic tool for the study of blasting techniques and seismology. A low cost system for generating similar diagnostics using consumer-grade video camera and direct-to-disk video hardware is proposed. Application is to verification of the Comprehensive Test Ban Treaty.

  12. Graph-based sensor fusion for classification of transient acoustic signals.

    PubMed

    Srinivas, Umamahesh; Nasrabadi, Nasser M; Monga, Vishal

    2015-03-01

    Advances in acoustic sensing have enabled the simultaneous acquisition of multiple measurements of the same physical event via co-located acoustic sensors. We exploit the inherent correlation among such multiple measurements for acoustic signal classification, to identify the launch/impact of munition (i.e., rockets, mortars). Specifically, we propose a probabilistic graphical model framework that can explicitly learn the class conditional correlations between the cepstral features extracted from these different measurements. Additionally, we employ symbolic dynamic filtering-based features, which offer improvements over the traditional cepstral features in terms of robustness to signal distortions. Experiments on real acoustic data sets show that our proposed algorithm outperforms conventional classifiers as well as the recently proposed joint sparsity models for multisensor acoustic classification. Additionally our proposed algorithm is less sensitive to insufficiency in training samples compared to competing approaches. PMID:25014986

  13. Speech-on-speech masking with variable access to the linguistic content of the masker speech.

    PubMed

    Calandruccio, Lauren; Dhar, Sumitrajit; Bradlow, Ann R

    2010-08-01

    It has been reported that listeners can benefit from a release in masking when the masker speech is spoken in a language that differs from the target speech compared to when the target and masker speech are spoken in the same language [Freyman, R. L. et al. (1999). J. Acoust. Soc. Am. 106, 3578-3588; Van Engen, K., and Bradlow, A. (2007), J. Acoust. Soc. Am. 121, 519-526]. It is unclear whether listeners benefit from this release in masking due to the lack of linguistic interference of the masker speech, from acoustic and phonetic differences between the target and masker languages, or a combination of these differences. In the following series of experiments, listeners' sentence recognition was evaluated using speech and noise maskers that varied in the amount of linguistic content, including native-English, Mandarin-accented English, and Mandarin speech. Results from three experiments indicated that the majority of differences observed between the linguistic maskers could be explained by spectral differences between the masker conditions. However, when the recognition task increased in difficulty, i.e., at a more challenging signal-to-noise ratio, a greater decrease in performance was observed for the maskers with more linguistically relevant information than what could be explained by spectral differences alone. PMID:20707455

  14. Direct classification of all American English phonemes using signals from functional speech motor cortex

    NASA Astrophysics Data System (ADS)

    Mugler, Emily M.; Patton, James L.; Flint, Robert D.; Wright, Zachary A.; Schuele, Stephan U.; Rosenow, Joshua; Shih, Jerry J.; Krusienski, Dean J.; Slutzky, Marc W.

    2014-06-01

    Objective. Although brain-computer interfaces (BCIs) can be used in several different ways to restore communication, communicative BCI has not approached the rate or efficiency of natural human speech. Electrocorticography (ECoG) has precise spatiotemporal resolution that enables recording of brain activity distributed over a wide area of cortex, such as during speech production. In this study, we sought to decode elements of speech production using ECoG. Approach. We investigated words that contain the entire set of phonemes in the general American accent using ECoG with four subjects. Using a linear classifier, we evaluated the degree to which individual phonemes within each word could be correctly identified from cortical signal. Main results. We classified phonemes with up to 36% accuracy when classifying all phonemes and up to 63% accuracy for a single phoneme. Further, misclassified phonemes follow articulation organization described in phonology literature, aiding classification of whole words. Precise temporal alignment to phoneme onset was crucial for classification success. Significance. We identified specific spatiotemporal features that aid classification, which could guide future applications. Word identification was equivalent to information transfer rates as high as 3.0 bits s-1 (33.6 words min-1), supporting pursuit of speech articulation for BCI control.

  15. Acoustic emission signal classification for gearbox failure detection

    NASA Astrophysics Data System (ADS)

    Shishino, Jun

    The purpose of this research is to develop a methodology and technique to determine the optimal number of clusters in acoustic emission (AE) data obtained from a ground test stand of a rotating H-60 helicopter tail gearbox by using mathematical algorithms and visual inspection. Signs of fatigue crack growth were observed from the AE signals acquired from the result of the optimal number of clusters in a data set. Previous researches have determined the number of clusters by visually inspecting the AE plots from number of iterations. This research is focused on finding the optimal number of clusters in the data set by using mathematical algorithms then using visual verification to confirm it. The AE data were acquired from the ground test stand that simulates the tail end of an H-60 Seahawk at Naval Air Station in Patuxant River, Maryland. The data acquired were filtered to eliminate durations that were greater than 100,000 is and 0 energy hit data to investigate the failure mechanisms occurring on the output bevel gear. From the filtered data, different AE signal parameters were chosen to perform iterations to see which clustering algorithms and number of outputs is the best. The clustering algorithms utilized are the Kohonen Self-organizing Map (SOM), k-mean and Gaussian Mixture Model (GMM). From the clustering iterations, the three cluster criterion algorithms were performed to observe the suggested optimal number of cluster by the criterions. The three criterion algorithms utilized are the Davies-Bouldin, Silhouette and Tou Criterions. After the criterions had suggested the optimal number of cluster for each data set, visual verification by observing the AE plots and statistical analysis of each cluster were performed. By observing the AE plots and the statistical analysis, the optimal number of cluster in the data set and effective clustering algorithms were determined. Along with the optimal number of clusters and effective clustering algorithm, the mechanisms

  16. Revisiting Neil Armstrongs Moon-Landing Quote: Implications for Speech Perception, Function Word Reduction, and Acoustic Ambiguity.

    PubMed

    Baese-Berk, Melissa M; Dilley, Laura C; Schmidt, Stephanie; Morrill, Tuuli H; Pitt, Mark A

    2016-01-01

    Neil Armstrong insisted that his quote upon landing on the moon was misheard, and that he had said one small step for a man, instead of one small step for man. What he said is unclear in part because function words like a can be reduced and spectrally indistinguishable from the preceding context. Therefore, their presence can be ambiguous, and they may disappear perceptually depending on the rate of surrounding speech. Two experiments are presented examining production and perception of reduced tokens of for and for a in spontaneous speech. Experiment 1 investigates the distributions of several acoustic features of for and for a. The results suggest that the distributions of for and for a overlap substantially, both in terms of temporal and spectral characteristics. Experiment 2 examines perception of these same tokens when the context speaking rate differs. The perceptibility of the function word a varies as a function of this context speaking rate. These results demonstrate that substantial ambiguity exists in the original quote from Armstrong, and that this ambiguity may be understood through context speaking rate. PMID:27603209

  17. Revisiting Neil Armstrongs Moon-Landing Quote: Implications for Speech Perception, Function Word Reduction, and Acoustic Ambiguity

    PubMed Central

    Baese-Berk, Melissa M.; Dilley, Laura C.; Schmidt, Stephanie; Morrill, Tuuli H.; Pitt, Mark A.

    2016-01-01

    Neil Armstrong insisted that his quote upon landing on the moon was misheard, and that he had said one small step for a man, instead of one small step for man. What he said is unclear in part because function words like a can be reduced and spectrally indistinguishable from the preceding context. Therefore, their presence can be ambiguous, and they may disappear perceptually depending on the rate of surrounding speech. Two experiments are presented examining production and perception of reduced tokens of for and for a in spontaneous speech. Experiment 1 investigates the distributions of several acoustic features of for and for a. The results suggest that the distributions of for and for a overlap substantially, both in terms of temporal and spectral characteristics. Experiment 2 examines perception of these same tokens when the context speaking rate differs. The perceptibility of the function word a varies as a function of this context speaking rate. These results demonstrate that substantial ambiguity exists in the original quote from Armstrong, and that this ambiguity may be understood through context speaking rate. PMID:27603209

  18. Revisiting Neil Armstrongs Moon-Landing Quote: Implications for Speech Perception, Function Word Reduction, and Acoustic Ambiguity.

    PubMed

    Baese-Berk, Melissa M; Dilley, Laura C; Schmidt, Stephanie; Morrill, Tuuli H; Pitt, Mark A

    2016-01-01

    Neil Armstrong insisted that his quote upon landing on the moon was misheard, and that he had said one small step for a man, instead of one small step for man. What he said is unclear in part because function words like a can be reduced and spectrally indistinguishable from the preceding context. Therefore, their presence can be ambiguous, and they may disappear perceptually depending on the rate of surrounding speech. Two experiments are presented examining production and perception of reduced tokens of for and for a in spontaneous speech. Experiment 1 investigates the distributions of several acoustic features of for and for a. The results suggest that the distributions of for and for a overlap substantially, both in terms of temporal and spectral characteristics. Experiment 2 examines perception of these same tokens when the context speaking rate differs. The perceptibility of the function word a varies as a function of this context speaking rate. These results demonstrate that substantial ambiguity exists in the original quote from Armstrong, and that this ambiguity may be understood through context speaking rate.

  19. Improving robustness of speech recognition systems

    NASA Astrophysics Data System (ADS)

    Mitra, Vikramjit

    2010-11-01

    Current Automatic Speech Recognition (ASR) systems fail to perform nearly as good as human speech recognition performance due to their lack of robustness against speech variability and noise contamination. The goal of this dissertation is to investigate these critical robustness issues, put forth different ways to address them and finally present an ASR architecture based upon these robustness criteria. Acoustic variations adversely affect the performance of current phone-based ASR systems, in which speech is modeled as 'beads-on-a-string', where the beads are the individual phone units. While phone units are distinctive in cognitive domain, they are varying in the physical domain and their variation occurs due to a combination of factors including speech style, speaking rate etc.; a phenomenon commonly known as 'coarticulation'. Traditional ASR systems address such coarticulatory variations by using contextualized phone-units such as triphones. Articulatory phonology accounts for coarticulatory variations by modeling speech as a constellation of constricting actions known as articulatory gestures. In such a framework, speech variations such as coarticulation and lenition are accounted for by gestural overlap in time and gestural reduction in space. To realize a gesture-based ASR system, articulatory gestures have to be inferred from the acoustic signal. At the initial stage of this research an initial study was performed using synthetically generated speech to obtain a proof-of-concept that articulatory gestures can indeed be recognized from the speech signal. It was observed that having vocal tract constriction trajectories (TVs) as intermediate representation facilitated the gesture recognition task from the speech signal. Presently no natural speech database contains articulatory gesture annotation; hence an automated iterative time-warping architecture is proposed that can annotate any natural speech database with articulatory gestures and TVs. Two natural

  20. Lexico-semantic and acoustic-phonetic processes in the perception of noise-vocoded speech: implications for cochlear implantation

    PubMed Central

    McGettigan, Carolyn; Rosen, Stuart; Scott, Sophie K.

    2014-01-01

    Noise-vocoding is a transformation which, when applied to speech, severely reduces spectral resolution and eliminates periodicity, yielding a stimulus that sounds “like a harsh whisper” (Scott et al., 2000, p. 2401). This process simulates a cochlear implant, where the activity of many thousand hair cells in the inner ear is replaced by direct stimulation of the auditory nerve by a small number of tonotopically-arranged electrodes. Although a cochlear implant offers a powerful means of restoring some degree of hearing to profoundly deaf individuals, the outcomes for spoken communication are highly variable (Moore and Shannon, 2009). Some variability may arise from differences in peripheral representation (e.g., the degree of residual nerve survival) but some may reflect differences in higher-order linguistic processing. In order to explore this possibility, we used noise-vocoding to explore speech recognition and perceptual learning in normal-hearing listeners tested across several levels of the linguistic hierarchy: segments (consonants and vowels), single words, and sentences. Listeners improved significantly on all tasks across two test sessions. In the first session, individual differences analyses revealed two independently varying sources of variability: one lexico-semantic in nature and implicating the recognition of words and sentences, and the other an acoustic-phonetic factor associated with words and segments. However, consequent to learning, by the second session there was a more uniform covariance pattern concerning all stimulus types. A further analysis of phonetic feature recognition allowed greater insight into learning-related changes in perception and showed that, surprisingly, participants did not make full use of cues that were preserved in the stimuli (e.g., vowel duration). We discuss these findings in relation cochlear implantation, and suggest auditory training strategies to maximize speech recognition performance in the absence of

  1. Comparison of Acoustic and Kinematic Approaches to Measuring Utterance-Level Speech Variability

    ERIC Educational Resources Information Center

    Howell, Peter; Anderson, Andrew J.; Bartrip, Jon; Bailey, Eleanor

    2009-01-01

    Purpose: The spatiotemporal index (STI) is one measure of variability. As currently implemented, kinematic data are used, requiring equipment that cannot be used with some patient groups or in scanners. An experiment is reported that addressed whether STI can be extended to an audio measure of sound pressure of the speech envelope over time that…

  2. The Exploitation of Subphonemic Acoustic Detail in L2 Speech Segmentation

    ERIC Educational Resources Information Center

    Shoemaker, Ellenor

    2014-01-01

    The current study addresses an aspect of second language (L2) phonological acquisition that has received little attention to date--namely, the acquisition of allophonic variation as a word boundary cue. The role of subphonemic variation in the segmentation of speech by native speakers has been indisputably demonstrated; however, the acquisition of…

  3. Impact of Aberrant Acoustic Properties on the Perception of Sound Quality in Electrolarynx Speech

    ERIC Educational Resources Information Center

    Meltzner, Geoffrey S.; Hillman, Robert E.

    2005-01-01

    A large percentage of patients who have undergone laryngectomy to treat advanced laryngeal cancer rely on an electrolarynx (EL) to communicate verbally. Although serviceable, EL speech is plagued by shortcomings in both sound quality and intelligibility. This study sought to better quantify the relative contributions of previously identified…

  4. Limited condition dependence of male acoustic signals in the grasshopper Chorthippus biguttulus

    PubMed Central

    Franzke, Alexandra; Reinhold, Klaus

    2012-01-01

    In many animal species, male acoustic signals serve to attract a mate and therefore often play a major role for male mating success. Male body condition is likely to be correlated with male acoustic signal traits, which signal male quality and provide choosy females indirect benefits. Environmental factors such as food quantity or quality can influence male body condition and therefore possibly lead to condition-dependent changes in the attractiveness of acoustic signals. Here, we test whether stressing food plants influences acoustic signal traits of males via condition-dependent expression of these traits. We examined four male song characteristics, which are vital for mate choice in females of the grasshopper Chorthippus biguttulus. Only one of the examined acoustic traits, loudness, was significantly altered by changing body condition because of drought- and moisture-related stress of food plants. No condition dependence could be observed for syllable to pause ratio, gap duration within syllables, and onset accentuation. We suggest that food plant stress and therefore food plant quality led to shifts in loudness of male grasshopper songs via body condition changes. The other three examined acoustic traits of males do not reflect male body condition induced by food plant quality. PMID:22957192

  5. Design of acoustic logging signal source of imitation based on field programmable gate array

    NASA Astrophysics Data System (ADS)

    Zhang, K.; Ju, X. D.; Lu, J. Q.; Men, B. Y.

    2014-08-01

    An acoustic logging signal source of imitation is designed and realized, based on the Field Programmable Gate Array (FPGA), to improve the efficiency of examining and repairing acoustic logging tools during research and field application, and to inspect and verify acoustic receiving circuits and corresponding algorithms. The design of this signal source contains hardware design and software design,and the hardware design uses an FPGA as the control core. Four signals are made first by reading the Random Access Memory (RAM) data which are inside the FPGA, then dealing with the data by digital to analog conversion, amplification, smoothing and so on. Software design uses VHDL, a kind of hardware description language, to program the FPGA. Experiments illustrate that the ratio of signal to noise for the signal source is high, the waveforms are stable, and also its functions of amplitude adjustment, frequency adjustment and delay adjustment are in accord with the characteristics of real acoustic logging waveforms. These adjustments can be used to imitate influences on sonic logging received waveforms caused by many kinds of factors such as spacing and span of acoustic tools, sonic speeds of different layers and fluids, and acoustic attenuations of different cementation planes.

  6. A New Method to Represent Speech Signals Via Predefined Signature and Envelope Sequences

    NASA Astrophysics Data System (ADS)

    Güz, Ümit; Gürkan, Hakan; Yarman, Binboga Sıddık

    2006-12-01

    A novel systematic procedure referred to as "SYMPES" to model speech signals is introduced. The structure of SYMPES is based on the creation of the so-called predefined "signature[InlineEquation not available: see fulltext.] and envelope[InlineEquation not available: see fulltext.]" sets. These sets are speaker and language independent. Once the speech signals are divided into frames with selected lengths, then each frame sequence[InlineEquation not available: see fulltext.] is reconstructed by means of the mathematical form[InlineEquation not available: see fulltext.]. In this representation,[InlineEquation not available: see fulltext.] is called the gain factor,[InlineEquation not available: see fulltext.] and[InlineEquation not available: see fulltext.] are properly assigned from the predefined signature and envelope sets, respectively. Examples are given to exhibit the implementation of SYMPES. It is shown that for the same compression ratio or better, SYMPES yields considerably better speech quality over the commercially available coders such as G.726 (ADPCM) at 16 kbps and voice excited LPC-10E (FS1015) at[InlineEquation not available: see fulltext.] kbps.

  7. A unique method to study acoustic transmission through ducts using signal synthesis and averaging of acoustic pulses

    NASA Technical Reports Server (NTRS)

    Salikuddin, M.; Ramakrishnan, R.; Ahuja, K. K.; Brown, W. H.

    1981-01-01

    An acoustic impulse technique using a loudspeaker driver is developed to measure the acoustic properties of a duct/nozzle system. A signal synthesis method is used to generate a desired single pulse with a flat spectrum. The convolution of the desired signal and the inverse Fourier transform of the reciprocal of the driver's response are then fed to the driver. A signal averaging process eliminates the jet mixing noise from the mixture of jet noise and the internal noise, thereby allowing very low intensity signals to be measured accurately, even for high velocity jets. A theoretical analysis is carried out to predict the incident sound field; this is used to help determine the number and locations of the induct measurement points to account for the contributions due to higher order modes present in the incident tube method. The impulse technique is validated by comparing experimentally determined acoustic characteristics of a duct-nozzle system with similar results obtained by the impedance tube method. Absolute agreement in the comparisons was poor, but the overall shapes of the time histories and spectral distributions were much alike.

  8. Diagnostics of DC and Induction Motors Based on the Analysis of Acoustic Signals

    NASA Astrophysics Data System (ADS)

    Glowacz, A.

    2014-10-01

    In this paper, a non-invasive method of early fault diagnostics of electric motors was proposed. This method uses acoustic signals generated by electric motors. Essential features were extracted from acoustic signals of motors. A plan of study of acoustic signals of electric motors was proposed. Researches were carried out for faultless induction motor, induction motor with one faulty rotor bar, induction motor with two faulty rotor bars and flawless Direct Current, and Direct Current motor with shorted rotor coils. Researches were carried out for methods of signal processing: log area ratio coefficients, Multiple signal classification, Nearest Neighbor classifier and the Bayes classifier. A pattern creation process was carried out using 40 samples of sound. In the identification process 130 five-second test samples were used. The proposed approach will also reduce the costs of maintenance and the number of faulty motors in the industry.

  9. Call transmission efficiency in native and invasive anurans: competing hypotheses of divergence in acoustic signals.

    PubMed

    Llusia, Diego; Gómez, Miguel; Penna, Mario; Márquez, Rafael

    2013-01-01

    Invasive species are a leading cause of the current biodiversity decline, and hence examining the major traits favouring invasion is a key and long-standing goal of invasion biology. Despite the prominent role of the advertisement calls in sexual selection and reproduction, very little attention has been paid to the features of acoustic communication of invasive species in nonindigenous habitats and their potential impacts on native species. Here we compare for the first time the transmission efficiency of the advertisement calls of native and invasive species, searching for competitive advantages for acoustic communication and reproduction of introduced taxa, and providing insights into competing hypotheses in evolutionary divergence of acoustic signals: acoustic adaptation vs. morphological constraints. Using sound propagation experiments, we measured the attenuation rates of pure tones (0.2-5 kHz) and playback calls (Lithobates catesbeianus and Pelophylax perezi) across four distances (1, 2, 4, and 8 m) and over two substrates (water and soil) in seven Iberian localities. All factors considered (signal type, distance, substrate, and locality) affected transmission efficiency of acoustic signals, which was maximized with lower frequency sounds, shorter distances, and over water surface. Despite being broadcast in nonindigenous habitats, the advertisement calls of invasive L. catesbeianus were propagated more efficiently than those of the native species, in both aquatic and terrestrial substrates, and in most of the study sites. This implies absence of optimal relationship between native environments and propagation of acoustic signals in anurans, in contrast to what predicted by the acoustic adaptation hypothesis, and it might render these vertebrates particularly vulnerable to intrusion of invasive species producing low frequency signals, such as L. catesbeianus. Our findings suggest that mechanisms optimizing sound transmission in native habitat can play a less

  10. Call Transmission Efficiency in Native and Invasive Anurans: Competing Hypotheses of Divergence in Acoustic Signals

    PubMed Central

    Llusia, Diego; Gómez, Miguel; Penna, Mario; Márquez, Rafael

    2013-01-01

    Invasive species are a leading cause of the current biodiversity decline, and hence examining the major traits favouring invasion is a key and long-standing goal of invasion biology. Despite the prominent role of the advertisement calls in sexual selection and reproduction, very little attention has been paid to the features of acoustic communication of invasive species in nonindigenous habitats and their potential impacts on native species. Here we compare for the first time the transmission efficiency of the advertisement calls of native and invasive species, searching for competitive advantages for acoustic communication and reproduction of introduced taxa, and providing insights into competing hypotheses in evolutionary divergence of acoustic signals: acoustic adaptation vs. morphological constraints. Using sound propagation experiments, we measured the attenuation rates of pure tones (0.2–5 kHz) and playback calls (Lithobates catesbeianus and Pelophylax perezi) across four distances (1, 2, 4, and 8 m) and over two substrates (water and soil) in seven Iberian localities. All factors considered (signal type, distance, substrate, and locality) affected transmission efficiency of acoustic signals, which was maximized with lower frequency sounds, shorter distances, and over water surface. Despite being broadcast in nonindigenous habitats, the advertisement calls of invasive L. catesbeianus were propagated more efficiently than those of the native species, in both aquatic and terrestrial substrates, and in most of the study sites. This implies absence of optimal relationship between native environments and propagation of acoustic signals in anurans, in contrast to what predicted by the acoustic adaptation hypothesis, and it might render these vertebrates particularly vulnerable to intrusion of invasive species producing low frequency signals, such as L. catesbeianus. Our findings suggest that mechanisms optimizing sound transmission in native habitat can play a

  11. Speech Signal and Facial Image Processing for Obstructive Sleep Apnea Assessment

    PubMed Central

    Espinoza-Cuadros, Fernando; Fernández-Pozo, Rubén; Toledano, Doroteo T.; Alcázar-Ramírez, José D.; López-Gonzalo, Eduardo; Hernández-Gómez, Luis A.

    2015-01-01

    Obstructive sleep apnea (OSA) is a common sleep disorder characterized by recurring breathing pauses during sleep caused by a blockage of the upper airway (UA). OSA is generally diagnosed through a costly procedure requiring an overnight stay of the patient at the hospital. This has led to proposing less costly procedures based on the analysis of patients' facial images and voice recordings to help in OSA detection and severity assessment. In this paper we investigate the use of both image and speech processing to estimate the apnea-hypopnea index, AHI (which describes the severity of the condition), over a population of 285 male Spanish subjects suspected to suffer from OSA and referred to a Sleep Disorders Unit. Photographs and voice recordings were collected in a supervised but not highly controlled way trying to test a scenario close to an OSA assessment application running on a mobile device (i.e., smartphones or tablets). Spectral information in speech utterances is modeled by a state-of-the-art low-dimensional acoustic representation, called i-vector. A set of local craniofacial features related to OSA are extracted from images after detecting facial landmarks using Active Appearance Models (AAMs). Support vector regression (SVR) is applied on facial features and i-vectors to estimate the AHI. PMID:26664493

  12. Dolphin's echolocation signals in a complicated acoustic environment

    NASA Astrophysics Data System (ADS)

    Ivanov, M. P.

    2004-07-01

    Echolocation abilities of a dolphin ( Tursiops truncatus ponticus) were investigated in laboratory conditions. The experiment was carried out in an open cage using an acoustic control over the behavior of the animal detecting underwater objects in a complicated acoustic environment. Targets of different strength were used as test objects. The dolphin was found to be able to detect objects at distances exceeding 650 m. For the target location, the dolphin used both single-pulse and multipulse echolocation modes. Time characteristics of echolocation pulses and time sequences of pulses as functions of the distance to the target were obtained.

  13. System and method for investigating sub-surface features of a rock formation with acoustic sources generating coded signals

    SciTech Connect

    Vu, Cung Khac; Nihei, Kurt; Johnson, Paul A; Guyer, Robert; Ten Cate, James A; Le Bas, Pierre-Yves; Larmat, Carene S

    2014-12-30

    A system and a method for investigating rock formations includes generating, by a first acoustic source, a first acoustic signal comprising a first plurality of pulses, each pulse including a first modulated signal at a central frequency; and generating, by a second acoustic source, a second acoustic signal comprising a second plurality of pulses. A receiver arranged within the borehole receives a detected signal including a signal being generated by a non-linear mixing process from the first-and-second acoustic signal in a non-linear mixing zone within the intersection volume. The method also includes-processing the received signal to extract the signal generated by the non-linear mixing process over noise or over signals generated by a linear interaction process, or both.

  14. Two-microphone spatial filtering provides speech reception benefits for cochlear implant users in difficult acoustic environments

    PubMed Central

    Goldsworthy, Raymond L.; Delhorne, Lorraine A.; Desloge, Joseph G.; Braida, Louis D.

    2014-01-01

    This article introduces and provides an assessment of a spatial-filtering algorithm based on two closely-spaced (∼1 cm) microphones in a behind-the-ear shell. The evaluated spatial-filtering algorithm used fast (∼10 ms) temporal-spectral analysis to determine the location of incoming sounds and to enhance sounds arriving from straight ahead of the listener. Speech reception thresholds (SRTs) were measured for eight cochlear implant (CI) users using consonant and vowel materials under three processing conditions: An omni-directional response, a dipole-directional response, and the spatial-filtering algorithm. The background noise condition used three simultaneous time-reversed speech signals as interferers located at 90°, 180°, and 270°. Results indicated that the spatial-filtering algorithm can provide speech reception benefits of 5.8 to 10.7 dB SRT compared to an omni-directional response in a reverberant room with multiple noise sources. Given the observed SRT benefits, coupled with an efficient design, the proposed algorithm is promising as a CI noise-reduction solution. PMID:25096120

  15. An information processing method for acoustic emission signal inspired from musical staff

    NASA Astrophysics Data System (ADS)

    Zheng, Wei; Wu, Chunxian

    2016-01-01

    This study proposes a musical-staff-inspired signal processing method for standard description expressions for discrete signals and describing the integrated characteristics of acoustic emission (AE) signals. The method maps various AE signals with complex environments into the normalized musical space. Four new indexes are proposed to comprehensively describe the signal. Several key features, such as contour, amplitude, and signal changing rate, are quantitatively expressed in a normalized musical space. The processed information requires only a small storage space to maintain high fidelity. The method is illustrated by using experiments on sandstones and computed tomography (CT) scanning to determine its validity for AE signal processing.

  16. MOOD STATE PREDICTION FROM SPEECH OF VARYING ACOUSTIC QUALITY FOR INDIVIDUALS WITH BIPOLAR DISORDER

    PubMed Central

    Gideon, John; Provost, Emily Mower; McInnis, Melvin

    2016-01-01

    Speech contains patterns that can be altered by the mood of an individual. There is an increasing focus on automated and distributed methods to collect and monitor speech from large groups of patients suffering from mental health disorders. However, as the scope of these collections increases, the variability in the data also increases. This variability is due in part to the range in the quality of the devices, which in turn affects the quality of the recorded data, negatively impacting the accuracy of automatic assessment. It is necessary to mitigate variability effects in order to expand the impact of these technologies. This paper explores speech collected from phone recordings for analysis of mood in individuals with bipolar disorder. Two different phones with varying amounts of clipping, loudness, and noise are employed. We describe methodologies for use during preprocessing, feature extraction, and data modeling to correct these differences and make the devices more comparable. The results demonstrate that these pipeline modifications result in statistically significantly higher performance, which highlights the potential of distributed mental health systems. PMID:27570493

  17. Acoustic Analyses of Speech Sounds and Rhythms in Japanese- and English-Learning Infants

    PubMed Central

    Yamashita, Yuko; Nakajima, Yoshitaka; Ueda, Kazuo; Shimada, Yohko; Hirsh, David; Seno, Takeharu; Smith, Benjamin Alexander

    2013-01-01

    The purpose of this study was to explore developmental changes, in terms of spectral fluctuations and temporal periodicity with Japanese- and English-learning infants. Three age groups (15, 20, and 24 months) were selected, because infants diversify phonetic inventories with age. Natural speech of the infants was recorded. We utilized a critical-band-filter bank, which simulated the frequency resolution in adults’ auditory periphery. First, the correlations between the power fluctuations of the critical-band outputs represented by factor analysis were observed in order to see how the critical bands should be connected to each other, if a listener is to differentiate sounds in infants’ speech. In the following analysis, we analyzed the temporal fluctuations of factor scores by calculating autocorrelations. The present analysis identified three factors as had been observed in adult speech at 24 months of age in both linguistic environments. These three factors were shifted to a higher frequency range corresponding to the smaller vocal tract size of the infants. The results suggest that the vocal tract structures of the infants had developed to become adult-like configuration by 24 months of age in both language environments. The amount of utterances with periodic nature of shorter time increased with age in both environments. This trend was clearer in the Japanese environment. PMID:23450824

  18. The Auditory-Brainstem Response to Continuous, Non-repetitive Speech Is Modulated by the Speech Envelope and Reflects Speech Processing

    PubMed Central

    Reichenbach, Chagit S.; Braiman, Chananel; Schiff, Nicholas D.; Hudspeth, A. J.; Reichenbach, Tobias

    2016-01-01

    The auditory-brainstem response (ABR) to short and simple acoustical signals is an important clinical tool used to diagnose the integrity of the brainstem. The ABR is also employed to investigate the auditory brainstem in a multitude of tasks related to hearing, such as processing speech or selectively focusing on one speaker in a noisy environment. Such research measures the response of the brainstem to short speech signals such as vowels or words. Because the voltage signal of the ABR has a tiny amplitude, several hundred to a thousand repetitions of the acoustic signal are needed to obtain a reliable response. The large number of repetitions poses a challenge to assessing cognitive functions due to neural adaptation. Here we show that continuous, non-repetitive speech, lasting several minutes, may be employed to measure the ABR. Because the speech is not repeated during the experiment, the precise temporal form of the ABR cannot be determined. We show, however, that important structural features of the ABR can nevertheless be inferred. In particular, the brainstem responds at the fundamental frequency of the speech signal, and this response is modulated by the envelope of the voiced parts of speech. We accordingly introduce a novel measure that assesses the ABR as modulated by the speech envelope, at the fundamental frequency of speech and at the characteristic latency of the response. This measure has a high signal-to-noise ratio and can hence be employed effectively to measure the ABR to continuous speech. We use this novel measure to show that the ABR is weaker to intelligible speech than to unintelligible, time-reversed speech. The methods presented here can be employed for further research on speech processing in the auditory brainstem and can lead to the development of future clinical diagnosis of brainstem function. PMID:27303286

  19. Measuring glottal activity during voiced speech using a tuned electromagnetic resonating collar sensor

    NASA Astrophysics Data System (ADS)

    Brown, D. R., III; Keenaghan, K.; Desimini, S.

    2005-11-01

    Non-acoustic speech sensors can be employed to obtain measurements of one or more aspects of the speech production process, such as glottal activity, even in the presence of background noise. These sensors have a long history of clinical applications and have also recently been applied to the problem of denoising speech signals recorded in acoustically noisy environments (Ng et al 2000 Proc. Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP) (Istanbul, Turkey) vol 1, pp 229-32). Recently, researchers developed a new non-acoustic speech sensor based primarily on a tuned electromagnetic resonator collar (TERC) (Brown et al 2004 Meas. Sci. Technol. 15 1291). The TERC sensor measures glottal activity by sensing small changes in the dielectric properties of the glottis that result from voiced speech. This paper builds on the seminal work in Brown et al (2004). The primary contributions of this paper are (i) a description of a new single-mode TERC sensor design addressing the comfort and complexity issues of the original sensor, (ii) a complete description of new external interface systems used to obtain long-duration recordings from the TERC sensor and (iii) more extensive experimental results and analysis for the single-mode TERC sensor including spectrograms of speech containing both voiced and unvoiced speech segments in quiet and acoustically noisy environments. The experimental results demonstrate that the single-mode TERC sensor is able to detect glottal activity up to the fourth harmonic and is also insensitive to acoustic background noise.

  20. Quantifying the Effect of Compression Hearing Aid Release Time on Speech Acoustics and Intelligibility

    ERIC Educational Resources Information Center

    Jenstad, Lorienne M.; Souza, Pamela E.

    2005-01-01

    Compression hearing aids have the inherent, and often adjustable, feature of release time from compression. Research to date does not provide a consensus on how to choose or set release time. The current study had 2 purposes: (a) a comprehensive evaluation of the acoustic effects of release time for a single-channel compression system in quiet and…

  1. Depression Diagnoses and Fundamental Frequency-Based Acoustic Cues in Maternal Infant-Directed Speech

    ERIC Educational Resources Information Center

    Porritt, Laura L.; Zinser, Michael C.; Bachorowski, Jo-Anne; Kaplan, Peter S.

    2014-01-01

    F[subscript 0]-based acoustic measures were extracted from a brief, sentence-final target word spoken during structured play interactions between mothers and their 3- to 14-month-old infants and were analyzed based on demographic variables and DSM-IV Axis-I clinical diagnoses and their common modifiers. F[subscript 0] range (?F[subscript 0]) was…

  2. Linguistic Emphasis in Maternal Speech to Preschool Language Learners with Language Impairments: An Acoustical Perspective.

    ERIC Educational Resources Information Center

    Scheffel, Debora L.; Ingrisano, Dennis R-S

    2000-01-01

    The nature of linguistic emphasis was studied from audio recordings of 29 mother-child dyads. Nineteen dyads involved mothers interacting with their 4-year-olds who evidenced language impairments. Approximately 84 percent of children evidencing language impairments could be so classified based on the acoustic variables associated with maternal use…

  3. Exploration of Acoustic Features for Automatic Vowel Discrimination in Spontaneous Speech

    ERIC Educational Resources Information Center

    Tyson, Na'im R.

    2012-01-01

    In an attempt to understand what acoustic/auditory feature sets motivated transcribers towards certain labeling decisions, I built machine learning models that were capable of discriminating between canonical and non-canonical vowels excised from the Buckeye Corpus. Specifically, I wanted to model when the dictionary form and the transcribed-form…

  4. Intelligibility of Telephone Speech for the Hearing Impaired When Various Microphones Are Used for Acoustic Coupling.

    ERIC Educational Resources Information Center

    Janota, Claus P.; Janota, Jeanette Olach

    1991-01-01

    Various candidate microphones were evaluated for acoustic coupling of hearing aids to a telephone receiver. Results from testing by 9 hearing-impaired adults found comparable listening performance with a pressure gradient microphone at a 10 decibel higher level of interfering noise than with a normal pressure-sensitive microphone. (Author/PB)

  5. Copula filtration of spoken language signals on the background of acoustic noise

    NASA Astrophysics Data System (ADS)

    Kolchenko, Lilia V.; Sinitsyn, Rustem B.

    2010-09-01

    This paper is devoted to the filtration of acoustic signals on the background of acoustic noise. Signal filtering is done with the help of a nonlinear analogue of a correlation function - a copula. The copula is estimated with the help of kernel estimates of the cumulative distribution function. At the second stage we suggest a new procedure of adaptive filtering. The silence and sound intervals are detected before the filtration with the help of nonparametric algorithm. The results are confirmed by experimental processing of spoken language signals.

  6. Speech, Speech!

    ERIC Educational Resources Information Center

    McComb, Gordon

    1982-01-01

    Discussion focuses on the nature of computer-generated speech and voice synthesis today. State-of-the-art devices for home computers are called text-to-speech (TTS) systems. Details about the operation and use of TTS synthesizers are provided, and the time saving in programing over previous methods is emphasized. (MP)

  7. [Shape acoustical recognition and characteristics of sonar signals by the dolphin T. truncatus].

    PubMed

    Dziedzic, A; Alcuri, G

    1977-10-17

    During the shape acoustical recognition process, the signal processing reveals two phases in the T. truncatus sonar emission. In the course of the first phase, the wide-band signals are invariant, during the second phase, near the end of the approach, their temporal and spectral characteristics change along with the shape of the objects to identify.

  8. Characterizing, synthesizing, and/or canceling out acoustic signals from sound sources

    DOEpatents

    Holzrichter, John F.; Ng, Lawrence C.

    2007-03-13

    A system for characterizing, synthesizing, and/or canceling out acoustic signals from inanimate and animate sound sources. Electromagnetic sensors monitor excitation sources in sound producing systems, such as animate sound sources such as the human voice, or from machines, musical instruments, and various other structures. Acoustical output from these sound producing systems is also monitored. From such information, a transfer function characterizing the sound producing system is generated. From the transfer function, acoustical output from the sound producing system may be synthesized or canceled. The systems disclosed enable accurate calculation of transfer functions relating specific excitations to specific acoustical outputs. Knowledge of such signals and functions can be used to effect various sound replication, sound source identification, and sound cancellation applications.

  9. Real-time GMAW quality classification using an artificial neural network with airborne acoustic signals as inputs

    SciTech Connect

    Matteson, A.; Morris, R.; Tate, R.

    1993-12-31

    The acoustic signal produced by the gas metal arc welding (GMAW) arc contains information about the behavior of the arc column, the molten pool and droplet transfer. It is possible to detect some defect producing conditions from the acoustic signal from the GMAW arc. An intelligent sensor, called the Weld Acoustic Monitor (WAM) has been developed to take advantage of this acoustic information in order to provide real-time quality assessment information for process control. The WAM makes use of an Artificial Neural Network (ANN) to classify the characteristic arc acoustic signals of acceptable and unacceptable welds. The ANN used in the Weld Acoustic Monitor developed its own set of rules for this classification problem by learning a data base of known GMAW acoustic signals.

  10. Sources and Radiation Patterns of Volcano-Acoustic Signals Investigated with Field-Scale Chemical Explosions

    NASA Astrophysics Data System (ADS)

    Bowman, D. C.; Lees, J. M.; Taddeucci, J.; Graettinger, A. H.; Sonder, I.; Valentine, G.

    2014-12-01

    We investigate the processes that give rise to complex acoustic signals during volcanic blasts by monitoring buried chemical explosions with infrasound and audio range microphones, strong motion sensors, and high speed imagery. Acoustic waveforms vary with scaled depth of burial (SDOB, units in meters per cube root of joules), ranging from high amplitude, impulsive, gas expansion dominated signals at low SDOB to low amplitude, longer duration, ground motion dominated signals at high SDOB. Typically, the sudden upward acceleration of the substrate above the blast produces the first acoustic arrival, followed by a second pulse due to the eruption of pressurized gas at the surface. Occasionally, a third overpressure occurs when displaced material decelerates upon impact with the ground. The transition between ground motion dominated and gas release dominated acoustics ranges between 0.0038-0.0018 SDOB, respectively. For example, one explosion registering an SDOB=0.0031 produced two overpressure pulses of approximately equal amplitude, one due to ground motion, the other to gas release. Recorded volcano infrasound has also identified distinct ground motion and gas release components during explosions at Sakurajima, Santiaguito, and Karymsky volcanoes. Our results indicate that infrasound records may provide a proxy for the depth and energy of these explosions. Furthermore, while magma fragmentation models indicate the possibility of several explosions during a single vulcanian eruption (Alidibirov, Bull Volc., 1994), our results suggest that a single explosion can also produce complex acoustic signals. Thus acoustic records alone cannot be used to distinguish between single explosions and multiple closely-spaced blasts at volcanoes. Results from a series of lateral blasts during the 2014 field experiment further indicates whether vent geometry can produce directional acoustic radiation patterns like those observed at Tungarahua volcano (Kim et al., GJI, 2012). Beside

  11. The evolutionary origins of ritualized acoustic signals in caterpillars.

    PubMed

    Scott, Jaclyn L; Kawahara, Akito Y; Skevington, Jeffrey H; Yen, Shen-Horn; Sami, Abeer; Smith, Myron L; Yack, Jayne E

    2010-01-01

    Animal communication signals can be highly elaborate, and researchers have long sought explanations for their evolutionary origins. For example, how did signals such as the tail-fan display of a peacock, a firefly flash or a wolf howl evolve? Animal communication theory holds that many signals evolved from non-signalling behaviours through the process of ritualization. Empirical evidence for ritualization is limited, as it is necessary to examine living relatives with varying degrees of signal evolution within a phylogenetic framework. We examine the origins of vibratory territorial signals in caterpillars using comparative and molecular phylogenetic methods. We show that a highly ritualized vibratory signal--anal scraping--originated from a locomotory behaviour--walking. Furthermore, comparative behavioural analysis supports the hypothesis that ritualized vibratory signals derive from physical fighting behaviours. Thus, contestants signal their opponents to avoid the cost of fighting. Our study provides experimental evidence for the origins of a complex communication signal, through the process of ritualization.

  12. Speech signal filtration using double-density dual-tree complex wavelet transform

    NASA Astrophysics Data System (ADS)

    Yasin, A. S.; Pavlova, O. N.; Pavlov, A. N.

    2016-08-01

    We consider the task of increasing the quality of speech signal cleaning from additive noise by means of double-density dual-tree complex wavelet transform (DDCWT) as compared to the standard method of wavelet filtration based on a multiscale analysis using discrete wavelet transform (DWT) with real basis set functions such as Daubechies wavelets. It is shown that the use of DDCWT instead of DWT provides a significant increase in the mean opinion score (MOS) rating at a high additive noise and makes it possible to reduce the number of expansion levels for the subsequent correction of wavelet coefficients.

  13. Applications for Subvocal Speech

    NASA Technical Reports Server (NTRS)

    Jorgensen, Charles; Betts, Bradley

    2007-01-01

    A research and development effort now underway is directed toward the use of subvocal speech for communication in settings in which (1) acoustic noise could interfere excessively with ordinary vocal communication and/or (2) acoustic silence or secrecy of communication is required. By "subvocal speech" is meant sub-audible electromyographic (EMG) signals, associated with speech, that are acquired from the surface of the larynx and lingual areas of the throat. Topics addressed in this effort include recognition of the sub-vocal EMG signals that represent specific original words or phrases; transformation (including encoding and/or enciphering) of the signals into forms that are less vulnerable to distortion, degradation, and/or interception; and reconstruction of the original words or phrases at the receiving end of a communication link. Potential applications include ordinary verbal communications among hazardous- material-cleanup workers in protective suits, workers in noisy environments, divers, and firefighters, and secret communications among law-enforcement officers and military personnel in combat and other confrontational situations.

  14. Pulse analysis of acoustic emission signals. Ph.D. Thesis

    NASA Technical Reports Server (NTRS)

    Houghton, J. R.

    1976-01-01

    A method for the signature analysis of pulses in the frequency domain and the time domain is presented. Fourier spectrum, Fourier transfer function, shock spectrum and shock spectrum ratio are examined in the frequency domain analysis, and pulse shape deconvolution is developed for use in the time domain analysis. To demonstrate the relative sensitivity of each of the methods to small changes in the pulse shape, signatures of computer modeled systems with analytical pulses are presented. Optimization techniques are developed and used to indicate the best design parameters values for deconvolution of the pulse shape. Several experiments are presented that test the pulse signature analysis methods on different acoustic emission sources. These include acoustic emissions associated with: (1) crack propagation, (2) ball dropping on a plate, (3) spark discharge and (4) defective and good ball bearings.

  15. Surface Roughness Evaluation Based on Acoustic Emission Signals in Robot Assisted Polishing

    PubMed Central

    de Agustina, Beatriz; Marín, Marta María; Teti, Roberto; Rubio, Eva María

    2014-01-01

    The polishing process is the most common technology used in applications where a high level of surface quality is demanded. The automation of polishing processes is especially difficult due to the high level of skill and dexterity that is required. Much of this difficulty arises because of the lack of reliable data on the effect of the polishing parameters on the resulting surface roughness. An experimental study was developed to evaluate the surface roughness obtained during Robot Assisted Polishing processes by the analysis of acoustic emission signals in the frequency domain. The aim is to find out a trend of a feature or features calculated from the acoustic emission signals detected along the process. Such an evaluation was made with the objective of collecting valuable information for the establishment of the end point detection of polishing process. As a main conclusion, it can be affirmed that acoustic emission (AE) signals can be considered useful to monitor the polishing process state. PMID:25405509

  16. Surface roughness evaluation based on acoustic emission signals in robot assisted polishing.

    PubMed

    de Agustina, Beatriz; Marín, Marta María; Teti, Roberto; Rubio, Eva María

    2014-11-14

    The polishing process is the most common technology used in applications where a high level of surface quality is demanded. The automation of polishing processes is especially difficult due to the high level of skill and dexterity that is required. Much of this difficulty arises because of the lack of reliable data on the effect of the polishing parameters on the resulting surface roughness. An experimental study was developed to evaluate the surface roughness obtained during Robot Assisted Polishing processes by the analysis of acoustic emission signals in the frequency domain. The aim is to find out a trend of a feature or features calculated from the acoustic emission signals detected along the process. Such an evaluation was made with the objective of collecting valuable information for the establishment of the end point detection of polishing process. As a main conclusion, it can be affirmed that acoustic emission (AE) signals can be considered useful to monitor the polishing process state.

  17. Predicting Speech Intelligibility with a Multiple Speech Subsystems Approach in Children with Cerebral Palsy

    ERIC Educational Resources Information Center

    Lee, Jimin; Hustad, Katherine C.; Weismer, Gary

    2014-01-01

    Purpose: Speech acoustic characteristics of children with cerebral palsy (CP) were examined with a multiple speech subsystems approach; speech intelligibility was evaluated using a prediction model in which acoustic measures were selected to represent three speech subsystems. Method: Nine acoustic variables reflecting different subsystems, and…

  18. The effect of habitat acoustics on common marmoset vocal signal transmission.

    PubMed

    Morrill, Ryan J; Thomas, A Wren; Schiel, Nicola; Souto, Antonio; Miller, Cory T

    2013-09-01

    Noisy acoustic environments present several challenges for the evolution of acoustic communication systems. Among the most significant is the need to limit degradation of spectro-temporal signal structure in order to maintain communicative efficacy. This can be achieved by selecting for several potentially complementary processes. Selection can act on behavioral mechanisms permitting signalers to control the timing and occurrence of signal production to avoid acoustic interference. Likewise, the signal itself may be the target of selection, biasing the evolution of its structure to comprise acoustic features that avoid interference from ambient noise or degrade minimally in the habitat. Here, we address the latter topic for common marmoset (Callithrix jacchus) long-distance contact vocalizations, known as phee calls. Our aim was to test whether this vocalization is specifically adapted for transmission in a species-typical forest habitat, the Atlantic forests of northeastern Brazil. We combined seasonal analyses of ambient habitat acoustics with experiments in which pure tones, clicks, and vocalizations were broadcast and rerecorded at different distances to characterize signal degradation in the habitat. Ambient sound was analyzed from intervals throughout the day and over rainy and dry seasons, showing temporal regularities across varied timescales. Broadcast experiment results indicated that the tone and click stimuli showed the typically inverse relationship between frequency and signaling efficacy. Although marmoset phee calls degraded over distance with marked predictability compared with artificial sounds, they did not otherwise appear to be specially designed for increased transmission efficacy or minimal interference in this habitat. We discuss these data in the context of other similar studies and evidence of potential behavioral mechanisms for avoiding acoustic interference in order to maintain effective vocal communication in common marmosets.

  19. The Effect of Habitat Acoustics on Common Marmoset Vocal Signal Transmission

    PubMed Central

    MORRILL, RYAN J.; THOMAS, A. WREN; SCHIEL, NICOLA; SOUTO, ANTONIO; MILLER, CORY T.

    2013-01-01

    Noisy acoustic environments present several challenges for the evolution of acoustic communication systems. Among the most significant is the need to limit degradation of spectro-temporal signal structure in order to maintain communicative efficacy. This can be achieved by selecting for several potentially complementary processes. Selection can act on behavioral mechanisms permitting signalers to control the timing and occurrence of signal production to avoid acoustic interference. Likewise, the signal itself may be the target of selection, biasing the evolution of its structure to comprise acoustic features that avoid interference from ambient noise or degrade minimally in the habitat. Here, we address the latter topic for common marmoset (Callithrix jacchus) long-distance contact vocalizations, known as phee calls. Our aim was to test whether this vocalization is specifically adapted for transmission in a species-typical forest habitat, the Atlantic forests of northeastern Brazil. We combined seasonal analyses of ambient habitat acoustics with experiments in which pure tones, clicks, and vocalizations were broadcast and rerecorded at different distances to characterize signal degradation in the habitat. Ambient sound was analyzed from intervals throughout the day and over rainy and dry seasons, showing temporal regularities across varied timescales. Broadcast experiment results indicated that the tone and click stimuli showed the typically inverse relationship between frequency and signaling efficacy. Although marmoset phee calls degraded over distance with marked predictability compared with artificial sounds, they did not otherwise appear to be specially designed for increased transmission efficacy or minimal interference in this habitat. We discuss these data in the context of other similar studies and evidence of potential behavioral mechanisms for avoiding acoustic interference in order to maintain effective vocal communication in common marmosets. PMID

  20. System and method for investigating sub-surface features of a rock formation with acoustic sources generating conical broadcast signals

    SciTech Connect

    Vu, Cung Khac; Skelt, Christopher; Nihei, Kurt; Johnson, Paul A.; Guyer, Robert; Ten Cate, James A.; Le Bas, Pierre -Yves; Larmat, Carene S.

    2015-08-18

    A method of interrogating a formation includes generating a conical acoustic signal, at a first frequency--a second conical acoustic signal at a second frequency each in the between approximately 500 Hz and 500 kHz such that the signals intersect in a desired intersection volume outside the borehole. The method further includes receiving, a difference signal returning to the borehole resulting from a non-linear mixing of the signals in a mixing zone within the intersection volume.

  1. Improving the speech intelligibility in classrooms

    NASA Astrophysics Data System (ADS)

    Lam, Choi Ling Coriolanus

    of the reverberation time, the indoor ambient noise (or background noise level), the signal-to-noise ratio, and the speech transmission index, it aims to establish a guideline for improving the speech intelligibility in classrooms for any countries and any environmental conditions. The study showed that the acoustical conditions of most of the measured classrooms in Hong Kong are unsatisfactory. The selection of materials inside a classroom is important for improving speech intelligibility at design stage, especially the acoustics ceiling, to shorten the reverberation time inside the classroom. The signal-to-noise should be higher than 11dB(A) for over 70% of speech perception, either tonal or non-tonal languages, without the usage of address system. The unexpected results bring out a call to revise the standard design and to devise acceptable standards for classrooms in Hong Kong. It is also demonstrated a method for assessment on the classroom in other cities with similar environmental conditions.

  2. Frequency Characteristics of Acoustic Emission Signals from Cementitious Waste-forms with Encapsulated Al

    SciTech Connect

    Spasova, Lyubka M.; Ojovan, Michael I.

    2007-07-01

    Acoustic emission (AE) signals were continuously recorded and their intrinsic frequency characteristics examined in order to evaluate the mechanical performance of cementitious wasteform samples with encapsulated Al waste. The primary frequency in the power spectrum and its range of intensity for the detected acoustic waves were potentially related with appearance of different micro-mechanical events caused by Al corrosion within the encapsulating cement system. In addition the process of cement matrix hardening has been shown as a source of AE signals characterized with essentially higher primary frequency (above 2 MHz) compared with those due to Al corrosion development (below 40 kHz) and cement cracking (above 100 kHz). (authors)

  3. Beeping and piping: characterization of two mechano-acoustic signals used by honey bees in swarming.

    PubMed

    Schlegel, Thomas; Visscher, P Kirk; Seeley, Thomas D

    2012-12-01

    Of the many signals used by honey bees during the process of swarming, two of them--the stop signal and the worker piping signal--are not easily distinguished for both are mechano-acoustic signals produced by scout bees who press their bodies against other bees while vibrating their wing muscles. To clarify the acoustic differences between these two signals, we recorded both signals from the same swarm and at the same time, and compared them in terms of signal duration, fundamental frequency, and frequency modulation. Stop signals and worker piping signals differ in all three variables: duration, 174 ± 64 vs. 602 ± 377 ms; fundamental frequency, 407 vs. 451 Hz; and frequency modulation, absent vs. present. While it remains unclear which differences the bees use to distinguish the two signals, it is clear that they do so for the signals have opposite effects. Stop signals cause inhibition of actively dancing scout bees whereas piping signals cause excitation of quietly resting non-scout bees. PMID:23149930

  4. Search for acoustic signals from high energy cascades

    NASA Technical Reports Server (NTRS)

    Bell, R.; Bowen, T.

    1985-01-01

    High energy cosmic ray secondaries can be detected by means of the cascades they produce when they pass through matter. When the charged particles of these cascades ionize the matter they are traveling through, the heat produced and resulting thermal expansion causes a thermoacoustic wave. These sound waves travel at about one hundred-thousandth the speed of light, and should allow an array of acoustic transducers to resolve structure in the cascade to about 1 cm without high speed electronics or segmentation of the detector.

  5. Search for acoustic signals from high energy cascades

    NASA Astrophysics Data System (ADS)

    Bell, R.; Bowen, T.

    1985-08-01

    High energy cosmic ray secondaries can be detected by means of the cascades they produce when they pass through matter. When the charged particles of these cascades ionize the matter they are traveling through, the heat produced and resulting thermal expansion causes a thermoacoustic wave. These sound waves travel at about one hundred-thousandth the speed of light, and should allow an array of acoustic transducers to resolve structure in the cascade to about 1 cm without high speed electronics or segmentation of the detector.

  6. Research on power-law acoustic transient signal detection based on wavelet transform

    NASA Astrophysics Data System (ADS)

    Han, Jian-hui; Yang, Ri-jie; Wang, Wei

    2007-11-01

    Aiming at the characteristics of acoustic transient signal emitted from antisubmarine weapon which is being dropped into water (torpedo, aerial sonobuoy and rocket assisted depth charge etc.), such as short duration, low SNR, abruptness and instability, based on traditional power-law detector, a new method to detect acoustic transient signal is proposed. Firstly wavelet transform is used to de-noise signal, removes random spectrum components and improves SNR. Then Power- Law detector is adopted to detect transient signal. The simulation results show the method can effectively extract envelop characteristic of transient signal on the condition of low SNR. The performance of WT-Power-Law markedly outgoes that of traditional Power-Law detection method.

  7. Evaluating the role of spectral and envelope characteristics in the intelligibility advantage of clear speech

    PubMed Central

    Krause, Jean C.; Braida, Louis D.

    2009-01-01

    In adverse listening conditions, talkers can increase their intelligibility by speaking clearly [Picheny, M.A., et al. (1985). J. Speech Hear. Res. 28, 96–103; Payton, K. L., et al. (1994). J. Acoust. Soc. Am. 95, 1581–1592]. This modified speaking style, known as clear speech, is typically spoken more slowly than conversational speech [Picheny, M. A., et al. (1986). J. Speech Hear. Res. 29, 434–446; Uchanski, R. M., et al. (1996). J. Speech Hear. Res. 39, 494–509]. However, talkers can produce clear speech at normal rates (clear∕normal speech) with training [Krause, J. C., and Braida, L. D. (2002). J. Acoust. Soc. Am. 112, 2165–2172] suggesting that clear speech has some inherent acoustic properties, independent of rate, that contribute to its improved intelligibility. Identifying these acoustic properties could lead to improved signal processing schemes for hearing aids. Two global-level properties of clear∕normal speech that appear likely to be associated with improved intelligibility are increased energy in the 1000–3000-Hz range of long-term spectra and increased modulation depth of low-frequency modulations of the intensity envelope [Krause, J. C., and Braida, L. D. (2004). J. Acoust. Soc. Am. 115, 362–378]. In an attempt to isolate the contributions of these two properties to intelligibility, signal processing transformations were developed to manipulate each of these aspects of conversational speech independently. Results of intelligibility testing with hearing-impaired listeners and normal-hearing listeners in noise suggest that (1) increasing energy between 1000 and 3000 Hz does not fully account for the intelligibility benefit of clear∕normal speech, and (2) simple filtering of the intensity envelope is generally detrimental to intelligibility. While other manipulations of the intensity envelope are required to determine conclusively the role of this factor in intelligibility, it is also likely that additional properties important for

  8. [Quantification and improvement of speech transmission performance using headphones in acoustic stimulated functional magnetic resonance imaging].

    PubMed

    Yamamura, Ken ichiro; Takatsu, Yasuo; Miyati, Tosiaki; Kimura, Tetsuya

    2014-10-01

    Functional magnetic resonance imaging (fMRI) has made a major contribution to the understanding of higher brain function, but fMRI with auditory stimulation, used in the planning of brain tumor surgery, is often inaccurate because there is a risk that the sounds used in the trial may not be correctly transmitted to the subjects due to acoustic noise. This prompted us to devise a method of digitizing sound transmission ability from the accuracy rate of 67 syllables, classified into three types. We evaluated this with and without acoustic noise during imaging. We also improved the structure of the headphones and compared their sound transmission ability with that of conventional headphones attached to an MRI device (a GE Signa HDxt 3.0 T). We calculated and compared the sound transmission ability of the conventional headphones with that of the improved model. The 95 percent upper confidence limit (UCL) was used as the threshold for accuracy rate of hearing for both headphone models. There was a statistically significant difference between the conventional model and the improved model during imaging (p < 0.01). The rate of accuracy of the improved model was 16 percent higher. 29 and 22 syllables were accurate at a 95% UCL in the improved model and the conventional model, respectively. This study revealed the evaluation system used in this study to be useful for correctly identifying syllables during fMRI.

  9. Effects of speech style, room acoustics, and vocal fatigue on vocal effort.

    PubMed

    Bottalico, Pasquale; Graetzer, Simone; Hunter, Eric J

    2016-05-01

    Vocal effort is a physiological measure that accounts for changes in voice production as vocal loading increases. It has been quantified in terms of sound pressure level (SPL). This study investigates how vocal effort is affected by speaking style, room acoustics, and short-term vocal fatigue. Twenty subjects were recorded while reading a text at normal and loud volumes in anechoic, semi-reverberant, and reverberant rooms in the presence of classroom babble noise. The acoustics in each environment were modified by creating a strong first reflection in the talker position. After each task, the subjects answered questions addressing their perception of the vocal effort, comfort, control, and clarity of their own voice. Variation in SPL for each subject was measured per task. It was found that SPL and self-reported effort increased in the loud style and decreased when the reflective panels were present and when reverberation time increased. Self-reported comfort and control decreased in the loud style, while self-reported clarity increased when panels were present. The lowest magnitude of vocal fatigue was experienced in the semi-reverberant room. The results indicate that early reflections may be used to reduce vocal effort without modifying reverberation time.

  10. Beeping and piping: characterization of two mechano-acoustic signals used by honey bees in swarming

    NASA Astrophysics Data System (ADS)

    Schlegel, Thomas; Visscher, P. Kirk; Seeley, Thomas D.

    2012-12-01

    Of the many signals used by honey bees during the process of swarming, two of them—the stop signal and the worker piping signal—are not easily distinguished for both are mechano-acoustic signals produced by scout bees who press their bodies against other bees while vibrating their wing muscles. To clarify the acoustic differences between these two signals, we recorded both signals from the same swarm and at the same time, and compared them in terms of signal duration, fundamental frequency, and frequency modulation. Stop signals and worker piping signals differ in all three variables: duration, 174 ± 64 vs. 602 ± 377 ms; fundamental frequency, 407 vs. 451 Hz; and frequency modulation, absent vs. present. While it remains unclear which differences the bees use to distinguish the two signals, it is clear that they do so for the signals have opposite effects. Stop signals cause inhibition of actively dancing scout bees whereas piping signals cause excitation of quietly resting non-scout bees.

  11. Acoustic evidence for the development of gestural coordination in the speech of 2-year-olds: a longitudinal study.

    PubMed

    Goodell, E W; Studdert-Kennedy, M

    1993-08-01

    Studies of child phonology have often assumed that young children first master a repertoire of phonemes and then build their lexicon by forming combinations of these abstract, contrastive units. However, evidence from children's systematic errors suggests that children first build a repertoire of words as integral sequences of gestures and then gradually differentiate these sequences into their gestural and segmental components. Recently, experimental support for this position has been found in the acoustic records of the speech of 3-, 5-, and 7-year-old children, suggesting that even in older children some phonemes have not yet fully segregated as units of gestural organization and control. The present longitudinal study extends this work to younger children (22- and 32-month-olds). Results demonstrate clear differences in the duration and coordination of gestures between children and adults, and a clear shift toward the patterns of adult speakers during roughly the third year of life. Details of the child-adult differences and developmental changes vary from one aspect of an utterance to another. PMID:8377484

  12. Speech communications in noise

    NASA Technical Reports Server (NTRS)

    1984-01-01

    The physical characteristics of speech, the methods of speech masking measurement, and the effects of noise on speech communication are investigated. Topics include the speech signal and intelligibility, the effects of noise on intelligibility, the articulation index, and various devices for evaluating speech systems.

  13. Speech recognition and understanding

    SciTech Connect

    Vintsyuk, T.K.

    1983-05-01

    This article discusses the automatic processing of speech signals with the aim of finding a sequence of works (speech recognition) or a concept (speech understanding) being transmitted by the speech signal. The goal of the research is to develop an automatic typewriter that will automatically edit and type text under voice control. A dynamic programming method is proposed in which all possible class signals are stored, after which the presented signal is compared to all the stored signals during the recognition phase. Topics considered include element-by-element recognition of words of speech, learning speech recognition, phoneme-by-phoneme speech recognition, the recognition of connected speech, understanding connected speech, and prospects for designing speech recognition and understanding systems. An application of the composition dynamic programming method for the solution of basic problems in the recognition and understanding of speech is presented.

  14. Subjective evaluation of speech and noise in learning environments in the realm of classroom acoustics: Results from laboratory and field experiments

    NASA Astrophysics Data System (ADS)

    Meis, Markus; Nocke, Christian; Hofmann, Simone; Becker, Bernhard

    2005-04-01

    The impact of different acoustical conditions in learning environments on noise annoyance and the evaluation of speech quality were tested in a series of three experiments. In Experiment 1 (n=79) the auralization of seven classrooms with reverberation times from 0.55 to 3.21 s [average between 250 Hz to 2 kHz] served to develop a Semantic Differential, evaluating a simulated teacher's voice. Four factors were found: acoustical comfort, roughness, sharpness, and loudness. In Experiment 2, the effects of two classroom renovations were examined from a holistic perspective. The rooms were treated acoustically with acoustic ceilings (RT=0.5 s [250 Hz-2 kHz]) and muffling floor materials as well as non-acoustically with a new lighting system and color design. The results indicate that pupils (n=61) in renovated classrooms judged the simulated voice more positively, were less annoyed from the noise in classrooms, and were more motivated to participate in the lessons. In Experiment 3 the sound environments from six different lecture rooms (RT=0.8 to 1.39 s [250 Hz-2 kHz]) in two Universities of Oldenburg were evaluated by 321 students during the lectures. Evidence found supports the assumption that acoustical comfort in rooms is dependent on frequency for rooms with higher reverberation times.

  15. The Role of the Listener's State in Speech Perception

    ERIC Educational Resources Information Center

    Viswanathan, Navin

    2009-01-01

    Accounts of speech perception disagree on whether listeners perceive the acoustic signal (Diehl, Lotto, & Holt, 2004) or the vocal tract gestures that produce the signal (e.g., Fowler, 1986). In this dissertation, I outline a research program using a phenomenon called "perceptual compensation for coarticulation" (Mann, 1980) to examine this…

  16. Speech intelligibility and recall of first and second language words heard at different signal-to-noise ratios.

    PubMed

    Hygge, Staffan; Kjellberg, Anders; Nöstl, Anatole

    2015-01-01

    Free recall of spoken words in Swedish (native tongue) and English were assessed in two signal-to-noise ratio (SNR) conditions (+3 and +12 dB), with and without half of the heard words being repeated back orally directly after presentation [shadowing, speech intelligibility (SI)]. A total of 24 word lists with 12 words each were presented in English and in Swedish to Swedish speaking college students. Pre-experimental measures of working memory capacity (operation span, OSPAN) were taken. A basic hypothesis was that the recall of the words would be impaired when the encoding of the words required more processing resources, thereby depleting working memory resources. This would be the case when the SNR was low or when the language was English. A low SNR was also expected to impair SI, but we wanted to compare the sizes of the SNR-effects on SI and recall. A low score on working memory capacity was expected to further add to the negative effects of SNR and language on both SI and recall. The results indicated that SNR had strong effects on both SI and recall, but also that the effect size was larger for recall than for SI. Language had a main effect on recall, but not on SI. The shadowing procedure had different effects on recall of the early and late parts of the word lists. Working memory capacity was unimportant for the effect on SI and recall. Thus, recall appear to be a more sensitive indicator than SI for the acoustics of learning, which has implications for building codes and recommendations concerning classrooms and other workplaces, where both hearing and learning is important. PMID:26441765

  17. The Modulation Transfer Function for Speech Intelligibility

    PubMed Central

    Elliott, Taffeta M.; Theunissen, Frédéric E.

    2009-01-01

    We systematically determined which spectrotemporal modulations in speech are necessary for comprehension by human listeners. Speech comprehension has been shown to be robust to spectral and temporal degradations, but the specific relevance of particular degradations is arguable due to the complexity of the joint spectral and temporal information in the speech signal. We applied a novel modulation filtering technique to recorded sentences to restrict acoustic information quantitatively and to obtain a joint spectrotemporal modulation transfer function for speech comprehension, the speech MTF. For American English, the speech MTF showed the criticality of low modulation frequencies in both time and frequency. Comprehension was significantly impaired when temporal modulations <12 Hz or spectral modulations <4 cycles/kHz were removed. More specifically, the MTF was bandpass in temporal modulations and low-pass in spectral modulations: temporal modulations from 1 to 7 Hz and spectral modulations <1 cycles/kHz were the most important. We evaluated the importance of spectrotemporal modulations for vocal gender identification and found a different region of interest: removing spectral modulations between 3 and 7 cycles/kHz significantly increases gender misidentifications of female speakers. The determination of the speech MTF furnishes an additional method for producing speech signals with reduced bandwidth but high intelligibility. Such compression could be used for audio applications such as file compression or noise removal and for clinical applications such as signal processing for cochlear implants. PMID:19266016

  18. Development of an Acoustic Signal Analysis Tool “Auto-F” Based on the Temperament Scale

    NASA Astrophysics Data System (ADS)

    Modegi, Toshio

    The MIDI interface is originally designed for electronic musical instruments but we consider this music-note based coding concept can be extended for general acoustic signal description. We proposed applying the MIDI technology to coding of bio-medical auscultation sound signals such as heart sounds for retrieving medical records and performing telemedicine. Then we have tried to extend our encoding targets including vocal sounds, natural sounds and electronic bio-signals such as ECG, using Generalized Harmonic Analysis method. Currently, we are trying to separate vocal sounds included in popular songs and encode both vocal sounds and background instrumental sounds into separate MIDI channels. And also, we are trying to extract articulation parameters such as MIDI pitch-bend parameters in order to reproduce natural acoustic sounds using a GM-standard MIDI tone generator. In this paper, we present an overall algorithm of our developed acoustic signal analysis tool, based on those research works, which can analyze given time-based signals on the musical temperament scale. The prominent feature of this tool is producing high-precision MIDI codes, which reproduce the similar signals as the given source signal using a GM-standard MIDI tone generator, and also providing analyzed texts in the XML format.

  19. Mode tomography using signals from the Long Range Ocean Acoustic Propagation EXperiment (LOAPEX)

    NASA Astrophysics Data System (ADS)

    Chandrayadula, Tarun K.

    Ocean acoustic tomography uses acoustic signals to infer the environmental properties of the ocean. The procedure for tomography consists of low frequency acoustic transmissions at mid-water depths to receivers located at hundreds of kilometer ranges. The arrival times of the signal at the receiver are then inverted for the sound speed of the background environment. Using this principle, experiments such as the 2004 Long Range Ocean Acoustic Propagation EXperiment have used acoustic signals recorded across Vertical Line Arrays (VLAs) to infer the Sound Speed Profile (SSP) across depth. The acoustic signals across the VLAs can be represented in terms of orthonormal basis functions called modes. The lower modes of the basis set concentrated around mid-water propagate longer distances and can be inverted for mesoscale effects such as currents and eddies. In spite of these advantages, mode tomography has received less attention. One of the important reasons for this is that internal waves in the ocean cause significant amplitude and travel time fluctuations in the modes. The amplitude and travel time fluctuations cause errors in travel time estimates. The absence of a statistical model and the lack of signal processing techniques for internal wave effects have precluded the modes from being used in tomographic inversions. This thesis estimates a statistical model for modes affected by internal waves and then uses the estimated model to design appropriate signal processing methods to obtain tomographic observables for the low modes. In order to estimate a statistical model, this thesis uses both the LOAPEX signals and also numerical simulations. The statistical model describes the amplitude and phase coherence across different frequencies for modes at different ranges. The model suggests that Matched Subspace Detectors (MSDs) based on the amplitude statistics of the modes are the optimum detectors to make travel time estimates for modes up to 250 km. The mean of the

  20. A user's guide for the signal processing software for image and speech compression developed in the Communications and Signal Processing Laboratory (CSPL), version 1

    NASA Technical Reports Server (NTRS)

    Kumar, P.; Lin, F. Y.; Vaishampayan, V.; Farvardin, N.

    1986-01-01

    A complete documentation of the software developed in the Communication and Signal Processing Laboratory (CSPL) during the period of July 1985 to March 1986 is provided. Utility programs and subroutines that were developed for a user-friendly image and speech processing environment are described. Additional programs for data compression of image and speech type signals are included. Also, programs for the zero-memory and block transform quantization in the presence of channel noise are described. Finally, several routines for simulating the perfromance of image compression algorithms are included.

  1. Acoustic Signal Processing for Pipe Condition Assessment (WaterRF Report 4360)

    EPA Science Inventory

    Unique to prestressed concrete cylinder pipe (PCCP), individual wire breaks create an excitation in the pipe wall that may vary in response to the remaining compression of the pipe core. This project was designed to improve acoustic signal processing for pipe condition assessment...

  2. Zipf's Law in Short-Time Timbral Codings of Speech, Music, and Environmental Sound Signals

    PubMed Central

    Haro, Martín; Serrà, Joan; Herrera, Perfecto; Corral, Álvaro

    2012-01-01

    Timbre is a key perceptual feature that allows discrimination between different sounds. Timbral sensations are highly dependent on the temporal evolution of the power spectrum of an audio signal. In order to quantitatively characterize such sensations, the shape of the power spectrum has to be encoded in a way that preserves certain physical and perceptual properties. Therefore, it is common practice to encode short-time power spectra using psychoacoustical frequency scales. In this paper, we study and characterize the statistical properties of such encodings, here called timbral code-words. In particular, we report on rank-frequency distributions of timbral code-words extracted from 740 hours of audio coming from disparate sources such as speech, music, and environmental sounds. Analogously to text corpora, we find a heavy-tailed Zipfian distribution with exponent close to one. Importantly, this distribution is found independently of different encoding decisions and regardless of the audio source. Further analysis on the intrinsic characteristics of most and least frequent code-words reveals that the most frequent code-words tend to have a more homogeneous structure. We also find that speech and music databases have specific, distinctive code-words while, in the case of the environmental sounds, this database-specific code-words are not present. Finally, we find that a Yule-Simon process with memory provides a reasonable quantitative approximation for our data, suggesting the existence of a common simple generative mechanism for all considered sound sources. PMID:22479497

  3. ICRA noises: artificial noise signals with speech-like spectral and temporal properties for hearing instrument assessment. International Collegium for Rehabilitative Audiology.

    PubMed

    Dreschler, W A; Verschuure, H; Ludvigsen, C; Westermann, S

    2001-01-01

    Current standards involving technical specification of hearing aids provide limited possibilities for assessing the influence of the spectral and temporal characteristics of the input signal, and these characteristics have a significant effect on the output signal of many recent types of hearing aids. This is particularly true of digital hearing instruments, which typically include non-linear amplification in multiple channels. Furthermore, these instruments often incorporate additional non-linear functions such as "noise reduction" and "feedback cancellation". The output signal produced by a non-linear hearing instrument relates to the characteristics of the input signal in a complex manner. Therefore, the choice of input signal significantly influences the outcome of any acoustic or psychophysical assessment of a non-linear hearing instrument. For this reason, the International Collegium for Rehabilitative Audiology (ICRA) has introduced a collection of noise signals that can be used for hearing aid testing (including real-ear measurements) and psychophysical evaluation. This paper describes the design criteria, the realisation process, and the final selection of nine test signals on a CD. Also, the spectral and temporal characteristics of these signals are documented. The ICRA noises provide a well-specified set of speech-like noises with spectra shaped according to gender and vocal effort, and with different amounts of speech modulation simulating one or more speakers. These noises can be applied as well-specified background noise in psychophysical experiments. They can also serve as test signals for the evaluation of digital hearing aids with noise reduction. It is demonstrated that the ICRA noises show the effectiveness of the noise reduction schemes. Based on these initial measurements, some initial steps are proposed to develop a standard method of technical specification of noise reduction based on the modulation characteristics. For this purpose, the

  4. Antifade sonar employs acoustic field diversity to recover signals from multipath fading

    SciTech Connect

    Lubman, D.

    1996-04-01

    Co-located pressure and particle motion (PM) hydrophones together with four-channel diversity combiners may be used to recover signals from multipath fading. Multipath fading is important in both shallow and deep water propagation and can be an important source of signal loss. The acoustic field diversity concept arises from the notion of conservation of signal energy and the observation that in rooms at least, the total acoustic energy density is the sum of potential energy (scalar field-sound pressure) and kinetic energy (vector field-sound PM) portions. One pressure hydrophone determines acoustic potential energy density at a point. In principle, three PM sensors (displacement, velocity, or acceleration) directed along orthogonal axes describe the kinetic energy density at a point. For a single plane wave, the time-averaged potential and kinetic field energies are identical everywhere. In multipath interference, however, potential and kinetic field energies at a point are partitioned unequally, depending mainly on relative signal phases. Thus, when pressure signals are in deep fade, abundant kinetic field signal energy may be available at that location. Performance benefits require a degree of uncorrelated fading between channels. The expectation of nearly uncorrelated fading is motivated from room theory. Performance benefits for sonar limited by independent Rayleigh fading are suggested by analogy to antifade radio. Average SNR can be improved by several decibels, holding time on target is multiplied manifold, and the bit error rate for data communication is reduced substantially. {copyright} {ital 1996 American Institute of Physics.}

  5. Cortical asymmetries in speech perception: what’s wrong, what’s right, and what’s left?

    PubMed Central

    McGettigan, Carolyn; Scott, Sophie K.

    2014-01-01

    Over the last 30 years hemispheric asymmetries in speech perception have been construed within a domain general framework, where preferential processing of speech is due to left lateralized, non-linguistic acoustic sensitivities. A prominent version of this argument holds that the left temporal lobe selectively processes rapid/temporal information in sound. Acoustically, this is a poor characterization of speech and there has been little empirical support for a left-hemisphere selectivity for these cues. In sharp contrast, the right temporal lobe is demonstrably sensitive to specific acoustic properties. We suggest that acoustic accounts of speech sensitivities need to be informed by the nature of the speech signal, and that a simple domain general/specific dichotomy may be incorrect. PMID:22521208

  6. Information-bearing acoustic change outperforms duration in predicting intelligibility of full-spectrum and noise-vocoded sentences.

    PubMed

    Stilp, Christian E

    2014-03-01

    Recent research has demonstrated a strong relationship between information-bearing acoustic changes in the speech signal and speech intelligibility. The availability of information-bearing acoustic changes reliably predicts intelligibility of full-spectrum [Stilp and Kluender (2010). Proc. Natl. Acad. Sci. U.S.A. 107(27), 12387-12392] and noise-vocoded sentences amid noise interruption [Stilp et al. (2013). J. Acoust. Soc. Am. 133(2), EL136-EL141]. However, other research reports that proportion of signal duration preserved also predicts intelligibility of noise-interrupted speech. These factors have only ever been investigated independently, obscuring whether one better explains speech perception. The present experiments manipulated both factors to answer this question. A broad range of sentence durations (160-480 ms) containing high or low information-bearing acoustic changes were replaced by speech-shaped noise in noise-vocoded (Experiment 1) and full-spectrum sentences (Experiment 2). Sentence intelligibility worsened with increasing noise replacement, but in both experiments, information-bearing acoustic change was a statistically superior predictor of performance. Perception relied more heavily on information-bearing acoustic changes in poorer listening conditions (in spectrally degraded sentences and amid increasing noise replacement). Highly linear relationships between measures of information and performance suggest that exploiting information-bearing acoustic change is a shared principle underlying perception of acoustically rich and degraded speech. Results demonstrate the explanatory power of information-theoretic approaches for speech perception.

  7. Prediction and constraint in audiovisual speech perception

    PubMed Central

    Peelle, Jonathan E.; Sommers, Mitchell S.

    2015-01-01

    During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing precision of prediction. Electrophysiological studies demonstrate oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to auditory information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration, supported

  8. Hidden Markov models in automatic speech recognition

    NASA Astrophysics Data System (ADS)

    Wrzoskowicz, Adam

    1993-11-01

    This article describes a method for constructing an automatic speech recognition system based on hidden Markov models (HMMs). The author discusses the basic concepts of HMM theory and the application of these models to the analysis and recognition of speech signals. The author provides algorithms which make it possible to train the ASR system and recognize signals on the basis of distinct stochastic models of selected speech sound classes. The author describes the specific components of the system and the procedures used to model and recognize speech. The author discusses problems associated with the choice of optimal signal detection and parameterization characteristics and their effect on the performance of the system. The author presents different options for the choice of speech signal segments and their consequences for the ASR process. The author gives special attention to the use of lexical, syntactic, and semantic information for the purpose of improving the quality and efficiency of the system. The author also describes an ASR system developed by the Speech Acoustics Laboratory of the IBPT PAS. The author discusses the results of experiments on the effect of noise on the performance of the ASR system and describes methods of constructing HMM's designed to operate in a noisy environment. The author also describes a language for human-robot communications which was defined as a complex multilevel network from an HMM model of speech sounds geared towards Polish inflections. The author also added mandatory lexical and syntactic rules to the system for its communications vocabulary.

  9. Acoustic and aerodynamic characteristics of Country-Western, Operatic and Broadway singing styles compared to speech

    NASA Astrophysics Data System (ADS)

    Stone, Robert E.; Cleveland, Thomas F.; Sundberg, P. Johan

    2003-04-01

    Acoustic and aerodynamic measures were used to objectively describe characteristics of Country-Western (C-W) singing in a group of six premier performers in a series of studies and of operatic and Broadway singing in a female subject with professional experience in both styles of singing. For comparison purposes the same measures also were applied to individuals while speaking the same material as sung. Changes in pitch and vocal loudness were investigated for various dependent variables, including subglottal pressure, closed quotient, glottal leakage, H1-H2 difference [the level difference between the two lowest partials of the source spectrum and glottal compliance (the ratio between the air volume displaced in a glottal pulse and the subglottal pressure)], formant frequencies, long-term-average spectrum and vibrato characteristics (in operatic versus Broadway singing). Data from C-W singers suggest they use higher sub-glottal pressures in singing than in speaking. Changes in vocal intensity for doubling sub-glottal pressure is less than reported for classical singers. Several measures were similar for both speaking and C-W singing. Whereas results provide objective specification of differences between operatic and Broadway styles of singing, the latter seems similar to features of conversational speaking style.

  10. Data quality enhancement and knowledge discovery from relevant signals in acoustic emission

    NASA Astrophysics Data System (ADS)

    Mejia, Felipe; Shyu, Mei-Ling; Nanni, Antonio

    2015-10-01

    The increasing popularity of structural health monitoring has brought with it a growing need for automated data management and data analysis tools. Of great importance are filters that can systematically detect unwanted signals in acoustic emission datasets. This study presents a semi-supervised data mining scheme that detects data belonging to unfamiliar distributions. This type of outlier detection scheme is useful detecting the presence of new acoustic emission sources, given a training dataset of unwanted signals. In addition to classifying new observations (herein referred to as "outliers") within a dataset, the scheme generates a decision tree that classifies sub-clusters within the outlier context set. The obtained tree can be interpreted as a series of characterization rules for newly-observed data, and they can potentially describe the basic structure of different modes within the outlier distribution. The data mining scheme is first validated on a synthetic dataset, and an attempt is made to confirm the algorithms' ability to discriminate outlier acoustic emission sources from a controlled pencil-lead-break experiment. Finally, the scheme is applied to data from two fatigue crack-growth steel specimens, where it is shown that extracted rules can adequately describe crack-growth related acoustic emission sources while filtering out background "noise." Results show promising performance in filter generation, thereby allowing analysts to extract, characterize, and focus only on meaningful signals.

  11. Acoustic tweezers for studying intracellular calcium signaling in SKBR-3 human breast cancer cells.

    PubMed

    Hwang, Jae Youn; Yoon, Chi Woo; Lim, Hae Gyun; Park, Jin Man; Yoon, Sangpil; Lee, Jungwoo; Shung, K Kirk

    2015-12-01

    Extracellular matrix proteins such as fibronectin (FNT) play crucial roles in cell proliferation, adhesion, and migration. For better understanding of these associated cellular activities, various microscopic manipulation tools have been used to study their intracellular signaling pathways. Recently, it has appeared that acoustic tweezers may possess similar capabilities in the study. Therefore, we here demonstrate that our newly developed acoustic tweezers with a high-frequency lithium niobate ultrasonic transducer have potentials to study intracellular calcium signaling by FNT-binding to human breast cancer cells (SKBR-3). It is found that intracellular calcium elevations in SKBR-3 cells, initially occurring on the microbead-contacted spot and then eventually spreading over the entire cell, are elicited by attaching an acoustically trapped FNT-coated microbead. Interestingly, they are suppressed by either extracellular calcium elimination or phospholipase C (PLC) inhibition. Hence, this suggests that our acoustic tweezers may serve as an alternative tool in the study of intracellular signaling by FNT-binding activities.

  12. Some Interactions of Speech Rate, Signal Distortion, and Certain Linguistic Factors in Listening Comprehension. Professional Paper No. 39-68.

    ERIC Educational Resources Information Center

    Sticht, Thomas G.

    This experiment was designed to determine the relative effects of speech rate and signal distortion due to the time-compression process on listening comprehension. In addition, linguistic factors--including sequencing of random words into story form, and inflection and phraseology--were qualitatively considered for their effects on listening…

  13. Multi-scale morphology analysis of acoustic emission signal and quantitative diagnosis for bearing fault

    NASA Astrophysics Data System (ADS)

    Wang, Wen-Jing; Cui, Ling-Li; Chen, Dao-Yun

    2016-04-01

    Monitoring of potential bearing faults in operation is of critical importance to safe operation of high speed trains. One of the major challenges is how to differentiate relevant signals to operational conditions of bearings from noises emitted from the surrounding environment. In this work, we report a procedure for analyzing acoustic emission signals collected from rolling bearings for diagnosis of bearing health conditions by examining their morphological pattern spectrum (MPS) through a multi-scale morphology analysis procedure. The results show that acoustic emission signals resulted from a given type of bearing faults share rather similar MPS curves. Further examinations in terms of sample entropy and Lempel-Ziv complexity of MPS curves suggest that these two parameters can be utilized to determine damage modes.

  14. Temperature and Pressure Dependence of Signal Amplitudes for Electrostriction Laser-Induced Thermal Acoustics

    NASA Technical Reports Server (NTRS)

    Herring, Gregory C.

    2015-01-01

    The relative signal strength of electrostriction-only (no thermal grating) laser-induced thermal acoustics (LITA) in gas-phase air is reported as a function of temperature T and pressure P. Measurements were made in the free stream of a variable Mach number supersonic wind tunnel, where T and P are varied simultaneously as Mach number is varied. Using optical heterodyning, the measured signal amplitude (related to the optical reflectivity of the acoustic grating) was averaged for each of 11 flow conditions and compared to the expected theoretical dependence of a pure-electrostriction LITA process, where the signal is proportional to the square root of [P*P /( T*T*T)].

  15. Robust Speech Rate Estimation for Spontaneous Speech

    PubMed Central

    Wang, Dagen; Narayanan, Shrikanth S.

    2010-01-01

    In this paper, we propose a direct method for speech rate estimation from acoustic features without requiring any automatic speech transcription. We compare various spectral and temporal signal analysis and smoothing strategies to better characterize the underlying syllable structure to derive speech rate. The proposed algorithm extends the methods of spectral subband correlation by including temporal correlation and the use of prominent spectral subbands for improving the signal correlation essential for syllable detection. Furthermore, to address some of the practical robustness issues in previously proposed methods, we introduce some novel components into the algorithm such as the use of pitch confidence for filtering spurious syllable envelope peaks, magnifying window for tackling neighboring syllable smearing, and relative peak measure thresholds for pseudo peak rejection. We also describe an automated approach for learning algorithm parameters from data, and find the optimal settings through Monte Carlo simulations and parameter sensitivity analysis. Final experimental evaluations are conducted based on a portion of the Switchboard corpus for which manual phonetic segmentation information, and published results for direct comparison are available. The results show a correlation coefficient of 0.745 with respect to the ground truth based on manual segmentation. This result is about a 17% improvement compared to the current best single estimator and a 11% improvement over the multiestimator evaluated on the same Switchboard database. PMID:20428476

  16. [Research on Time-frequency Characteristics of Magneto-acoustic Signal of Different Thickness Medium Based on Wave Summing Method].

    PubMed

    Zhang, Shunqi; Yin, Tao; Ma, Ren; Liu, Zhipeng

    2015-08-01

    Functional imaging method of biological electrical characteristics based on magneto-acoustic effect gives valuable information of tissue in early tumor diagnosis, therein time and frequency characteristics analysis of magneto-acoustic signal is important in image reconstruction. This paper proposes wave summing method based on Green function solution for acoustic source of magneto-acoustic effect. Simulations and analysis under quasi 1D transmission condition are carried out to time and frequency characteristics of magneto-acoustic signal of models with different thickness. Simulation results of magneto-acoustic signal were verified through experiments. Results of the simulation with different thickness showed that time-frequency characteristics of magneto-acoustic signal reflected thickness of sample. Thin sample, which is less than one wavelength of pulse, and thick sample, which is larger than one wavelength, showed different summed waveform and frequency characteristics, due to difference of summing thickness. Experimental results verified theoretical analysis and simulation results. This research has laid a foundation for acoustic source and conductivity reconstruction to the medium with different thickness in magneto-acoustic imaging.

  17. Use of amplitude modulation cues recovered from frequency modulation for cochlear implant users when original speech cues are severely degraded.

    PubMed

    Won, Jong Ho; Shim, Hyun Joon; Lorenzi, Christian; Rubinstein, Jay T

    2014-06-01

    Won et al. (J Acoust Soc Am 132:1113-1119, 2012) reported that cochlear implant (CI) speech processors generate amplitude-modulation (AM) cues recovered from broadband speech frequency modulation (FM) and that CI users can use these cues for speech identification in quiet. The present study was designed to extend this finding for a wide range of listening conditions, where the original speech cues were severely degraded by manipulating either the acoustic signals or the speech processor. The manipulation of the acoustic signals included the presentation of background noise, simulation of reverberation, and amplitude compression. The manipulation of the speech processor included changing the input dynamic range and the number of channels. For each of these conditions, multiple levels of speech degradation were tested. Speech identification was measured for CI users and compared for stimuli having both AM and FM information (intact condition) or FM information only (FM condition). Each manipulation degraded speech identification performance for both intact and FM conditions. Performance for the intact and FM conditions became similar for stimuli having the most severe degradations. Identification performance generally overlapped for the intact and FM conditions. Moreover, identification performance for the FM condition was better than chance performance even at the maximum level of distortion. Finally, significant correlations were found between speech identification scores for the intact and FM conditions. Altogether, these results suggest that despite poor frequency selectivity, CI users can make efficient use of AM cues recovered from speech FM in difficult listening situations.

  18. Ductile Deformation of Dehydrating Serpentinite Evidenced by Acoustic Signal Monitoring

    NASA Astrophysics Data System (ADS)

    Gasc, J.; Hilairet, N.; Wang, Y.; Schubnel, A. J.

    2012-12-01

    Serpentinite dehydration is believed to be responsible for triggering earthquakes at intermediate depths (i.e., 60-300 km) in subduction zones. Based on experimental results, some authors have proposed mechanisms that explain how brittle deformation can occur despite high pressure and temperature conditions [1]. However, reproducing microseismicity in the laboratory associated with the deformation of dehydrating serpentinite remains challenging. A recent study showed that, even for fast dehydration kinetics, ductile deformation could take place rather than brittle faulting in the sample [2]. This latter study was conducted in a multi-anvil apparatus without the ability to control differential stress during dehydration. We have since conducted controlled deformation experiments in the deformation-DIA (D-DIA) on natural serpentinite samples at sector 13 (GSECARS) of the APS. Monochromatic radiation was used with both a 2D MAR-CCD detector and a CCD camera to determine the stress and the strain of the sample during the deformation process [3]. In addition, an Acoustic Emission (AE) recording setup was used to monitor the microseismicity from the sample, using piezo-ceramic transducers glued on the basal truncation of the anvils. The use of six independent transducers allows locating the AEs and calculating the corresponding focal mechanisms. The samples were deformed at strain rates of 10-5-10-4 s-1 under confining pressures of 3-5 GPa. Dehydration was triggered during the deformation by heating the samples at rates ranging from 5 to 60 K/min. Before the onset of the dehydration, X-ray diffraction data showed that the serpentinite sustained ~1 GPa of stress which plummeted when dehydration occurred. Although AEs were recorded during the compression and decompression stages, no AEs ever accompanied this stress drop, suggesting ductile deformation of the samples. Hence, unlike many previous studies, no evidence for fluid embrittlement and anticrack generation was found

  19. Neural Oscillations Carry Speech Rhythm through to Comprehension

    PubMed Central

    Peelle, Jonathan E.; Davis, Matthew H.

    2012-01-01

    A key feature of speech is the quasi-regular rhythmic information contained in its slow amplitude modulations. In this article we review the information conveyed by speech rhythm, and the role of ongoing brain oscillations in listeners’ processing of this content. Our starting point is the fact that speech is inherently temporal, and that rhythmic information conveyed by the amplitude envelope contains important markers for place and manner of articulation, segmental information, and speech rate. Behavioral studies demonstrate that amplitude envelope information is relied upon by listeners and plays a key role in speech intelligibility. Extending behavioral findings, data from neuroimaging – particularly electroencephalography (EEG) and magnetoencephalography (MEG) – point to phase locking by ongoing cortical oscillations to low-frequency information (~4–8 Hz) in the speech envelope. This phase modulation effectively encodes a prediction of when important events (such as stressed syllables) are likely to occur, and acts to increase sensitivity to these relevant acoustic cues. We suggest a framework through which such neural entrainment to speech rhythm can explain effects of speech rate on word and segment perception (i.e., that the perception of phonemes and words in connected speech is influenced by preceding speech rate). Neuroanatomically, acoustic amplitude modulations are processed largely bilaterally in auditory cortex, with intelligible speech resulting in differential recruitment of left-hemisphere regions. Notable among these is lateral anterior temporal cortex, which we propose functions in a domain-general fashion to support ongoing memory and integration of meaningful input. Together, the reviewed evidence suggests that low-frequency oscillations in the acoustic speech signal form the foundation of a rhythmic hierarchy supporting spoken language, mirrored by phase-locked oscillations in the human brain. PMID:22973251

  20. Acoustic cardiac signals analysis: a Kalman filter-based approach.

    PubMed

    Salleh, Sheik Hussain; Hussain, Hadrina Sheik; Swee, Tan Tian; Ting, Chee-Ming; Noor, Alias Mohd; Pipatsart, Surasak; Ali, Jalil; Yupapin, Preecha P

    2012-01-01

    Auscultation of the heart is accompanied by both electrical activity and sound. Heart auscultation provides clues to diagnose many cardiac abnormalities. Unfortunately, detection of relevant symptoms and diagnosis based on heart sound through a stethoscope is difficult. The reason GPs find this difficult is that the heart sounds are of short duration and separated from one another by less than 30 ms. In addition, the cost of false positives constitutes wasted time and emotional anxiety for both patient and GP. Many heart diseases cause changes in heart sound, waveform, and additional murmurs before other signs and symptoms appear. Heart-sound auscultation is the primary test conducted by GPs. These sounds are generated primarily by turbulent flow of blood in the heart. Analysis of heart sounds requires a quiet environment with minimum ambient noise. In order to address such issues, the technique of denoising and estimating the biomedical heart signal is proposed in this investigation. Normally, the performance of the filter naturally depends on prior information related to the statistical properties of the signal and the background noise. This paper proposes Kalman filtering for denoising statistical heart sound. The cycles of heart sounds are certain to follow first-order Gauss-Markov process. These cycles are observed with additional noise for the given measurement. The model is formulated into state-space form to enable use of a Kalman filter to estimate the clean cycles of heart sounds. The estimates obtained by Kalman filtering are optimal in mean squared sense.

  1. Speech research: Studies on the nature of speech, instrumentation for its investigation, and practical applications

    NASA Astrophysics Data System (ADS)

    Liberman, A. M.

    1982-03-01

    This report is one of a regular series on the status and progress of studies on the nature of speech, instrumentation for its investigation and practical applications. Manuscripts cover the following topics: Speech perception and memory coding in relation to reading ability; The use of orthographic structure by deaf adults: Recognition of finger-spelled letters; Exploring the information support for speech; The stream of speech; Using the acoustic signal to make inferences about place and duration of tongue-palate contact. Patterns of human interlimb coordination emerge from the the properties of nonlinear limit cycle oscillatory processes: Theory and data; Motor control: Which themes do we orchestrate? Exploring the nature of motor control in Down's syndrome; Periodicity and auditory memory: A pilot study; Reading skill and language skill: On the role of sign order and morphological structure in memory for American Sign Language sentences; Perception of nasal consonants with special reference to Catalan; and Speech production Characteristics of the hearing impaired.

  2. Biological invasions and the acoustic niche: the effect of bullfrog calls on the acoustic signals of white-banded tree frogs.

    PubMed

    Both, Camila; Grant, Taran

    2012-10-23

    Invasive species are known to affect native species in a variety of ways, but the effect of acoustic invaders has not been examined previously. We simulated an invasion of the acoustic niche by exposing calling native male white-banded tree frogs (Hypsiboas albomarginatus) to recorded invasive American bullfrog (Lithobates catesbeianus) calls. In response, tree frogs immediately shifted calls to significantly higher frequencies. In the post-stimulus period, they continued to use higher frequencies while also decreasing signal duration. Acoustic signals are the primary basis of mate selection in many anurans, suggesting that such changes could negatively affect the reproductive success of native species. The effects of bullfrog vocalizations on acoustic communities are expected to be especially severe due to their broad frequency band, which masks the calls of multiple species simultaneously. PMID:22675139

  3. Biological invasions and the acoustic niche: the effect of bullfrog calls on the acoustic signals of white-banded tree frogs

    PubMed Central

    Both, Camila; Grant, Taran

    2012-01-01

    Invasive species are known to affect native species in a variety of ways, but the effect of acoustic invaders has not been examined previously. We simulated an invasion of the acoustic niche by exposing calling native male white-banded tree frogs (Hypsiboas albomarginatus) to recorded invasive American bullfrog (Lithobates catesbeianus) calls. In response, tree frogs immediately shifted calls to significantly higher frequencies. In the post-stimulus period, they continued to use higher frequencies while also decreasing signal duration. Acoustic signals are the primary basis of mate selection in many anurans, suggesting that such changes could negatively affect the reproductive success of native species. The effects of bullfrog vocalizations on acoustic communities are expected to be especially severe due to their broad frequency band, which masks the calls of multiple species simultaneously. PMID:22675139

  4. Biological invasions and the acoustic niche: the effect of bullfrog calls on the acoustic signals of white-banded tree frogs.

    PubMed

    Both, Camila; Grant, Taran

    2012-10-23

    Invasive species are known to affect native species in a variety of ways, but the effect of acoustic invaders has not been examined previously. We simulated an invasion of the acoustic niche by exposing calling native male white-banded tree frogs (Hypsiboas albomarginatus) to recorded invasive American bullfrog (Lithobates catesbeianus) calls. In response, tree frogs immediately shifted calls to significantly higher frequencies. In the post-stimulus period, they continued to use higher frequencies while also decreasing signal duration. Acoustic signals are the primary basis of mate selection in many anurans, suggesting that such changes could negatively affect the reproductive success of native species. The effects of bullfrog vocalizations on acoustic communities are expected to be especially severe due to their broad frequency band, which masks the calls of multiple species simultaneously.

  5. Extraction of fault component from abnormal sound in diesel engines using acoustic signals

    NASA Astrophysics Data System (ADS)

    Dayong, Ning; Changle, Sun; Yongjun, Gong; Zengmeng, Zhang; Jiaoyi, Hou

    2016-06-01

    In this paper a method for extracting fault components from abnormal acoustic signals and automatically diagnosing diesel engine faults is presented. The method named dislocation superimposed method (DSM) is based on the improved random decrement technique (IRDT), differential function (DF) and correlation analysis (CA). The aim of DSM is to linearly superpose multiple segments of abnormal acoustic signals because of the waveform similarity of faulty components. The method uses sample points at the beginning of time when abnormal sound appears as the starting position for each segment. In this study, the abnormal sound belonged to shocking faulty type; thus, the starting position searching method based on gradient variance was adopted. The coefficient of similar degree between two same sized signals is presented. By comparing with a similar degree, the extracted fault component could be judged automatically. The results show that this method is capable of accurately extracting the fault component from abnormal acoustic signals induced by faulty shocking type and the extracted component can be used to identify the fault type.

  6. Speech coding

    SciTech Connect

    Ravishankar, C., Hughes Network Systems, Germantown, MD

    1998-05-08

    Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the

  7. A Comprehensive Noise Robust Speech Parameterization Algorithm Using Wavelet Packet Decomposition-Based Denoising and Speech Feature Representation Techniques

    NASA Astrophysics Data System (ADS)

    Kotnik, Bojan; Kačič, Zdravko

    2007-12-01

    This paper concerns the problem of automatic speech recognition in noise-intense and adverse environments. The main goal of the proposed work is the definition, implementation, and evaluation of a novel noise robust speech signal parameterization algorithm. The proposed procedure is based on time-frequency speech signal representation using wavelet packet decomposition. A new modified soft thresholding algorithm based on time-frequency adaptive threshold determination was developed to efficiently reduce the level of additive noise in the input noisy speech signal. A two-stage Gaussian mixture model (GMM)-based classifier was developed to perform speech/nonspeech as well as voiced/unvoiced classification. The adaptive topology of the wavelet packet decomposition tree based on voiced/unvoiced detection was introduced to separately analyze voiced and unvoiced segments of the speech signal. The main feature vector consists of a combination of log-root compressed wavelet packet parameters, and autoregressive parameters. The final output feature vector is produced using a two-staged feature vector postprocessing procedure. In the experimental framework, the noisy speech databases Aurora 2 and Aurora 3 were applied together with corresponding standardized acoustical model training/testing procedures. The automatic speech recognition performance achieved using the proposed noise robust speech parameterization procedure was compared to the standardized mel-frequency cepstral coefficient (MFCC) feature extraction procedures ETSI ES 201 108 and ETSI ES 202 050.

  8. Classroom Acoustics: The Problem, Impact, and Solution.

    ERIC Educational Resources Information Center

    Berg, Frederick S.; And Others

    1996-01-01

    This article describes aspects of classroom acoustics that interfere with the ability of listeners to understand speech. It considers impacts on students and teachers and offers four possible solutions: noise control, signal control without amplification, individual amplification systems, and sound field amplification systems. (Author/DB)

  9. Formant-frequency variation and informational masking of speech by extraneous formants: evidence against dynamic and speech-specific acoustical constraints.

    PubMed

    Roberts, Brian; Summers, Robert J; Bailey, Peter J

    2014-08-01

    How speech is separated perceptually from other speech remains poorly understood. Recent research indicates that the ability of an extraneous formant to impair intelligibility depends on the variation of its frequency contour. This study explored the effects of manipulating the depth and pattern of that variation. Three formants (F1+F2+F3) constituting synthetic analogues of natural sentences were distributed across the 2 ears, together with a competitor for F2 (F2C) that listeners must reject to optimize recognition (left = F1+F2C; right = F2+F3). The frequency contours of F1 - F3 were each scaled to 50% of their natural depth, with little effect on intelligibility. Competitors were created either by inverting the frequency contour of F2 about its geometric mean (a plausibly speech-like pattern) or using a regular and arbitrary frequency contour (triangle wave, not plausibly speech-like) matched to the average rate and depth of variation for the inverted F2C. Adding a competitor typically reduced intelligibility; this reduction depended on the depth of F2C variation, being greatest for 100%-depth, intermediate for 50%-depth, and least for 0%-depth (constant) F2Cs. This suggests that competitor impact depends on overall depth of frequency variation, not depth relative to that for the target formants. The absence of tuning (i.e., no minimum in intelligibility for the 50% case) suggests that the ability to reject an extraneous formant does not depend on similarity in the depth of formant-frequency variation. Furthermore, triangle-wave competitors were as effective as their more speech-like counterparts, suggesting that the selection of formants from the ensemble also does not depend on speech-specific constraints. PMID:24842068

  10. Near- Source, Seismo-Acoustic Signals Accompanying a NASCAR Race at the Texas Motor Speedway

    NASA Astrophysics Data System (ADS)

    Stump, B. W.; Hayward, C.; Underwood, R.; Howard, J. E.; MacPhail, M. D.; Golden, P.; Endress, A.

    2014-12-01

    Near-source, seismo-acoustic observations provide a unique opportunity to characterize urban sources, remotely sense human activities including vehicular traffic and monitor large engineering structures. Energy separately coupled into the solid earth and atmosphere provides constraints on not only the location of these sources but also the physics of the generating process. Conditions and distances at which these observations can be made are dependent upon not only local geological conditions but also atmospheric conditions at the time of the observations. In order to address this range of topics, an empirical, seismo-acoustic study was undertaken in and around the Texas Motor Speedway in the Dallas-Ft. Worth area during the first week of April 2014 at which time a range of activities associated with a series of NASCAR races occurred. Nine, seismic sensors were deployed around the 1.5-mile track for purposes of documenting the direct-coupled seismic energy from the passage of the cars and other vehicles on the track. Six infrasound sensors were deployed on a rooftop in a rectangular array configuration designed to provide high frequency beam forming for acoustic signals. Finally, a five-element infrasound array was deployed outside the track in order to characterize how the signals propagate away from the sources in the near-source region. Signals recovered from within the track were able to track and characterize the motion of a variety of vehicles during the race weekend including individual racecars. Seismic data sampled at 1000 sps documented strong Doppler effects as the cars approached and moved away from individual sensors. There were faint seismic signals that arrived at seismic velocity but local acoustic to seismic coupling as supported by the acoustic observations generated the majority of seismic signals. Actual seismic ground motions were small as demonstrated by the dominance of regional seismic signals from a magnitude 4.0 earthquake that arrived at

  11. Ultrasonic speech translator and communications system

    DOEpatents

    Akerman, M.A.; Ayers, C.W.; Haynes, H.D.

    1996-07-23

    A wireless communication system undetectable by radio frequency methods for converting audio signals, including human voice, to electronic signals in the ultrasonic frequency range, transmitting the ultrasonic signal by way of acoustical pressure waves across a carrier medium, including gases, liquids, or solids, and reconverting the ultrasonic acoustical pressure waves back to the original audio signal. The ultrasonic speech translator and communication system includes an ultrasonic transmitting device and an ultrasonic receiving device. The ultrasonic transmitting device accepts as input an audio signal such as human voice input from a microphone or tape deck. The ultrasonic transmitting device frequency modulates an ultrasonic carrier signal with the audio signal producing a frequency modulated ultrasonic carrier signal, which is transmitted via acoustical pressure waves across a carrier medium such as gases, liquids or solids. The ultrasonic receiving device converts the frequency modulated ultrasonic acoustical pressure waves to a frequency modulated electronic signal, demodulates the audio signal from the ultrasonic carrier signal, and conditions the demodulated audio signal to reproduce the original audio signal at its output. 7 figs.

  12. Ultrasonic speech translator and communications system

    DOEpatents

    Akerman, M. Alfred; Ayers, Curtis W.; Haynes, Howard D.

    1996-01-01

    A wireless communication system undetectable by radio frequency methods for converting audio signals, including human voice, to electronic signals in the ultrasonic frequency range, transmitting the ultrasonic signal by way of acoustical pressure waves across a carrier medium, including gases, liquids, or solids, and reconverting the ultrasonic acoustical pressure waves back to the original audio signal. The ultrasonic speech translator and communication system (20) includes an ultrasonic transmitting device (100) and an ultrasonic receiving device (200). The ultrasonic transmitting device (100) accepts as input (115) an audio signal such as human voice input from a microphone (114) or tape deck. The ultrasonic transmitting device (100) frequency modulates an ultrasonic carrier signal with the audio signal producing a frequency modulated ultrasonic carrier signal, which is transmitted via acoustical pressure waves across a carrier medium such as gases, liquids or solids. The ultrasonic receiving device (200) converts the frequency modulated ultrasonic acoustical pressure waves to a frequency modulated electronic signal, demodulates the audio signal from the ultrasonic carrier signal, and conditions the demodulated audio signal to reproduce the original audio signal at its output (250).

  13. Particle swarm optimization based feature enhancement and feature selection for improved emotion recognition in speech and glottal signals.

    PubMed

    Muthusamy, Hariharan; Polat, Kemal; Yaacob, Sazali

    2015-01-01

    In the recent years, many research works have been published using speech related features for speech emotion recognition, however, recent studies show that there is a strong correlation between emotional states and glottal features. In this work, Mel-frequency cepstralcoefficients (MFCCs), linear predictive cepstral coefficients (LPCCs), perceptual linear predictive (PLP) features, gammatone filter outputs, timbral texture features, stationary wavelet transform based timbral texture features and relative wavelet packet energy and entropy features were extracted from the emotional speech (ES) signals and its glottal waveforms(GW). Particle swarm optimization based clustering (PSOC) and wrapper based particle swarm optimization (WPSO) were proposed to enhance the discerning ability of the features and to select the discriminating features respectively. Three different emotional speech databases were utilized to gauge the proposed method. Extreme learning machine (ELM) was employed to classify the different types of emotions. Different experiments were conducted and the results show that the proposed method significantly improves the speech emotion recognition performance compared to previous works published in the literature.

  14. Particle Swarm Optimization Based Feature Enhancement and Feature Selection for Improved Emotion Recognition in Speech and Glottal Signals

    PubMed Central

    Muthusamy, Hariharan; Polat, Kemal; Yaacob, Sazali

    2015-01-01

    In the recent years, many research works have been published using speech related features for speech emotion recognition, however, recent studies show that there is a strong correlation between emotional states and glottal features. In this work, Mel-frequency cepstralcoefficients (MFCCs), linear predictive cepstral coefficients (LPCCs), perceptual linear predictive (PLP) features, gammatone filter outputs, timbral texture features, stationary wavelet transform based timbral texture features and relative wavelet packet energy and entropy features were extracted from the emotional speech (ES) signals and its glottal waveforms(GW). Particle swarm optimization based clustering (PSOC) and wrapper based particle swarm optimization (WPSO) were proposed to enhance the discerning ability of the features and to select the discriminating features respectively. Three different emotional speech databases were utilized to gauge the proposed method. Extreme learning machine (ELM) was employed to classify the different types of emotions. Different experiments were conducted and the results show that the proposed method significantly improves the speech emotion recognition performance compared to previous works published in the literature. PMID:25799141

  15. Early recognition of speech

    PubMed Central

    Remez, Robert E; Thomas, Emily F

    2013-01-01

    Classic research on the perception of speech sought to identify minimal acoustic correlates of each consonant and vowel. In explaining perception, this view designated momentary components of an acoustic spectrum as cues to the recognition of elementary phonemes. This conceptualization of speech perception is untenable given the findings of phonetic sensitivity to modulation independent of the acoustic and auditory form of the carrier. The empirical key is provided by studies of the perceptual organization of speech, a low-level integrative function that finds and follows the sensory effects of speech amid concurrent events. These projects have shown that the perceptual organization of speech is keyed to modulation; fast; unlearned; nonsymbolic; indifferent to short-term auditory properties; and organization requires attention. The ineluctably multisensory nature of speech perception also imposes conditions that distinguish language among cognitive systems. WIREs Cogn Sci 2013, 4:213–223. doi: 10.1002/wcs.1213 PMID:23926454

  16. Time delay and Doppler estimation for wideband acoustic signals in multipath environments.

    PubMed

    Jiang, Xue; Zeng, Wen-Jun; Li, Xi-Lin

    2011-08-01

    Estimation of the parameters of a multipath underwater acoustic channel is of great interest for a variety of applications. This paper proposes a high-resolution method for jointly estimating the multipath time delays, Doppler scales, and attenuation amplitudes of a time-varying acoustical channel. The proposed method formulates the estimation of channel parameters into a sparse representation problem. With the [script-l](1)-norm as the measure of sparsity, the proposed method makes use of the basis pursuit (BP) criterion to find the sparse solution. The ill-conditioning can be effectively reduced by the [script-l](1)-norm regularization. Unlike many existing methods that are only applicable to narrowband signals, the proposed method can handle both narrowband and wideband signals. Simulation results are provided to verify the performance and effectiveness of the proposed algorithm, indicating that it has a super-resolution in both delay and Doppler domain, and it is robust to noise.

  17. Seismo-acoustic signals associated with degassing explosions recorded at Shishaldin Volcano, Alaska, 2003-2004

    USGS Publications Warehouse

    Petersen, T.

    2007-01-01

    In summer 2003, a Chaparral Model 2 microphone was deployed at Shishaldin Volcano, Aleutian Islands, Alaska. The pressure sensor was co-located with a short-period seismometer on the volcano’s north flank at a distance of 6.62 km from the active summit vent. The seismo-acoustic data exhibit a correlation between impulsive acoustic signals (1–2 Pa) and long-period (LP, 1–2 Hz) earthquakes. Since it last erupted in 1999, Shishaldin has been characterized by sustained seismicity consisting of many hundreds to two thousand LP events per day. The activity is accompanied by up to ∼200 m high discrete gas puffs exiting the small summit vent, but no significant eruptive activity has been confirmed. The acoustic waveforms possess similarity throughout the data set (July 2003–November 2004) indicating a repetitive source mechanism. The simplicity of the acoustic waveforms, the impulsive onsets with relatively short (∼10–20 s) gradually decaying codas and the waveform similarities suggest that the acoustic pulses are generated at the fluid–air interface within an open-vent system. SO2 measurements have revealed a low SO2 flux, suggesting a hydrothermal system with magmatic gases leaking through. This hypothesis is supported by the steady-state nature of Shishaldin’s volcanic system since 1999. Time delays between the seismic LP and infrasound onsets were acquired from a representative day of seismo-acoustic data. A simple model was used to estimate source depths. The short seismo-acoustic delay times have revealed that the seismic and acoustic sources are co-located at a depth of 240±200 m below the crater rim. This shallow depth is confirmed by resonance of the upper portion of the open conduit, which produces standing waves with f=0.3 Hz in the acoustic waveform codas. The infrasound data has allowed us to relate Shishaldin’s LP earthquakes to degassing explosions, created by gas volume ruptures from a fluid–air interface.

  18. Lateralization of acoustic signals by dichotically listening budgerigars (Melopsittacus undulatus).

    PubMed

    Welch, Thomas E; Dent, Micheal L

    2011-10-01

    Sound localization allows humans and animals to determine the direction of objects to seek or avoid and indicates the appropriate position to direct visual attention. Interaural time differences (ITDs) and interaural level differences (ILDs) are two primary cues that humans use to localize or lateralize sound sources. There is limited information about behavioral cue sensitivity in animals, especially animals with poor sound localization acuity and small heads, like budgerigars. ITD and ILD thresholds were measured behaviorally in dichotically listening budgerigars equipped with headphones in an identification task. Budgerigars were less sensitive than humans and cats, and more similar to rabbits, barn owls, and monkeys, in their abilities to lateralize dichotic signals. Threshold ITDs were relatively constant for pure tones below 4 kHz, and were immeasurable at higher frequencies. Threshold ILDs were relatively constant over a wide range of frequencies, similar to humans. Thresholds in both experiments were best for broadband noise stimuli. These lateralization results are generally consistent with the free field localization abilities of these birds, and add support to the idea that budgerigars may be able to enhance their cues to directional hearing (e.g., via connected interaural pathways) beyond what would be expected based on head size. PMID:21973385

  19. Tracking the Speech Signal--Time-Locked MEG Signals during Perception of Ultra-Fast and Moderately Fast Speech in Blind and in Sighted Listeners

    ERIC Educational Resources Information Center

    Hertrich, Ingo; Dietrich, Susanne; Ackermann, Hermann

    2013-01-01

    Blind people can learn to understand speech at ultra-high syllable rates (ca. 20 syllables/s), a capability associated with hemodynamic activation of the central-visual system. To further elucidate the neural mechanisms underlying this skill, magnetoencephalographic (MEG) measurements during listening to sentence utterances were cross-correlated…

  20. Circuit for echo and noise suppression of acoustic signals transmitted through a drill string

    DOEpatents

    Drumheller, D.S.; Scott, D.D.

    1993-12-28

    An electronic circuit for digitally processing analog electrical signals produced by at least one acoustic transducer is presented. In a preferred embodiment of the present invention, a novel digital time delay circuit is utilized which employs an array of First-in-First-out (FiFo) microchips. Also, a bandpass filter is used at the input to this circuit for isolating drill string noise and eliminating high frequency output. 20 figures.

  1. Speech perception as an active cognitive process

    PubMed Central

    Heald, Shannon L. M.; Nusbaum, Howard C.

    2014-01-01

    One view of speech perception is that acoustic signals are transformed into representations for pattern matching to determine linguistic structure. This process can be taken as a statistical pattern-matching problem, assuming realtively stable linguistic categories are characterized by neural representations related to auditory properties of speech that can be compared to speech input. This kind of pattern matching can be termed a passive process which implies rigidity of processing with few demands on cognitive processing. An alternative view is that speech recognition, even in early stages, is an active process in which speech analysis is attentionally guided. Note that this does not mean consciously guided but that information-contingent changes in early auditory encoding can occur as a function of context and experience. Active processing assumes that attention, plasticity, and listening goals are important in considering how listeners cope with adverse circumstances that impair hearing by masking noise in the environment or hearing loss. Although theories of speech perception have begun to incorporate some active processing, they seldom treat early speech encoding as plastic and attentionally guided. Recent research has suggested that speech perception is the product of both feedforward and feedback interactions between a number of brain regions that include descending projections perhaps as far downstream as the cochlea. It is important to understand how the ambiguity of the speech signal and constraints of context dynamically determine cognitive resources recruited during perception including focused attention, learning, and working memory. Theories of speech perception need to go beyond the current corticocentric approach in order to account for the intrinsic dynamics of the auditory encoding of speech. In doing so, this may provide new insights into ways in which hearing disorders and loss may be treated either through augementation or therapy. PMID

  2. Punch stretching process monitoring using acoustic emission signal analysis. II - Application of frequency domain deconvolution

    NASA Technical Reports Server (NTRS)

    Liang, Steven Y.; Dornfeld, David A.; Nickerson, Jackson A.

    1987-01-01

    The coloring effect on the acoustic emission signal due to the frequency response of the data acquisition/processing instrumentation may bias the interpretation of AE signal characteristics. In this paper, a frequency domain deconvolution technique, which involves the identification of the instrumentation transfer functions and multiplication of the AE signal spectrum by the inverse of these system functions, has been carried out. In this way, the change in AE signal characteristics can be better interpreted as the result of the change in only the states of the process. Punch stretching process was used as an example to demonstrate the application of the technique. Results showed that, through the deconvolution, the frequency characteristics of AE signals generated during the stretching became more distinctive and can be more effectively used as tools for process monitoring.

  3. Problems Associated with Statistical Pattern Recognition of Acoustic Emission Signals in a Compact Tension Fatigue Specimen

    NASA Technical Reports Server (NTRS)

    Hinton, Yolanda L.

    1999-01-01

    Acoustic emission (AE) data were acquired during fatigue testing of an aluminum 2024-T4 compact tension specimen using a commercially available AE system. AE signals from crack extension were identified and separated from noise spikes, signals that reflected from the specimen edges, and signals that saturated the instrumentation. A commercially available software package was used to train a statistical pattern recognition system to classify the signals. The software trained a network to recognize signals with a 91-percent accuracy when compared with the researcher's interpretation of the data. Reasons for the discrepancies are examined and it is postulated that additional preprocessing of the AE data to focus on the extensional wave mode and eliminate other effects before training the pattern recognition system will result in increased accuracy.

  4. Non-invasive estimation of static and pulsatile intracranial pressure from transcranial acoustic signals.

    PubMed

    Levinsky, Alexandra; Papyan, Surik; Weinberg, Guy; Stadheim, Trond; Eide, Per Kristian

    2016-05-01

    The aim of the present study was to examine whether a method for estimation of non-invasive ICP (nICP) from transcranial acoustic (TCA) signals mixed with head-generated sounds estimate the static and pulsatile invasive ICP (iICP). For that purpose, simultaneous iICP and mixed TCA signals were obtained from patients undergoing continuous iICP monitoring as part of clinical management. The ear probe placed in the right outer ear channel sent a TCA signal with fixed frequency (621 Hz) that was picked up by the left ear probe along with acoustic signals generated by the intracranial compartment. Based on a mathematical model of the association between mixed TCA and iICP, the static and pulsatile nICP values were determined. Total 39 patients were included in the study; the total number of observations for prediction of static and pulsatile iICP were 5789 and 6791, respectively. The results demonstrated a good agreement between iICP/nICP observations, with mean difference of 0.39 mmHg and 0.53 mmHg for static and pulsatile ICP, respectively. In summary, in this cohort of patients, mixed TCA signals estimated the static and pulsatile iICP with rather good accuracy. Further studies are required to validate whether mixed TCA signals may become useful for measurement of nICP. PMID:26997563

  5. Acoustic effects of the ATOC signal (75 Hz, 195 dB) on dolphins and whales.

    PubMed

    Au, W W; Nachtigall, P E; Pawloski, J L

    1997-05-01

    The Acoustic Thermometry of Ocean Climate (ATOC) program of Scripps Institution of Oceanography and the Applied Physics Laboratory, University of Washington, will broadcast a low-frequency 75-Hz phase modulated acoustic signal over ocean basins in order to study ocean temperatures on a global scale and examine the effects of global warming. One of the major concerns is the possible effect of the ATOC signal on marine life, especially on dolphins and whales. In order to address this issue, the hearing sensitivity of a false killer whale (Pseudorca crassidens) and a Risso's dolphin (Grampus griseus) to the ATOC sound was measured behaviorally. A staircase procedure with the signal levels being changed in 1-dB steps was used to measure the animals' threshold to the actual ATOC coded signal. The results indicate that small odontocetes such as the Pseudorca and Grampus swimming directly above the ATOC source will not hear the signal unless they dive to a depth of approximately 400 m. A sound propagation analysis suggests that the sound-pressure level at ranges greater than 0.5 km will be less than 130 dB for depths down to about 500 m. Several species of baleen whales produce sounds much greater than 170-180 dB. With the ATOC source on the axis of the deep sound channel (greater than 800 m), the ATOC signal will probably have minimal physical and physiological effects on cetaceans.

  6. Acoustics

    NASA Technical Reports Server (NTRS)

    Goodman, Jerry R.; Grosveld, Ferdinand

    2007-01-01

    The acoustics environment in space operations is important to maintain at manageable levels so that the crewperson can remain safe, functional, effective, and reasonably comfortable. High acoustic levels can produce temporary or permanent hearing loss, or cause other physiological symptoms such as auditory pain, headaches, discomfort, strain in the vocal cords, or fatigue. Noise is defined as undesirable sound. Excessive noise may result in psychological effects such as irritability, inability to concentrate, decrease in productivity, annoyance, errors in judgment, and distraction. A noisy environment can also result in the inability to sleep, or sleep well. Elevated noise levels can affect the ability to communicate, understand what is being said, hear what is going on in the environment, degrade crew performance and operations, and create habitability concerns. Superfluous noise emissions can also create the inability to hear alarms or other important auditory cues such as an equipment malfunctioning. Recent space flight experience, evaluations of the requirements in crew habitable areas, and lessons learned (Goodman 2003; Allen and Goodman 2003; Pilkinton 2003; Grosveld et al. 2003) show the importance of maintaining an acceptable acoustics environment. This is best accomplished by having a high-quality set of limits/requirements early in the program, the "designing in" of acoustics in the development of hardware and systems, and by monitoring, testing and verifying the levels to ensure that they are acceptable.

  7. The Acoustic Structure and Information Content of Female Koala Vocal Signals

    PubMed Central

    Charlton, Benjamin D.

    2015-01-01

    Determining the information content of animal vocalisations can give valuable insights into the potential functions of vocal signals. The source-filter theory of vocal production allows researchers to examine the information content of mammal vocalisations by linking variation in acoustic features with variation in relevant physical characteristics of the caller. Here I used a source-filter theory approach to classify female koala vocalisations into different call-types, and determine which acoustic features have the potential to convey important information about the caller to other conspecifics. A two-step cluster analysis classified female calls into bellows, snarls and tonal rejection calls. Additional results revealed that female koala vocalisations differed in their potential to provide information about a given caller’s phenotype that may be of importance to receivers. Female snarls did not contain reliable acoustic cues to the caller’s identity and age. In contrast, female bellows and tonal rejection calls were individually distinctive, and the tonal rejection calls of older female koalas had consistently lower mean, minimum and maximum fundamental frequency. In addition, female bellows were significantly shorter in duration and had higher fundamental frequency, formant frequencies, and formant frequency spacing than male bellows. These results indicate that female koala vocalisations have the potential to signal the caller’s identity, age and sex. I go on to discuss the anatomical basis for these findings, and consider the possible functional relevance of signalling this type of information in the koala’s natural habitat. PMID:26465340

  8. The Acoustic Structure and Information Content of Female Koala Vocal Signals.

    PubMed

    Charlton, Benjamin D

    2015-01-01

    Determining the information content of animal vocalisations can give valuable insights into the potential functions of vocal signals. The source-filter theory of vocal production allows researchers to examine the information content of mammal vocalisations by linking variation in acoustic features with variation in relevant physical characteristics of the caller. Here I used a source-filter theory approach to classify female koala vocalisations into different call-types, and determine which acoustic features have the potential to convey important information about the caller to other conspecifics. A two-step cluster analysis classified female calls into bellows, snarls and tonal rejection calls. Additional results revealed that female koala vocalisations differed in their potential to provide information about a given caller's phenotype that may be of importance to receivers. Female snarls did not contain reliable acoustic cues to the caller's identity and age. In contrast, female bellows and tonal rejection calls were individually distinctive, and the tonal rejection calls of older female koalas had consistently lower mean, minimum and maximum fundamental frequency. In addition, female bellows were significantly shorter in duration and had higher fundamental frequency, formant frequencies, and formant frequency spacing than male bellows. These results indicate that female koala vocalisations have the potential to signal the caller's identity, age and sex. I go on to discuss the anatomical basis for these findings, and consider the possible functional relevance of signalling this type of information in the koala's natural habitat.

  9. Gearbox fault diagnosis based on deep random forest fusion of acoustic and vibratory signals

    NASA Astrophysics Data System (ADS)

    Li, Chuan; Sanchez, René-Vinicio; Zurita, Grover; Cerrada, Mariela; Cabrera, Diego; Vásquez, Rafael E.

    2016-08-01

    Fault diagnosis is an effective tool to guarantee safe operations in gearboxes. Acoustic and vibratory measurements in such mechanical devices are all sensitive to the existence of faults. This work addresses the use of a deep random forest fusion (DRFF) technique to improve fault diagnosis performance for gearboxes by using measurements of an acoustic emission (AE) sensor and an accelerometer that are used for monitoring the gearbox condition simultaneously. The statistical parameters of the wavelet packet transform (WPT) are first produced from the AE signal and the vibratory signal, respectively. Two deep Boltzmann machines (DBMs) are then developed for deep representations of the WPT statistical parameters. A random forest is finally suggested to fuse the outputs of the two DBMs as the integrated DRFF model. The proposed DRFF technique is evaluated using gearbox fault diagnosis experiments under different operational conditions, and achieves 97.68% of the classification rate for 11 different condition patterns. Compared to other peer algorithms, the addressed method exhibits the best performance. The results indicate that the deep learning fusion of acoustic and vibratory signals may improve fault diagnosis capabilities for gearboxes.

  10. Moisture estimation in power transformer oil using acoustic signals and spectral kurtosis

    NASA Astrophysics Data System (ADS)

    Leite, Valéria C. M. N.; Veloso, Giscard F. C.; Borges da Silva, Luiz Eduardo; Lambert-Torres, Germano; Borges da Silva, Jonas G.; Onofre Pereira Pinto, João

    2016-03-01

    The aim of this paper is to present a new technique for estimating the contamination by moisture in power transformer insulating oil based on the spectral kurtosis analysis of the acoustic signals of partial discharges (PDs). Basically, in this approach, the spectral kurtosis of the PD acoustic signal is calculated and the correlation between its maximum value and the moisture percentage is explored to find a function that calculates the moisture percentage. The function can be easily implemented in DSP, FPGA, or any other type of embedded system for online moisture monitoring. To evaluate the proposed approach, an experiment is assembled with a piezoelectric sensor attached to a tank, which is filled with insulating oil samples contaminated by different levels of moisture. A device generating electrical discharges is submerged into the oil to simulate the occurrence of PDs. Detected acoustic signals are processed using fast kurtogram algorithm to extract spectral kurtosis values. The obtained data are used to find the fitting function that relates the water contamination to the maximum value of the spectral kurtosis. Experimental results show that the proposed method is suitable for online monitoring system of power transformers.

  11. The Acoustic Structure and Information Content of Female Koala Vocal Signals.

    PubMed

    Charlton, Benjamin D

    2015-01-01

    Determining the information content of animal vocalisations can give valuable insights into the potential functions of vocal signals. The source-filter theory of vocal production allows researchers to examine the information content of mammal vocalisations by linking variation in acoustic features with variation in relevant physical characteristics of the caller. Here I used a source-filter theory approach to classify female koala vocalisations into different call-types, and determine which acoustic features have the potential to convey important information about the caller to other conspecifics. A two-step cluster analysis classified female calls into bellows, snarls and tonal rejection calls. Additional results revealed that female koala vocalisations differed in their potential to provide information about a given caller's phenotype that may be of importance to receivers. Female snarls did not contain reliable acoustic cues to the caller's identity and age. In contrast, female bellows and tonal rejection calls were individually distinctive, and the tonal rejection calls of older female koalas had consistently lower mean, minimum and maximum fundamental frequency. In addition, female bellows were significantly shorter in duration and had higher fundamental frequency, formant frequencies, and formant frequency spacing than male bellows. These results indicate that female koala vocalisations have the potential to signal the caller's identity, age and sex. I go on to discuss the anatomical basis for these findings, and consider the possible functional relevance of signalling this type of information in the koala's natural habitat. PMID:26465340

  12. Design Foundations for Content-Rich Acoustic Interfaces: Investigating Audemes as Referential Non-Speech Audio Cues

    ERIC Educational Resources Information Center

    Ferati, Mexhid Adem

    2012-01-01

    To access interactive systems, blind and visually impaired users can leverage their auditory senses by using non-speech sounds. The current structure of non-speech sounds, however, is geared toward conveying user interface operations (e.g., opening a file) rather than large theme-based information (e.g., a history passage) and, thus, is ill-suited…

  13. Acoustic effects of the ATOC signal (75 Hz, 195 dB) on dolphins and whales

    SciTech Connect

    Au, W.W.; Nachtigall, P.E.; Pawloski, J.L.

    1997-05-01

    The Acoustic Thermometry of Ocean Climate (ATOC) program of Scripps Institution of Oceanography and the Applied Physics Laboratory, University of Washington, will broadcast a low-frequency 75-Hz phase modulated acoustic signal over ocean basins in order to study ocean temperatures on a global scale and examine the effects of global warming. One of the major concerns is the possible effect of the ATOC signal on marine life, especially on dolphins and whales. In order to address this issue, the hearing sensitivity of a false killer whale ({ital Pseudorca crassidens}) and a Risso{close_quote}s dolphin ({ital Grampus griseus}) to the ATOC sound was measured behaviorally. A staircase procedure with the signal levels being changed in 1-dB steps was used to measure the animals{close_quote} threshold to the actual ATOC coded signal. The results indicate that small odontocetes such as the {ital Pseudorca} and {ital Grampus} swimming directly above the ATOC source will not hear the signal unless they dive to a depth of approximately 400 m. A sound propagation analysis suggests that the sound-pressure level at ranges greater than 0.5 km will be less than 130 dB for depths down to about 500 m. Several species of baleen whales produce sounds much greater than 170{endash}180 dB. With the ATOC source on the axis of the deep sound channel (greater than 800 m), the ATOC signal will probably have minimal physical and physiological effects on cetaceans. {copyright} {ital 1997 Acoustical Society of America.}

  14. Speech and noise levels for predicting the degree of speech security

    NASA Astrophysics Data System (ADS)

    Bradley, John S.; Gover, Bradford N.

    2005-09-01

    A meeting room is speech secure when it is difficult or impossible for an eavesdropper to overhear speech from within. The degree of security could range from less stringent conditions of being barely able to understand a few words from the meeting room, to higher levels, where transmitted speech would be completely inaudible. This paper reports on measurements to determine the statistical distribution of speech levels in meeting rooms and the distribution of ambient noise levels just outside meeting rooms. To select the required transmission characteristics for a meeting room wall, one would first decide on an acceptable level of risk, in terms of the probability of a speech security lapse occurring. This leads to the selection of a combination of a speech level in the meeting room and a noise level nearby that would occur together with this probability. The combination of appropriate estimates of meeting room speech levels and nearby ambient noise levels, together with the sound transmission characteristics of the intervening partition, makes it possible to calculate signal/noise ratio indices related to speech security [J. Acoust. Soc. Am. 116(6), 3480-3490 (2004)]. The value of these indices indicates if adequate speech security will be achieved.

  15. Quadratic Time-Frequency Analysis of Hydroacoustic Signals as Applied to Acoustic Emissions of Large Whales

    NASA Astrophysics Data System (ADS)

    Le Bras, Ronan; Victor, Sucic; Damir, Malnar; Götz, Bokelmann

    2014-05-01

    In order to enrich the set of attributes in setting up a large database of whale signals, as envisioned in the Baleakanta project, we investigate methods of time-frequency analysis. The purpose of establishing the database is to increase and refine knowledge of the emitted signal and of its propagation characteristics, leading to a better understanding of the animal migrations in a non-invasive manner and to characterize acoustic propagation in oceanic media. The higher resolution for signal extraction and a better separation from other signals and noise will be used for various purposes, including improved signal detection and individual animal identification. The quadratic class of time-frequency distributions (TFDs) is the most popular set of time-frequency tools for analysis and processing of non-stationary signals. Two best known and most studied members of this class are the spectrogram and the Wigner-Ville distribution. However, to be used efficiently, i.e. to have highly concentrated signal components while significantly suppressing interference and noise simultaneously, TFDs need to be optimized first. The optimization method used in this paper is based on the Cross-Wigner-Ville distribution, and unlike similar approaches it does not require prior information on the analysed signal. The method is applied to whale signals, which, just like the majority of other real-life signals, can generally be classified as multicomponent non-stationary signals, and hence time-frequency techniques are a natural choice for their representation, analysis, and processing. We present processed data from a set containing hundreds of individual calls. The TFD optimization method results into a high resolution time-frequency representation of the signals. It allows for a simple extraction of signal components from the TFD's dominant ridges. The local peaks of those ridges can then be used for the signal components instantaneous frequency estimation, which in turn can be used as

  16. Behavioral assessment of acoustic parameters relevant to signal recognition and preference in a vocal fish.

    PubMed

    McKibben, J R; Bass, A H

    1998-12-01

    Acoustic signal recognition depends on the receiver's processing of the physical attributes of a sound. This study takes advantage of the simple communication sounds produced by plainfin midshipman fish to examine effects of signal variation on call recognition and preference. Nesting male midshipman generate both long duration (> 1 min) sinusoidal-like "hums" and short duration "grunts." The hums of neighboring males often overlap, creating beat waveforms. Presentation of humlike, single tone stimuli, but not grunts or noise, elicited robust attraction (phonotaxis) by gravid females. In two-choice tests, females differentiated and chose between acoustic signals that differed in duration, frequency, amplitude, and fine temporal content. Frequency preferences were temperature dependent, in accord with the known temperature dependence of hum fundamental frequency. Concurrent hums were simulated with two-tone beat stimuli, either presented from a single speaker or produced more naturally by interference between adjacent sources. Whereas certain single-source beats reduced stimulus attractiveness, beats which resolved into unmodulated tones at their sources did not affect preference. These results demonstrate that phonotactic assessment of stimulus relevance can be applied in a teleost fish, and that multiple signal parameters can affect receiver response in a vertebrate with relatively simple communication signals. PMID:9857511

  17. Seismo-acoustic Signals Recorded at KSIAR, the Infrasound Array Installed at PS31

    NASA Astrophysics Data System (ADS)

    Kim, T. S.; Che, I. Y.; Jeon, J. S.; Chi, H. C.; Kang, I. B.

    2014-12-01

    One of International Monitoring System (IMS)'s primary seismic stations, PS31, called Korea Seismic Research Station (KSRS), was installed around Wonju, Korea in 1970s. It has been operated by US Air Force Technical Applications Center (AFTAC) for more than 40 years. KSRS is composed of 26 seismic sensors including 19 short period, 6 long period and 1 broad band seismometers. The 19 short period sensors were used to build an array with a 10-km aperture while the 6 long period sensors were used for a relatively long period array with a 40-km aperture. After KSRS was certified as an IMS station in 2006 by Comprehensive Nuclear Test Ban Treaty Organization (CTBTO), Korea Institute of Geoscience and Mineral Resources (KIGAM) which is the Korea National Data Center started to take over responsibilities on the operation and maintenance of KSRS from AFTAC. In April of 2014, KIGAM installed an infrasound array, KSIAR, on the existing four short period seismic stations of KSRS, the sites KS05, KS06, KS07 and KS16. The collocated KSIAR changed KSRS from a seismic array into a seismo-acoustic array. The aperture of KSIAR is 3.3 km. KSIAR also has a 100-m small aperture infrasound array at KS07. The infrasound data from KSIAR except that from the site KS06 is being transmitted in real time to KIGAM with VPN and internet line. An initial analysis on seismo-acoustic signals originated from local and regional distance ranges has been performed since May 2014. The analysis with the utilization of an array process called Progressive Multi-Channel Correlation (PMCC) detected seismo-acoustic signals caused by various sources including small explosions in relation to constructing local tunnels and roads. Some of them were not found in the list of automatic bulletin of KIGAM. The seismo-acoustic signals recorded by KSIAR are supplying a useful information for discriminating local and regional man-made events from natural events.

  18. Studies of horizontal refraction and scattering of low-frequency acoustic signals using a modal approach in signal processing of NPAL data

    NASA Astrophysics Data System (ADS)

    Voronovich, Alexander G.; Ostashev, Vladimir E.

    2003-04-01

    In our previous paper [J. Acoust. Soc. Am. 112, 2232], we obtained a time dependence of the horizontal refraction angle (HRA) of acoustic signals propagating over a range of about 4000 km in the ocean. This dependence was computed by processing of acoustic signals recorded during the North Pacific Acoustic Laboratory (NPAL) experiment using a ray-type approach. In the present paper, we consider the results obtained in signal processing of the same data using a modal approach. In this approach, the acoustic field is represented as a sum of local acoustic modes with amplitudes depending on a frequency and arrival angle. We obtained a time dependence of HRA for a time interval of about a year. Time evolution of HRA exhibits long-period variations which could be associated with seasonal trends in the sound speed profiles. The results are consistent with those obtained by the ray approach. Different horizontal angles within arrivals were impossible to resolve due to sound scattering by internal waves. A theoretical estimate of the angular width of the acoustic signals in a horizontal plane was obtained. It appears to be consistent with the observed variance of HRA data. [Work supported by ONR.] a)J. A. Colosi, B. D. Cornuelle, B. D. Dushaw, M. A. Dzieciuch, B. M. Howe, J. A. Mercer, R. C. Spindel, and P. F. Worcester.

  19. Signal Processing Methods for Removing the Effects of Whole Body Vibration upon Speech

    NASA Technical Reports Server (NTRS)

    Bitner, Rachel M.; Begault, Durand R.

    2014-01-01

    Humans may be exposed to whole-body vibration in environments where clear speech communications are crucial, particularly during the launch phases of space flight and in high-performance aircraft. Prior research has shown that high levels of vibration cause a decrease in speech intelligibility. However, the effects of whole-body vibration upon speech are not well understood, and no attempt has been made to restore speech distorted by whole-body vibration. In this paper, a model for speech under whole-body vibration is proposed and a method to remove its effect is described. The method described reduces the perceptual effects of vibration, yields higher ASR accuracy scores, and may significantly improve intelligibility. Possible applications include incorporation within communication systems to improve radio-communication systems in environments such a spaceflight, aviation, or off-road vehicle operations.

  20. Acoustic analysis of trill sounds.

    PubMed

    Dhananjaya, N; Yegnanarayana, B; Bhaskararao, Peri

    2012-04-01

    In this paper, the acoustic-phonetic characteristics of steady apical trills--trill sounds produced by the periodic vibration of the apex of the tongue--are studied. Signal processing methods, namely, zero-frequency filtering and zero-time liftering of speech signals, are used to analyze the excitation source and the resonance characteristics of the vocal tract system, respectively. Although it is natural to expect the effect of trilling on the resonances of the vocal tract system, it is interesting to note that trilling influences the glottal source of excitation as well. The excitation characteristics derived using zero-frequency filtering of speech signals are glottal epochs, strength of impulses at the glottal epochs, and instantaneous fundamental frequency of the glottal vibration. Analysis based on zero-time liftering of speech signals is used to study the dynamic resonance characteristics of vocal tract system during the production of trill sounds. Qualitative analysis of trill sounds in different vowel contexts, and the acoustic cues that may help spotting trills in continuous speech are discussed. PMID:22501086

  1. Generation of desired signals from acoustic drivers. [for aircraft engine internal noise propagation experiment

    NASA Technical Reports Server (NTRS)

    Ramakrishnan, R.; Salikuddin, M.; Ahuja, K. K.

    1982-01-01

    A procedure to control transient signal generation is developed for the study of internal noise propagation from aircraft engines. A simple algorithm incorporating transform techniques is used to produce signals of any desired waveform from acoustic drivers. The accurate driver response is then calculated, and from this the limiting frequency characteristics are determined and the undesirable frequencies where the driver response is poor are eliminated from the analysis. A synthesized signal is then produced by convolving the inverse of the response function with the desired signal. Although the shape of the synthesized signal is in general quite awkward, the driver generates the desired signal when the distorted signal is fed into the driver. The results of operating the driver in two environments, in a free field and in a duct, are presented in order to show the impedance matching effect of the driver. In addition, results using a high frequency cut-off value as a parameter is presented in order to demonstrate the extent of the applicability of the synthesis procedure. It is concluded that the desired signals can be generated through the signal synthesis procedure.

  2. Acoustic alarm signalling facilitates predator protection of treehoppers by mutualist ant bodyguards

    PubMed Central

    Morales, Manuel A; Barone, Jennifer L; Henry, Charles S

    2008-01-01

    Mutualism is a net positive interaction that includes varying degrees of both costs and benefits. Because tension between the costs and benefits of mutualism can lead to evolutionary instability, identifying mechanisms that regulate investment between partners is critical to understanding the evolution and maintenance of mutualism. Recently, studies have highlighted the importance of interspecific signalling as one mechanism for regulating investment between mutualist partners. Here, we provide evidence for interspecific alarm signalling in an insect protection mutualism and we demonstrate a functional link between this acoustic signalling and efficacy of protection. The treehopper Publilia concava Say (Hemiptera: Membracidae) is an insect that provides ants with a carbohydrate-rich excretion called honeydew in return for protection from predators. Adults of this species produce distinct vibrational signals in the context of predator encounters. In laboratory trials, putative alarm signal production significantly increased following initial contact with ladybeetle predators (primarily Harmonia axyridis Pallas, Coleoptera: Coccinellidae), but not following initial contact with ants. In field trials, playback of a recorded treehopper alarm signal resulted in a significant increase in both ant activity and the probability of ladybeetle discovery by ants relative to both silence and treehopper courtship signal controls. Our results show that P. concava treehoppers produce alarm signals in response to predator threat and that this signalling can increase effectiveness of predator protection by ants. PMID:18480015

  3. Acoustic signal perception in a noisy habitat: lessons from synchronising insects.

    PubMed

    Hartbauer, M; Siegert, M E; Fertschai, I; Römer, H

    2012-06-01

    Acoustically communicating animals often have to cope with ambient noise that has the potential to interfere with the perception of conspecific signals. Here we use the synchronous display of mating signals in males of the tropical katydid Mecopoda elongata in order to assess the influence of nocturnal rainforest noise on signal perception. Loud background noise may disturb chorus synchrony either by masking the signals of males or by interaction of noisy events with the song oscillator. Phase-locked synchrony of males was studied under various signal-to-noise ratios (SNRs) using either native noise or the audio component of noise (<9 kHz). Synchronous entrainment was lost at a SNR of -3 dB when native noise was used, whereas with the audio component still 50% of chirp periods matched the pacer period at a SNR of -7 dB. Since the chirp period of solo singing males remained almost unaffected by noise, our results suggest that masking interference limits chorus synchrony by rendering conspecific signals ambiguous. Further, entrainment with periodic artificial signals indicates that synchrony is achieved by ignoring heterospecific signals and attending to a conspecific signal period. Additionally, the encoding of conspecific chirps was studied in an auditory neuron under the same background noise regimes.

  4. Temporal patterns in the acoustic signals of beaked whales at Cross Seamount.

    PubMed

    Johnston, D W; McDonald, M; Polovina, J; Domokos, R; Wiggins, S; Hildebrand, J

    2008-04-23

    Seamounts may influence the distribution of marine mammals through a combination of increased ocean mixing, enhanced local productivity and greater prey availability. To study the effects of seamounts on the presence and acoustic behaviour of cetaceans, we deployed a high-frequency acoustic recording package on the summit of Cross Seamount during April through October 2005. The most frequently detected cetacean vocalizations were echolocation sounds similar to those produced by ziphiid and mesoplodont beaked whales together with buzz-type signals consistent with prey-capture attempts. Beaked whale signals occurred almost entirely at night throughout the six-month deployment. Measurements of prey presence with a Simrad EK-60 fisheries acoustics echo sounder indicate that Cross Seamount may enhance local productivity in near-surface waters. Concentrations of micronekton were aggregated over the seamount in near-surface waters at night, and dense concentrations of nekton were detected across the surface of the summit. Our results suggest that seamounts may provide enhanced foraging opportunities for beaked whales during the night through a combination of increased productivity, vertical migrations by micronekton and local retention of prey. Furthermore, the summit of the seamount may act as a barrier against which whales concentrate prey. PMID:18252660

  5. Extruded Bread Classification on the Basis of Acoustic Emission Signal With Application of Artificial Neural Networks

    NASA Astrophysics Data System (ADS)

    Świetlicka, Izabela; Muszyński, Siemowit; Marzec, Agata

    2015-04-01

    The presented work covers the problem of developing a method of extruded bread classification with the application of artificial neural networks. Extruded flat graham, corn, and rye breads differening in water activity were used. The breads were subjected to the compression test with simultaneous registration of acoustic signal. The amplitude-time records were analyzed both in time and frequency domains. Acoustic emission signal parameters: single energy, counts, amplitude, and duration acoustic emission were determined for the breads in four water activities: initial (0.362 for rye, 0.377 for corn, and 0.371 for graham bread), 0.432, 0.529, and 0.648. For classification and the clustering process, radial basis function, and self-organizing maps (Kohonen network) were used. Artificial neural networks were examined with respect to their ability to classify or to cluster samples according to the bread type, water activity value, and both of them. The best examination results were achieved by the radial basis function network in classification according to water activity (88%), while the self-organizing maps network yielded 81% during bread type clustering.

  6. Temporal patterns in the acoustic signals of beaked whales at Cross Seamount.

    PubMed

    Johnston, D W; McDonald, M; Polovina, J; Domokos, R; Wiggins, S; Hildebrand, J

    2008-04-23

    Seamounts may influence the distribution of marine mammals through a combination of increased ocean mixing, enhanced local productivity and greater prey availability. To study the effects of seamounts on the presence and acoustic behaviour of cetaceans, we deployed a high-frequency acoustic recording package on the summit of Cross Seamount during April through October 2005. The most frequently detected cetacean vocalizations were echolocation sounds similar to those produced by ziphiid and mesoplodont beaked whales together with buzz-type signals consistent with prey-capture attempts. Beaked whale signals occurred almost entirely at night throughout the six-month deployment. Measurements of prey presence with a Simrad EK-60 fisheries acoustics echo sounder indicate that Cross Seamount may enhance local productivity in near-surface waters. Concentrations of micronekton were aggregated over the seamount in near-surface waters at night, and dense concentrations of nekton were detected across the surface of the summit. Our results suggest that seamounts may provide enhanced foraging opportunities for beaked whales during the night through a combination of increased productivity, vertical migrations by micronekton and local retention of prey. Furthermore, the summit of the seamount may act as a barrier against which whales concentrate prey.

  7. Incident signal power comparison for localization of concurrent multiple acoustic sources.

    PubMed

    Salvati, Daniele; Canazza, Sergio

    2014-01-01

    In this paper, a method to solve the localization of concurrent multiple acoustic sources in large open spaces is presented. The problem of the multisource localization in far-field conditions is to correctly associate the direction of arrival (DOA) estimated by a network array system to the same source. The use of systems implementing a Bayesian filter is a traditional approach to address the problem of localization in multisource acoustic scenario. However, in a real noisy open space the acoustic sources are often discontinuous with numerous short-duration events and thus the filtering methods may have difficulty to track the multiple sources. Incident signal power comparison (ISPC) is proposed to compute DOAs association. ISPC is based on identifying the incident signal power (ISP) of the sources on a microphone array using beamforming methods and comparing the ISP between different arrays using spectral distance (SD) measurement techniques. This method solves the ambiguities, due to the presence of simultaneous sources, by identifying sounds through a minimization of an error criterion on SD measures of DOA combinations. The experimental results were conducted in an outdoor real noisy environment and the ISPC performance is reported using different beamforming techniques and SD functions. PMID:24701179

  8. Incident Signal Power Comparison for Localization of Concurrent Multiple Acoustic Sources

    PubMed Central

    2014-01-01

    In this paper, a method to solve the localization of concurrent multiple acoustic sources in large open spaces is presented. The problem of the multisource localization in far-field conditions is to correctly associate the direction of arrival (DOA) estimated by a network array system to the same source. The use of systems implementing a Bayesian filter is a traditional approach to address the problem of localization in multisource acoustic scenario. However, in a real noisy open space the acoustic sources are often discontinuous with numerous short-duration events and thus the filtering methods may have difficulty to track the multiple sources. Incident signal power comparison (ISPC) is proposed to compute DOAs association. ISPC is based on identifying the incident signal power (ISP) of the sources on a microphone array using beamforming methods and comparing the ISP between different arrays using spectral distance (SD) measurement techniques. This method solves the ambiguities, due to the presence of simultaneous sources, by identifying sounds through a minimization of an error criterion on SD measures of DOA combinations. The experimental results were conducted in an outdoor real noisy environment and the ISPC performance is reported using different beamforming techniques and SD functions. PMID:24701179

  9. Multisensor multipulse Linear Predictive Coding (LPC) analysis in noise for medium rate speech transmission

    NASA Astrophysics Data System (ADS)

    Preuss, R. D.

    1985-12-01

    The theory of multipulse linear predictive coding (LPC) analysis is extended to include the possible presence of acoustic noise, as for a telephone near a busy road. Models are developed assuming two signals are provided: the primary signal is the output of a microphone which samples the combined acoustic fields of the noise and the speech, while the secondary signal is the output of a microphone which samples the acoustic field of the noise alone. Analysis techniques to extract the multipulse LPC parameters from these two signals are developed; these techniques are developed as approximations to maximum likelihood analysis for the given model.

  10. Brain-Computer Interfaces for Speech Communication

    PubMed Central

    Brumberg, Jonathan S.; Nieto-Castanon, Alfonso; Kennedy, Philip R.; Guenther, Frank H.

    2010-01-01

    This paper briefly reviews current silent speech methodologies for normal and disabled individuals. Current techniques utilizing electromyographic (EMG) recordings of vocal tract movements are useful for physically healthy individuals but fail for tetraplegic individuals who do not have accurate voluntary control over the speech articulators. Alternative methods utilizing EMG from other body parts (e.g., hand, arm, or facial muscles) or electroencephalography (EEG) can provide capable silent communication to severely paralyzed users, though current interfaces are extremely slow relative to normal conversation rates and require constant attention to a computer screen that provides visual feedback and/or cueing. We present a novel approach to the problem of silent speech via an intracortical microelectrode brain computer interface (BCI) to predict intended speech information directly from the activity of neurons involved in speech production. The predicted speech is synthesized and acoustically fed back to the user with a delay under 50 ms. We demonstrate that the Neurotrophic Electrode used in the BCI is capable of providing useful neural recordings for over 4 years, a necessary property for BCIs that need to remain viable over the lifespan of the user. Other design considerations include neural decoding techniques based on previous research involving BCIs for computer cursor or robotic arm control via prediction of intended movement kinematics from motor cortical signals in monkeys and humans. Initial results from a study of continuous speech production with instantaneous acoustic feedback show the BCI user was able to improve his control over an artificial speech synthesizer both within and across recording sessions. The success of this initial trial validates the potential of the intracortical microelectrode-based approach for providing a speech prosthesis that can allow much more rapid communication rates. PMID:20204164

  11. Brain-Computer Interfaces for Speech Communication.

    PubMed

    Brumberg, Jonathan S; Nieto-Castanon, Alfonso; Kennedy, Philip R; Guenther, Frank H

    2010-04-01

    This paper briefly reviews current silent speech methodologies for normal and disabled individuals. Current techniques utilizing electromyographic (EMG) recordings of vocal tract movements are useful for physically healthy individuals but fail for tetraplegic individuals who do not have accurate voluntary control over the speech articulators. Alternative methods utilizing EMG from other body parts (e.g., hand, arm, or facial muscles) or electroencephalography (EEG) can provide capable silent communication to severely paralyzed users, though current interfaces are extremely slow relative to normal conversation rates and require constant attention to a computer screen that provides visual feedback and/or cueing. We present a novel approach to the problem of silent speech via an intracortical microelectrode brain computer interface (BCI) to predict intended speech information directly from the activity of neurons involved in speech production. The predicted speech is synthesized and acoustically fed back to the user with a delay under 50 ms. We demonstrate that the Neurotrophic Electrode used in the BCI is capable of providing useful neural recordings for over 4 years, a necessary property for BCIs that need to remain viable over the lifespan of the user. Other design considerations include neural decoding techniques based on previous research involving BCIs for computer cursor or robotic arm control via prediction of intended movement kinematics from motor cortical signals in monkeys and humans. Initial results from a study of continuous speech production with instantaneous acoustic feedback show the BCI user was able to improve his control over an artificial speech synthesizer both within and across recording sessions. The success of this initial trial validates the potential of the intracortical microelectrode-based approach for providing a speech prosthesis that can allow much more rapid communication rates.

  12. Acoustics in educational settings. Subcommittee on Acoustics in Educational Settings of the Bioacoustics Standards and Noise Standards Committee American Speech-Language-Hearing Association.

    PubMed

    1995-03-01

    The Americans with Disabilities Act (enacted July 26, 1990) has brought into focus the need for removing barriers and improving accessibility of all buildings and facilities. It is clear that the definition of barrier must be expanded to include not only structural features that limit physical accessibility, but also acoustical barriers that limit access to communication and information. Acoustical interference caused by inappropriate levels of background noise and reverberation presents a barrier to learning and communication in educational settings and school-sponsored extracurricular activities, particularly for students with hearing loss or other language/learning concerns. ASHA has provided these guidelines and acoustical improvement strategies in order to assist communication-related professionals, teachers, school officials, architects, contractors, state education agencies, and others in developing the best possible learning environment for all students. Additional research on both the acoustical characteristics of learning environments and the communication requirements of learners is encouraged. PMID:7696882

  13. Demodulation of acoustic telemetry binary phase shift keying signal based on high-order Duffing system

    NASA Astrophysics Data System (ADS)

    Yan, Bing-Nan; Liu, Chong-Xin; Ni, Jun-Kang; Zhao, Liang

    2016-10-01

    In order to grasp the downhole situation immediately, logging while drilling (LWD) technology is adopted. One of the LWD technologies, called acoustic telemetry, can be successfully applied to modern drilling. It is critical for acoustic telemetry technology that the signal is successfully transmitted to the ground. In this paper, binary phase shift keying (BPSK) is used to modulate carrier waves for the transmission and a new BPSK demodulation scheme based on Duffing chaos is investigated. Firstly, a high-order system is given in order to enhance the signal detection capability and it is realized through building a virtual circuit using an electronic workbench (EWB). Secondly, a new BPSK demodulation scheme is proposed based on the intermittent chaos phenomena of the new Duffing system. Finally, a system variable crossing zero-point equidistance method is proposed to obtain the phase difference between the system and the BPSK signal. Then it is determined that the digital signal transmitted from the bottom of the well is ‘0’ or ‘1’. The simulation results show that the demodulation method is feasible. Project supported by the National Natural Science Foundation of China (Grant No. 51177117) and the National Key Science & Technology Special Projects, China (Grant No. 2011ZX05021-005).

  14. A hardware model of the auditory periphery to transduce acoustic signals into neural activity

    PubMed Central

    Tateno, Takashi; Nishikawa, Jun; Tsuchioka, Nobuyoshi; Shintaku, Hirofumi; Kawano, Satoyuki

    2013-01-01

    To improve the performance of cochlear implants, we have integrated a microdevice into a model of the auditory periphery with the goal of creating a microprocessor. We constructed an artificial peripheral auditory system using a hybrid model in which polyvinylidene difluoride was used as a piezoelectric sensor to convert mechanical stimuli into electric signals. To produce frequency selectivity, the slit on a stainless steel base plate was designed such that the local resonance frequency of the membrane over the slit reflected the transfer function. In the acoustic sensor, electric signals were generated based on the piezoelectric effect from local stress in the membrane. The electrodes on the resonating plate produced relatively large electric output signals. The signals were fed into a computer model that mimicked some functions of inner hair cells, inner hair cell–auditory nerve synapses, and auditory nerve fibers. In general, the responses of the model to pure-tone burst and complex stimuli accurately represented the discharge rates of high-spontaneous-rate auditory nerve fibers across a range of frequencies greater than 1 kHz and middle to high sound pressure levels. Thus, the model provides a tool to understand information processing in the peripheral auditory system and a basic design for connecting artificial acoustic sensors to the peripheral auditory nervous system. Finally, we discuss the need for stimulus control with an appropriate model of the auditory periphery based on auditory brainstem responses that were electrically evoked by different temporal pulse patterns with the same pulse number. PMID:24324432

  15. A hardware model of the auditory periphery to transduce acoustic signals into neural activity.

    PubMed

    Tateno, Takashi; Nishikawa, Jun; Tsuchioka, Nobuyoshi; Shintaku, Hirofumi; Kawano, Satoyuki

    2013-01-01

    To improve the performance of cochlear implants, we have integrated a microdevice into a model of the auditory periphery with the goal of creating a microprocessor. We constructed an artificial peripheral auditory system using a hybrid model in which polyvinylidene difluoride was used as a piezoelectric sensor to convert mechanical stimuli into electric signals. To produce frequency selectivity, the slit on a stainless steel base plate was designed such that the local resonance frequency of the membrane over the slit reflected the transfer function. In the acoustic sensor, electric signals were generated based on the piezoelectric effect from local stress in the membrane. The electrodes on the resonating plate produced relatively large electric output signals. The signals were fed into a computer model that mimicked some functions of inner hair cells, inner hair cell-auditory nerve synapses, and auditory nerve fibers. In general, the responses of the model to pure-tone burst and complex stimuli accurately represented the discharge rates of high-spontaneous-rate auditory nerve fibers across a range of frequencies greater than 1 kHz and middle to high sound pressure levels. Thus, the model provides a tool to understand information processing in the peripheral auditory system and a basic design for connecting artificial acoustic sensors to the peripheral auditory nervous system. Finally, we discuss the need for stimulus control with an appropriate model of the auditory periphery based on auditory brainstem responses that were electrically evoked by different temporal pulse patterns with the same pulse number. PMID:24324432

  16. Wavelet Transform Of Acoustic Signal From A Ranque- Hilsch Vortex Tube

    NASA Astrophysics Data System (ADS)

    Istihat, Y.; Wisnoe, W.

    2015-09-01

    This paper presents the frequency analysis of flow in a Ranque-Hilsch Vortex Tube (RHVT) obtained from acoustic signal using microphones in an isolated formation setup. Data Acquisition System (DAS) that incorporates Analog to Digital Converter (ADC) with laptop computer has been used to acquire the wave data. Different inlet pressures (20, 30, 40, 50 and 60 psi) are supplied and temperature differences are recorded. Frequencies produced from a RHVT are experimentally measured and analyzed by means of Wavelet Transform (WT). Morlet Wavelet is used and relation between Pressure variation, Temperature and Frequency are studied. Acoustic data has been analyzed using Matlab® and time-frequency analysis (Scalogram) is presented. Results show that the Pressure is proportional with the Frequency inside the RHVT whereby two distinct working frequencies is pronounced in between 4-8 kHz.

  17. Study of Doppler Shift Correction for Underwater Acoustic Communication Using Orthogonal Signal Division Multiplexing

    NASA Astrophysics Data System (ADS)

    Ebihara, Tadashi; Mizutani, Keiichi

    2011-07-01

    In this study, we apply Doppler shift correction schemes for underwater acoustic (UWA) communication with orthogonal signal division multiplexing (OSDM) to achieve stable communication in underwater acoustic channels. Three Doppler correction schemes, which exploit the guard interval, are applied to UWA communication with OSDM and evaluated in simulations. Through a simulation in which only the Doppler effect is considered, we confirmed that by adapting schemes to UWA communication with OSDM, we can correct large Doppler shifts, which addresses the usual speed of vehicles and ships. Moreover, by considering both the Doppler effect and channel reverberation, we propose the best possible combination of Doppler correction schemes for UWA communication with OSDM. The results suggest that UWA communication with OSDM may lead to high-quality communication by considering channel reverberation and large Doppler shifts.

  18. The effect of artificial rain on backscattered acoustic signal: first measurements

    NASA Astrophysics Data System (ADS)

    Titchenko, Yuriy; Karaev, Vladimir; Meshkov, Evgeny; Goldblat, Vladimir

    The problem of rain influencing on a characteristics of backscattered ultrasonic and microwave signal by water surface is considered. The rain influence on backscattering process of electromagnetic waves was investigated in laboratory and field experiments, for example [1-3]. Raindrops have a significant impact on backscattering of microwave and influence on wave spectrum measurement accuracy by string wave gauge. This occurs due to presence of raindrops in atmosphere and modification of the water surface. For measurements of water surface characteristics during precipitation we propose to use an acoustic system. This allows us obtaining of the water surface parameters independently on precipitation in atmosphere. The measurements of significant wave height of water surface using underwater acoustical systems are well known [4, 5]. Moreover, the variance of orbital velocity can be measure using these systems. However, these methods cannot be used for measurements of slope variance and the other second statistical moments of water surface that required for analyzing the radar backscatter signal. An original design Doppler underwater acoustic wave gauge allows directly measuring the surface roughness characteristics that affect on electromagnetic waves backscattering of the same wavelength [6]. Acoustic wave gauge is Doppler ultrasonic sonar which is fixed near the bottom on the floating disk. Measurements are carried out at vertically orientation of sonar antennas towards water surface. The first experiments were conducted with the first model of an acoustic wave gauge. The acoustic wave gauge (8 mm wavelength) is equipped with a transceiving antenna with a wide symmetrical antenna pattern. The gauge allows us to measure Doppler spectrum and cross section of backscattered signal. Variance of orbital velocity vertical component can be retrieved from Doppler spectrum with high accuracy. The result of laboratory and field experiments during artificial rain is presented

  19. Classroom acoustics: Three pilot studies

    NASA Astrophysics Data System (ADS)

    Smaldino, Joseph J.

    2005-04-01

    This paper summarizes three related pilot projects designed to focus on the possible effects of classroom acoustics on fine auditory discrimination as it relates to language acquisition, especially English as a second language. The first study investigated the influence of improving the signal-to-noise ratio on the differentiation of English phonemes. The results showed better differentiation with better signal-to-noise ratio. The second studied speech perception in noise by young adults for whom English was a second language. The outcome indicated that the second language learners required a better signal-to-noise ratio to perform equally to the native language participants. The last study surveyed the acoustic conditions of preschool and day care classrooms, wherein first and second language learning occurs. The survey suggested an unfavorable acoustic environment for language learning.

  20. A Novel Method for Speech Acquisition and Enhancement by 94 GHz Millimeter-Wave Sensor.

    PubMed

    Chen, Fuming; Li, Sheng; Li, Chuantao; Liu, Miao; Li, Zhao; Xue, Huijun; Jing, Xijing; Wang, Jianqi

    2015-12-31

    In order to improve the speech acquisition ability of a non-contact method, a 94 GHz millimeter wave (MMW) radar sensor was employed to detect speech signals. This novel non-contact speech acquisition method was shown to have high directional sensitivity, and to be immune to strong acoustical disturbance. However, MMW radar speech is often degraded by combined sources of noise, which mainly include harmonic, electrical circuit and channel noise. In this paper, an algorithm combining empirical mode decomposition (EMD) and mutual information entropy (MIE) was proposed for enhancing the perceptibility and intelligibility of radar speech. Firstly, the radar speech signal was adaptively decomposed into oscillatory components called intrinsic mode functions (IMFs) by EMD. Secondly, MIE was used to determine the number of reconstructive components, and then an adaptive threshold was employed to remove the noise from the radar speech. The experimental results show that human speech can be effectively acquired by a 94 GHz MMW radar sensor when the detection distance is 20 m. Moreover, the noise of the radar speech is greatly suppressed and the speech sounds become more pleasant to human listeners after being enhanced by the proposed algorithm, suggesting that this novel speech acquisition and enhancement method will provide a promising alternative for various applications associated with speech detection.

  1. A Novel Method for Speech Acquisition and Enhancement by 94 GHz Millimeter-Wave Sensor

    PubMed Central

    Chen, Fuming; Li, Sheng; Li, Chuantao; Liu, Miao; Li, Zhao; Xue, Huijun; Jing, Xijing; Wang, Jianqi

    2015-01-01

    In order to improve the speech acquisition ability of a non-contact method, a 94 GHz millimeter wave (MMW) radar sensor was employed to detect speech signals. This novel non-contact speech acquisition method was shown to have high directional sensitivity, and to be immune to strong acoustical disturbance. However, MMW radar speech is often degraded by combined sources of noise, which mainly include harmonic, electrical circuit and channel noise. In this paper, an algorithm combining empirical mode decomposition (EMD) and mutual information entropy (MIE) was proposed for enhancing the perceptibility and intelligibility of radar speech. Firstly, the radar speech signal was adaptively decomposed into oscillatory components called intrinsic mode functions (IMFs) by EMD. Secondly, MIE was used to determine the number of reconstructive components, and then an adaptive threshold was employed to remove the noise from the radar speech. The experimental results show that human speech can be effectively acquired by a 94 GHz MMW radar sensor when the detection distance is 20 m. Moreover, the noise of the radar speech is greatly suppressed and the speech sounds become more pleasant to human listeners after being enhanced by the proposed algorithm, suggesting that this novel speech acquisition and enhancement method will provide a promising alternative for various applications associated with speech detection. PMID:26729126

  2. Perceptual organization of speech signals by children with and without dyslexia

    PubMed Central

    Nittrouer, Susan; Lowenstein, Joanna H.

    2013-01-01

    Developmental dyslexia is a condition in which children encounter difficulty learning to read in spite of adequate instruction. Although considerable effort has been expended trying to identify the source of the problem, no single solution has been agreed upon. The current study explored a new hypothesis, that developmental dyslexia may be due to faulty perceptual organization of linguistically relevant sensory input. To test that idea, sentence-length speech signals were processed to create either sine-wave or noise-vocoded analogs. Seventy children between 8 and 11 years of age, with and without dyslexia participated. Children with dyslexia were selected to have phonological awareness deficits, although those without such deficits were retained in the study. The processed sentences were presented for recognition, and measures of reading, phonological awareness, and expressive vocabulary were collected. Results showed that children with dyslexia, regardless of phonological subtype, had poorer recognition scores than children without dyslexia for both kinds of degraded sentences. Older children with dyslexia recognized the sine-wave sentences better than younger children with dyslexia, but no such effect of age was found for the vocoded materials. Recognition scores were used as predictor variables in regression analyses with reading, phonological awareness, and vocabulary measures used as dependent variables. Scores for both sorts of sentence materials were strong predictors of performance on all three dependent measures when all children were included, but only performance for the sine-wave materials explained significant proportions of variance when only children with dyslexia were included. Finally, matching young, typical readers with older children with dyslexia on reading abilities did not mitigate the group difference in recognition of vocoded sentences. Conclusions were that children with dyslexia have difficulty organizing linguistically relevant sensory

  3. Phylogenetic signal in the acoustic parameters of the advertisement calls of four clades of anurans

    PubMed Central

    2013-01-01

    Background Anuran vocalizations, especially their advertisement calls, are largely species-specific and can be used to identify taxonomic affiliations. Because anurans are not vocal learners, their vocalizations are generally assumed to have a strong genetic component. This suggests that the degree of similarity between advertisement calls may be related to large-scale phylogenetic relationships. To test this hypothesis, advertisement calls from 90 species belonging to four large clades (Bufo, Hylinae, Leptodactylus, and Rana) were analyzed. Phylogenetic distances were estimated based on the DNA sequences of the 12S mitochondrial ribosomal RNA gene, and, for a subset of 49 species, on the rhodopsin gene. Mean values for five acoustic parameters (coefficient of variation of root-mean-square amplitude, dominant frequency, spectral flux, spectral irregularity, and spectral flatness) were computed for each species. We then tested for phylogenetic signal on the body-size-corrected residuals of these five parameters, using three statistical tests (Moran’s I, Mantel, and Blomberg’s K) and three models of genetic distance (pairwise distances, Abouheif’s proximities, and the variance-covariance matrix derived from the phylogenetic tree). Results A significant phylogenetic signal was detected for most acoustic parameters on the 12S dataset, across statistical tests and genetic distance models, both for the entire sample of 90 species and within clades in several cases. A further analysis on a subset of 49 species using genetic distances derived from rhodopsin and from 12S broadly confirmed the results obtained on the larger sample, indicating that the phylogenetic signals observed in these acoustic parameters can be detected using a variety of genetic distance models derived either from a variable mitochondrial sequence or from a conserved nuclear gene. Conclusions We found a robust relationship, in a large number of species, between anuran phylogenetic relatedness and

  4. Cortical processing of degraded speech sounds: effects of distortion type and continuity.

    PubMed

    Miettinen, Ismo; Alku, Paavo; Yrttiaho, Santeri; May, Patrick J C; Tiitinen, Hannu

    2012-04-01

    Human speech perception is highly resilient to acoustic distortions. In addition to distortions from external sound sources, degradation of the acoustic structure of the sound itself can substantially reduce the intelligibility of speech. The degradation of the internal structure of speech happens, for example, when the digital representation of the signal is impoverished by reducing its amplitude resolution. Further, the perception of speech is also influenced by whether the distortion is transient, coinciding with speech, or is heard continuously in the background. However, the complex effects of the acoustic structure and continuity of the distortion on the cortical processing of degraded speech are unclear. In the present magnetoencephalography study, we investigated how the cortical processing of degraded speech sounds as measured through the auditory N1m response is affected by variation of both the distortion type (internal, external) and the continuity of distortion (transient, continuous). We found that when the distortion was continuous, the N1m was significantly delayed, regardless of the type of distortion. The N1m amplitude, in turn, was affected only when speech sounds were degraded with transient internal distortion, which resulted in larger response amplitudes. The results suggest that external and internal distortions of speech result in divergent patterns of activity in the auditory cortex, and that the effects are modulated by the temporal continuity of the distortion.

  5. Estimates of the prevalence of anomalous signal losses in the Yellow Sea derived from acoustic and oceanographic computer model simulations

    NASA Astrophysics Data System (ADS)

    Chin-Bing, Stanley A.; King, David B.; Warn-Varnas, Alex C.; Lamb, Kevin G.; Hawkins, James A.; Teixeira, Marvi

    2002-05-01

    The results from collocated oceanographic and acoustic simulations in a region of the Yellow Sea near the Shandong peninsula have been presented [Chin-Bing et al., J. Acoust. Soc. Am. 108, 2577 (2000)]. In that work, the tidal flow near the peninsula was used to initialize a 2.5-dimensional ocean model [K. G. Lamb, J. Geophys. Res. 99, 843-864 (1994)] that subsequently generated internal solitary waves (solitons). The validity of these soliton simulations was established by matching satellite imagery taken over the region. Acoustic propagation simulations through this soliton field produced results similar to the anomalous signal loss measured by Zhou, Zhang, and Rogers [J. Acoust. Soc. Am. 90, 2042-2054 (1991)]. Analysis of the acoustic interactions with the solitons also confirmed the hypothesis that the loss mechanism involved acoustic mode coupling. Recently we have attempted to estimate the prevalence of these anomalous signal losses in this region. These estimates were made from simulating acoustic effects over an 80 hour space-time evolution of soliton packets. Examples will be presented that suggest the conditions necessary for anomalous signal loss may be more prevalent than previously thought. [Work supported by ONR/NRL and by a High Performance Computing DoD grant.

  6. Influence of attenuation on acoustic emission signals in carbon fiber reinforced polymer panels.

    PubMed

    Asamene, Kassahun; Hudson, Larry; Sundaresan, Mannur

    2015-05-01

    Influence of attenuation on acoustic emission (AE) signals in Carbon Fiber Reinforced Polymer (CFRP) crossply and quasi-isotropic panels is examined in this paper. Attenuation coefficients of the fundamental antisymmetric (A0) and symmetric (S0) wave modes were determined experimentally along different directions for the two types of CFRP panels. In the frequency range from 100 kHz to 500 kHz, the A0 mode undergoes significantly greater changes due to material related attenuation compared to the S0 mode. Moderate to strong changes in the attenuation levels were noted with propagation directions. Such mode and frequency dependent attenuation introduces major changes in the characteristics of AE signals depending on the position of the AE sensor relative to the source. Results from finite element simulations of a microscopic damage event in the composite laminates are used to illustrate attenuation related changes in modal and frequency components of AE signals.

  7. Speaker specificity in speech perception: the importance of what is and is not in the signal

    NASA Astrophysics Data System (ADS)

    Dahan, Delphine; Scarborough, Rebecca A.

    2005-09-01

    In some American English dialects, /ae/ before /g/ (but not before /k/) raises to a vowel approaching [E], in effect reducing phonetic overlap between (e.g.) ``bag'' and ``back.'' Here, participants saw four written words on a computer screen (e.g., ``bag,'' ``back,'' ``dog,'' ``dock'') and heard a spoken word. Their task was to indicate which word they heard. Participants' eye movements to the written words were recorded. Participants in the ``ae-raising'' group heard identity-spliced ``bag''-like words containing the raised vowel [E] participants in the ``control'' group heard cross-spliced ``bag''-like words containing standard [ae]. Acoustically identical ``back''-like words were subsequently presented to both groups. The ae-raising-group participants identified ``back''-like words faster and more accurately, and made fewer fixations to the competitor ``bag,'' than control-group participants did. Thus, exposure to ae-raised realizations of ``bag'' facilitated the identification of ``back'' because of the reduced fit between the input and the altered representation of the competing hypothesis ``bag.'' This demonstrates that listeners evaluate the spoken input with respect to what is, but also what is not, in the signal, and that this evaluation involves speaker-specific representations. [Work supported by NSF Human and Social Dynamics 0433567.

  8. Comparison of a short-time speech-based intelligibility metric to the speech transmission index and intelligibility dataa

    PubMed Central

    Payton, Karen L.; Shrestha, Mona

    2013-01-01

    Several algorithms have been shown to generate a metric corresponding to the Speech Transmission Index (STI) using speech as a probe stimulus [e.g., Goldsworthy and Greenberg, J. Acoust. Soc. Am. 116, 3679–3689 (2004)]. The time-domain approaches work well on long speech segments and have the added potential to be used for short-time analysis. This study investigates the performance of the Envelope Regression (ER) time-domain STI method as a function of window length, in acoustically degraded environments with multiple talkers and speaking styles. The ER method is compared with a short-time Theoretical STI, derived from octave-band signal-to-noise ratios and reverberation times. For windows as short as 0.3 s, the ER method tracks short-time Theoretical STI changes in stationary speech-shaped noise, fluctuating restaurant babble and stationary noise plus reverberation. The metric is also compared to intelligibility scores on conversational speech and speech articulated clearly but at normal speaking rates (Clear/Norm) in stationary noise. Correlation between the metric and intelligibility scores is high and, consistent with the subject scores, the metrics are higher for Clear/Norm speech than for conversational speech and higher for the first word in a sentence than for the last word. PMID:24180791

  9. Pipe wall damage detection by electromagnetic acoustic transducer generated guided waves in absence of defect signals.

    PubMed

    Vasiljevic, Milos; Kundu, Tribikram; Grill, Wolfgang; Twerdowski, Evgeny

    2008-05-01

    Most investigators emphasize the importance of detecting the reflected signal from the defect to determine if the pipe wall has any damage and to predict the damage location. However, often the small signal from the defect is hidden behind the other arriving wave modes and signal noise. To overcome the difficulties associated with the identification of the small defect signal in the time history plots, in this paper the time history is analyzed well after the arrival of the first defect signal, and after different wave modes have propagated multiple times through the pipe. It is shown that the defective pipe can be clearly identified by analyzing these late arriving diffuse ultrasonic signals. Multiple reflections and scattering of the propagating wave modes by the defect and pipe ends do not hamper the defect detection capability; on the contrary, it apparently stabilizes the signal and makes it easier to distinguish the defective pipe from the defect-free pipe. This paper also highlights difficulties associated with the interpretation of the recorded time histories due to mode conversion by the defect. The design of electro-magnetic acoustic transducers used to generate and receive the guided waves in the pipe is briefly described in the paper.

  10. Noise affects the shape of female preference functions for acoustic signals.

    PubMed

    Reichert, Michael S; Ronacher, Bernhard

    2015-02-01

    The shape of female mate preference functions influences the speed and direction of sexual signal evolution. However, the expression of female preferences is modulated by interactions between environmental conditions and the female's sensory processing system. Noise is an especially relevant environmental condition because it interferes directly with the neural processing of signals. Although noise is therefore likely a significant force in the evolution of communication systems, little is known about its effects on preference function shape. In the grasshopper Chorthippus biguttulus, female preferences for male calling song characteristics are likely to be affected by noise because its auditory system is sensitive to fine temporal details of songs. We measured female preference functions for variation in male song characteristics in several levels of masking noise and found strong effects of noise on preference function shape. The overall responsiveness to signals in noise generally decreased. Preference strength increased for some signal characteristics and decreased for others, largely corresponding to expectations based on neurophysiological studies of acoustic signal processing. These results suggest that different signal characteristics will be favored under different noise conditions, and thus that signal evolution may proceed differently depending on the extent and temporal patterning of environmental noise.

  11. Comparison of voice acquisition methodologies in speech research.

    PubMed

    Vogel, Adam P; Maruff, Paul

    2008-11-01

    The use of voice acoustic techniques has the potential to extend beyond work devoted purely to speech or vocal pathology. For this to occur, however, researchers and clinicians will require acquisition technologies that provide fast, accurate, and cost-effective methods for recording data. Therefore, the present study aimed to compare industry-standard techniques for acquiring high-quality acoustic signals (e.g., hard drive and solid-state recorder) with widely available and easy-to-use, computer-based (standard laptop) data-acquisition methods. Speech samples were simultaneously acquired from 15 healthy controls using all three methods and were analyzed using identical analysis techniques. Data from all three acquisition methods were directly compared using a variety of acoustic correlates. The results suggested that selected acoustic measures (e.g., f 0, noise-to-harmonic ratio, number of pauses) were accurately obtained using all three methods; however, minimum recording standards were required for widely used measures of perturbation.

  12. Signal diversification in Oecanthus tree crickets is shaped by energetic, morphometric, and acoustic trade-offs.

    PubMed

    Symes, L B; Ayres, M P; Cowdery, C P; Costello, R A

    2015-06-01

    Physiology, physics, and ecological interactions can generate trade-offs within species, but may also shape divergence among species. We tested whether signal divergence in Oecanthus tree crickets is shaped by acoustic, energetic, and behavioral trade-offs. We found that species with faster pulse rates, produced by opening and closing wings up to twice as many times per second, did not have higher metabolic costs of calling. The relatively constant energetic cost across species is explained by trade-offs between the duration and repetition rate of acoustic signals-species with fewer stridulatory teeth closed their wings more frequently such that the number of teeth struck per second of calling and the resulting duty cycle were relatively constant across species. Further trade-offs were evident in relationships between signals and body size. Calling was relatively inexpensive for small males, permitting them to call for much of the night, but at low amplitude. Large males produced much louder calls, reaching up to four times more area, but the energetic costs increased substantially with increasing size and the time spent calling dropped to only 20% of the night. These trade-offs indicate that the trait combinations that arise in these species represent a limited subset of conceivable trait combinations.

  13. Signal diversification in Oecanthus tree crickets is shaped by energetic, morphometric, and acoustic trade-offs.

    PubMed

    Symes, L B; Ayres, M P; Cowdery, C P; Costello, R A

    2015-06-01

    Physiology, physics, and ecological interactions can generate trade-offs within species, but may also shape divergence among species. We tested whether signal divergence in Oecanthus tree crickets is shaped by acoustic, energetic, and behavioral trade-offs. We found that species with faster pulse rates, produced by opening and closing wings up to twice as many times per second, did not have higher metabolic costs of calling. The relatively constant energetic cost across species is explained by trade-offs between the duration and repetition rate of acoustic signals-species with fewer stridulatory teeth closed their wings more frequently such that the number of teeth struck per second of calling and the resulting duty cycle were relatively constant across species. Further trade-offs were evident in relationships between signals and body size. Calling was relatively inexpensive for small males, permitting them to call for much of the night, but at low amplitude. Large males produced much louder calls, reaching up to four times more area, but the energetic costs increased substantially with increasing size and the time spent calling dropped to only 20% of the night. These trade-offs indicate that the trait combinations that arise in these species represent a limited subset of conceivable trait combinations. PMID:25903317

  14. The potential influence of morphology on the evolutionary divergence of an acoustic signal

    PubMed Central

    Pitchers, W. R.; Klingenberg, C.P.; Tregenza, Tom; Hunt, J.; Dworkin, I.

    2014-01-01

    The evolution of acoustic behaviour and that of the morphological traits mediating its production are often coupled. Lack of variation in the underlying morphology of signalling traits has the potential to constrain signal evolution. This relationship is particularly likely in field crickets, where males produce acoustic advertisement signals to attract females by stridulating with specialized structures on their forewings. In this study, we characterise the size and geometric shape of the forewings of males from six allopatric populations of the black field cricket (Teleogryllus commodus) known to have divergent advertisement calls. We sample from each of these populations using both wild-caught and common-garden reared cohorts, allowing us to test for multivariate relationships between wing morphology and call structure. We show that the allometry of shape has diverged across populations. However, there was a surprisingly small amount of covariation between wing shape and call structure within populations. Given the importance of male size for sexual selection in crickets, the divergence we observe among populations has the potential to influence the evolution of advertisement calls in this species. PMID:25223712

  15. Long Recording Sequences: How to Track the Intra-Individual Variability of Acoustic Signals

    PubMed Central

    Lengagne, Thierry; Gomez, Doris; Josserand, Rémy; Voituron, Yann

    2015-01-01

    Recently developed acoustic technologies - like automatic recording units - allow the recording of long sequences in natural environments. These devices are used for biodiversity survey but they could also help researchers to estimate global signal variability at various (individual, population, species) scales. While sexually-selected signals are expected to show a low intra-individual variability at relatively short time scale, this variability has never been estimated so far. Yet, measuring signal variability in controlled conditions should prove useful to understand sexual selection processes and should help design acoustic sampling schedules and to analyse long call recordings. We here use the overall call production of 36 male treefrogs (Hyla arborea) during one night to evaluate within-individual variability in call dominant frequency and to test the efficiency of different sampling methods at capturing such variability. Our results confirm that using low number of calls underestimates call dominant frequency variation of about 35% in the tree frog and suggest that the assessment of this variability is better by using 2 or 3 short and well-distributed records than by using samples made of consecutive calls. Hence, 3 well-distributed 2-minutes records (beginning, middle and end of the calling period) are sufficient to capture on average all the nightly variability, whereas a sample of 10 000 consecutive calls captures only 86% of it. From a biological point of view, the call dominant frequency variability observed in H. arborea (116Hz on average but up to 470 Hz of variability during the course of the night for one male) challenge about its reliability in mate quality assessment. Automatic acoustic recording units will provide long call sequences in the near future and it will be then possible to confirm such results on large samples recorded in more complex field conditions. PMID:25970183

  16. Auditory-tactile echo-reverberating stuttering speech corrector

    NASA Astrophysics Data System (ADS)

    Kuniszyk-Jozkowiak, Wieslawa; Adamczyk, Bogdan

    1997-02-01

    The work presents the construction of a device, which transforms speech sounds into acoustical and tactile signals of echo and reverberation. Research has been done on the influence of the echo and reverberation, which are transmitted as acoustic and tactile stimuli, on speech fluency. Introducing the echo or reverberation into the auditory feedback circuit results in a reduction of stuttering. A bit less, but still significant corrective effects are observed while using the tactile channel for transmitting the signals. The use of joined auditory and tactile channels increases the effects of their corrective influence on the stutterers' speech. The results of the experiment justify the use of the tactile channel in the stutterers' therapy.

  17. Mate preference in the painted goby: the influence of visual and acoustic courtship signals.

    PubMed

    Amorim, M Clara P; da Ponte, Ana Nunes; Caiano, Manuel; Pedroso, Silvia S; Pereira, Ricardo; Fonseca, Paulo J

    2013-11-01

    We tested the hypothesis that females of a small vocal marine fish with exclusive paternal care, the painted goby, prefer high parental-quality mates such as large or high-condition males. We tested the effect of male body size and male visual and acoustic courtship behaviour (playback experiments) on female mating preferences by measuring time spent near one of a two-choice stimuli. Females did not show preference for male size but preferred males that showed higher levels of courtship, a trait known to advertise condition (fat reserves). Also, time spent near the preferred male depended on male courtship effort. Playback experiments showed that when sound was combined with visual stimuli (a male confined in a small aquarium placed near each speaker), females spent more time near the male associated with courtship sound than with the control male (associated with white noise or silence). Although male visual courtship effort also affected female preference in the pre-playback period, this effect decreased during playback and disappeared in the post-playback period. Courtship sound stimuli alone did not elicit female preference in relation to a control. Taken together, the results suggest that visual and mainly acoustic courtship displays are subject to mate preference and may advertise parental quality in this species. Our results indicate that visual and acoustic signals interplay in a complex fashion and highlight the need to examine how different sensory modalities affect mating preferences in fish and other vertebrates. PMID:23948469

  18. Multichannel signal processing at Bell Labs Acoustics Research-Sampled by a postdoc

    NASA Astrophysics Data System (ADS)

    Kellermann, Walter

    2001-05-01

    In the mid 1980's, the first large microphone arrays for audio capture were designed and realized by Jim Flanagan and Gary Elko. After the author joined Bell Labs in 1989, the first real-time digital beamformer for teleconferencing applications was implemented and formed a starting point for the development of several novel beamforming techniques. In parallel, multichannel loudspeaker systems were already investigated and research on acoustic echo cancellation, small-aperture directional microphones, and sensor technology complemented the research scenario aiming at seamless hands-free acoustic communication. Arrays of many sensors and loudspeakers for sampling the spatial domain combined with advanced signal processing sparked new concepts that are still fueling ongoing research around the world-including the author's research group. Here, robust adaptive beamforming has found its way from large-scale arrays into many applications using smaller apertures. Blind source separation algorithms allow for effective spatial filtering without a priori information on source positions. Full-duplex communication using multiple channels for both reproduction and recording is enabled by multichannel acoustic echo cancellation combined with beamforming. Recently, wave domain adaptive filtering, a new concept for handling many sensors and many loudspeakers, has been verified for arrays that may well remind some observers of former Bell Labs projects.

  19. Neural Mechanisms for Acoustic Signal Detection under Strong Masking in an Insect

    PubMed Central

    Römer, Heiner

    2015-01-01

    Communication is fundamental for our understanding of behavior. In the acoustic modality, natural scenes for communication in humans and animals are often very noisy, decreasing the chances for signal detection and discrimination. We investigated the mechanisms enabling selective hearing under natural noisy conditions for auditory receptors and interneurons of an insect. In the studied katydid Mecopoda elongata species-specific calling songs (chirps) are strongly masked by signals of another species, both communicating in sympatry. The spectral properties of the two signals are similar and differ only in a small frequency band at 2 kHz present in the chirping species. Receptors sharply tuned to 2 kHz are completely unaffected by the masking signal of the other species, whereas receptors tuned to higher audio and ultrasonic frequencies show complete masking. Intracellular recordings of identified interneurons revealed two mechanisms providing response selectivity to the chirp. (1) Response selectivity is when several identified interneurons exhibit remarkably selective responses to the chirps, even at signal-to-noise ratios of −21 dB, since they are sharply tuned to 2 kHz. Their dendritic arborizations indicate selective connectivity with low-frequency receptors tuned to 2 kHz. (2) Novelty detection is when a second group of interneurons is broadly tuned but, because of strong stimulus-specific adaptation to the masker spectrum and “novelty detection” to the 2 kHz band present only in the conspecific signal, these interneurons start to respond selectively to the chirp shortly after the onset of the continuous masker. Both mechanisms provide the sensory basis for hearing at unfavorable signal-to-noise ratios. SIGNIFICANCE STATEMENT Animal and human acoustic communication may suffer from the same “cocktail party problem,” when communication happens in noisy social groups. We address solutions for this problem in a model system of two katydids, where one

  20. Neural Mechanisms for Acoustic Signal Detection under Strong Masking in an Insect.

    PubMed

    Kostarakos, Konstantinos; Römer, Heiner

    2015-07-22

    Communication is fundamental for our understanding of behavior. In the acoustic modality, natural scenes for communication in humans and animals are often very noisy, decreasing the chances for signal detection and discrimination. We investigated the mechanisms enabling selective hearing under natural noisy conditions for auditory receptors and interneurons of an insect. In the studied katydid Mecopoda elongata species-specific calling songs (chirps) are strongly masked by signals of another species, both communicating in sympatry. The spectral properties of the two signals are similar and differ only in a small frequency band at 2 kHz present in the chirping species. Receptors sharply tuned to 2 kHz are completely unaffected by the masking signal of the other species, whereas receptors tuned to higher audio and ultrasonic frequencies show complete masking. Intracellular recordings of identified interneurons revealed two mechanisms providing response selectivity to the chirp. (1) Response selectivity is when several identified interneurons exhibit remarkably selective responses to the chirps, even at signal-to-noise ratios of -21 dB, since they are sharply tuned to 2 kHz. Their dendritic arborizations indicate selective connectivity with low-frequency receptors tuned to 2 kHz. (2) Novelty detection is when a second group of interneurons is broadly tuned but, because of strong stimulus-specific adaptation to the masker spectrum and "novelty detection" to the 2 kHz band present only in the conspecific signal, these interneurons start to respond selectively to the chirp shortly after the onset of the continuous masker. Both mechanisms provide the sensory basis for hearing at unfavorable signal-to-noise ratios. Significance statement: Animal and human acoustic communication may suffer from the same "cocktail party problem," when communication happens in noisy social groups. We address solutions for this problem in a model system of two katydids, where one species

  1. Differential Diagnosis of Severe Speech Disorders Using Speech Gestures

    ERIC Educational Resources Information Center

    Bahr, Ruth Huntley

    2005-01-01

    The differentiation of childhood apraxia of speech from severe phonological disorder is a common clinical problem. This article reports on an attempt to describe speech errors in children with childhood apraxia of speech on the basis of gesture use and acoustic analyses of articulatory gestures. The focus was on the movement of articulators and…

  2. Neural mechanisms underlying auditory feedback control of speech.

    PubMed

    Tourville, Jason A; Reilly, Kevin J; Guenther, Frank H

    2008-02-01

    The neural substrates underlying auditory feedback control of speech were investigated using a combination of functional magnetic resonance imaging (fMRI) and computational modeling. Neural responses were measured while subjects spoke monosyllabic words under two conditions: (i) normal auditory feedback of their speech and (ii) auditory feedback in which the first formant frequency of their speech was unexpectedly shifted in real time. Acoustic measurements showed compensation to the shift within approximately 136 ms of onset. Neuroimaging revealed increased activity in bilateral superior temporal cortex during shifted feedback, indicative of neurons coding mismatches between expected and actual auditory signals, as well as right prefrontal and Rolandic cortical activity. Structural equation modeling revealed increased influence of bilateral auditory cortical areas on right frontal areas during shifted speech, indicating that projections from auditory error cells in posterior superior temporal cortex to motor correction cells in right frontal cortex mediate auditory feedback control of speech.

  3. Optical observations of meteors generating infrasound-I: Acoustic signal identification and phenomenology

    NASA Astrophysics Data System (ADS)

    Silber, Elizabeth A.; Brown, Peter G.

    2014-11-01

    We analyse infrasound signals from 71 bright meteors/fireballs simultaneously detected by video to investigate the phenomenology and characteristics of meteor-generated near-field infrasound (<300 km) and shock production. A taxonomy for meteor generated infrasound signal classification has been developed using the time-pressure signal of the infrasound arrivals. Based on the location along the meteor trail where the infrasound signal originates, we find most signals are associated with cylindrical shocks, with about a quarter of events evidencing spherical shocks associated with fragmentation episodes and optical flares. The video data indicate that all events with ray launch angles >117° from the trajectory heading are most likely generated by a spherical shock, while infrasound produced by the meteors with ray launch angles ≤117° can be attributed to both a cylindrical line source and a spherical shock. We find that meteors preferentially produce infrasound toward the end of their trails with a smaller number showing a preference for mid-trail production. Meteors producing multiple infrasound arrivals show a strong infrasound source height skewness to the end of trails and are much more likely to be associated with optical flares. We find that about 1% of all our optically-recorded meteors have associated detected infrasound and estimate that regional meteor infrasound events should occur on the order of once per week and dominate in numbers over infrasound associated with more energetic (but rarer) bolides. While a significant fraction of our meteors generating infrasound (~1/4 of single arrivals) are produced by fragmentation events, we find no instances where acoustic radiation is detectable more than about 60° beyond the ballistic regime at our meteoroid sizes (grams to tens of kilograms) emphasizing the strong anisotropy in acoustic radiation for meteors which are dominated by cylindrical line source geometry, even in the presence of fragmentation.

  4. Multiple target tracking and classification improvement using data fusion at node level using acoustic signals

    NASA Astrophysics Data System (ADS)

    Damarla, T. R.; Whipps, Gene

    2005-05-01

    Target tracking and classification using passive acoustic signals is difficult at best as the signals are contaminated by wind noise, multi-path effects, road conditions, and are generally not deterministic. In addition, microphone characteristics, such as sensitivity, vary with the weather conditions. The problem is further compounded if there are multiple targets, especially if some are measured with higher signal-to-noise ratios (SNRs) than the others and they share spectral information. At the U. S. Army Research Laboratory we have conducted several field experiments with a convoy of two, three, four and five vehicles traveling on different road surfaces, namely gravel, asphalt, and dirt roads. The largest convoy is comprised of two tracked vehicles and three wheeled vehicles. Two of the wheeled vehicles are heavy trucks and one is a light vehicle. We used a super-resolution direction-of-arrival estimator, specifically the minimum variance distortionless response, to compute the bearings of the targets. In order to classify the targets, we modeled the acoustic signals emanated from the targets as a set of coupled harmonics, which are related to the engine-firing rate, and subsequently used a multivariate Gaussian classifier. Independent of the classifier, we find tracking of wheeled vehicles to be intermittent as the signals from vehicles with high SNR dominate the much quieter wheeled vehicles. We used several fusion techniques to combine tracking and classification results to improve final tracking and classification estimates. We will present the improvements (or losses) made in tracking and classification of all targets. Although improvements in the estimates for tracked vehicles are not noteworthy, significant improvements are seen in the case of wheeled vehicles. We will present the fusion algorithm used.

  5. Clustering reveals cavitation-related acoustic emission signals from dehydrating branches.

    PubMed

    Vergeynst, Lidewei L; Sause, Markus G R; De Baerdemaeker, Niels J F; De Roo, Linus; Steppe, Kathy

    2016-06-01

    The formation of air emboli in the xylem during drought is one of the key processes leading to plant mortality due to loss in hydraulic conductivity, and strongly fuels the interest in quantifying vulnerability to cavitation. The acoustic emission (AE) technique can be used to measure hydraulic conductivity losses and construct vulnerability curves. For years, it has been believed that all the AE signals are produced by the formation of gas emboli in the xylem sap under tension. More recent experiments, however, demonstrate that gas emboli formation cannot explain all the signals detected during drought, suggesting that different sources of AE exist. This complicates the use of the AE technique to measure emboli formation in plants. We therefore analysed AE waveforms measured on branches of grapevine (Vitis vinifera L. 'Chardonnay') during bench dehydration with broadband sensors, and applied an automated clustering algorithm in order to find natural clusters of AE signals. We used AE features and AE activity patterns during consecutive dehydration phases to identify the different AE sources. Based on the frequency spectrum of the signals, we distinguished three different types of AE signals, of which the frequency cluster with high 100-200 kHz frequency content was strongly correlated with cavitation. Our results indicate that cavitation-related AE signals can be filtered from other AE sources, which presents a promising avenue into quantifying xylem embolism in plants in laboratory and field conditions. PMID:27095256

  6. System and method for characterizing, synthesizing, and/or canceling out acoustic signals from inanimate sound sources

    DOEpatents

    Holzrichter, John F; Burnett, Greg C; Ng, Lawrence C

    2013-05-21

    A system and method for characterizing, synthesizing, and/or canceling out acoustic signals from inanimate sound sources is disclosed. Propagating wave electromagnetic sensors monitor excitation sources in sound producing systems, such as machines, musical instruments, and various other structures. Acoustical output from these sound producing systems is also monitored. From such information, a transfer function characterizing the sound producing system is generated. From the transfer function, acoustical output from the sound producing system may be synthesized or canceled. The methods disclosed enable accurate calculation of matched transfer functions relating specific excitations to specific acoustical outputs. Knowledge of such signals and functions can be used to effect various sound replication, sound source identification, and sound cancellation applications.

  7. System and method for characterizing, synthesizing, and/or canceling out acoustic signals from inanimate sound sources

    DOEpatents

    Holzrichter, John F.; Burnett, Greg C.; Ng, Lawrence C.

    2007-10-16

    A system and method for characterizing, synthesizing, and/or canceling out acoustic signals from inanimate sound sources is disclosed. Propagating wave electromagnetic sensors monitor excitation sources in sound producing systems, such as machines, musical instruments, and various other structures. Acoustical output from these sound producing systems is also monitored. From such information, a transfer function characterizing the sound producing system is generated. From the transfer function, acoustical output from the sound producing system may be synthesized or canceled. The methods disclosed enable accurate calculation of matched transfer functions relating specific excitations to specific acoustical outputs. Knowledge of such signals and functions can be used to effect various sound replication, sound source identification, and sound cancellation applications.

  8. System and method for characterizing synthesizing and/or canceling out acoustic signals from inanimate sound sources

    DOEpatents

    Holzrichter, John F.; Burnett, Greg C.; Ng, Lawrence C.

    2003-01-01

    A system and method for characterizing, synthesizing, and/or canceling out acoustic signals from inanimate sound sources is disclosed. Propagating wave electromagnetic sensors monitor excitation sources in sound producing systems, such as machines, musical instruments, and various other structures. Acoustical output from these sound producing systems is also monitored. From such information, a transfer function characterizing the sound producing system is generated. From the transfer function, acoustical output from the sound producing system may be synthesized or canceled. The methods disclosed enable accurate calculation of matched transfer functions relating specific excitations to specific acoustical outputs. Knowledge of such signals and functions can be used to effect various sound replication, sound source identification, and sound cancellation applications.

  9. Auditory perception bias in speech imitation

    PubMed Central

    Postma-Nilsenová, Marie; Postma, Eric

    2013-01-01

    In an experimental study, we explored the role of auditory perception bias in vocal pitch imitation. Psychoacoustic tasks involving a missing fundamental indicate that some listeners are attuned to the relationship between all the higher harmonics present in the signal, which supports their perception of the fundamental frequency (the primary acoustic correlate of pitch). Other listeners focus on the lowest harmonic constituents of the complex sound signal which may hamper the perception of the fundamental. These two listener types are referred to as fundamental and spectral listeners, respectively. We hypothesized that the individual differences in speakers' capacity to imitate F0 found in earlier studies, may at least partly be due to the capacity to extract information about F0 from the speech signal. Participants' auditory perception bias was determined with a standard missing fundamental perceptual test. Subsequently, speech data were collected in a shadowing task with two conditions, one with a full speech signal and one with high-pass filtered speech above 300 Hz. The results showed that perception bias toward fundamental frequency was related to the degree of F0 imitation. The effect was stronger in the condition with high-pass filtered speech. The experimental outcomes suggest advantages for fundamental listeners in communicative situations where F0 imitation is used as a behavioral cue. Future research needs to determine to what extent auditory perception bias may be related to other individual properties known to improve imitation, such as phonetic talent. PMID:24204361

  10. Experimental Research Into Generation of Acoustic Emission Signals in the Process of Friction of Hadfield Steel Single Crystals

    NASA Astrophysics Data System (ADS)

    Lychagin, D. V.; Filippov, A. V.; Novitskaia, O. S.; Kolubaev, E. A.; Sizova, O. V.

    2016-08-01

    The results of experimental research into dry sliding friction of Hadfield steel single crystals involving registration of acoustic emission are presented in the paper. The images of friction surfaces of Hadfield steel single crystals and wear grooves of the counterbody surface made after completion of three serial experiments conducted under similar conditions and friction regimes are given. The relation of the acoustic emission waveform envelope to the changing friction factor is revealed. Amplitude-frequency characteristics of acoustic emission signal frames are determined on the base of Fast Fourier Transform and Short Time Fourier Transform during the run-in stage of tribounits and in the process of stable friction.

  11. The contribution of auditory temporal processing to the separation of competing speech signals in listeners with normal hearing

    NASA Astrophysics Data System (ADS)

    Adam, Trudy J.; Pichora-Fuller, Kathy

    2002-05-01

    The hallmark of auditory function in aging adults is difficulty listening in a background of competing talkers, even when hearing sensitivity in quiet is good. Age-related physiological changes may contribute by introducing small timing errors (jitter) to the neural representation of sound, compromising the fidelity of the signal's fine temporal structure. This may preclude the association of spectral features to form an accurate percept of one complex stimulus, distinct from competing sounds. For simple voiced speech (vowels), the separation of two competing stimuli can be achieved on the basis of their respective harmonic (temporal) structures. Fundamental frequency (F0) differences in competing stimuli facilitate their segregation. This benefit was hypothesized to rely on the adequate temporal representation of the speech signal(s). Auditory aging was simulated via the desynchronization (~0.25-ms jitter) of the spectral bands of synthesized vowels. The perceptual benefit of F0 difference for the identification of concurrent vowel pairs was examined for intact and jittered vowels in young adults with normal hearing thresholds. Results suggest a role for reduced signal fidelity in the perceptual difficulties encountered in noisy everyday environments by aging listeners. [Work generously supported by the Michael Smith Foundation for Health Research.

  12. Talker variability in audio-visual speech perception.

    PubMed

    Heald, Shannon L M; Nusbaum, Howard C

    2014-01-01

    A change in talker is a change in the context for the phonetic interpretation of acoustic patterns of speech. Different talkers have different mappings between acoustic patterns and phonetic categories and listeners need to adapt to these differences. Despite this complexity, listeners are adept at comprehending speech in multiple-talker contexts, albeit at a slight but measurable performance cost (e.g., slower recognition). So far, this talker variability cost has been demonstrated only in audio-only speech. Other research in single-talker contexts have shown, however, that when listeners are able to see a talker's face, speech recognition is improved under adverse listening (e.g., noise or distortion) conditions that can increase uncertainty in the mapping between acoustic patterns and phonetic categories. Does seeing a talker's face reduce the cost of word recognition in multiple-talker contexts? We used a speeded word-monitoring task in which listeners make quick judgments about target word recognition in single- and multiple-talker contexts. Results show faster recognition performance in single-talker conditions compared to multiple-talker conditions for both audio-only and audio-visual speech. However, recognition time in a multiple-talker context was slower in the audio-visual condition compared to audio-only condition. These results suggest that seeing a talker's face during speech perception may slow recognition by increasing the importance of talker identification, signaling to the listener a change in talker has occurred. PMID:25076919

  13. Perception and the temporal properties of speech

    NASA Astrophysics Data System (ADS)

    Gordon, Peter C.

    1991-11-01

    Four experiments addressing the role of attention in phonetic perception are reported. The first experiment shows that the relative importance of two cues to the voicing distinction changes when subjects must perform an arithmetic distractor task at the same time as identifying a speech stimulus. The voice onset time cue loses phonetic significance when subjects are distracted, while the F0 onset frequency cue does not. The second experiment shows a similar pattern for two cues to the distinction between the vowels /i/ (as in 'beat') and /I/ (as in 'bit'). Together these experiments indicate that careful attention to speech perception is necessary for strong acoustic cues to achieve their full phonetic impact, while weaker acoustic cues achieve their full phonetic impact without close attention. Experiment 3 shows that this pattern is obtained when the distractor task places little demand on verbal short term memory. Experiment 4 provides a large data set for testing formal models of the role of attention in speech perception. Attention is shown to influence the signal to noise ratio in phonetic encoding. This principle is instantiated in a network model in which the role of attention is to reduce noise in the phonetic encoding of acoustic cues. Implications of this work for understanding speech perception and general theories of the role of attention in perception are discussed.

  14. Maternal depression and the learning-promoting effects of infant-directed speech: Roles of maternal sensitivity, depression diagnosis, and speech acoustic cues.

    PubMed

    Kaplan, Peter S; Danko, Christina M; Cejka, Anna M; Everhart, Kevin D

    2015-11-01

    The hypothesis that the associative learning-promoting effects of infant-directed speech (IDS) depend on infants' social experience was tested in a conditioned-attention paradigm with a cumulative sample of 4- to 14-month-old infants. Following six forward pairings of a brief IDS segment and a photographic slide of a smiling female face, infants of clinically depressed mothers exhibited evidence of having acquired significantly weaker voice-face associations than infants of non-depressed mothers. Regression analyses revealed that maternal depression was significantly related to infant learning even after demographic correlates of depression, antidepressant medication use, and extent of pitch modulation in maternal IDS had been taken into account. However, after maternal depression had been accounted for, maternal emotional availability, coded by blind raters from separate play interactions, accounted for significant further increments in the proportion of variance accounted for in infant learning scores. Both maternal depression and maternal insensitivity negatively, and additively, predicted poor learning.

  15. Silent katydid females are at higher risk of bat predation than acoustically signalling katydid males.

    PubMed

    Raghuram, Hanumanthan; Deb, Rittik; Nandi, Diptarup; Balakrishnan, Rohini

    2015-01-01

    Males that produce conspicuous mate attraction signals are often at high risk of predation from eavesdropping predators. Females of such species typically search for signalling males and their higher motility may also place them at risk. The relative predation risk faced by males and females in the context of mate-finding using long-distance signals has rarely been investigated. In this study, we show, using a combination of diet analysis and behavioural experiments, that katydid females, who do not produce acoustic signals, are at higher risk of predation from a major bat predator, Megaderma spasma, than calling males. Female katydids were represented in much higher numbers than males in the culled remains beneath roosts of M. spasma. Playback experiments using katydid calls revealed that male calls were approached in only about one-third of the trials overall, whereas tethered, flying katydids were always approached and attacked. Our results question the idea that necessary costs of mate-finding, including risk of predation, are higher in signalling males than in searching females.

  16. Silent katydid females are at higher risk of bat predation than acoustically signalling katydid males

    PubMed Central

    Raghuram, Hanumanthan; Deb, Rittik; Nandi, Diptarup; Balakrishnan, Rohini

    2015-01-01

    Males that produce conspicuous mate attraction signals are often at high risk of predation from eavesdropping predators. Females of such species typically search for signalling males and their higher motility may also place them at risk. The relative predation risk faced by males and females in the context of mate-finding using long-distance signals has rarely been investigated. In this study, we show, using a combination of diet analysis and behavioural experiments, that katydid females, who do not produce acoustic signals, are at higher risk of predation from a major bat predator, Megaderma spasma, than calling males. Female katydids were represented in much higher numbers than males in the culled remains beneath roosts of M. spasma. Playback experiments using katydid calls revealed that male calls were approached in only about one-third of the trials overall, whereas tethered, flying katydids were always approached and attacked. Our results question the idea that necessary costs of mate-finding, including risk of predation, are higher in signalling males than in searching females. PMID:25429019

  17. When males whistle at females: complex FM acoustic signals in cockroaches

    NASA Astrophysics Data System (ADS)

    Sueur, Jérôme; Aubin, Thierry

    2006-10-01

    Male cockroaches of the species Elliptorhina chopardi expel air through a pair of modified abdominal spiracles during courtship. This air expulsion simultaneously produces air and substrate-borne vibrations. We described and compared in details these two types of vibrations. Our analysis of the air-borne signals shows that males can produce three categories of signals with distinct temporal and frequency parameters. “Pure whistles” consist of two independent harmonic series fast frequency modulated with independent harmonics that can cross each other. “Noisy whistles” also possess two independent voices but include a noisy broad-band frequency part in the middle. Hiss sounds are more noise-like, being made of a broad-band frequency spectrum. All three call types are unusually high in dominant frequency (>5 kHz) for cockroaches. The substrate-borne signals are categorised similarly. Some harmonics of the substrate-borne signals were filtered out, however, making the acoustic energy centered on fewer frequency bands. Our analysis shows that cockroach signals are complex, with fast frequency modulations and two distinct voices. These results also readdress the question of what system could potentially receive and decode the information contained within such complex sounds.

  18. Silent katydid females are at higher risk of bat predation than acoustically signalling katydid males.

    PubMed

    Raghuram, Hanumanthan; Deb, Rittik; Nandi, Diptarup; Balakrishnan, Rohini

    2015-01-01

    Males that produce conspicuous mate attraction signals are often at high risk of predation from eavesdropping predators. Females of such species typically search for signalling males and their higher motility may also place them at risk. The relative predation risk faced by males and females in the context of mate-finding using long-distance signals has rarely been investigated. In this study, we show, using a combination of diet analysis and behavioural experiments, that katydid females, who do not produce acoustic signals, are at higher risk of predation from a major bat predator, Megaderma spasma, than calling males. Female katydids were represented in much higher numbers than males in the culled remains beneath roosts of M. spasma. Playback experiments using katydid calls revealed that male calls were approached in only about one-third of the trials overall, whereas tethered, flying katydids were always approached and attacked. Our results question the idea that necessary costs of mate-finding, including risk of predation, are higher in signalling males than in searching females. PMID:25429019

  19. Processing of simple and complex acoustic signals in a tonotopically organized ear

    PubMed Central

    Hummel, Jennifer; Wolf, Konstantin; Kössl, Manfred; Nowotny, Manuela

    2014-01-01

    Processing of complex signals in the hearing organ remains poorly understood. This paper aims to contribute to this topic by presenting investigations on the mechanical and neuronal response of the hearing organ of the tropical bushcricket species Mecopoda elongata to simple pure tone signals as well as to the conspecific song as a complex acoustic signal. The high-frequency hearing organ of bushcrickets, the crista acustica (CA), is tonotopically tuned to frequencies between about 4 and 70 kHz. Laser Doppler vibrometer measurements revealed a strong and dominant low-frequency-induced motion of the CA when stimulated with either pure tone or complex stimuli. Consequently, the high-frequency distal area of the CA is more strongly deflected by low-frequency-induced waves than by high-frequency-induced waves. This low-frequency dominance will have strong effects on the processing of complex signals. Therefore, we additionally studied the neuronal response of the CA to native and frequency-manipulated chirps. Again, we found a dominant influence of low-frequency components within the conspecific song, indicating that the mechanical vibration pattern highly determines the neuronal response of the sensory cells. Thus, we conclude that the encoding of communication signals is modulated by ear mechanics. PMID:25339727

  20. Processing of simple and complex acoustic signals in a tonotopically organized ear.

    PubMed

    Hummel, Jennifer; Wolf, Konstantin; Kössl, Manfred; Nowotny, Manuela

    2014-12-01

    Processing of complex signals in the hearing organ remains poorly understood. This paper aims to contribute to this topic by presenting investigations on the mechanical and neuronal response of the hearing organ of the tropical bushcricket species Mecopoda elongata to simple pure tone signals as well as to the conspecific song as a complex acoustic signal. The high-frequency hearing organ of bushcrickets, the crista acustica (CA), is tonotopically tuned to frequencies between about 4 and 70 kHz. Laser Doppler vibrometer measurements revealed a strong and dominant low-frequency-induced motion of the CA when stimulated with either pure tone or complex stimuli. Consequently, the high-frequency distal area of the CA is more strongly deflected by low-frequency-induced waves than by high-frequency-induced waves. This low-frequency dominance will have strong effects on the processing of complex signals. Therefore, we additionally studied the neuronal response of the CA to native and frequency-manipulated chirps. Again, we found a dominant influence of low-frequency components within the conspecific song, indicating that the mechanical vibration pattern highly determines the neuronal response of the sensory cells. Thus, we conclude that the encoding of communication signals is modulated by ear mechanics.

  1. Monitoring Rock Failure Processes Using the Hilbert-Huang Transform of Acoustic Emission Signals

    NASA Astrophysics Data System (ADS)

    Zhang, Ji; Peng, Weihong; Liu, Fengyu; Zhang, Haixiang; Li, Zhijian

    2016-02-01

    Rock fracturing generates acoustic emission (AE) signals that have statistical parameters referred to as AE signal parameters (AESP). Identification of rock fracturing or the failure process stage using such data raises several challenges. This study proposes a Hilbert-Huang transform-based AE processing approach to capture the time-frequency characteristics of both AE signals and AESP during rock failure processes. The damage occurring in tested rock specimens can be illustrated through analysis using this method. In this study, the specimens were 25 × 60 × 150 mm3 in size and were compressed at a displacement rate of 0.05 mm/min until failure. The recorded data included force and displacement, AE signals, and AESP. The AESP in the last third of the strain range period and 14 typical moments of strong AE signals were selected for further investigation. These results show that AE signals and AESP can be jointly used for identification of deformation stages. The transition between linear and nonlinear deformation stages was found to last for a short period in this process. The instantaneous frequency of the AE effective energy rate increased linearly from 0.5 to 1.5 Hz. Attenuation of elastic waves spreading in rock samples developed with deformation, as illustrated in the Hilbert spectra of AE signals. This attenuation is frequency dependent. Furthermore, AE signals in the softening process showed a complex frequency distribution attributed to the mechanical properties of the tested specimen. The results indicate that rock failure is predictable. The novel technology applied in this study is feasible for analysis of the entire deformation process, including softening and failure processes.

  2. Level variations in speech: Effect on masking release in hearing-impaired listeners.

    PubMed

    Reed, Charlotte M; Desloge, Joseph G; Braida, Louis D; Perez, Zachary D; Léger, Agnès C

    2016-07-01

    Acoustic speech is marked by time-varying changes in the amplitude envelope that may pose difficulties for hearing-impaired listeners. Removal of these variations (e.g., by the Hilbert transform) could improve speech reception for such listeners, particularly in fluctuating interference. Léger, Reed, Desloge, Swaminathan, and Braida [(2015b). J. Acoust. Soc. Am. 138, 389-403] observed that a normalized measure of masking release obtained for hearing-impaired listeners using speech processed to preserve temporal fine-structure (TFS) cues was larger than that for unprocessed or envelope-based speech. This study measured masking release for two other speech signals in which level variations were minimal: peak clipping and TFS processing of an envelope signal. Consonant identification was measured for hearing-impaired listeners in backgrounds of continuous and fluctuating speech-shaped noise. The normalized masking release obtained using speech with normal variations in overall level was substantially less than that observed using speech processed to achieve highly restricted level variations. These results suggest that the performance of hearing-impaired listeners in fluctuating noise may be improved by signal processing that leads to a decrease in stimulus level variations. PMID:27475136

  3. Selection of a voice for a speech signal for personalized warnings: the effect of speaker's gender and voice pitch.

    PubMed

    Machado, Sheron; Duarte, Emília; Teles, Júlia; Reis, Lara; Rebelo, Francisco

    2012-01-01

    There is an increasing interest in multimodal technology-based warnings, namely those conveying speech-warning statements. This type of warning may be tailored to the situation as well as to the target user's characteristics. However, more information is needed on how to design these warnings in a way that ensures intelligibility, promotes compliance and reduces the potential for annoyance. In this context, this paper reports an exploratory study whose main purpose was to assist the selection of a synthesized voice for a subsequent compliance study with personalized (i.e., using the person's name) technology-based warnings using Virtual Reality. Participants were requested to listen to speech signals, gathered from a speech synthesizer and post-processed in order to change the pitch perception, and then these were evaluated by fulfilling the MOS-X questionnaire. After that, the participants ranked the voices according to their preference. The effects of the speaker's gender and voice pitch, on both ratings and ranking were assessed. The preference of the male and female listeners for a talker's voice gender was also investigated. The results show that participants mostly prefer as first choice the high-pitched female voice, which also gathered the highest overall score in the MOS-X questionnaire. No significant influence of the participants' gender was found on the assessed measures.

  4. Room Acoustics

    NASA Astrophysics Data System (ADS)

    Kuttruff, Heinrich; Mommertz, Eckard

    The traditional task of room acoustics is to create or formulate conditions which ensure the best possible propagation of sound in a room from a sound source to a listener. Thus, objects of room acoustics are in particular assembly halls of all kinds, such as auditoria and lecture halls, conference rooms, theaters, concert halls or churches. Already at this point, it has to be pointed out that these conditions essentially depend on the question if speech or music should be transmitted; in the first case, the criterion for transmission quality is good speech intelligibility, in the other case, however, the success of room-acoustical efforts depends on other factors that cannot be quantified that easily, not least it also depends on the hearing habits of the listeners. In any case, absolutely "good acoustics" of a room do not exist.

  5. A methodology to condition distorted acoustic emission signals to identify fracture timing from human cadaver spine impact tests.

    PubMed

    Arun, Mike W J; Yoganandan, Narayan; Stemper, Brian D; Pintar, Frank A

    2014-12-01

    While studies have used acoustic sensors to determine fracture initiation time in biomechanical studies, a systematic procedure is not established to process acoustic signals. The objective of the study was to develop a methodology to condition distorted acoustic emission data using signal processing techniques to identify fracture initiation time. The methodology was developed from testing a human cadaver lumbar spine column. Acoustic sensors were glued to all vertebrae, high-rate impact loading was applied, load-time histories were recorded (load cell), and fracture was documented using CT. Compression fracture occurred to L1 while other vertebrae were intact. FFT of raw voltage-time traces were used to determine an optimum frequency range associated with high decibel levels. Signals were bandpass filtered in this range. Bursting pattern was found in the fractured vertebra while signals from other vertebrae were silent. Bursting time was associated with time of fracture initiation. Force at fracture was determined using this time and force-time data. The methodology is independent of selecting parameters a priori such as fixing a voltage level(s), bandpass frequency and/or using force-time signal, and allows determination of force based on time identified during signal processing. The methodology can be used for different body regions in cadaver experiments.

  6. Multi-bearing defect detection with trackside acoustic signal based on a pseudo time-frequency analysis and Dopplerlet filter

    NASA Astrophysics Data System (ADS)

    Zhang, Haibin; Lu, Siliang; He, Qingbo; Kong, Fanrang

    2016-03-01

    The diagnosis of train bearing defects based on the acoustic signal acquired by a trackside microphone plays a significant role in the transport system. However, the wayside acoustic signal suffers from the Doppler distortion due to the high moving speed and also contains the multi-source signals from different train bearings. This paper proposes a novel solution to overcome the two difficulties in trackside acoustic diagnosis. In the method a pseudo time-frequency analysis (PTFA) based on an improved Dopplerlet transform (IDT) is presented to acquire the time centers for different bearings. With the time centers, we design a series of Dopplerlet filters (DF) in time-frequency domain to work on the signal's time-frequency distribution (TFD) gained by the short time Fourier transform (STFT). Then an inverse STFT (ISTFT) is utilized to get the separated signals for each sound source which means bearing here. Later the resampling method based on certain motion parameters eliminates the Doppler Effect and finally the diagnosis can be made effectively according to the envelope spectrum of each separated signal. With the effectiveness of the technique validated by both simulated and experimental cases, the proposed wayside acoustic diagnostic scheme is expected to be available in wayside defective bearing detection.

  7. Temporal lobe regions engaged during normal speech comprehension.

    PubMed

    Crinion, Jennifer T; Lambon-Ralph, Matthew A; Warburton, Elizabeth A; Howard, David; Wise, Richard J S

    2003-05-01

    Processing of speech is obligatory. Thus, during normal speech comprehension, the listener is aware of the overall meaning of the speaker's utterance without the need to direct attention to individual linguistic and paralinguistic (intonational, prosodic, etc.) features contained within the speech signal. However, most functional neuroimaging studies of speech perception have used metalinguistic tasks that required the subjects to attend to specific features of the stimuli. Such tasks have demanded a forced-choice decision and a motor response from the subjects, which will engage frontal systems and may include unpredictable top-down modulation of the signals observed in one or more of the temporal lobe neural systems engaged during speech perception. This study contrasted the implicit comprehension of simple narrative speech with listening to reversed versions of the narratives: the latter are as acoustically complex as speech but are unintelligible in terms of both linguistic and paralinguistic information. The result demonstrated that normal comprehension, free of task demands that do not form part of everyday discourse, engages regions distributed between the two temporal lobes, more widely on the left. In particular, comprehension is dependent on anterolateral and ventral left temporal regions, as suggested by observations on patients with semantic dementia, as well as posterior regions described in studies on aphasic stroke patients. The only frontal contribution was confined to the ventrolateral left prefrontal cortex, compatible with observations that comprehension of simple speech is preserved in patients with left posterior frontal infarction. PMID:12690058

  8. Laminar cortical dynamics of conscious speech perception: neural model of phonemic restoration using subsequent context in noise.

    PubMed

    Grossberg, Stephen; Kazerounian, Sohrob

    2011-07-01

    How are laminar circuits of neocortex organized to generate conscious speech and language percepts? How does the brain restore information that is occluded by noise, or absent from an acoustic signal, by integrating contextual information over many milliseconds to disambiguate noise-occluded acoustical signals? How are speech and language heard in the correct temporal order, despite the influence of contexts that may occur many milliseconds before or after each perceived word? A neural model describes key mechanisms in forming conscious speech percepts, and quantitatively simulates a critical example of contextual disambiguation of speech and language; namely, phonemic restoration. Here, a phoneme deleted from a speech stream is perceptually restored when it is replaced by broadband noise, even when the disambiguating context occurs after the phoneme was presented. The model describes how the laminar circuits within a hierarchy of cortical processing stages may interact to generate a conscious speech percept that is embodied by a resonant wave of activation that occurs between acoustic features, acoustic item chunks, and list chunks. Chunk-mediated gating allows speech to be heard in the correct temporal order, even when what is heard depends upon future context.

  9. Neuronal Spoken Word Recognition: The Time Course of Processing Variation in the Speech Signal

    ERIC Educational Resources Information Center

    Schild, Ulrike; Roder, Brigitte; Friedrich, Claudia K.

    2012-01-01

    Recent neurobiological studies revealed evidence for lexical representations that are not specified for the coronal place of articulation (PLACE; Friedrich, Eulitz, & Lahiri, 2006; Friedrich, Lahiri, & Eulitz, 2008). Here we tested when these types of underspecified representations influence neuronal speech recognition. In a unimodal…

  10. Is Birdsong More Like Speech or Music?

    PubMed

    Shannon, Robert V

    2016-04-01

    Music and speech share many acoustic cues but not all are equally important. For example, harmonic pitch is essential for music but not for speech. When birds communicate is their song more like speech or music? A new study contrasting pitch and spectral patterns shows that birds perceive their song more like humans perceive speech. PMID:26944220

  11. ON THE NATURE OF SPEECH SCIENCE.

    ERIC Educational Resources Information Center

    PETERSON, GORDON E.

    IN THIS ARTICLE THE NATURE OF THE DISCIPLINE OF SPEECH SCIENCE IS CONSIDERED AND THE VARIOUS BASIC AND APPLIED AREAS OF THE DISCIPLINE ARE DISCUSSED. THE BASIC AREAS ENCOMPASS THE VARIOUS PROCESSES OF THE PHYSIOLOGY OF SPEECH PRODUCTION, THE ACOUSTICAL CHARACTERISTICS OF SPEECH, INCLUDING THE SPEECH WAVE TYPES AND THE INFORMATION-BEARING ACOUSTIC…

  12. The broadband social acoustic signaling behavior of spinner and spotted dolphins.

    PubMed

    Lammers, Marc O; Au, Whitlow W L; Herzing, Denise L

    2003-09-01

    Efforts to study the social acoustic signaling behavior of delphinids have traditionally been restricted to audio-range (<20 kHz) analyses. To explore the occurrence of communication signals at ultrasonic frequencies, broadband recordings of whistles and burst pulses were obtained from two commonly studied species of delphinids, the Hawaiian spinner dolphin (Stenella longirostris) and the Atlantic spotted dolphin (Stenella frontalis). Signals were quantitatively analyzed to establish their full bandwidth, to identify distinguishing characteristics between each species, and to determine how often they occur beyond the range of human hearing. Fundamental whistle contours were found to extend beyond 20 kHz only rarely among spotted dolphins, but with some regularity in spinner dolphins. Harmonics were present in the majority of whistles and varied considerably in their number, occurrence, and amplitude. Many whistles had harmonics that extended past 50 kHz and some reached as high as 100 kHz. The relative amplitude of harmonics and the high hearing sensitivity of dolphins to equivalent frequencies suggest that harmonics are biologically relevant spectral features. The burst pulses of both species were found to be predominantly ultrasonic, often with little or no energy below 20 kHz. The findings presented reveal that the social signals produced by spinner and spotted dolphins span the full range of their hearing sensitivity, are spectrally quite varied, and in the case of burst pulses are probably produced more frequently than reported by audio-range analyses. PMID:14514216

  13. The broadband social acoustic signaling behavior of spinner and spotted dolphins

    NASA Astrophysics Data System (ADS)

    Lammers, Marc O.; Au, Whitlow W. L.; Herzing, Denise L.

    2003-09-01

    Efforts to study the social acoustic signaling behavior of delphinids have traditionally been restricted to audio-range (<20 kHz) analyses. To explore the occurrence of communication signals at ultrasonic frequencies, broadband recordings of whistles and burst pulses were obtained from two commonly studied species of delphinids, the Hawaiian spinner dolphin (Stenella longirostris) and the Atlantic spotted dolphin (Stenella frontalis). Signals were quantitatively analyzed to establish their full bandwidth, to identify distinguishing characteristics between each species, and to determine how often they occur beyond the range of human hearing. Fundamental whistle contours were found to extend beyond 20 kHz only rarely among spotted dolphins, but with some regularity in spinner dolphins. Harmonics were present in the majority of whistles and varied considerably in their number, occurrence, and amplitude. Many whistles had harmonics that extended past 50 kHz and some reached as high as 100 kHz. The relative amplitude of harmonics and the high hearing sensitivity of dolphins to equivalent frequencies suggest that harmonics are biologically relevant spectral features. The burst pulses of both species were found to be predominantly ultrasonic, often with little or no energy below 20 kHz. The findings presented reveal that the social signals produced by spinner and spotted dolphins span the full range of their hearing sensitivity, are spectrally quite varied, and in the case of burst pulses are probably produced more frequently than reported by audio-range analyses.

  14. Deciphering acoustic emission signals in drought stressed branches: the missing link between source and sensor

    PubMed Central

    Vergeynst, Lidewei L.; Sause, Markus G. R.; Hamstad, Marvin A.; Steppe, Kathy

    2015-01-01

    When drought occurs in plants, acoustic emission (AE) signals can be detected, but the actual causes of these signals are still unknown. By analyzing the waveforms of the measured signals, it should, however, be possible to trace the characteristics of the AE source and get information about the underlying physiological processes. A problem encountered during this analysis is that the waveform changes significantly from source to sensor and lack of knowledge on wave propagation impedes research progress made in this field. We used finite element modeling and the well-known pencil lead break source to investigate wave propagation in a branch. A cylindrical rod of polyvinyl chloride was first used to identify the theoretical propagation modes. Two wave propagation modes could be distinguished and we used the finite element model to interpret their behavior in terms of source position for both the PVC rod and a wooden rod. Both wave propagation modes were also identified in drying-induced signals from woody branches, and we used the obtained insights to provide recommendations for further AE research in plant science. PMID:26191070

  15. Divergence of Acoustic Signals in a Widely Distributed Frog: Relevance of Inter-Male Interactions

    PubMed Central

    Velásquez, Nelson A.; Opazo, Daniel; Díaz, Javier; Penna, Mario

    2014-01-01

    Divergence of acoustic signals in a geographic scale results from diverse evolutionary forces acting in parallel and affecting directly inter-male vocal interactions among disjunct populations. Pleurodema thaul is a frog having an extensive latitudinal distribution in Chile along which males' advertisement calls exhibit an important variation. Using the playback paradigm we studied the evoked vocal responses of males of three populations of P. thaul in Chile, from northern, central and southern distribution. In each population, males were stimulated with standard synthetic calls having the acoustic structure of local and foreign populations. Males of both northern and central populations displayed strong vocal responses when were confronted with the synthetic call of their own populations, giving weaker responses to the call of the southern population. The southern population gave stronger responses to calls of the northern population than to the local call. Furthermore, males in all populations were stimulated with synthetic calls for which the dominant frequency, pulse rate and modulation depth were varied parametrically. Individuals from the northern and central populations gave lower responses to a synthetic call devoid of amplitude modulation relative to stimuli containing modulation depths between 30–100%, whereas the southern population responded similarly to all stimuli in this series. Geographic variation in the evoked vocal responses of males of P. thaul underlines the importance of inter-male interactions in driving the divergence of the acoustic traits and contributes evidence for a role of intra-sexual selection in the evolution of the sound communication system of this anuran. PMID:24489957

  16. Hierarchical Organization of Auditory and Motor Representations in Speech Perception: Evidence from Searchlight Similarity Analysis.

    PubMed

    Evans, Samuel; Davis, Matthew H

    2015-12-01

    How humans extract the identity of speech sounds from highly variable acoustic signals remains unclear. Here, we use searchlight representational similarity analysis (RSA) to localize and characterize neural representations of syllables at different levels of the hierarchically organized temporo-frontal pathways for speech perception. We asked participants to listen to spoken syllables that differed considerably in their surface acoustic form by changing speaker and degrading surface acoustics using noise-vocoding and sine wave synthesis while we recorded neural responses with functional magnetic resonance imaging. We found evidence for a graded hierarchy of abstraction across the brain. At the peak of the hierarchy, neural representations in somatomotor cortex encoded syllable identity but not surface acoustic form, at the base of the hierarchy, primary auditory cortex showed the reverse. In contrast, bilateral temporal cortex exhibited an intermediate response, encoding both syllable identity and the surface acoustic form of speech. Regions of somatomotor cortex associated with encoding syllable identity in perception were also engaged when producing the same syllables in a separate session. These findings are consistent with a hierarchical account of how variable acoustic signals are transformed into abstract representations of the identity of speech sounds. PMID:26157026

  17. Hierarchical Organization of Auditory and Motor Representations in Speech Perception: Evidence from Searchlight Similarity Analysis

    PubMed Central

    Evans, Samuel; Davis, Matthew H.

    2015-01-01

    How humans extract the identity of speech sounds from highly variable acoustic signals remains unclear. Here, we use searchlight representational similarity analysis (RSA) to localize and characterize neural representations of syllables at different levels of the hierarchically organized temporo-frontal pathways for speech perception. We asked participants to listen to spoken syllables that differed considerably in their surface acoustic form by changing speaker and degrading surface acoustics using noise-vocoding and sine wave synthesis while we recorded neural responses with functional magnetic resonance imaging. We found evidence for a graded hierarchy of abstraction across the brain. At the peak of the hierarchy, neural representations in somatomotor cortex encoded syllable identity but not surface acoustic form, at the base of the hierarchy, primary auditory cortex showed the reverse. In contrast, bilateral temporal cortex exhibited an intermediate response, encoding both syllable identity and the surface acoustic form of speech. Regions of somatomotor cortex associated with encoding syllable identity in perception were also engaged when producing the same syllables in a separate session. These findings are consistent with a hierarchical account of how variable acoustic signals are transformed into abstract representations of the identity of speech sounds. PMID:26157026

  18. The Natural Statistics of Audiovisual Speech

    PubMed Central

    Chandrasekaran, Chandramouli; Trubanova, Andrea; Stillittano, Sébastien; Caplier, Alice; Ghazanfar, Asif A.

    2009-01-01

    Humans, like other animals, are exposed to a continuous stream of signals, which are dynamic, multimodal, extended, and time varying in nature. This complex input space must be transduced and sampled by our sensory systems and transmitted to the brain where it can guide the selection of appropriate actions. To simplify this process, it's been suggested that the brain exploits statistical regularities in the stimulus space. Tests of this idea have largely been confined to unimodal signals and natural scenes. One important class of multisensory signals for which a quantitative input space characterization is unavailable is human speech. We do not understand what signals our brain has to actively piece together from an audiovisual speech stream to arrive at a percept versus what is already embedded in the signal structure of the stream itself. In essence, we do not have a clear understanding of the natural statistics of audiovisual speech. In the present study, we identified the following major statistical features of audiovisual speech. First, we observed robust correlations and close temporal correspondence between the area of the mouth opening and the acoustic envelope. Second, we found the strongest correlation between the area of the mouth opening and vocal tract resonances. Third, we observed that both area of the mouth opening and the voice envelope are temporally modulated in the 2–7 Hz frequency range. Finally, we show that the timing of mouth movements relative to the onset of th