Science.gov

Sample records for acoustic speech signal

  1. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2004-03-23

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  2. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2006-02-14

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  3. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2006-08-08

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  4. System And Method For Characterizing Voiced Excitations Of Speech And Acoustic Signals, Removing Acoustic Noise From Speech, And Synthesizi

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2006-04-25

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  5. Denoising of human speech using combined acoustic and em sensor signal processing

    SciTech Connect

    Ng, L C; Burnett, G C; Holzrichter, J F; Gable, T J

    1999-11-29

    Low Power EM radar-like sensors have made it possible to measure properties of the human speech production system in real-time, without acoustic interference. This greatly enhances the quality and quantify of information for many speech related applications. See Holzrichter, Burnett, Ng, and Lea, J. Acoustic. Soc. Am. 103 (1) 622 (1998). By using combined Glottal-EM- Sensor- and Acoustic-signals, segments of voiced, unvoiced, and no-speech can be reliably defined. Real-time Denoising filters can be constructed to remove noise from the user's corresponding speech signal.

  6. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2002-01-01

    Low power EM waves are used to detect motions of vocal tract tissues of the human speech system before, during, and after voiced speech. A voiced excitation function is derived. The excitation function provides speech production information to enhance speech characterization and to enable noise removal from human speech.

  7. Estimation of glottal source features from the spectral envelope of the acoustic speech signal

    NASA Astrophysics Data System (ADS)

    Torres, Juan Felix

    Speech communication encompasses diverse types of information, including phonetics, affective state, voice quality, and speaker identity. From a speech production standpoint, the acoustic speech signal can be mainly divided into glottal source and vocal tract components, which play distinct roles in rendering the various types of information it contains. Most deployed speech analysis systems, however, do not explicitly represent these two components as distinct entities, as their joint estimation from the acoustic speech signal becomes an ill-defined blind deconvolution problem. Nevertheless, because of the desire to understand glottal behavior and how it relates to perceived voice quality, there has been continued interest in explicitly estimating the glottal component of the speech signal. To this end, several inverse filtering (IF) algorithms have been proposed, but they are unreliable in practice because of the blind formulation of the separation problem. In an effort to develop a method that can bypass the challenging IF process, this thesis proposes a new glottal source information extraction method that relies on supervised machine learning to transform smoothed spectral representations of speech, which are already used in some of the most widely deployed and successful speech analysis applications, into a set of glottal source features. A transformation method based on Gaussian mixture regression (GMR) is presented and compared to current IF methods in terms of feature similarity, reliability, and speaker discrimination capability on a large speech corpus, and potential representations of the spectral envelope of speech are investigated for their ability represent glottal source variation in a predictable manner. The proposed system was found to produce glottal source features that reasonably matched their IF counterparts in many cases, while being less susceptible to spurious errors. The development of the proposed method entailed a study into the aspects

  8. Continuous perception and graded categorization: Electrophysiological evidence for a linear relationship between the acoustic signal and perceptual encoding of speech

    PubMed Central

    Toscano, Joseph C.; McMurray, Bob; Dennhardt, Joel; Luck, Steven. J.

    2012-01-01

    Speech sounds are highly variable, yet listeners readily extract information from them and transform continuous acoustic signals into meaningful categories during language comprehension. A central question is whether perceptual encoding captures continuous acoustic detail in a one-to-one fashion or whether it is affected by categories. We addressed this in an event-related potential (ERP) experiment in which listeners categorized spoken words that varied along a continuous acoustic dimension (voice onset time; VOT) in an auditory oddball task. We found that VOT effects were present through a late stage of perceptual processing (N1 component, ca. 100 ms poststimulus) and were independent of categories. In addition, effects of within-category differences in VOT were present at a post-perceptual categorization stage (P3 component, ca. 450 ms poststimulus). Thus, at perceptual levels, acoustic information is encoded continuously, independent of phonological information. Further, at phonological levels, fine-grained acoustic differences are preserved along with category information. PMID:20935168

  9. Classifying acoustic signals into phoneme categories: average and dyslexic readers make use of complex dynamical patterns and multifractal scaling properties of the speech signal

    PubMed Central

    2015-01-01

    Several competing aetiologies of developmental dyslexia suggest that the problems with acquiring literacy skills are causally entailed by low-level auditory and/or speech perception processes. The purpose of this study is to evaluate the diverging claims about the specific deficient peceptual processes under conditions of strong inference. Theoretically relevant acoustic features were extracted from a set of artificial speech stimuli that lie on a /bAk/-/dAk/ continuum. The features were tested on their ability to enable a simple classifier (Quadratic Discriminant Analysis) to reproduce the observed classification performance of average and dyslexic readers in a speech perception experiment. The ‘classical’ features examined were based on component process accounts of developmental dyslexia such as the supposed deficit in Envelope Rise Time detection and the deficit in the detection of rapid changes in the distribution of energy in the frequency spectrum (formant transitions). Studies examining these temporal processing deficit hypotheses do not employ measures that quantify the temporal dynamics of stimuli. It is shown that measures based on quantification of the dynamics of complex, interaction-dominant systems (Recurrence Quantification Analysis and the multifractal spectrum) enable QDA to classify the stimuli almost identically as observed in dyslexic and average reading participants. It seems unlikely that participants used any of the features that are traditionally associated with accounts of (impaired) speech perception. The nature of the variables quantifying the temporal dynamics of the speech stimuli imply that the classification of speech stimuli cannot be regarded as a linear aggregate of component processes that each parse the acoustic signal independent of one another, as is assumed by the ‘classical’ aetiologies of developmental dyslexia. It is suggested that the results imply that the differences in speech perception performance between

  10. Classifying acoustic signals into phoneme categories: average and dyslexic readers make use of complex dynamical patterns and multifractal scaling properties of the speech signal.

    PubMed

    Hasselman, Fred

    2015-01-01

    Several competing aetiologies of developmental dyslexia suggest that the problems with acquiring literacy skills are causally entailed by low-level auditory and/or speech perception processes. The purpose of this study is to evaluate the diverging claims about the specific deficient peceptual processes under conditions of strong inference. Theoretically relevant acoustic features were extracted from a set of artificial speech stimuli that lie on a /bAk/-/dAk/ continuum. The features were tested on their ability to enable a simple classifier (Quadratic Discriminant Analysis) to reproduce the observed classification performance of average and dyslexic readers in a speech perception experiment. The 'classical' features examined were based on component process accounts of developmental dyslexia such as the supposed deficit in Envelope Rise Time detection and the deficit in the detection of rapid changes in the distribution of energy in the frequency spectrum (formant transitions). Studies examining these temporal processing deficit hypotheses do not employ measures that quantify the temporal dynamics of stimuli. It is shown that measures based on quantification of the dynamics of complex, interaction-dominant systems (Recurrence Quantification Analysis and the multifractal spectrum) enable QDA to classify the stimuli almost identically as observed in dyslexic and average reading participants. It seems unlikely that participants used any of the features that are traditionally associated with accounts of (impaired) speech perception. The nature of the variables quantifying the temporal dynamics of the speech stimuli imply that the classification of speech stimuli cannot be regarded as a linear aggregate of component processes that each parse the acoustic signal independent of one another, as is assumed by the 'classical' aetiologies of developmental dyslexia. It is suggested that the results imply that the differences in speech perception performance between average and

  11. Perceptual centres in speech - an acoustic analysis

    NASA Astrophysics Data System (ADS)

    Scott, Sophie Kerttu

    Perceptual centres, or P-centres, represent the perceptual moments of occurrence of acoustic signals - the 'beat' of a sound. P-centres underlie the perception and production of rhythm in perceptually regular speech sequences. P-centres have been modelled both in speech and non speech (music) domains. The three aims of this thesis were toatest out current P-centre models to determine which best accounted for the experimental data bto identify a candidate parameter to map P-centres onto (a local approach) as opposed to the previous global models which rely upon the whole signal to determine the P-centre the final aim was to develop a model of P-centre location which could be applied to speech and non speech signals. The first aim was investigated by a series of experiments in which a) speech from different speakers was investigated to determine whether different models could account for variation between speakers b) whether rendering the amplitude time plot of a speech signal affects the P-centre of the signal c) whether increasing the amplitude at the offset of a speech signal alters P-centres in the production and perception of speech. The second aim was carried out by a) manipulating the rise time of different speech signals to determine whether the P-centre was affected, and whether the type of speech sound ramped affected the P-centre shift b) manipulating the rise time and decay time of a synthetic vowel to determine whether the onset alteration was had more affect on P-centre than the offset manipulation c) and whether the duration of a vowel affected the P-centre, if other attributes (amplitude, spectral contents) were held constant. The third aim - modelling P-centres - was based on these results. The Frequency dependent Amplitude Increase Model of P-centre location (FAIM) was developed using a modelling protocol, the APU GammaTone Filterbank and the speech from different speakers. The P-centres of the stimuli corpus were highly predicted by attributes of

  12. Speech recognition: Acoustic, phonetic and lexical

    NASA Astrophysics Data System (ADS)

    Zue, V. W.

    1985-10-01

    Our long-term research goal is the development and implementation of speaker-independent continuous speech recognition systems. It is our conviction that proper utilization of speech-specific knowledge is essential for advanced speech recognition systems. With this in mind, we have continued to make progress on the acquisition of acoustic-phonetic and lexical knowledge. We have completed the development of a continuous digit recognition system. The system was constructed to investigate the utilization of acoustic phonetic knowledge in a speech recognition system. Some of the significant development of this study includes a soft-failure procedure for lexical access, and the discovery of a set of acoustic-phonetic features for verification. We have completed a study of the constraints provided by lexical stress on word recognition. We found that lexical stress information alone can, on the average, reduce the number of word candidates from a large dictionary by more than 80%. In conjunction with this study, we successfully developed a system that automatically determines the stress pattern of a word from the acoustic signal.

  13. Speech recognition: Acoustic, phonetic and lexical knowledge

    NASA Astrophysics Data System (ADS)

    Zue, V. W.

    1985-08-01

    During this reporting period we continued to make progress on the acquisition of acoustic-phonetic and lexical knowledge. We completed development of a continuous digit recognition system. The system was constructed to investigate the use of acoustic-phonetic knowledge in a speech recognition system. The significant achievements of this study include the development of a soft-failure procedure for lexical access and the discovery of a set of acoustic-phonetic features for verification. We completed a study of the constraints that lexical stress imposes on word recognition. We found that lexical stress information alone can, on the average, reduce the number of word candidates from a large dictionary by more than 80 percent. In conjunction with this study, we successfully developed a system that automatically determines the stress pattern of a word from the acoustic signal. We performed an acoustic study on the characteristics of nasal consonants and nasalized vowels. We have also developed recognition algorithms for nasal murmurs and nasalized vowels in continuous speech. We finished the preliminary development of a system that aligns a speech waveform with the corresponding phonetic transcription.

  14. Acoustic Markers of Prosodic Boundaries in Spanish Spontaneous Alaryngeal Speech

    ERIC Educational Resources Information Center

    Cuenca, M. H.; Barrio, M. M.

    2010-01-01

    Prosodic information aids segmentation of the continuous speech signal and thereby facilitates auditory speech processing. Durational and pitch variations are prosodic cues especially necessary to convey prosodic boundaries, but alaryngeal speakers have inconsistent control over acoustic parameters such as F0 and duration, being as a result noisy…

  15. Acoustics of Clear Speech: Effect of Instruction

    ERIC Educational Resources Information Center

    Lam, Jennifer; Tjaden, Kris; Wilding, Greg

    2012-01-01

    Purpose: This study investigated how different instructions for eliciting clear speech affected selected acoustic measures of speech. Method: Twelve speakers were audio-recorded reading 18 different sentences from the Assessment of Intelligibility of Dysarthric Speech (Yorkston & Beukelman, 1984). Sentences were produced in habitual, clear,…

  16. Study of acoustic correlates associate with emotional speech

    NASA Astrophysics Data System (ADS)

    Yildirim, Serdar; Lee, Sungbok; Lee, Chul Min; Bulut, Murtaza; Busso, Carlos; Kazemzadeh, Ebrahim; Narayanan, Shrikanth

    2004-10-01

    This study investigates the acoustic characteristics of four different emotions expressed in speech. The aim is to obtain detailed acoustic knowledge on how a speech signal is modulated by changes from neutral to a certain emotional state. Such knowledge is necessary for automatic emotion recognition and classification and emotional speech synthesis. Speech data obtained from two semi-professional actresses are analyzed and compared. Each subject produces 211 sentences with four different emotions; neutral, sad, angry, happy. We analyze changes in temporal and acoustic parameters such as magnitude and variability of segmental duration, fundamental frequency and the first three formant frequencies as a function of emotion. Acoustic differences among the emotions are also explored with mutual information computation, multidimensional scaling and acoustic likelihood comparison with normal speech. Results indicate that speech associated with anger and happiness is characterized by longer duration, shorter interword silence, higher pitch and rms energy with wider ranges. Sadness is distinguished from other emotions by lower rms energy and longer interword silence. Interestingly, the difference in formant pattern between [happiness/anger] and [neutral/sadness] are better reflected in back vowels such as /a/(/father/) than in front vowels. Detailed results on intra- and interspeaker variability will be reported.

  17. Speech Intelligibility Advantages using an Acoustic Beamformer Display

    NASA Technical Reports Server (NTRS)

    Begault, Durand R.; Sunder, Kaushik; Godfroy, Martine; Otto, Peter

    2015-01-01

    A speech intelligibility test conforming to the Modified Rhyme Test of ANSI S3.2 "Method for Measuring the Intelligibility of Speech Over Communication Systems" was conducted using a prototype 12-channel acoustic beamformer system. The target speech material (signal) was identified against speech babble (noise), with calculated signal-noise ratios of 0, 5 and 10 dB. The signal was delivered at a fixed beam orientation of 135 deg (re 90 deg as the frontal direction of the array) and the noise at 135 deg (co-located) and 0 deg (separated). A significant improvement in intelligibility from 57% to 73% was found for spatial separation for the same signal-noise ratio (0 dB). Significant effects for improved intelligibility due to spatial separation were also found for higher signal-noise ratios (5 and 10 dB).

  18. Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

    DOEpatents

    Holzrichter, J.F.; Ng, L.C.

    1998-03-17

    The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching. 35 figs.

  19. Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

    DOEpatents

    Holzrichter, John F.; Ng, Lawrence C.

    1998-01-01

    The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching.

  20. Evaluating a topographical mapping from speech acoustics to tongue positions

    SciTech Connect

    Hogden, J.; Heard, M.

    1995-05-01

    The {ital continuity} {ital mapping} algorithm---a procedure for learning to recover the relative positions of the articulators from speech signals---is evaluated using human speech data. The advantage of continuity mapping is that it is an unsupervised algorithm; that is, it can potentially be trained to make a mapping from speech acoustics to speech articulation without articulator measurements. The procedure starts by vector quantizing short windows of a speech signal so that each window is represented (encoded) by a single number. Next, multidimensional scaling is used to map quantization codes that were temporally close in the encoded speech to nearby points in a {ital continuity} {ital map}. Since speech sounds produced sufficiently close together in time must have been produced by similar articulator configurations, and speech sounds produced close together in time are close to each other in the continuity map, sounds produced by similar articulator positions should be mapped to similar positions in the continuity map. The data set used for evaluating the continuity mapping algorithm is comprised of simultaneously collected articulator and acoustic measurements made using an electromagnetic midsagittal articulometer on a human subject. Comparisons between measured articulator positions and those recovered using continuity mapping will be presented.

  1. Acoustic modeling of the speech organ

    NASA Astrophysics Data System (ADS)

    Kacprowski, J.

    The state of research on acoustic modeling of phonational and articulatory speech producing elements is reviewed. Consistent with the physical interpretation of the speech production process, the acoustic theory of speech production is expressed as the product of three factors: laryngeal involvement, sound transmission, and emanations from the mouth and/or nose. Each of these factors is presented in the form of a simplified mathematical description which provides the theoretical basis for the formation of physical models of the appropriate functional members of this complex bicybernetic system. Vocal tract wall impedance, vocal tract synthesizers, laryngeal dysfunction, vowel nasalization, resonance circuits, and sound wave propagation are discussed.

  2. Investigation of the optimum acoustical conditions for speech using auralization

    NASA Astrophysics Data System (ADS)

    Yang, Wonyoung; Hodgson, Murray

    2001-05-01

    Speech intelligibility is mainly affected by reverberation and by signal-to-noise level difference, the difference between the speech-signal and background-noise levels at a receiver. An important question for the design of rooms for speech (e.g., classrooms) is, what are the optimal values of these factors? This question has been studied experimentally and theoretically. Experimental studies found zero optimal reverberation time, but theoretical predictions found nonzero reverberation times. These contradictory results are partly caused by the different ways of accounting for background noise. Background noise sources and their locations inside the room are the most detrimental factors in speech intelligibility. However, noise levels also interact with reverberation in rooms. In this project, two major room-acoustical factors for speech intelligibility were controlled using speech and noise sources of known relative output levels located in a virtual room with known reverberation. Speech intelligibility test signals were played in the virtual room and auralized for listeners. The Modified Rhyme Test (MRT) and babble noise were used to measure subjective speech intelligibility quality. Optimal reverberation times, and the optimal values of other speech intelligibility metrics, for normal-hearing people and for hard-of-hearing people, were identified and compared.

  3. Benefits to Speech Perception in Noise From the Binaural Integration of Electric and Acoustic Signals in Simulated Unilateral Deafness

    PubMed Central

    Ma, Ning; Morris, Saffron; Kitterick, Pádraig Thomas

    2016-01-01

    Objectives: This study used vocoder simulations with normal-hearing (NH) listeners to (1) measure their ability to integrate speech information from an NH ear and a simulated cochlear implant (CI), and (2) investigate whether binaural integration is disrupted by a mismatch in the delivery of spectral information between the ears arising from a misalignment in the mapping of frequency to place. Design: Eight NH volunteers participated in the study and listened to sentences embedded in background noise via headphones. Stimuli presented to the left ear were unprocessed. Stimuli presented to the right ear (referred to as the CI-simulation ear) were processed using an eight-channel noise vocoder with one of the three processing strategies. An Ideal strategy simulated a frequency-to-place map across all channels that matched the delivery of spectral information between the ears. A Realistic strategy created a misalignment in the mapping of frequency to place in the CI-simulation ear where the size of the mismatch between the ears varied across channels. Finally, a Shifted strategy imposed a similar degree of misalignment in all channels, resulting in consistent mismatch between the ears across frequency. The ability to report key words in sentences was assessed under monaural and binaural listening conditions and at signal to noise ratios (SNRs) established by estimating speech-reception thresholds in each ear alone. The SNRs ensured that the monaural performance of the left ear never exceeded that of the CI-simulation ear. The advantages of binaural integration were calculated by comparing binaural performance with monaural performance using the CI-simulation ear alone. Thus, these advantages reflected the additional use of the experimentally constrained left ear and were not attributable to better-ear listening. Results: Binaural performance was as accurate as, or more accurate than, monaural performance with the CI-simulation ear alone. When both ears supported a

  4. Does Signal Degradation Affect Top-Down Processing of Speech?

    PubMed

    Wagner, Anita; Pals, Carina; de Blecourt, Charlotte M; Sarampalis, Anastasios; Başkent, Deniz

    2016-01-01

    Speech perception is formed based on both the acoustic signal and listeners' knowledge of the world and semantic context. Access to semantic information can facilitate interpretation of degraded speech, such as speech in background noise or the speech signal transmitted via cochlear implants (CIs). This paper focuses on the latter, and investigates the time course of understanding words, and how sentential context reduces listeners' dependency on the acoustic signal for natural and degraded speech via an acoustic CI simulation.In an eye-tracking experiment we combined recordings of listeners' gaze fixations with pupillometry, to capture effects of semantic information on both the time course and effort of speech processing. Normal-hearing listeners were presented with sentences with or without a semantically constraining verb (e.g., crawl) preceding the target (baby), and their ocular responses were recorded to four pictures, including the target, a phonological (bay) competitor and a semantic (worm) and an unrelated distractor.The results show that in natural speech, listeners' gazes reflect their uptake of acoustic information, and integration of preceding semantic context. Degradation of the signal leads to a later disambiguation of phonologically similar words, and to a delay in integration of semantic information. Complementary to this, the pupil dilation data show that early semantic integration reduces the effort in disambiguating phonologically similar words. Processing degraded speech comes with increased effort due to the impoverished nature of the signal. Delayed integration of semantic information further constrains listeners' ability to compensate for inaudible signals. PMID:27080670

  5. Acoustic characteristics of listener-constrained speech

    NASA Astrophysics Data System (ADS)

    Ashby, Simone; Cummins, Fred

    2003-04-01

    Relatively little is known about the acoustical modifications speakers employ to meet the various constraints-auditory, linguistic and otherwise-of their listeners. Similarly, the manner by which perceived listener constraints interact with speakers' adoption of specialized speech registers is poorly Hypo (H&H) theory offers a framework for examining the relationship between speech production and output-oriented goals for communication, suggesting that under certain circumstances speakers may attempt to minimize phonetic ambiguity by employing a ``hyperarticulated'' speaking style (Lindblom, 1990). It remains unclear, however, what the acoustic correlates of hyperarticulated speech are, and how, if at all, we might expect phonetic properties to change respective to different listener-constrained conditions. This paper is part of a preliminary investigation concerned with comparing the prosodic characteristics of speech produced across a range of listener constraints. Analyses are drawn from a corpus of read hyperarticulated speech data comprising eight adult, female speakers of English. Specialized registers include speech to foreigners, infant-directed speech, speech produced under noisy conditions, and human-machine interaction. The authors gratefully acknowledge financial support of the Irish Higher Education Authority, allocated to Fred Cummins for collaborative work with Media Lab Europe.

  6. Start/End Delays of Voiced and Unvoiced Speech Signals

    SciTech Connect

    Herrnstein, A

    1999-09-24

    Recent experiments using low power EM-radar like sensors (e.g, GEMs) have demonstrated a new method for measuring vocal fold activity and the onset times of voiced speech, as vocal fold contact begins to take place. Similarly the end time of a voiced speech segment can be measured. Secondly it appears that in most normal uses of American English speech, unvoiced-speech segments directly precede or directly follow voiced-speech segments. For many applications, it is useful to know typical duration times of these unvoiced speech segments. A corpus, assembled earlier of spoken ''Timit'' words, phrases, and sentences and recorded using simultaneously measured acoustic and EM-sensor glottal signals, from 16 male speakers, was used for this study. By inspecting the onset (or end) of unvoiced speech, using the acoustic signal, and the onset (or end) of voiced speech using the EM sensor signal, the average duration times for unvoiced segments preceding onset of vocalization were found to be 300ms, and for following segments, 500ms. An unvoiced speech period is then defined in time, first by using the onset of the EM-sensed glottal signal, as the onset-time marker for the voiced speech segment and end marker for the unvoiced segment. Then, by subtracting 300ms from the onset time mark of voicing, the unvoiced speech segment start time is found. Similarly, the times for a following unvoiced speech segment can be found. While data of this nature have proven to be useful for work in our laboratory, a great deal of additional work remains to validate such data for use with general populations of users. These procedures have been useful for applying optimal processing algorithms over time segments of unvoiced, voiced, and non-speech acoustic signals. For example, these data appear to be of use in speaker validation, in vocoding, and in denoising algorithms.

  7. Multifractal nature of unvoiced speech signals

    SciTech Connect

    Adeyemi, O.A.; Hartt, K.; Boudreaux-Bartels, G.F.

    1996-06-01

    A refinement is made in the nonlinear dynamic modeling of speech signals. Previous research successfully characterized speech signals as chaotic. Here, we analyze fricative speech signals using multifractal measures to determine various fractal regimes present in their chaotic attractors. Results support the hypothesis that speech signals have multifractal measures. {copyright} {ital 1996 American Institute of Physics.}

  8. Acoustic Evidence for Phonologically Mismatched Speech Errors

    ERIC Educational Resources Information Center

    Gormley, Andrea

    2015-01-01

    Speech errors are generally said to accommodate to their new phonological context. This accommodation has been validated by several transcription studies. The transcription methodology is not the best choice for detecting errors at this level, however, as this type of error can be difficult to perceive. This paper presents an acoustic analysis of…

  9. Acoustic characterization of developmental speech disorders

    NASA Astrophysics Data System (ADS)

    Bunnell, H. Timothy; Polikoff, James; McNicholas, Jane; Walter, Rhonda; Winn, Matthew

    2001-05-01

    A novel approach to classifying children with developmental speech delays (DSD) involving /r/ was developed. The approach first derives an acoustic classification of /r/ tokens based on their forced Viterbi alignment to a five-state hidden Markov model (HMM) of normally articulated /r/. Children with DSD are then classified in terms of the proportion of their /r/ productions that fall into each broad acoustic class. This approach was evaluated using 953 examples of /r/ as produced by 18 DSD children and an approximately equal number of /r/ tokens produced by a much larger number of normally articulating children. The acoustic classification identified three broad categories of /r/ that differed substantially in how they aligned to the normal speech /r/ HMM. Additionally, these categories tended to partition tokens uttered by DSD children from those uttered by normally articulating children. Similarities among the DSD children and average normal child measured in terms of the proportion of their /r/ productions that fell into each of the three broad acoustic categories were used to perform a hierarchical clustering. This clustering revealed groupings of DSD children who tended to approach /r/ production in one of several acoustically distinct manners.

  10. Applications of Hilbert Spectral Analysis for Speech and Sound Signals

    NASA Technical Reports Server (NTRS)

    Huang, Norden E.

    2003-01-01

    A new method for analyzing nonlinear and nonstationary data has been developed, and the natural applications are to speech and sound signals. The key part of the method is the Empirical Mode Decomposition method with which any complicated data set can be decomposed into a finite and often small number of Intrinsic Mode Functions (IMF). An IMF is defined as any function having the same numbers of zero-crossing and extrema, and also having symmetric envelopes defined by the local maxima and minima respectively. The IMF also admits well-behaved Hilbert transform. This decomposition method is adaptive, and, therefore, highly efficient. Since the decomposition is based on the local characteristic time scale of the data, it is applicable to nonlinear and nonstationary processes. With the Hilbert transform, the Intrinsic Mode Functions yield instantaneous frequencies as functions of time, which give sharp identifications of imbedded structures. This method invention can be used to process all acoustic signals. Specifically, it can process the speech signals for Speech synthesis, Speaker identification and verification, Speech recognition, and Sound signal enhancement and filtering. Additionally, as the acoustical signals from machinery are essentially the way the machines are talking to us. Therefore, the acoustical signals, from the machines, either from sound through air or vibration on the machines, can tell us the operating conditions of the machines. Thus, we can use the acoustic signal to diagnosis the problems of machines.

  11. Sparse representation in speech signal processing

    NASA Astrophysics Data System (ADS)

    Lee, Te-Won; Jang, Gil-Jin; Kwon, Oh-Wook

    2003-11-01

    We review the sparse representation principle for processing speech signals. A transformation for encoding the speech signals is learned such that the resulting coefficients are as independent as possible. We use independent component analysis with an exponential prior to learn a statistical representation for speech signals. This representation leads to extremely sparse priors that can be used for encoding speech signals for a variety of purposes. We review applications of this method for speech feature extraction, automatic speech recognition and speaker identification. Furthermore, this method is also suited for tackling the difficult problem of separating two sounds given only a single microphone.

  12. Signals voice biofeedback for speech fluency disorders

    NASA Astrophysics Data System (ADS)

    Martin, Jose Francisco; Fernandez-Ramos, Raquel; Romero-Sanchez, Jorge; Rios, Francisco

    2003-04-01

    The knowledge about mechanisms of voice production as well as the parameters obtaining, allow us to present solutions for coding, transmission and establishment of properties to distinguish between the responsible physiological mechanisms. In this work, we are interested in the evaluation of syllabic Sequences in Continuous Speech. We keep in mind this evaluation is very interesting and useful for Foniatrics and Logopaedia applications focus on the measurement and control of Speech Fluency. Moreover, we are interested in studying and evaluating sequential programming and muscular coordination. In this way, the main objective of our work is focus on the study of production mechanisms, model, evaluation methods and introduction of a reliable algorithm to catalogue and classify the phenomena of rythm and speech fluency. In this paper, we present an algorithm for syllabic analysis based on Short Time Energy concept. Firstly, the algorithm extracts the temporary syllabic intervals of speech and silence, and then compared with normality intervals. Secondly, it proceeds to feedback in real time to the patient luminous and acoustic signals indicating the degree of mismatching with the normality model. This methodology is useful to improve fluency disorder. We present an ASIC microelectronic solution for the syllabic analyser and a portable prototype to be used in a clinic level as much as individualized tool for the patient.

  13. Infant Perception of Atypical Speech Signals

    ERIC Educational Resources Information Center

    Vouloumanos, Athena; Gelfand, Hanna M.

    2013-01-01

    The ability to decode atypical and degraded speech signals as intelligible is a hallmark of speech perception. Human adults can perceive sounds as speech even when they are generated by a variety of nonhuman sources including computers and parrots. We examined how infants perceive the speech-like vocalizations of a parrot. Further, we examined how…

  14. Acoustic analysis of speech under stress.

    PubMed

    Sondhi, Savita; Khan, Munna; Vijay, Ritu; Salhan, Ashok K; Chouhan, Satish

    2015-01-01

    When a person is emotionally charged, stress could be discerned in his voice. This paper presents a simplified and a non-invasive approach to detect psycho-physiological stress by monitoring the acoustic modifications during a stressful conversation. Voice database consists of audio clips from eight different popular FM broadcasts wherein the host of the show vexes the subjects who are otherwise unaware of the charade. The audio clips are obtained from real-life stressful conversations (no simulated emotions). Analysis is done using PRAAT software to evaluate mean fundamental frequency (F0) and formant frequencies (F1, F2, F3, F4) both in neutral and stressed state. Results suggest that F0 increases with stress; however, formant frequency decreases with stress. Comparison of Fourier and chirp spectra of short vowel segment shows that for relaxed speech, the two spectra are similar; however, for stressed speech, they differ in the high frequency range due to increased pitch modulation. PMID:26558301

  15. Acoustic assessment of erygmophonic speech of Moroccan laryngectomized patients

    PubMed Central

    Ouattassi, Naouar; Benmansour, Najib; Ridal, Mohammed; Zaki, Zouheir; Bendahhou, Karima; Nejjari, Chakib; Cherkaoui, Abdeljabbar; El Alami, Mohammed Nouredine El Amine

    2015-01-01

    Introduction Acoustic evaluation of alaryngeal voices is among the most prominent issues in speech analysis field. In fact, many methods have been developed to date to substitute the classic perceptual evaluation. The Aim of this study is to present our experience in erygmophonic speech objective assessment and to discuss the most widely used methods of acoustic speech appraisal. through a prospective case-control study we have measured acoustic parameters of speech quality during one year of erygmophonic rehabilitation therapy of Moroccan laryngectomized patients. Methods We have assessed acoustic parameters of erygmophonic speech samples of eleven laryngectomized patients through the speech rehabilitation therapy. Acoustic parameters were obtained by perturbation analysis method and linear predictive coding algorithms also through the broadband spectrogram. Results Using perturbation analysis methods, we have found erygmophonic voice to be significantly poorer than normal speech and it exhibits higher formant frequency values. However, erygmophonic voice shows also higher and extremely variable Error values that were greater than the acceptable level. And thus, live a doubt on the reliability of those analytic methods results. Conclusion Acoustic parameters for objective evaluation of alaryngeal voices should allow a reliable representation of the perceptual evaluation of the quality of speech. This requirement has not been fulfilled by the common methods used so far. Therefore, acoustical assessment of erygmophonic speech needs more investigations. PMID:26587121

  16. Acoustic Study of Acted Emotions in Speech

    NASA Astrophysics Data System (ADS)

    Wang, Rong

    An extensive set of carefully recorded utterances provided a speech database for investigating acoustic correlates among eight emotional states. Four professional actors and four professional actresses simulated the emotional states of joy, conversation, nervousness, anger, sadness, hate, fear, and depression. The values of 14 acoustic parameters were extracted from analyses of the simulated portrayals. Normalization of the parameters was made to reduce the talker-dependence. Correlates of emotion were investigated by means of principal components analysis. Sadness and depression were found to be "ambiguous" with respect to each other, but "unique" with respect to joy and anger in the principal components space. Joy, conversation, nervousness, anger, hate, and fear did not separate well in the space and so exhibited ambiguity with respect to one another. The different talkers expressed joy, anger, sadness, and depression more consistently than the other four emotions. The analysis results were compared with the results of a subjective study using the same speech database and considerable consistency between the two was found.

  17. Acoustic differences among casual, conversational, and read speech

    NASA Astrophysics Data System (ADS)

    Pinnow, DeAnna

    Speech is a complex behavior that allows speakers to use many variations to satisfy the demands connected with multiple speaking environments. Speech research typically obtains speech samples in a controlled laboratory setting using read material, yet anecdotal observations of such speech, particularly from talkers with a speech and language impairment, have identified a "performance" effect in the produced speech which masks the characteristics of impaired speech outside of the lab (Goberman, Recker, & Parveen, 2010). The aim of the current study was to investigate acoustic differences among laboratory read, laboratory conversational, and casual speech through well-defined speech tasks in the laboratory and in talkers' natural environments. Eleven healthy research participants performed lab recording tasks (19 read sentences and a dialogue about their life) and collected natural-environment recordings of themselves over 3-day periods using portable recorders. Segments were analyzed for articulatory, voice, and prosodic acoustic characteristics using computer software and hand counting. The current study results indicate that lab-read speech was significantly different from casual speech: greater articulation range, improved voice quality measures, lower speech rate, and lower mean pitch. One implication of the results is that different laboratory techniques may be beneficial in obtaining speech samples that are more like casual speech, thus making it easier to correctly analyze abnormal speech characteristics with fewer errors.

  18. Acoustics in Halls for Speech and Music

    NASA Astrophysics Data System (ADS)

    Gade, Anders C.

    This chapter deals specifically with concepts, tools, and architectural variables of importance when designing auditoria for speech and music. The focus will be on cultivating the useful components of the sound in the room rather than on avoiding noise from outside or from installations, which is dealt with in Chap. 11. The chapter starts by presenting the subjective aspects of the room acoustic experience according to consensus at the time of writing. Then follows a description of their objective counterparts, the objective room acoustic parameters, among which the classical reverberation time measure is only one of many, but still of fundamental value. After explanations on how these parameters can be measured and predicted during the design phase, the remainder of the chapter deals with how the acoustic properties can be controlled by the architectural design of auditoria. This is done by presenting the influence of individual design elements as well as brief descriptions of halls designed for specific purposes, such as drama, opera, and symphonic concerts. Finally, some important aspects of loudspeaker installations in auditoria are briefly touched upon.

  19. Acoustic properties of naturally produced clear speech at normal speaking rates

    NASA Astrophysics Data System (ADS)

    Krause, Jean C.; Braida, Louis D.

    2004-01-01

    Sentences spoken ``clearly'' are significantly more intelligible than those spoken ``conversationally'' for hearing-impaired listeners in a variety of backgrounds [Picheny et al., J. Speech Hear. Res. 28, 96-103 (1985); Uchanski et al., ibid. 39, 494-509 (1996); Payton et al., J. Acoust. Soc. Am. 95, 1581-1592 (1994)]. While producing clear speech, however, talkers often reduce their speaking rate significantly [Picheny et al., J. Speech Hear. Res. 29, 434-446 (1986); Uchanski et al., ibid. 39, 494-509 (1996)]. Yet speaking slowly is not solely responsible for the intelligibility benefit of clear speech (over conversational speech), since a recent study [Krause and Braida, J. Acoust. Soc. Am. 112, 2165-2172 (2002)] showed that talkers can produce clear speech at normal rates with training. This finding suggests that clear speech has inherent acoustic properties, independent of rate, that contribute to improved intelligibility. Identifying these acoustic properties could lead to improved signal processing schemes for hearing aids. To gain insight into these acoustical properties, conversational and clear speech produced at normal speaking rates were analyzed at three levels of detail (global, phonological, and phonetic). Although results suggest that talkers may have employed different strategies to achieve clear speech at normal rates, two global-level properties were identified that appear likely to be linked to the improvements in intelligibility provided by clear/normal speech: increased energy in the 1000-3000-Hz range of long-term spectra and increased modulation depth of low frequency modulations of the intensity envelope. Other phonological and phonetic differences associated with clear/normal speech include changes in (1) frequency of stop burst releases, (2) VOT of word-initial voiceless stop consonants, and (3) short-term vowel spectra.

  20. Speaker verification using combined acoustic and EM sensor signal processing

    SciTech Connect

    Ng, L C; Gable, T J; Holzrichter, J F

    2000-11-10

    Low Power EM radar-like sensors have made it possible to measure properties of the human speech production system in real-time, without acoustic interference. This greatly enhances the quality and quantity of information for many speech related applications. See Holzrichter, Burnett, Ng, and Lea, J. Acoustic. SOC. Am . 103 ( 1) 622 (1998). By combining the Glottal-EM-Sensor (GEMS) with the Acoustic-signals, we've demonstrated an almost 10 fold reduction in error rates from a speaker verification system experiment under a moderate noisy environment (-10dB).

  1. Low Bandwidth Vocoding using EM Sensor and Acoustic Signal Processing

    SciTech Connect

    Ng, L C; Holzrichter, J F; Larson, P E

    2001-10-25

    Low-power EM radar-like sensors have made it possible to measure properties of the human speech production system in real-time, without acoustic interference [1]. By combining these data with the corresponding acoustic signal, we've demonstrated an almost 10-fold bandwidth reduction in speech compression, compared to a standard 2.4 kbps LPC10 protocol used in the STU-III (Secure Terminal Unit, third generation) telephone. This paper describes a potential EM sensor/acoustic based vocoder implementation.

  2. Segmentation of the speech signal based on changes in energy distribution in the spectrum

    NASA Astrophysics Data System (ADS)

    Jassem, W.; Kudzdela, H.; Domagala, P.

    1983-08-01

    A simple algorithm is proposed for automatic phonetic segmentation of the acoustic speech signal on the MERA 303 desk-top minicomputer. The algorithm is verified with Polish linguistic material spoken by two subjects. The proposed algorithm detects approximately 80 percent of the boundaries between enunciated segments correctly, a result no worse than that obtained using more complex methods. Speech recognition programs are discussed as speech perception models, and the nature of categorical perception of human speech sounds is examined.

  3. Detection and Classification of Whale Acoustic Signals

    NASA Astrophysics Data System (ADS)

    Xian, Yin

    vocalization data set. The word error rate of the DCTNet feature is similar to the MFSC in speech recognition tasks, suggesting that the convolutional network is able to reveal acoustic content of speech signals.

  4. Digital signal processing in acoustics. I

    NASA Astrophysics Data System (ADS)

    Davies, H.; McNeil, D. J.

    1985-11-01

    Digital signal processing techniques have gained steadily in importance over the past few years in many areas of science and engineering and have transformed the character of instrumentation used in laboratory and plant. This is particularly marked in acoustics, which has both benefited from the developments in signal processing and provided significant stimulus for these developments. As a result acoustical techniques are now used in a very wide range of applications and acoustics is one area in which digital signal processing is exploited to its limits. For example, the development of fast algorithms for computing Fourier transforms and the associated developments in hardware have led to remarkable advances in the use of spectral analysis as a means of investigating the nature and characteristics of acoustic sources. Speech research has benefited considerably in this respect, and, in a rather more technological application, spectral analysis of machinery noise provides information about changes in machine condition which may indicate imminent failure. More recently the observation that human and animal muscles emit low intensity noise suggests that spectral analysis of this noise may yield information about muscle structure and performance.

  5. Age-Related Changes in Acoustic Characteristics of Adult Speech

    ERIC Educational Resources Information Center

    Torre, Peter, III; Barlow, Jessica A.

    2009-01-01

    This paper addresses effects of age and sex on certain acoustic properties of speech, given conflicting findings on such effects reported in prior research. The speech of 27 younger adults (15 women, 12 men; mean age 25.5 years) and 59 older adults (32 women, 27 men; mean age 75.2 years) was evaluated for identification of differences for sex and…

  6. Mathematical model of acoustic speech production with mobile walls of the vocal tract

    NASA Astrophysics Data System (ADS)

    Lyubimov, N. A.; Zakharov, E. V.

    2016-03-01

    A mathematical speech production model is considered that describes acoustic oscillation propagation in a vocal tract with mobile walls. The wave field function satisfies the Helmholtz equation with boundary conditions of the third kind (impedance type). The impedance mode corresponds to a threeparameter pendulum oscillation model. The experimental research demonstrates the nonlinear character of how the mobility of the vocal tract walls influence the spectral envelope of a speech signal.

  7. Clinical investigation of speech signal features among patients with schizophrenia

    PubMed Central

    ZHANG, Jing; PAN, Zhongde; GUI, Chao; CUI, Donghong

    2016-01-01

    Background A new area of interest in the search for biomarkers for schizophrenia is the study of the acoustic parameters of speech called 'speech signal features'. Several of these features have been shown to be related to emotional responsiveness, a characteristic that is notably restricted in patients with schizophrenia, particularly those with prominent negative symptoms. Aim Assess the relationship of selected acoustic parameters of speech to the severity of clinical symptoms in patients with chronic schizophrenia and compare these characteristics between patients and matched healthy controls. Methods Ten speech signal features-six prosody features, formant bandwidth and amplitude, and two spectral features-were assessed using 15-minute speech samples obtained by smartphone from 26 inpatients with chronic schizophrenia (at enrollment and 1 week later) and from 30 healthy controls (at enrollment only). Clinical symptoms of the patients were also assessed at baseline and 1 week later using the Positive and Negative Syndrome Scale, the Scale for the Assessment of Negative Symptoms, and the Clinical Global Impression-Schizophrenia scale. Results In the patient group the symptoms were stable over the 1-week interval and the 1-week test-retest reliability of the 10 speech features was good (intraclass correlation coefficients [ICC] ranging from 0.55 to 0.88). Comparison of the speech features between patients and controls found no significant differences in the six prosody features or in the formant bandwidth and amplitude features, but the two spectral features were different: the Mel-frequency cepstral coefficient (MFCC) scores were significantly lower in the patient group than in the control group, and the linear prediction coding (LPC) scores were significantly higher in the patient group than in the control group. Within the patient group, 10 of the 170 associations between the 10 speech features considered and the 17 clinical parameters considered were

  8. Acoustic assessment of speech privacy curtains in two nursing units.

    PubMed

    Pope, Diana S; Miller-Klein, Erik T

    2016-01-01

    Hospitals have complex soundscapes that create challenges to patient care. Extraneous noise and high reverberation rates impair speech intelligibility, which leads to raised voices. In an unintended spiral, the increasing noise may result in diminished speech privacy, as people speak loudly to be heard over the din. The products available to improve hospital soundscapes include construction materials that absorb sound (acoustic ceiling tiles, carpet, wall insulation) and reduce reverberation rates. Enhanced privacy curtains are now available and offer potential for a relatively simple way to improve speech privacy and speech intelligibility by absorbing sound at the hospital patient's bedside. Acoustic assessments were performed over 2 days on two nursing units with a similar design in the same hospital. One unit was built with the 1970s' standard hospital construction and the other was newly refurbished (2013) with sound-absorbing features. In addition, we determined the effect of an enhanced privacy curtain versus standard privacy curtains using acoustic measures of speech privacy and speech intelligibility indexes. Privacy curtains provided auditory protection for the patients. In general, that protection was increased by the use of enhanced privacy curtains. On an average, the enhanced curtain improved sound absorption from 20% to 30%; however, there was considerable variability, depending on the configuration of the rooms tested. Enhanced privacy curtains provide measureable improvement to the acoustics of patient rooms but cannot overcome larger acoustic design issues. To shorten reverberation time, additional absorption, and compact and more fragmented nursing unit floor plate shapes should be considered. PMID:26780959

  9. Acoustic assessment of speech privacy curtains in two nursing units

    PubMed Central

    Pope, Diana S.; Miller-Klein, Erik T.

    2016-01-01

    Hospitals have complex soundscapes that create challenges to patient care. Extraneous noise and high reverberation rates impair speech intelligibility, which leads to raised voices. In an unintended spiral, the increasing noise may result in diminished speech privacy, as people speak loudly to be heard over the din. The products available to improve hospital soundscapes include construction materials that absorb sound (acoustic ceiling tiles, carpet, wall insulation) and reduce reverberation rates. Enhanced privacy curtains are now available and offer potential for a relatively simple way to improve speech privacy and speech intelligibility by absorbing sound at the hospital patient's bedside. Acoustic assessments were performed over 2 days on two nursing units with a similar design in the same hospital. One unit was built with the 1970s’ standard hospital construction and the other was newly refurbished (2013) with sound-absorbing features. In addition, we determined the effect of an enhanced privacy curtain versus standard privacy curtains using acoustic measures of speech privacy and speech intelligibility indexes. Privacy curtains provided auditory protection for the patients. In general, that protection was increased by the use of enhanced privacy curtains. On an average, the enhanced curtain improved sound absorption from 20% to 30%; however, there was considerable variability, depending on the configuration of the rooms tested. Enhanced privacy curtains provide measureable improvement to the acoustics of patient rooms but cannot overcome larger acoustic design issues. To shorten reverberation time, additional absorption, and compact and more fragmented nursing unit floor plate shapes should be considered. PMID:26780959

  10. Speech recognition: Acoustic phonetic and lexical knowledge representation

    NASA Astrophysics Data System (ADS)

    Zue, V. W.

    1983-02-01

    The purpose of this program is to develop a speech data base facility under which the acoustic characteristics of speech sounds in various contexts can be studied conveniently; investigate the phonological properties of a large lexicon of, say 10,000 words, and determine to what extent the phontactic constraints can be utilized in speech recognition; study the acoustic cues that are used to mark work boundaries; develop a test bed in the form of a large-vocabulary, IWR system to study the interactions of acoustic, phonetic and lexical knowledge; and develop a limited continuous speech recognition system with the goal of recognizing any English word from its spelling in order to assess the interactions of higher-level knowledge sources.

  11. Speech recognition: Acoustic phonetic and lexical knowledge representation

    NASA Astrophysics Data System (ADS)

    Zue, V. W.

    1984-02-01

    The purpose of this program is to develop a speech data base facility under which the acoustic characteristics of speech sounds in various contexts can be studied conveniently; investigate the phonological properties of a large lexicon of, say 10,000 words and determine to what extent the phonotactic constraints can be utilized in speech recognition; study the acoustic cues that are used to mark work boundaries; develop a test bed in the form of a large-vocabulary, IWR system to study the interactions of acoustic, phonetic and lexical knowledge; and develop a limited continuous speech recognition system with the goal of recognizing any English word from its spelling in order to assess the interactions of higher-level knowledge sources.

  12. Classification of stop place in consonant-vowel contexts using feature extrapolation of acoustic-phonetic features in telephone speech.

    PubMed

    Lee, Jung-Won; Choi, Jeung-Yoon; Kang, Hong-Goo

    2012-02-01

    Knowledge-based speech recognition systems extract acoustic cues from the signal to identify speech characteristics. For channel-deteriorated telephone speech, acoustic cues, especially those for stop consonant place, are expected to be degraded or absent. To investigate the use of knowledge-based methods in degraded environments, feature extrapolation of acoustic-phonetic features based on Gaussian mixture models is examined. This process is applied to a stop place detection module that uses burst release and vowel onset cues for consonant-vowel tokens of English. Results show that classification performance is enhanced in telephone channel-degraded speech, with extrapolated acoustic-phonetic features reaching or exceeding performance using estimated Mel-frequency cepstral coefficients (MFCCs). Results also show acoustic-phonetic features may be combined with MFCCs for best performance, suggesting these features provide information complementary to MFCCs. PMID:22352523

  13. Optimizing acoustical conditions for speech intelligibility in classrooms

    NASA Astrophysics Data System (ADS)

    Yang, Wonyoung

    High speech intelligibility is imperative in classrooms where verbal communication is critical. However, the optimal acoustical conditions to achieve a high degree of speech intelligibility have previously been investigated with inconsistent results, and practical room-acoustical solutions to optimize the acoustical conditions for speech intelligibility have not been developed. This experimental study validated auralization for speech-intelligibility testing, investigated the optimal reverberation for speech intelligibility for both normal and hearing-impaired listeners using more realistic room-acoustical models, and proposed an optimal sound-control design for speech intelligibility based on the findings. The auralization technique was used to perform subjective speech-intelligibility tests. The validation study, comparing auralization results with those of real classroom speech-intelligibility tests, found that if the room to be auralized is not very absorptive or noisy, speech-intelligibility tests using auralization are valid. The speech-intelligibility tests were done in two different auralized sound fields---approximately diffuse and non-diffuse---using the Modified Rhyme Test and both normal and hearing-impaired listeners. A hybrid room-acoustical prediction program was used throughout the work, and it and a 1/8 scale-model classroom were used to evaluate the effects of ceiling barriers and reflectors. For both subject groups, in approximately diffuse sound fields, when the speech source was closer to the listener than the noise source, the optimal reverberation time was zero. When the noise source was closer to the listener than the speech source, the optimal reverberation time was 0.4 s (with another peak at 0.0 s) with relative output power levels of the speech and noise sources SNS = 5 dB, and 0.8 s with SNS = 0 dB. In non-diffuse sound fields, when the noise source was between the speaker and the listener, the optimal reverberation time was 0.6 s with

  14. Speech and melody recognition in binaurally combined acoustic and electric hearing

    NASA Astrophysics Data System (ADS)

    Kong, Ying-Yee; Stickney, Ginger S.; Zeng, Fan-Gang

    2005-03-01

    Speech recognition in noise and music perception is especially challenging for current cochlear implant users. The present study utilizes the residual acoustic hearing in the nonimplanted ear in five cochlear implant users to elucidate the role of temporal fine structure at low frequencies in auditory perception and to test the hypothesis that combined acoustic and electric hearing produces better performance than either mode alone. The first experiment measured speech recognition in the presence of competing noise. It was found that, although the residual low-frequency (<1000 Hz) acoustic hearing produced essentially no recognition for speech recognition in noise, it significantly enhanced performance when combined with the electric hearing. The second experiment measured melody recognition in the same group of subjects and found that, contrary to the speech recognition result, the low-frequency acoustic hearing produced significantly better performance than the electric hearing. It is hypothesized that listeners with combined acoustic and electric hearing might use the correlation between the salient pitch in low-frequency acoustic hearing and the weak pitch in the envelope to enhance segregation between signal and noise. The present study suggests the importance and urgency of accurately encoding the fine-structure cue in cochlear implants. .

  15. Sensitivity to Structure in the Speech Signal by Children with Speech Sound Disorder and Reading Disability

    PubMed Central

    Johnson, Erin Phinney; Pennington, Bruce F.; Lowenstein, Joanna H.; Nittrouer, Susan

    2011-01-01

    Purpose Children with speech sound disorder (SSD) and reading disability (RD) have poor phonological awareness, a problem believed to arise largely from deficits in processing the sensory information in speech, specifically individual acoustic cues. However, such cues are details of acoustic structure. Recent theories suggest that listeners also need to be able to integrate those details to perceive linguistically relevant form. This study examined abilities of children with SSD, RD, and SSD+RD not only to process acoustic cues but also to recover linguistically relevant form from the speech signal. Method Ten- to 11-year-olds with SSD (n = 17), RD (n = 16), SSD+RD (n = 17), and Controls (n = 16) were tested to examine their sensitivity to (1) voice onset times (VOT); (2) spectral structure in fricative-vowel syllables; and (3) vocoded sentences. Results Children in all groups performed similarly with VOT stimuli, but children with disorders showed delays on other tasks, although the specifics of their performance varied. Conclusion Children with poor phonemic awareness not only lack sensitivity to acoustic details, but are also less able to recover linguistically relevant forms. This is contrary to one of the main current theories of the relation between spoken and written language development. PMID:21329941

  16. Preserved Acoustic Hearing in Cochlear Implantation Improves Speech Perception

    PubMed Central

    Sheffield, Sterling W.; Jahn, Kelly; Gifford, René H.

    2015-01-01

    Background With improved surgical techniques and electrode design, an increasing number of cochlear implant (CI) recipients have preserved acoustic hearing in the implanted ear, thereby resulting in bilateral acoustic hearing. There are currently no guidelines, however, for clinicians with respect to audio-metric criteria and the recommendation of amplification in the implanted ear. The acoustic bandwidth necessary to obtain speech perception benefit from acoustic hearing in the implanted ear is unknown. Additionally, it is important to determine if, and in which listening environments, acoustic hearing in both ears provides more benefit than hearing in just one ear, even with limited residual hearing. Purpose The purposes of this study were to (1) determine whether acoustic hearing in an ear with a CI provides as much speech perception benefit as an equivalent bandwidth of acoustic hearing in the non-implanted ear, and (2) determine whether acoustic hearing in both ears provides more benefit than hearing in just one ear. Research Design A repeated-measures, within-participant design was used to compare performance across listening conditions. Study Sample Seven adults with CIs and bilateral residual acoustic hearing (hearing preservation) were recruited for the study. Data Collection and Analysis Consonant-nucleus-consonant word recognition was tested in four conditions: CI alone, CI + acoustic hearing in the nonimplanted ear, CI + acoustic hearing in the implanted ear, and CI + bilateral acoustic hearing. A series of low-pass filters were used to examine the effects of acoustic bandwidth through an insert earphone with amplification. Benefit was defined as the difference among conditions. The benefit of bilateral acoustic hearing was tested in both diffuse and single-source background noise. Results were analyzed using repeated-measures analysis of variance. Results Similar benefit was obtained for equivalent acoustic frequency bandwidth in either ear. Acoustic

  17. Acoustically-Induced Electrical Signals

    NASA Astrophysics Data System (ADS)

    Brown, S. R.

    2014-12-01

    We have observed electrical signals excited by and moving along with an acoustic pulse propagating in a sandstone sample. Using resonance we are now studying the characteristics of this acousto-electric signal and determining its origin and the controlling physical parameters. Four rock samples with a range of porosities, permeabilities, and mineralogies were chosen: Berea, Boise, and Colton sandstones and Austin Chalk. Pore water salinity was varied from deionized water to sea water. Ag-AgCl electrodes were attached to the sample and were interfaced to a 4-wire electrical resistivity system. Under computer control, the acoustic signals were excited and the electrical response was recorded. We see strong acoustically-induced electrical signals in all samples, with the magnitude of the effect for each rock getting stronger as we move from the 1st to the 3rd harmonics in resonance. Given a particular fluid salinity, each rock has its own distinct sensitivity in the induced electrical effect. For example at the 2nd harmonic, Berea Sandstone produces the largest electrical signal per acoustic power input even though Austin Chalk and Boise Sandstone tend to resonate with much larger amplitudes at the same harmonic. Two effects are potentially responsible for this acoustically-induced electrical response: one the co-seismic seismo-electric effect and the other a strain-induced resistivity change known as the acousto-electric effect. We have designed experimental tests to separate these mechanisms. The tests show that the seismo-electric effect is dominant in our studies. We note that these experiments are in a fluid viscosity dominated seismo-electric regime, leading to a simple interpretation of the signals where the electric potential developed is proportional to the local acceleration of the rock. Toward a test of this theory we have measured the local time-varying acoustic strain in our samples using a laser vibrometer.

  18. Characteristic Extraction of Speech Signal Using Wavelet

    NASA Astrophysics Data System (ADS)

    Moriai, Shogo; Hanazaki, Izumi

    In the analysis-synthesis coding of speech signals, realization of the high quality in the low bit rate coding depends on the extraction of its characteristic parameters in the pre-processing. The precise extraction of the fundamental frequency, one of the parameters of the source information, guarantees the quality in the speech synthesis. But its extraction is diffcult because of the influence of the consonant, non-periodicity of vocal cords vibration, wide range of the fundamental frequency, etc.. In this paper, we will propose a new fundamental frequency extraction of the speech signals using the Wavelet transform with the criterion based on its harmonics structure.

  19. Local thresholding de-noise speech signal

    NASA Astrophysics Data System (ADS)

    Luo, Haitao

    2013-07-01

    De-noise speech signal if it is noisy. Construct a wavelet according to Daubechies' method, and derive a wavelet packet from the constructed scaling and wavelet functions. Decompose the noisy speech signal by wavelet packet. Develop algorithms to detect beginning and ending point of speech. Construct polynomial function for local thresholding. Apply different strategies to de-noise and compress the decomposed terminal nodes coefficients. Reconstruct the wavelet packet tree. Re-build audio file using reconstructed data and compare the effectiveness of different strategies.

  20. Acoustic Localization with Infrasonic Signals

    NASA Astrophysics Data System (ADS)

    Threatt, Arnesha; Elbing, Brian

    2015-11-01

    Numerous geophysical and anthropogenic events emit infrasonic frequencies (<20 Hz), including volcanoes, hurricanes, wind turbines and tornadoes. These sounds, which cannot be heard by the human ear, can be detected from large distances (in excess of 100 miles) due to low frequency acoustic signals having a very low decay rate in the atmosphere. Thus infrasound could be used for long-range, passive monitoring and detection of these events. An array of microphones separated by known distances can be used to locate a given source, which is known as acoustic localization. However, acoustic localization with infrasound is particularly challenging due to contamination from other signals, sensitivity to wind noise and producing a trusted source for system development. The objective of the current work is to create an infrasonic source using a propane torch wand or a subwoofer and locate the source using multiple infrasonic microphones. This presentation will present preliminary results from various microphone configurations used to locate the source.

  1. Acoustic Analysis of Speech of Cochlear Implantees and Its Implications

    PubMed Central

    Patadia, Rajesh; Govale, Prajakta; Rangasayee, R.; Kirtane, Milind

    2012-01-01

    Objectives Cochlear implantees have improved speech production skills compared with those using hearing aids, as reflected in their acoustic measures. When compared to normal hearing controls, implanted children had fronted vowel space and their /s/ and /∫/ noise frequencies overlapped. Acoustic analysis of speech provides an objective index of perceived differences in speech production which can be precursory in planning therapy. The objective of this study was to compare acoustic characteristics of speech in cochlear implantees with those of normal hearing age matched peers to understand implications. Methods Group 1 consisted of 15 children with prelingual bilateral severe-profound hearing loss (age, 5-11 years; implanted between 4-10 years). Prior to an implant behind the ear, hearing aids were used; prior & post implantation subjects received at least 1 year of aural intervention. Group 2 consisted of 15 normal hearing age matched peers. Sustained productions of vowels and words with selected consonants were recorded. Using Praat software for acoustic analysis, digitized speech tokens were measured for F1, F2, and F3 of vowels; centre frequency (Hz) and energy concentration (dB) in burst; voice onset time (VOT in ms) for stops; centre frequency (Hz) of noise in /s/; rise time (ms) for affricates. A t-test was used to find significant differences between groups. Results Significant differences were found in VOT for /b/, F1 and F2 of /e/, and F3 of /u/. No significant differences were found for centre frequency of burst, energy concentration for stops, centre frequency of noise in /s/, or rise time for affricates. These findings suggest that auditory feedback provided by cochlear implants enable subjects to monitor production of speech sounds. Conclusion Acoustic analysis of speech is an essential method for discerning characteristics which have or have not been improved by cochlear implantation and thus for planning intervention. PMID:22701768

  2. Methods and apparatus for non-acoustic speech characterization and recognition

    DOEpatents

    Holzrichter, John F.

    1999-01-01

    By simultaneously recording EM wave reflections and acoustic speech information, the positions and velocities of the speech organs as speech is articulated can be defined for each acoustic speech unit. Well defined time frames and feature vectors describing the speech, to the degree required, can be formed. Such feature vectors can uniquely characterize the speech unit being articulated each time frame. The onset of speech, rejection of external noise, vocalized pitch periods, articulator conditions, accurate timing, the identification of the speaker, acoustic speech unit recognition, and organ mechanical parameters can be determined.

  3. Methods and apparatus for non-acoustic speech characterization and recognition

    SciTech Connect

    Holzrichter, J.F.

    1999-12-21

    By simultaneously recording EM wave reflections and acoustic speech information, the positions and velocities of the speech organs as speech is articulated can be defined for each acoustic speech unit. Well defined time frames and feature vectors describing the speech, to the degree required, can be formed. Such feature vectors can uniquely characterize the speech unit being articulated each time frame. The onset of speech, rejection of external noise, vocalized pitch periods, articulator conditions, accurate timing, the identification of the speaker, acoustic speech unit recognition, and organ mechanical parameters can be determined.

  4. Role of the middle ear muscle apparatus in mechanisms of speech signal discrimination

    NASA Technical Reports Server (NTRS)

    Moroz, B. S.; Bazarov, V. G.; Sachenko, S. V.

    1980-01-01

    A method of impedance reflexometry was used to examine 101 students with hearing impairment in order to clarify the interrelation between speech discrimination and the state of the middle ear muscles. Ability to discriminate speech signals depends to some extent on the functional state of intraaural muscles. Speech discrimination was greatly impaired in the absence of stapedial muscle acoustic reflex, in the presence of low thresholds of stimulation and in very small values of reflex amplitude increase. Discrimination was not impeded in positive AR, high values of relative thresholds and normal increase of reflex amplitude in response to speech signals with augmenting intensity.

  5. Speech recognition: Acoustic-phonetic knowledge acquisition and representation

    NASA Astrophysics Data System (ADS)

    Zue, Victor W.

    1987-09-01

    A long-term research goal is the development and implementation of speaker-independent continuous speech recognition systems. It is believed that the proper utilization of speech-specific knowledge is essential for such advanced systems. Research is thus directed toward the acquisition of acoustic-phonetic and lexical knowledge, and the application of this knowledge to speech recognition algorithms. Investigation into the contextual variations of speech sounds has continued, emphasizing the role of the syllable in these variations. Analysis revealed that the acoustic realization of a stop depends greatly on its position within a syllable. In order to represent and utilize this information in speech recognition, a hierarchical syllable description has been adopted that enables us to specify the constraints in terms of an immediate constituent grammar. We will continue to quantify the effect of context on the acoustic realization of phonemes using larger constituent units such as syllables. In addition, a grammar will be developed to describe the relationship between phonemes and acoustic segments, and a parser that will make use of this grammar for phonetic recognition and lexical access.

  6. Empirical mode decomposition for analyzing acoustical signals

    NASA Technical Reports Server (NTRS)

    Huang, Norden E. (Inventor)

    2005-01-01

    The present invention discloses a computer implemented signal analysis method through the Hilbert-Huang Transformation (HHT) for analyzing acoustical signals, which are assumed to be nonlinear and nonstationary. The Empirical Decomposition Method (EMD) and the Hilbert Spectral Analysis (HSA) are used to obtain the HHT. Essentially, the acoustical signal will be decomposed into the Intrinsic Mode Function Components (IMFs). Once the invention decomposes the acoustic signal into its constituting components, all operations such as analyzing, identifying, and removing unwanted signals can be performed on these components. Upon transforming the IMFs into Hilbert spectrum, the acoustical signal may be compared with other acoustical signals.

  7. Acoustic Speech Analysis Of Wayang Golek Puppeteer

    NASA Astrophysics Data System (ADS)

    Hakim, Faisal Abdul; Mandasari, Miranti Indar; Sarwono, Joko

    2010-12-01

    Active disguising speech is one problem to be taken into account in forensic speaker verification or identification processes. The verification processes are usually carried out by comparison between unknown samples and known samples. Active disguising can be occurred on both samples. To simulate the condition of speech disguising, voices of Wayang Golek Puppeteer were used. It is assumed that wayang golek puppeteer is a master of disguise. He can manipulate his voice into many different types of character's voices. This paper discusses the speech characteristics of 2 puppeteers. Comparison was made between the voices of puppeteer's habitual voice with his manipulated voice.

  8. Learning to perceptually organize speech signals in native fashion.

    PubMed

    Nittrouer, Susan; Lowenstein, Joanna H

    2010-03-01

    The ability to recognize speech involves sensory, perceptual, and cognitive processes. For much of the history of speech perception research, investigators have focused on the first and third of these, asking how much and what kinds of sensory information are used by normal and impaired listeners, as well as how effective amounts of that information are altered by "top-down" cognitive processes. This experiment focused on perceptual processes, asking what accounts for how the sensory information in the speech signal gets organized. Two types of speech signals processed to remove properties that could be considered traditional acoustic cues (amplitude envelopes and sine wave replicas) were presented to 100 listeners in five groups: native English-speaking (L1) adults, 7-, 5-, and 3-year-olds, and native Mandarin-speaking adults who were excellent second-language (L2) users of English. The L2 adults performed more poorly than L1 adults with both kinds of signals. Children performed more poorly than L1 adults but showed disproportionately better performance for the sine waves than for the amplitude envelopes compared to both groups of adults. Sentence context had similar effects across groups, so variability in recognition was attributed to differences in perceptual organization of the sensory information, presumed to arise from native language experience. PMID:20329861

  9. Learning to perceptually organize speech signals in native fashion1

    PubMed Central

    Nittrouer, Susan; Lowenstein, Joanna H.

    2010-01-01

    The ability to recognize speech involves sensory, perceptual, and cognitive processes. For much of the history of speech perception research, investigators have focused on the first and third of these, asking how much and what kinds of sensory information are used by normal and impaired listeners, as well as how effective amounts of that information are altered by “top-down” cognitive processes. This experiment focused on perceptual processes, asking what accounts for how the sensory information in the speech signal gets organized. Two types of speech signals processed to remove properties that could be considered traditional acoustic cues (amplitude envelopes and sine wave replicas) were presented to 100 listeners in five groups: native English-speaking (L1) adults, 7-, 5-, and 3-year-olds, and native Mandarin-speaking adults who were excellent second-language (L2) users of English. The L2 adults performed more poorly than L1 adults with both kinds of signals. Children performed more poorly than L1 adults but showed disproportionately better performance for the sine waves than for the amplitude envelopes compared to both groups of adults. Sentence context had similar effects across groups, so variability in recognition was attributed to differences in perceptual organization of the sensory information, presumed to arise from native language experience. PMID:20329861

  10. Effects of human fatigue on speech signals

    NASA Astrophysics Data System (ADS)

    Stamoulis, Catherine

    2001-05-01

    Cognitive performance may be significantly affected by fatigue. In the case of critical personnel, such as pilots, monitoring human fatigue is essential to ensure safety and success of a given operation. One of the modalities that may be used for this purpose is speech, which is sensitive to respiratory changes and increased muscle tension of vocal cords, induced by fatigue. Age, gender, vocal tract length, physical and emotional state may significantly alter speech intensity, duration, rhythm, and spectral characteristics. In addition to changes in speech rhythm, fatigue may also affect the quality of speech, such as articulation. In a noisy environment, detecting fatigue-related changes in speech signals, particularly subtle changes at the onset of fatigue, may be difficult. Therefore, in a performance-monitoring system, speech parameters which are significantly affected by fatigue need to be identified and extracted from input signals. For this purpose, a series of experiments was performed under slowly varying cognitive load conditions and at different times of the day. The results of the data analysis are presented here.

  11. An Acoustic Measure for Word Prominence in Spontaneous Speech

    PubMed Central

    Wang, Dagen; Narayanan, Shrikanth

    2010-01-01

    An algorithm for automatic speech prominence detection is reported in this paper. We describe a comparative analysis on various acoustic features for word prominence detection and report results using a spoken dialog corpus with manually assigned prominence labels. The focus is on features such as spectral intensity and speech rate that are directly extracted from speech based on a correlation-based approach without requiring explicit linguistic or phonetic knowledge. Additionally, various pitch-based measures are studied with respect to their discriminating ability for prominence detection. A parametric scheme for modeling pitch plateau is proposed and this feature alone is found to outperform the traditional local pitch statistics. Two sets of experiments are used to explore the usefulness of the acoustic score generated using these features. The first set focuses on a more traditional way of word prominence detection based on a manually-tagged corpus. A 76.8% classification accuracy was achieved on a corpus of role-playing spoken dialogs. Due to difficulties in manually tagging speech prominence into discrete levels (categories), the second set of experiments focuses on evaluating the score indirectly. Specifically, through experiments on the Switchboard corpus, it is shown that the proposed acoustic score can discriminate between content word and function words in a statistically significant way. The relation between speech prominence and content/function words is also explored. Since prominent words tend to be predominantly content words, and since content words can be automatically marked from text-derived part of speech (POS) information, it is shown that the proposed acoustic score can be indirectly cross-validated through POS information. PMID:20454538

  12. [Influence of human personal features on acoustic correlates of speech emotional intonation characteristics].

    PubMed

    Dmitrieva, E S; Gel'man, V Ia; Zaĭtseva, K A; Orlov, A M

    2009-01-01

    Comparative study of acoustic correlates of emotional intonation was conducted on two types of speech material: sensible speech utterances and short meaningless words. The corpus of speech signals of different emotional intonations (happy, angry, frightened, sad and neutral) was created using the actor's method of simulation of emotions. Native Russian 20-70-year-old speakers (both professional actors and non-actors) participated in the study. In the corpus, the following characteristics were analyzed: mean values and standard deviations of the power, fundamental frequency, frequencies of the first and second formants, and utterance duration. Comparison of each emotional intonation with "neutral" utterances showed the greatest deviations of the fundamental frequency and frequencies of the first formant. The direction of these deviations was independent of the semantic content of speech utterance and its duration, age, gender, and being actor or non-actor, though the personal features of the speakers affected the absolute values of these frequencies. PMID:19947529

  13. Acoustic emission and signal analysis

    NASA Astrophysics Data System (ADS)

    Rao, A. K.

    1990-01-01

    A review is given of the acoustic emission (AE) phenomenon and its applications in NDE and geological rock mechanics. Typical instrumentation used in AE signal detection, data acquisition, processing, and analysis is discussed. The parameters used in AE signal analysis are outlined, and current methods of AE signal analysis procedures are discussed. A literature review is presented on the pattern classification of AE signals. A discussion then follows on the application of AE in aircraft component monitoring, with an experiment described which focuses on in-flight AE monitoring during fatigue crack growth in an aero engine mount. A pattern recognition approach is detailed for the classification of the experimental data. The approach subjects each of the data files to a cluster analysis by the threshold-k-means scheme. The technique is shown to classify the data successfully.

  14. Unvoiced Speech Recognition Using Tissue-Conductive Acoustic Sensor

    NASA Astrophysics Data System (ADS)

    Heracleous, Panikos; Kaino, Tomomi; Saruwatari, Hiroshi; Shikano, Kiyohiro

    2006-12-01

    We present the use of stethoscope and silicon NAM (nonaudible murmur) microphones in automatic speech recognition. NAM microphones are special acoustic sensors, which are attached behind the talker's ear and can capture not only normal (audible) speech, but also very quietly uttered speech (nonaudible murmur). As a result, NAM microphones can be applied in automatic speech recognition systems when privacy is desired in human-machine communication. Moreover, NAM microphones show robustness against noise and they might be used in special systems (speech recognition, speech transform, etc.) for sound-impaired people. Using adaptation techniques and a small amount of training data, we achieved for a 20 k dictation task a[InlineEquation not available: see fulltext.] word accuracy for nonaudible murmur recognition in a clean environment. In this paper, we also investigate nonaudible murmur recognition in noisy environments and the effect of the Lombard reflex on nonaudible murmur recognition. We also propose three methods to integrate audible speech and nonaudible murmur recognition using a stethoscope NAM microphone with very promising results.

  15. Auditory-visual speech perception and synchrony detection for speech and nonspeech signals

    PubMed Central

    Conrey, Brianna; Pisoni, David B.

    2012-01-01

    Previous research has identified a “synchrony window” of several hundred milliseconds over which auditory-visual (AV) asynchronies are not reliably perceived. Individual variability in the size of this AV synchrony window has been linked with variability in AV speech perception measures, but it was not clear whether AV speech perception measures are related to synchrony detection for speech only or for both speech and nonspeech signals. An experiment was conducted to investigate the relationship between measures of AV speech perception and AV synchrony detection for speech and nonspeech signals. Variability in AV synchrony detection for both speech and nonspeech signals was found to be related to variability in measures of auditory-only (A-only) and AV speech perception, suggesting that temporal processing for both speech and nonspeech signals must be taken into account in explaining variability in A-only and multisensory speech perception. PMID:16838548

  16. Speech recognition: Acoustic-phonetic knowledge acquisition and representation

    NASA Astrophysics Data System (ADS)

    Zue, Victor W.

    1988-09-01

    The long-term research goal is to develop and implement speaker-independent continuous speech recognition systems. It is believed that the proper utilization of speech-specific knowledge is essential for such advanced systems. This research is thus directed toward the acquisition, quantification, and representation, of acoustic-phonetic and lexical knowledge, and the application of this knowledge to speech recognition algorithms. In addition, we are exploring new speech recognition alternatives based on artificial intelligence and connectionist techniques. We developed a statistical model for predicting the acoustic realization of stop consonants in various positions in the syllable template. A unification-based grammatical formalism was developed for incorporating this model into the lexical access algorithm. We provided an information-theoretic justification for the hierarchical structure of the syllable template. We analyzed segmented duration for vowels and fricatives in continuous speech. Based on contextual information, we developed durational models for vowels and fricatives that account for over 70 percent of the variance, using data from multiple, unknown speakers. We rigorously evaluated the ability of human spectrogram readers to identify stop consonants spoken by many talkers and in a variety of phonetic contexts. Incorporating the declarative knowledge used by the readers, we developed a knowledge-based system for stop identification. We achieved comparable system performance to that to the readers.

  17. Massively-parallel architectures for automatic recognition of visual speech signals. Annual report

    SciTech Connect

    Sejnowski, T.J.

    1988-10-12

    Significant progress was made in the primary objective of estimating the acoustic characteristics of speech from the visual speech signals. Neural networks were trained on a data base of vowels. The raw images of faces, aligned and preprocessed, were used as input to these network, which were trained to estimate the corresponding envelope of the acoustic spectrum. The performance of the networks was better than trained humans and was comparable with optimized pattern classifiers. The approach avoids the problems of information loss through early categorization. The acoustic information the network extracts from the visual signal can be used to supplement the acoustic signal in noisy environments, such as cockpits. During the next year these results will be extended to diphthongs using recurrent neural networks and temporal sequences of input images.

  18. Acoustic Characteristics of Ataxic Speech in Japanese Patients with Spinocerebellar Degeneration (SCD)

    ERIC Educational Resources Information Center

    Ikui, Yukiko; Tsukuda, Mamoru; Kuroiwa, Yoshiyuki; Koyano, Shigeru; Hirose, Hajime; Taguchi, Takahide

    2012-01-01

    Background: In English- and German-speaking countries, ataxic speech is often described as showing scanning based on acoustic impressions. Although the term "scanning" is generally considered to represent abnormal speech features including prosodic excess or insufficiency, any precise acoustic analysis of ataxic speech has not been performed in…

  19. Prediction of acoustic feature parameters using myoelectric signals.

    PubMed

    Lee, Ki-Seung

    2010-07-01

    It is well-known that a clear relationship exists between human voices and myoelectric signals (MESs) from the area of the speaker's mouth. In this study, we utilized this information to implement a speech synthesis scheme in which MES alone was used to predict the parameters characterizing the vocal-tract transfer function of specific speech signals. Several feature parameters derived from MES were investigated to find the optimal feature for maximization of the mutual information between the acoustic and the MES features. After the optimal feature was determined, an estimation rule for the acoustic parameters was proposed, based on a minimum mean square error (MMSE) criterion. In a preliminary study, 60 isolated words were used for both objective and subjective evaluations. The results showed that the average Euclidean distance between the original and predicted acoustic parameters was reduced by about 30% compared with the average Euclidean distance of the original parameters. The intelligibility of the synthesized speech signals using the predicted features was also evaluated. A word-level identification ratio of 65.5% and a syllable-level identification ratio of 73% were obtained through a listening test. PMID:20172775

  20. Fundamental frequency is critical to speech perception in noise in combined acoustic and electric hearinga

    PubMed Central

    Carroll, Jeff; Tiaden, Stephanie; Zeng, Fan-Gang

    2011-01-01

    Cochlear implant (CI) users have been shown to benefit from residual low-frequency hearing, specifically in pitch related tasks. It remains unclear whether this benefit is dependent on fundamental frequency (F0) or other acoustic cues. Three experiments were conducted to determine the role of F0, as well as its frequency modulated (FM) and amplitude modulated (AM) components, in speech recognition with a competing voice. In simulated CI listeners, the signal-to-noise ratio was varied to estimate the 50% correct response. Simulation results showed that the F0 cue contributes to a significant proportion of the benefit seen with combined acoustic and electric hearing, and additionally that this benefit is due to the FM rather than the AM component. In actual CI users, sentence recognition scores were collected with either the full F0 cue containing both the FM and AM components or the 500-Hz low-pass speech cue containing the F0 and additional harmonics. The F0 cue provided a benefit similar to the low-pass cue for speech in noise, but not in quiet. Poorer CI users benefited more from the F0 cue than better users. These findings suggest that F0 is critical to improving speech perception in noise in combined acoustic and electric hearing. PMID:21973360

  1. Predicting the intelligibility of deaf children's speech from acoustic measures

    NASA Astrophysics Data System (ADS)

    Uchanski, Rosalie M.; Geers, Ann E.; Brenner, Christine M.; Tobey, Emily A.

    2001-05-01

    A weighted combination of speech-acoustic measures may provide an objective assessment of speech intelligibility in deaf children that could be used to evaluate the benefits of sensory aids and rehabilitation programs. This investigation compared the accuracy of two different approaches, multiple linear regression and a simple neural net. These two methods were applied to identical sets of acoustic measures, including both segmental (e.g., voice-onset times of plosives, spectral moments of fricatives, second formant frequencies of vowels) and suprasegmental measures (e.g., sentence duration, number and frequency of intersentence pauses). These independent variables were obtained from digitized recordings of deaf children's imitations of 11 simple sentences. The dependent measure was the percentage of spoken words from the 36 McGarr Sentences understood by groups of naive listeners. The two predictive methods were trained on speech measures obtained from 123 out of 164 8- and 9-year-old deaf children who used cochlear implants. Then, predictions were obtained using speech measures from the remaining 41 children. Preliminary results indicate that multiple linear regression is a better predictor of intelligibility than the neural net, accounting for 79% as opposed to 65% of the variance in the data. [Work supported by NIH.

  2. Prolonged Speech and Modification of Stuttering: Perceptual, Acoustic, and Electroglottographic Data.

    ERIC Educational Resources Information Center

    Packman, Ann; And Others

    1994-01-01

    This study investigated changes in the speech patterns of young adult male subjects when stuttering was modified by deliberately prolonging speech. Three subjects showed clinically significant stuttering reductions when using prolonged speech to reduce their stuttering. Resulting speech was perceptually stutter free. Acoustic and…

  3. Tongue-Palate Contact Pressure, Oral Air Pressure, and Acoustics of Clear Speech

    ERIC Educational Resources Information Center

    Searl, Jeff; Evitts, Paul M.

    2013-01-01

    Purpose: The authors compared articulatory contact pressure (ACP), oral air pressure (Po), and speech acoustics for conversational versus clear speech. They also assessed the relationship of these measures to listener perception. Method: Twelve adults with normal speech produced monosyllables in a phrase using conversational and clear speech.…

  4. Language identification from visual-only speech signals

    PubMed Central

    Ronquest, Rebecca E.; Levi, Susannah V.; Pisoni, David B.

    2010-01-01

    Our goal in the present study was to examine how observers identify English and Spanish from visual-only displays of speech. First, we replicated the recent findings of Soto-Faraco et al. (2007) with Spanish and English bilingual and monolingual observers using different languages and a different experimental paradigm (identification). We found that prior linguistic experience affected response bias but not sensitivity (Experiment 1). In two additional experiments, we investigated the visual cues that observers use to complete the language-identification task. The results of Experiment 2 indicate that some lexical information is available in the visual signal but that it is limited. Acoustic analyses confirmed that our Spanish and English stimuli differed acoustically with respect to linguistic rhythmic categories. In Experiment 3, we tested whether this rhythmic difference could be used by observers to identify the language when the visual stimuli is temporally reversed, thereby eliminating lexical information but retaining rhythmic differences. The participants performed above chance even in the backward condition, suggesting that the rhythmic differences between the two languages may aid language identification in visual-only speech signals. The results of Experiments 3A and 3B also confirm previous findings that increased stimulus length facilitates language identification. Taken together, the results of these three experiments replicate earlier findings and also show that prior linguistic experience, lexical information, rhythmic structure, and utterance length influence visual-only language identification. PMID:20675804

  5. Spatial acoustic signal processing for immersive communication

    NASA Astrophysics Data System (ADS)

    Atkins, Joshua

    Computing is rapidly becoming ubiquitous as users expect devices that can augment and interact naturally with the world around them. In these systems it is necessary to have an acoustic front-end that is able to capture and reproduce natural human communication. Whether the end point is a speech recognizer or another human listener, the reduction of noise, reverberation, and acoustic echoes are all necessary and complex challenges. The focus of this dissertation is to provide a general method for approaching these problems using spherical microphone and loudspeaker arrays.. In this work, a theory of capturing and reproducing three-dimensional acoustic fields is introduced from a signal processing perspective. In particular, the decomposition of the spatial part of the acoustic field into an orthogonal basis of spherical harmonics provides not only a general framework for analysis, but also many processing advantages. The spatial sampling error limits the upper frequency range with which a sound field can be accurately captured or reproduced. In broadband arrays, the cost and complexity of using multiple transducers is an issue. This work provides a flexible optimization method for determining the location of array elements to minimize the spatial aliasing error. The low frequency array processing ability is also limited by the SNR, mismatch, and placement error of transducers. To address this, a robust processing method is introduced and used to design a reproduction system for rendering over arbitrary loudspeaker arrays or binaurally over headphones. In addition to the beamforming problem, the multichannel acoustic echo cancellation (MCAEC) issue is also addressed. A MCAEC must adaptively estimate and track the constantly changing loudspeaker-room-microphone response to remove the sound field presented over the loudspeakers from that captured by the microphones. In the multichannel case, the system is overdetermined and many adaptive schemes fail to converge to

  6. Processing speech signal using auditory-like filterbank provides least uncertainty about articulatory gestures

    PubMed Central

    Ghosh, Prasanta Kumar; Goldstein, Louis M.; Narayanan, Shrikanth S.

    2011-01-01

    Understanding how the human speech production system is related to the human auditory system has been a perennial subject of inquiry. To investigate the production–perception link, in this paper, a computational analysis has been performed using the articulatory movement data obtained during speech production with concurrently recorded acoustic speech signals from multiple subjects in three different languages: English, Cantonese, and Georgian. The form of articulatory gestures during speech production varies across languages, and this variation is considered to be reflected in the articulatory position and kinematics. The auditory processing of the acoustic speech signal is modeled by a parametric representation of the cochlear filterbank which allows for realizing various candidate filterbank structures by changing the parameter value. Using mathematical communication theory, it is found that the uncertainty about the articulatory gestures in each language is maximally reduced when the acoustic speech signal is represented using the output of a filterbank similar to the empirically established cochlear filterbank in the human auditory system. Possible interpretations of this finding are discussed. PMID:21682422

  7. A 94-GHz Millimeter-Wave Sensor for Speech Signal Acquisition

    PubMed Central

    Li, Sheng; Tian, Ying; Lu, Guohua; Zhang, Yang; Lv, Hao; Yu, Xiao; Xue, Huijun; Zhang, Hua; Wang, Jianqi; Jing, Xijing

    2013-01-01

    High frequency millimeter-wave (MMW) radar-like sensors enable the detection of speech signals. This novel non-acoustic speech detection method has some special advantages not offered by traditional microphones, such as preventing strong-acoustic interference, high directional sensitivity with penetration, and long detection distance. A 94-GHz MMW radar sensor was employed in this study to test its speech acquisition ability. A 34-GHz zero intermediate frequency radar, a 34-GHz superheterodyne radar, and a microphone were also used for comparison purposes. A short-time phase-spectrum-compensation algorithm was used to enhance the detected speech. The results reveal that the 94-GHz radar sensor showed the highest sensitivity and obtained the highest speech quality subjective measurement score. This result suggests that the MMW radar sensor has better performance than a traditional microphone in terms of speech detection for detection distances longer than 1 m. As a substitute for the traditional speech acquisition method, this novel speech acquisition method demonstrates a large potential for many speech related applications. PMID:24284764

  8. A 94-GHz millimeter-wave sensor for speech signal acquisition.

    PubMed

    Li, Sheng; Tian, Ying; Lu, Guohua; Zhang, Yang; Lv, Hao; Yu, Xiao; Xue, Huijun; Zhang, Hua; Wang, Jianqi; Jing, Xijing

    2013-01-01

    High frequency millimeter-wave (MMW) radar-like sensors enable the detection of speech signals. This novel non-acoustic speech detection method has some special advantages not offered by traditional microphones, such as preventing strong-acoustic interference, high directional sensitivity with penetration, and long detection distance. A 94-GHz MMW radar sensor was employed in this study to test its speech acquisition ability. A 34-GHz zero intermediate frequency radar, a 34-GHz superheterodyne radar, and a microphone were also used for comparison purposes. A short-time phase-spectrum-compensation algorithm was used to enhance the detected speech. The results reveal that the 94-GHz radar sensor showed the highest sensitivity and obtained the highest speech quality subjective measurement score. This result suggests that the MMW radar sensor has better performance than a traditional microphone in terms of speech detection for detection distances longer than 1 m. As a substitute for the traditional speech acquisition method, this novel speech acquisition method demonstrates a large potential for many speech related applications. PMID:24284764

  9. Speech Recognition and Acoustic Features in Combined Electric and Acoustic Stimulation

    ERIC Educational Resources Information Center

    Yoon, Yang-soo; Li, Yongxin; Fu, Qian-Jie

    2012-01-01

    Purpose: In this study, the authors aimed to identify speech information processed by a hearing aid (HA) that is additive to information processed by a cochlear implant (CI) as a function of signal-to-noise ratio (SNR). Method: Speech recognition was measured with CI alone, HA alone, and CI + HA. Ten participants were separated into 2 groups; good…

  10. Improved perception of speech in noise and Mandarin tones with acoustic simulations of harmonic coding for cochlear implants.

    PubMed

    Li, Xing; Nie, Kaibao; Imennov, Nikita S; Won, Jong Ho; Drennan, Ward R; Rubinstein, Jay T; Atlas, Les E

    2012-11-01

    Harmonic and temporal fine structure (TFS) information are important cues for speech perception in noise and music perception. However, due to the inherently coarse spectral and temporal resolution in electric hearing, the question of how to deliver harmonic and TFS information to cochlear implant (CI) users remains unresolved. A harmonic-single-sideband-encoder [(HSSE); Nie et al. (2008). Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing; Lie et al., (2010). Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing] strategy has been proposed that explicitly tracks the harmonics in speech and transforms them into modulators conveying both amplitude modulation and fundamental frequency information. For unvoiced speech, HSSE transforms the TFS into a slowly varying yet still noise-like signal. To investigate its potential, four- and eight-channel vocoder simulations of HSSE and the continuous-interleaved-sampling (CIS) strategy were implemented, respectively. Using these vocoders, five normal-hearing subjects' speech recognition performance was evaluated under different masking conditions; another five normal-hearing subjects' Mandarin tone identification performance was also evaluated. Additionally, the neural discharge patterns evoked by HSSE- and CIS-encoded Mandarin tone stimuli were simulated using an auditory nerve model. All subjects scored significantly higher with HSSE than with CIS vocoders. The modeling analysis demonstrated that HSSE can convey temporal pitch cues better than CIS. Overall, the results suggest that HSSE is a promising strategy to enhance speech perception with CIs. PMID:23145619

  11. Adding articulatory features to acoustic features for automatic speech recognition

    SciTech Connect

    Zlokarnik, I.

    1995-05-01

    A hidden-Markov-model (HMM) based speech recognition system was evaluated that makes use of simultaneously recorded acoustic and articulatory data. The articulatory measurements were gathered by means of electromagnetic articulography and describe the movement of small coils fixed to the speakers` tongue and jaw during the production of German V{sub 1}CV{sub 2} sequences [P. Hoole and S. Gfoerer, J. Acoust. Soc. Am. Suppl. 1 {bold 87}, S123 (1990)]. Using the coordinates of the coil positions as an articulatory representation, acoustic and articulatory features were combined to make up an acoustic--articulatory feature vector. The discriminant power of this combined representation was evaluated for two subjects on a speaker-dependent isolated word recognition task. When the articulatory measurements were used both for training and testing the HMMs, the articulatory representation was capable of reducing the error rate of comparable acoustic-based HMMs by a relative percentage of more than 60%. In a separate experiment, the articulatory movements during the testing phase were estimated using a multilayer perceptron that performed an acoustic-to-articulatory mapping. Under these more realistic conditions, when articulatory measurements are only available during the training, the error rate could be reduced by a relative percentage of 18% to 25%.

  12. Massively parallel network architectures for automatic recognition of visual speech signals. Final technical report

    SciTech Connect

    Sejnowski, T.J.; Goldstein, M.

    1990-01-01

    This research sought to produce a massively-parallel network architecture that could interpret speech signals from video recordings of human talkers. This report summarizes the project's results: (1) A corpus of video recordings from two human speakers was analyzed with image processing techniques and used as the data for this study; (2) We demonstrated that a feed forward network could be trained to categorize vowels from these talkers. The performance was comparable to that of the nearest neighbors techniques and to trained humans on the same data; (3) We developed a novel approach to sensory fusion by training a network to transform from facial images to short-time spectral amplitude envelopes. This information can be used to increase the signal-to-noise ratio and hence the performance of acoustic speech recognition systems in noisy environments; (4) We explored the use of recurrent networks to perform the same mapping for continuous speech. Results of this project demonstrate the feasibility of adding a visual speech recognition component to enhance existing speech recognition systems. Such a combined system could be used in noisy environments, such as cockpits, where improved communication is needed. This demonstration of presymbolic fusion of visual and acoustic speech signals is consistent with our current understanding of human speech perception.

  13. A Visual or Tactile Signal Makes Auditory Speech Detection More Efficient by Reducing Uncertainty

    PubMed Central

    Tjan, Bosco S.; Chao, Ewen; Bernstein, Lynne E.

    2014-01-01

    Acoustic speech is easier to detect in noise when the talker can be seen. This finding could be explained by integration of multisensory inputs or refinement of auditory processing from visual guidance. In two experiments, we studied two-interval forced choice detection of an auditory “ba” in acoustic noise, paired with various visual and tactile stimuli that were identically presented in both observation intervals. Detection thresholds were reduced under the multisensory conditions versus the auditory-only condition, even though the visual and/or tactile stimuli alone could not inform the correct response. Results were analyzed relative to an ideal observer for which intrinsic (internal) noise and efficiency were independent contributors to detection sensitivity. Across experiments, intrinsic noise was unaffected by the multisensory stimuli, arguing against the merging (integrating) of multisensory inputs into a unitary speech signal; but sampling efficiency was increased to varying degrees, supporting refinement of knowledge about the auditory stimulus. The steepness of the psychometric functions decreased with increasing sampling efficiency, suggesting that the “task-irrelevant” visual and tactile stimuli reduced uncertainty about the acoustic signal. Visible speech was not superior for enhancing auditory speech detection. Our results reject multisensory neuronal integration and speech-specific neural processing as explanations for enhanced auditory speech detection under noisy conditions. Instead, our results support a more rudimentary form of multisensory interaction – the otherwise task-irrelevant sensory systems inform the auditory system about when to listen. PMID:24400652

  14. Acoustic and Perceptual Consequences of Clear and Loud Speech

    PubMed Central

    Tjaden, Kris; Richards, Emily; Kuo, Christina; Wilding, Greg; Sussman, Joan

    2014-01-01

    Objective Several issues concerning F2 slope in dysarthria were addressed by obtaining speech acoustic measures and judgments of intelligibility for sentences produced in Habitual, Clear and Loud conditions by speakers with Parkinson's disease (PD) and healthy controls. Patients and Methods Acoustic measures of average and maximum F2 slope for diphthongs, duration and intensity were obtained. Listeners judged intelligibility using a visual analog scale. Differences in measures among groups and conditions as well as relationships among measures were examined. Results Average and maximum F2 slope metrics were strongly correlated, but only average F2 slope consistently differed among groups and conditions, with shallower slopes for the PD group and steeper slopes for Clear speech versus Habitual and Loud. Clear and Loud speech were also characterized by lengthened durations, increased intensity and improved intelligibility versus Habitual. F2 slope and intensity were unrelated, and F2 slope was a significant predictor of intelligibility. Conclusion Average diphthong F2 slope was more sensitive than maximum F2 slope to articulatory mechanism involvement in mild dysarthria in PD. F2 slope holds promise as an objective measure of treatment-related changes in the articulatory mechanism for therapeutic techniques that focus on articulation. PMID:24504015

  15. Learning Speech Variability in Discriminative Acoustic Model Adaptation

    NASA Astrophysics Data System (ADS)

    Sato, Shoei; Oku, Takahiro; Homma, Shinichi; Kobayashi, Akio; Imai, Toru

    We present a new discriminative method of acoustic model adaptation that deals with a task-dependent speech variability. We have focused on differences of expressions or speaking styles between tasks and set the objective of this method as improving the recognition accuracy of indistinctly pronounced phrases dependent on a speaking style.The adaptation appends subword models for frequently observable variants of subwords in the task. To find the task-dependent variants, low-confidence words are statistically selected from words with higher frequency in the task's adaptation data by using their word lattices. HMM parameters of subword models dependent on the words are discriminatively trained by using linear transforms with a minimum phoneme error (MPE) criterion. For the MPE training, subword accuracy discriminating between the variants and the originals is also investigated. In speech recognition experiments, the proposed adaptation with the subword variants reduced the word error rate by 12.0% relative in a Japanese conversational broadcast task.

  16. Suppressed alpha oscillations predict intelligibility of speech and its acoustic details.

    PubMed

    Obleser, Jonas; Weisz, Nathan

    2012-11-01

    Modulations of human alpha oscillations (8-13 Hz) accompany many cognitive processes, but their functional role in auditory perception has proven elusive: Do oscillatory dynamics of alpha reflect acoustic details of the speech signal and are they indicative of comprehension success? Acoustically presented words were degraded in acoustic envelope and spectrum in an orthogonal design, and electroencephalogram responses in the frequency domain were analyzed in 24 participants, who rated word comprehensibility after each trial. First, the alpha power suppression during and after a degraded word depended monotonically on spectral and, to a lesser extent, envelope detail. The magnitude of this alpha suppression exhibited an additional and independent influence on later comprehension ratings. Second, source localization of alpha suppression yielded superior parietal, prefrontal, as well as anterior temporal brain areas. Third, multivariate classification of the time-frequency pattern across participants showed that patterns of late posterior alpha power allowed best for above-chance classification of word intelligibility. Results suggest that both magnitude and topography of late alpha suppression in response to single words can indicate a listener's sensitivity to acoustic features and the ability to comprehend speech under adverse listening conditions. PMID:22100354

  17. Suppressed Alpha Oscillations Predict Intelligibility of Speech and its Acoustic Details

    PubMed Central

    Weisz, Nathan

    2012-01-01

    Modulations of human alpha oscillations (8–13 Hz) accompany many cognitive processes, but their functional role in auditory perception has proven elusive: Do oscillatory dynamics of alpha reflect acoustic details of the speech signal and are they indicative of comprehension success? Acoustically presented words were degraded in acoustic envelope and spectrum in an orthogonal design, and electroencephalogram responses in the frequency domain were analyzed in 24 participants, who rated word comprehensibility after each trial. First, the alpha power suppression during and after a degraded word depended monotonically on spectral and, to a lesser extent, envelope detail. The magnitude of this alpha suppression exhibited an additional and independent influence on later comprehension ratings. Second, source localization of alpha suppression yielded superior parietal, prefrontal, as well as anterior temporal brain areas. Third, multivariate classification of the time–frequency pattern across participants showed that patterns of late posterior alpha power allowed best for above-chance classification of word intelligibility. Results suggest that both magnitude and topography of late alpha suppression in response to single words can indicate a listener's sensitivity to acoustic features and the ability to comprehend speech under adverse listening conditions. PMID:22100354

  18. Effect of acoustic fine structure cues on the recognition of auditory-only and audiovisual speech.

    PubMed

    Meister, Hartmut; Fuersen, Katrin; Schreitmueller, Stefan; Walger, Martin

    2016-06-01

    This study addressed the hypothesis that an improvement in speech recognition due to combined envelope and fine structure cues is greater in the audiovisual than the auditory modality. Normal hearing listeners were presented with envelope vocoded speech in combination with low-pass filtered speech. The benefit of adding acoustic low-frequency fine structure to acoustic envelope cues was significantly greater for audiovisual than for auditory-only speech. It is suggested that this is due to complementary information of the different acoustic and visual cues. The results have potential implications for the assessment of bimodal cochlear implant fittings or electroacoustic stimulation. PMID:27369134

  19. Method and apparatus for obtaining complete speech signals for speech recognition applications

    NASA Technical Reports Server (NTRS)

    Abrash, Victor (Inventor); Cesari, Federico (Inventor); Franco, Horacio (Inventor); George, Christopher (Inventor); Zheng, Jing (Inventor)

    2009-01-01

    The present invention relates to a method and apparatus for obtaining complete speech signals for speech recognition applications. In one embodiment, the method continuously records an audio stream comprising a sequence of frames to a circular buffer. When a user command to commence or terminate speech recognition is received, the method obtains a number of frames of the audio stream occurring before or after the user command in order to identify an augmented audio signal for speech recognition processing. In further embodiments, the method analyzes the augmented audio signal in order to locate starting and ending speech endpoints that bound at least a portion of speech to be processed for recognition. At least one of the speech endpoints is located using a Hidden Markov Model.

  20. Acoustic Predictors of Intelligibility for Segmentally Interrupted Speech: Temporal Envelope, Voicing, and Duration

    ERIC Educational Resources Information Center

    Fogerty, Daniel

    2013-01-01

    Purpose: Temporal interruption limits the perception of speech to isolated temporal glimpses. An analysis was conducted to determine the acoustic parameter that best predicts speech recognition from temporal fragments that preserve different types of speech information--namely, consonants and vowels. Method: Young listeners with normal hearing…

  1. Perceptual and Acoustic Reliability Estimates for the Speech Disorders Classification System (SDCS)

    ERIC Educational Resources Information Center

    Shriberg, Lawrence D.; Fourakis, Marios; Hall, Sheryl D.; Karlsson, Heather B.; Lohmeier, Heather L.; McSweeny, Jane L.; Potter, Nancy L.; Scheer-Cohen, Alison R.; Strand, Edythe A.; Tilkens, Christie M.; Wilson, David L.

    2010-01-01

    A companion paper describes three extensions to a classification system for paediatric speech sound disorders termed the Speech Disorders Classification System (SDCS). The SDCS uses perceptual and acoustic data reduction methods to obtain information on a speaker's speech, prosody, and voice. The present paper provides reliability estimates for…

  2. Emotional speech acoustic model for Malay: iterative versus isolated unit training.

    PubMed

    Mustafa, Mumtaz Begum; Ainon, Raja Noor

    2013-10-01

    The ability of speech synthesis system to synthesize emotional speech enhances the user's experience when using this kind of system and its related applications. However, the development of an emotional speech synthesis system is a daunting task in view of the complexity of human emotional speech. The more recent state-of-the-art speech synthesis systems, such as the one based on hidden Markov models, can synthesize emotional speech with acceptable naturalness with the use of a good emotional speech acoustic model. However, building an emotional speech acoustic model requires adequate resources including segment-phonetic labels of emotional speech, which is a problem for many under-resourced languages, including Malay. This research shows how it is possible to build an emotional speech acoustic model for Malay with minimal resources. To achieve this objective, two forms of initialization methods were considered: iterative training using the deterministic annealing expectation maximization algorithm and the isolated unit training. The seed model for the automatic segmentation is a neutral speech acoustic model, which was transformed to target emotion using two transformation techniques: model adaptation and context-dependent boundary refinement. Two forms of evaluation have been performed: an objective evaluation measuring the prosody error and a listening evaluation to measure the naturalness of the synthesized emotional speech. PMID:24116440

  3. Acoustic Analysis of the Voiced-Voiceless Distinction in Dutch Tracheoesophageal Speech

    ERIC Educational Resources Information Center

    Jongmans, Petra; Wempe, Ton G.; van Tinteren, Harm; Hilgers, Frans J. M.; Pols, Louis C. W.; van As-Brooks, Corina J.

    2010-01-01

    Purpose: Confusions between voiced and voiceless plosives and voiced and voiceless fricatives are common in Dutch tracheoesophageal (TE) speech. This study investigates (a) which acoustic measures are found to convey a correct voicing contrast in TE speech and (b) whether different measures are found in TE speech than in normal laryngeal (NL)…

  4. Acoustic-Emergent Phonology in the Amplitude Envelope of Child-Directed Speech

    PubMed Central

    Leong, Victoria; Goswami, Usha

    2015-01-01

    When acquiring language, young children may use acoustic spectro-temporal patterns in speech to derive phonological units in spoken language (e.g., prosodic stress patterns, syllables, phonemes). Children appear to learn acoustic-phonological mappings rapidly, without direct instruction, yet the underlying developmental mechanisms remain unclear. Across different languages, a relationship between amplitude envelope sensitivity and phonological development has been found, suggesting that children may make use of amplitude modulation (AM) patterns within the envelope to develop a phonological system. Here we present the Spectral Amplitude Modulation Phase Hierarchy (S-AMPH) model, a set of algorithms for deriving the dominant AM patterns in child-directed speech (CDS). Using Principal Components Analysis, we show that rhythmic CDS contains an AM hierarchy comprising 3 core modulation timescales. These timescales correspond to key phonological units: prosodic stress (Stress AM, ~2 Hz), syllables (Syllable AM, ~5 Hz) and onset-rime units (Phoneme AM, ~20 Hz). We argue that these AM patterns could in principle be used by naïve listeners to compute acoustic-phonological mappings without lexical knowledge. We then demonstrate that the modulation statistics within this AM hierarchy indeed parse the speech signal into a primitive hierarchically-organised phonological system comprising stress feet (proto-words), syllables and onset-rime units. We apply the S-AMPH model to two other CDS corpora, one spontaneous and one deliberately-timed. The model accurately identified 72–82% (freely-read CDS) and 90–98% (rhythmically-regular CDS) stress patterns, syllables and onset-rime units. This in-principle demonstration that primitive phonology can be extracted from speech AMs is termed Acoustic-Emergent Phonology (AEP) theory. AEP theory provides a set of methods for examining how early phonological development is shaped by the temporal modulation structure of speech across

  5. Acoustic landmarks drive delta-theta oscillations to enable speech comprehension by facilitating perceptual parsing

    PubMed Central

    Doelling, Keith; Arnal, Luc; Ghitza, Oded; Poeppel, David

    2013-01-01

    A growing body of research suggests that intrinsic neuronal slow (< 10 Hz) oscillations in auditory cortex appear to track incoming speech and other spectro-temporally complex auditory signals. Within this framework, several recent studies have identified critical-band temporal envelopes as the specific acoustic feature being reflected by the phase of these oscillations. However, how this alignment between speech acoustics and neural oscillations might underpin intelligibility is unclear. Here we test the hypothesis that the ‘sharpness’ of temporal fluctuations in the critical band envelope acts as a temporal cue to speech syllabic rate, driving delta-theta rhythms to track the stimulus and facilitate intelligibility. We interpret our findings as evidence that sharp events in the stimulus cause cortical rhythms to re-align and parse the stimulus into syllable-sized chunks for further decoding. Using magnetoencephalographic recordings, we show that by removing temporal fluctuations that occur at the syllabic rate, envelope-tracking activity is reduced. By artificially reinstating these temporal fluctuations, envelope-tracking activity is regained. These changes in tracking correlate with intelligibility of the stimulus. Together, the results suggest that the sharpness of fluctuations in the stimulus, as reflected in the cochlear output, drive oscillatory activity to track and entrain to the stimulus, at its syllabic rate. This process likely facilitates parsing of the stimulus into meaningful chunks appropriate for subsequent decoding, enhancing perception and intelligibility. PMID:23791839

  6. Internet-oriented visualization with audio presentation of speech signals

    NASA Astrophysics Data System (ADS)

    Braun, Jerome J.; Levkowitz, Haim

    1998-05-01

    Visualization of speech signals, including the capability to visualize the waveforms while simultaneously hearing the speech, is among the essential requirements in speech processing research. In tasks related to labeling of speech signals, visualization activities may have to be performed by multiple users upon a centralized collection of speech data. When speech labeling activities involve perceptual issues, the human factors issues including functionality tradeoffs are particularly important, since the user's burden (tiredness, annoyance) can affect the perceptual responses. We developed VideVox (pronounced 'Veedeh-Vox'), a speech visualization facility, in which the visualization activities may be performed by a large number of users in geographically, dialectically and linguistically diverse locations. Developed in Java, and capable of operating both as an Internet Java applet and a Java application, VideVox is platform independent. Using the client-server architecture paradigm, it allows distributed visualization work. The Internet orientation makes VideVox a promising direction for speech signal visualization in speech labeling activities that require a large number of users in multiple locations. In the paper, we describe our approach, VideVox features, modes of audio data exploration and audio-synchronous animation for speech visualization, operations related to identification of perceptual events, and the human factors issues related to perception-oriented visualizations of speech.

  7. Moving to the Speed of Sound: Context Modulation of the Effect of Acoustic Properties of Speech

    ERIC Educational Resources Information Center

    Shintel, Hadas; Nusbaum, Howard C.

    2008-01-01

    Suprasegmental acoustic patterns in speech can convey meaningful information and affect listeners' interpretation in various ways, including through systematic analog mapping of message-relevant information onto prosody. We examined whether the effect of analog acoustic variation is governed by the acoustic properties themselves. For example, fast…

  8. Fluid-acoustic interactions and their impact on pathological voiced speech

    NASA Astrophysics Data System (ADS)

    Erath, Byron D.; Zanartu, Matias; Peterson, Sean D.; Plesniak, Michael W.

    2011-11-01

    Voiced speech is produced by vibration of the vocal fold structures. Vocal fold dynamics arise from aerodynamic pressure loadings, tissue properties, and acoustic modulation of the driving pressures. Recent speech science advancements have produced a physiologically-realistic fluid flow solver (BLEAP) capable of prescribing asymmetric intraglottal flow attachment that can be easily assimilated into reduced order models of speech. The BLEAP flow solver is extended to incorporate acoustic loading and sound propagation in the vocal tract by implementing a wave reflection analog approach for sound propagation based on the governing BLEAP equations. This enhanced physiological description of the physics of voiced speech is implemented into a two-mass model of speech. The impact of fluid-acoustic interactions on vocal fold dynamics is elucidated for both normal and pathological speech through linear and nonlinear analysis techniques. Supported by NSF Grant CBET-1036280.

  9. Acoustical Characteristics of Mastication Sounds: Application of Speech Analysis Techniques

    NASA Astrophysics Data System (ADS)

    Brochetti, Denise

    Food scientists have used acoustical methods to study characteristics of mastication sounds in relation to food texture. However, a model for analysis of the sounds has not been identified, and reliability of the methods has not been reported. Therefore, speech analysis techniques were applied to mastication sounds, and variation in measures of the sounds was examined. To meet these objectives, two experiments were conducted. In the first experiment, a digital sound spectrograph generated waveforms and wideband spectrograms of sounds by 3 adult subjects (1 male, 2 females) for initial chews of food samples differing in hardness and fracturability. Acoustical characteristics were described and compared. For all sounds, formants appeared in the spectrograms, and energy occurred across a 0 to 8000-Hz range of frequencies. Bursts characterized waveforms for peanut, almond, raw carrot, ginger snap, and hard candy. Duration and amplitude of the sounds varied with the subjects. In the second experiment, the spectrograph was used to measure the duration, amplitude, and formants of sounds for the initial 2 chews of cylindrical food samples (raw carrot, teething toast) differing in diameter (1.27, 1.90, 2.54 cm). Six adult subjects (3 males, 3 females) having normal occlusions and temporomandibular joints chewed the samples between the molar teeth and with the mouth open. Ten repetitions per subject were examined for each food sample. Analysis of estimates of variation indicated an inconsistent intrasubject variation in the acoustical measures. Food type and sample diameter also affected the estimates, indicating the variable nature of mastication. Generally, intrasubject variation was greater than intersubject variation. Analysis of ranks of the data indicated that the effect of sample diameter on the acoustical measures was inconsistent and depended on the subject and type of food. If inferences are to be made concerning food texture from acoustical measures of mastication

  10. How stable are acoustic metrics of contrastive speech rhythm?

    PubMed

    Wiget, Lukas; White, Laurence; Schuppler, Barbara; Grenon, Izabelle; Rauch, Olesya; Mattys, Sven L

    2010-03-01

    Acoustic metrics of contrastive speech rhythm, based on vocalic and intervocalic interval durations, are intended to capture stable typological differences between languages. They should consequently be robust to variation between speakers, sentence materials, and measurers. This paper assesses the impact of these sources of variation on the metrics %V (proportion of utterance comprised of vocalic intervals), VarcoV (rate-normalized standard deviation of vocalic interval duration), and nPVI-V (a measure of the durational variability between successive pairs of vocalic intervals). Five measurers analyzed the same corpus of speech: five sentences read by six speakers of Standard Southern British English. Differences between sentences were responsible for the greatest variation in rhythm scores. Inter-speaker differences were also a source of significant variability. However, there was relatively little variation due to segmentation differences between measurers following an agreed protocol. An automated phone alignment process was also used: Rhythm scores thus derived showed good agreement with the human measurers. A number of recommendations for researchers wishing to exploit contrastive rhythm metrics are offered in conclusion. PMID:20329856

  11. Discrimination of Speech Stimuli Based on Neuronal Response Phase Patterns Depends on Acoustics But Not Comprehension

    PubMed Central

    Poeppel, David

    2010-01-01

    Speech stimuli give rise to neural activity in the listener that can be observed as waveforms using magnetoencephalography. Although waveforms vary greatly from trial to trial due to activity unrelated to the stimulus, it has been demonstrated that spoken sentences can be discriminated based on theta-band (3–7 Hz) phase patterns in single-trial response waveforms. Furthermore, manipulations of the speech signal envelope and fine structure that reduced intelligibility were found to produce correlated reductions in discrimination performance, suggesting a relationship between theta-band phase patterns and speech comprehension. This study investigates the nature of this relationship, hypothesizing that theta-band phase patterns primarily reflect cortical processing of low-frequency (<40 Hz) modulations present in the acoustic signal and required for intelligibility, rather than processing exclusively related to comprehension (e.g., lexical, syntactic, semantic). Using stimuli that are quite similar to normal spoken sentences in terms of low-frequency modulation characteristics but are unintelligible (i.e., their time-inverted counterparts), we find that discrimination performance based on theta-band phase patterns is equal for both types of stimuli. Consistent with earlier findings, we also observe that whereas theta-band phase patterns differ across stimuli, power patterns do not. We use a simulation model of the single-trial response to spoken sentence stimuli to demonstrate that phase-locked responses to low-frequency modulations of the acoustic signal can account not only for the phase but also for the power results. The simulation offers insight into the interpretation of the empirical results with respect to phase-resetting and power-enhancement models of the evoked response. PMID:20484530

  12. Capacity of voiceband channel with speech signal interference

    NASA Astrophysics Data System (ADS)

    Wulich, D.; Goldfeld, L.

    1994-08-01

    An estimation of the capacity of a voiceband channel with speech signal interference and background Gaussian white noise has been made. The solution is based on the fact that over a time interval of tens of milliseconds the speech signal can be considered as a stationary Gaussian process. In such a model the total interference is nonwhite but Gaussian, a situation for which the capacity can be found according to the formulas given in classical literature. The results are important where the voice signal acts as an interference, for example the crosstalk problem in telephone lines or data over voice (DOV) systems where the speech is transmitted simultaneously with the digitally modulated signal.

  13. DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1

    NASA Astrophysics Data System (ADS)

    Garofolo, J. S.; Lamel, L. F.; Fisher, W. M.; Fiscus, J. G.; Pallett, D. S.

    1993-02-01

    The Texas Instruments/Massachusetts Institute of Technology (TIMIT) corpus of read speech has been designed to provide speech data for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of automatic speech recognition systems. TIMIT contains speech from 630 speakers representing 8 major dialect divisions of American English, each speaking 10 phonetically-rich sentences. The TIMIT corpus includes time-aligned orthographic, phonetic, and word transcriptions, as well as speech waveform data for each spoken sentence. The release of TIMIT contains several improvements over the Prototype CD-ROM released in December, 1988: (1) full 630-speaker corpus, (2) checked and corrected transcriptions, (3) word-alignment transcriptions, (4) NIST SPHERE-headered waveform files and header manipulation software, (5) phonemic dictionary, (6) new test and training subsets balanced for dialectal and phonetic coverage, and (7) more extensive documentation.

  14. Physical properties of modification of speech signal fragments

    NASA Astrophysics Data System (ADS)

    Gusev, Mikhail N.

    2004-04-01

    The methods used for modification of separate speech signals fragments in the process of speech synthesis by arbitrary text are described in this report. Three groups of sounds differ in the modification methods of frequency characteristics. Two groups of sounds differ in that they need different methods of duration changes. To modify the samples of a speaker's voice by the methods used it is necessary to make pre-marking, so called segementation. As variable speech fragments, the allophones are taken. The modification methods described allow form arbitrary speech successions in the wide intonation diapason on the basis of limited amount of the speaker's voice patterns.

  15. Speech masking and cancelling and voice obscuration

    DOEpatents

    Holzrichter, John F.

    2013-09-10

    A non-acoustic sensor is used to measure a user's speech and then broadcasts an obscuring acoustic signal diminishing the user's vocal acoustic output intensity and/or distorting the voice sounds making them unintelligible to persons nearby. The non-acoustic sensor is positioned proximate or contacting a user's neck or head skin tissue for sensing speech production information.

  16. Formant Centralization Ratio: A Proposal for a New Acoustic Measure of Dysarthric Speech

    ERIC Educational Resources Information Center

    Sapir, Shimon; Ramig, Lorraine O.; Spielman, Jennifer L.; Fox, Cynthia

    2010-01-01

    Purpose: The vowel space area (VSA) has been used as an acoustic metric of dysarthric speech, but with varying degrees of success. In this study, the authors aimed to test an alternative metric to the VSA--the "formant centralization ratio" (FCR), which is hypothesized to more effectively differentiate dysarthric from healthy speech and register…

  17. The effects of noise on speech and warning signals

    NASA Astrophysics Data System (ADS)

    Suter, Alice H.

    1989-06-01

    To assess the effects of noise on speech communication it is necessary to examine certain characteristics of the speech signal. Speech level can be measured by a variety of methods, none of which has yet been standardized, and it should be kept in mind that vocal effort increases with background noise level and with different types of activity. Noise and filtering commonly degrade the speech signal, especially as it is transmitted through communications systems. Intelligibility is also adversely affected by distance, reverberation, and monaural listening. Communication systems currently in use may cause strain and delays on the part of the listener, but there are many possibilities for improvement. Individuals who need to communicate in noise may be subject to voice disorders. Shouted speech becomes progressively less intelligible at high voice levels, but improvements can be realized when talkers use clear speech. Tolerable listening levels are lower for negative than for positive S/Ns, and comfortable listening levels should be at a S/N of at least 5 dB, and preferably above 10 dB. Popular methods to predict speech intelligibility in noise include the Articulation Index, Speech Interference Level, Speech Transmission Index, and the sound level meter's A-weighting network. This report describes these methods, discussing certain advantages and disadvantages of each, and shows their interrelations.

  18. Segmentation and frequency domain ML pitch estimation of speech signals

    NASA Astrophysics Data System (ADS)

    Hanna, Salim A.

    The rate of oscillation of the vocal cords and its inverse value, the pitch period, are important speech features that are useful for speech analysis/synthesis, speech recognition, and speech coding. An automatic approach for the estimation of the pitch period in continuous speech is presented. The proposed approach considers the segmentation of the speech signal into homogeneous regions and the detection of segments that are generated by vocal cord oscillations prior to pitch estimation. The pitch period of voiced segments is estimated in the frequency domain using a maximum likelihood (ML) procedure. The estimated pitch period is chosen to maximize a likelihood function over the range of expected pitch periods. An efficient simplified realization of the generalized likelihood ratio segmentation method is also described.

  19. Speech perception of sine-wave signals by children with cochlear implants

    PubMed Central

    Nittrouer, Susan; Kuess, Jamie; Lowenstein, Joanna H.

    2015-01-01

    Children need to discover linguistically meaningful structures in the acoustic speech signal. Being attentive to recurring, time-varying formant patterns helps in that process. However, that kind of acoustic structure may not be available to children with cochlear implants (CIs), thus hindering development. The major goal of this study was to examine whether children with CIs are as sensitive to time-varying formant structure as children with normal hearing (NH) by asking them to recognize sine-wave speech. The same materials were presented as speech in noise, as well, to evaluate whether any group differences might simply reflect general perceptual deficits on the part of children with CIs. Vocabulary knowledge, phonemic awareness, and “top-down” language effects were all also assessed. Finally, treatment factors were examined as possible predictors of outcomes. Results showed that children with CIs were as accurate as children with NH at recognizing sine-wave speech, but poorer at recognizing speech in noise. Phonemic awareness was related to that recognition. Top-down effects were similar across groups. Having had a period of bimodal stimulation near the time of receiving a first CI facilitated these effects. Results suggest that children with CIs have access to the important time-varying structure of vocal-tract formants. PMID:25994709

  20. Speech perception of sine-wave signals by children with cochlear implants.

    PubMed

    Nittrouer, Susan; Kuess, Jamie; Lowenstein, Joanna H

    2015-05-01

    Children need to discover linguistically meaningful structures in the acoustic speech signal. Being attentive to recurring, time-varying formant patterns helps in that process. However, that kind of acoustic structure may not be available to children with cochlear implants (CIs), thus hindering development. The major goal of this study was to examine whether children with CIs are as sensitive to time-varying formant structure as children with normal hearing (NH) by asking them to recognize sine-wave speech. The same materials were presented as speech in noise, as well, to evaluate whether any group differences might simply reflect general perceptual deficits on the part of children with CIs. Vocabulary knowledge, phonemic awareness, and "top-down" language effects were all also assessed. Finally, treatment factors were examined as possible predictors of outcomes. Results showed that children with CIs were as accurate as children with NH at recognizing sine-wave speech, but poorer at recognizing speech in noise. Phonemic awareness was related to that recognition. Top-down effects were similar across groups. Having had a period of bimodal stimulation near the time of receiving a first CI facilitated these effects. Results suggest that children with CIs have access to the important time-varying structure of vocal-tract formants. PMID:25994709

  1. Multilevel Analysis in Analyzing Speech Data

    ERIC Educational Resources Information Center

    Guddattu, Vasudeva; Krishna, Y.

    2011-01-01

    The speech produced by human vocal tract is a complex acoustic signal, with diverse applications in phonetics, speech synthesis, automatic speech recognition, speaker identification, communication aids, speech pathology, speech perception, machine translation, hearing research, rehabilitation and assessment of communication disorders and many…

  2. Analysis of Acoustic Features in Speakers with Cognitive Disorders and Speech Impairments

    NASA Astrophysics Data System (ADS)

    Saz, Oscar; Simón, Javier; Rodríguez, W. Ricardo; Lleida, Eduardo; Vaquero, Carlos

    2009-12-01

    This work presents the results in the analysis of the acoustic features (formants and the three suprasegmental features: tone, intensity and duration) of the vowel production in a group of 14 young speakers suffering different kinds of speech impairments due to physical and cognitive disorders. A corpus with unimpaired children's speech is used to determine the reference values for these features in speakers without any kind of speech impairment within the same domain of the impaired speakers; this is 57 isolated words. The signal processing to extract the formant and pitch values is based on a Linear Prediction Coefficients (LPCs) analysis of the segments considered as vowels in a Hidden Markov Model (HMM) based Viterbi forced alignment. Intensity and duration are also based in the outcome of the automated segmentation. As main conclusion of the work, it is shown that intelligibility of the vowel production is lowered in impaired speakers even when the vowel is perceived as correct by human labelers. The decrease in intelligibility is due to a 30% of increase in confusability in the formants map, a reduction of 50% in the discriminative power in energy between stressed and unstressed vowels and to a 50% increase of the standard deviation in the length of the vowels. On the other hand, impaired speakers keep good control of tone in the production of stressed and unstressed vowels.

  3. Speech intelligibility in rooms: Effect of prior listening exposure interacts with room acoustics.

    PubMed

    Zahorik, Pavel; Brandewie, Eugene J

    2016-07-01

    There is now converging evidence that a brief period of prior listening exposure to a reverberant room can influence speech understanding in that environment. Although the effect appears to depend critically on the amplitude modulation characteristic of the speech signal reaching the ear, the extent to which the effect may be influenced by room acoustics has not been thoroughly evaluated. This study seeks to fill this gap in knowledge by testing the effect of prior listening exposure or listening context on speech understanding in five different simulated sound fields, ranging from anechoic space to a room with broadband reverberation time (T60) of approximately 3 s. Although substantial individual variability in the effect was observed and quantified, the context effect was, on average, strongly room dependent. At threshold, the effect was minimal in anechoic space, increased to a maximum of 3 dB on average in moderate reverberation (T60 = 1 s), and returned to minimal levels again in high reverberation. This interaction suggests that the functional effects of prior listening exposure may be limited to sound fields with moderate reverberation (0.4 ≤ T60 ≤ 1 s). PMID:27475133

  4. Sine-wave and noise-vocoded sine-wave speech in a tone language: Acoustic details matter.

    PubMed

    Rosen, Stuart; Hui, Sze Ngar Catherine

    2015-12-01

    Sine-wave speech (SWS) is a highly simplified version of speech consisting only of frequency- and amplitude-modulated sinusoids representing the formants. That listeners can successfully understand SWS has led to claims that speech perception must be based on abstract properties of the stimuli far removed from their specific acoustic form. Here it is shown, in bilingual Cantonese/English listeners, that performance with Cantonese SWS is improved by noise vocoding, with no effect on English SWS utterances. This manipulation preserves the abstract informational structure in the signals but changes its surface form. The differential effects of noise vocoding likely arise from the fact that Cantonese is a tonal language and hence more reliant on fundamental frequency (F0) contours for its intelligibility. SWS does not preserve tonal information from the original speech but does have false tonal information signalled by the lowest frequency sinusoid. Noise vocoding SWS appears to minimise the tonal percept, which thus interferes less in the perception of Cantonese. It has no effect in English, which is minimally reliant on F0 variations for intelligibility. Therefore it is not only the informational structure of a sound that is important but also how its acoustic detail interacts with the phonological structure of a given language. PMID:26723325

  5. The Use of Artificial Neural Networks to Estimate Speech Intelligibility from Acoustic Variables: A Preliminary Analysis.

    ERIC Educational Resources Information Center

    Metz, Dale Evan; And Others

    1992-01-01

    A preliminary scheme for estimating the speech intelligibility of hearing-impaired speakers from acoustic parameters, using a computerized artificial neural network to process mathematically the acoustic input variables, is outlined. Tests with 60 hearing-impaired speakers found the scheme to be highly accurate in identifying speakers separated by…

  6. A maximum likelihood approach to estimating articulator positions from speech acoustics

    SciTech Connect

    Hogden, J.

    1996-09-23

    This proposal presents an algorithm called maximum likelihood continuity mapping (MALCOM) which recovers the positions of the tongue, jaw, lips, and other speech articulators from measurements of the sound-pressure waveform of speech. MALCOM differs from other techniques for recovering articulator positions from speech in three critical respects: it does not require training on measured or modeled articulator positions, it does not rely on any particular model of sound propagation through the vocal tract, and it recovers a mapping from acoustics to articulator positions that is linearly, not topographically, related to the actual mapping from acoustics to articulation. The approach categorizes short-time windows of speech into a finite number of sound types, and assumes the probability of using any articulator position to produce a given sound type can be described by a parameterized probability density function. MALCOM then uses maximum likelihood estimation techniques to: (1) find the most likely smooth articulator path given a speech sample and a set of distribution functions (one distribution function for each sound type), and (2) change the parameters of the distribution functions to better account for the data. Using this technique improves the accuracy of articulator position estimates compared to continuity mapping -- the only other technique that learns the relationship between acoustics and articulation solely from acoustics. The technique has potential application to computer speech recognition, speech synthesis and coding, teaching the hearing impaired to speak, improving foreign language instruction, and teaching dyslexics to read. 34 refs., 7 figs.

  7. Effect of signal to noise ratio on the speech perception ability of older adults

    PubMed Central

    Shojaei, Elahe; Ashayeri, Hassan; Jafari, Zahra; Zarrin Dast, Mohammad Reza; Kamali, Koorosh

    2016-01-01

    Background: Speech perception ability depends on auditory and extra-auditory elements. The signal- to-noise ratio (SNR) is an extra-auditory element that has an effect on the ability to normally follow speech and maintain a conversation. Speech in noise perception difficulty is a common complaint of the elderly. In this study, the importance of SNR magnitude as an extra-auditory effect on speech perception in noise was examined in the elderly. Methods: The speech perception in noise test (SPIN) was conducted on 25 elderly participants who had bilateral low–mid frequency normal hearing thresholds at three SNRs in the presence of ipsilateral white noise. These participants were selected by available sampling method. Cognitive screening was done using the Persian Mini Mental State Examination (MMSE) test. Results: Independent T- test, ANNOVA and Pearson Correlation Index were used for statistical analysis. There was a significant difference in word discrimination scores at silence and at three SNRs in both ears (p≤0.047). Moreover, there was a significant difference in word discrimination scores for paired SNRs (0 and +5, 0 and +10, and +5 and +10 (p≤0.04)). No significant correlation was found between age and word recognition scores at silence and at three SNRs in both ears (p≥0.386). Conclusion: Our results revealed that decreasing the signal level and increasing the competing noise considerably reduced the speech perception ability in normal hearing at low–mid thresholds in the elderly. These results support the critical role of SNRs for speech perception ability in the elderly. Furthermore, our results revealed that normal hearing elderly participants required compensatory strategies to maintain normal speech perception in challenging acoustic situations. PMID:27390712

  8. Language Comprehension in Language-Learning Impaired Children Improved with Acoustically Modified Speech

    NASA Astrophysics Data System (ADS)

    Tallal, Paula; Miller, Steve L.; Bedi, Gail; Byma, Gary; Wang, Xiaoqin; Nagarajan, Srikantan S.; Schreiner, Christoph; Jenkins, William M.; Merzenich, Michael M.

    1996-01-01

    A speech processing algorithm was developed to create more salient versions of the rapidly changing elements in the acoustic waveform of speech that have been shown to be deficiently processed by language-learning impaired (LLI) children. LLI children received extensive daily training, over a 4-week period, with listening exercises in which all speech was translated into this synthetic form. They also received daily training with computer "games" designed to adaptively drive improvements in temporal processing thresholds. Significant improvements in speech discrimination and language comprehension abilities were demonstrated in two independent groups of LLI children.

  9. Acoustic signal processing toolbox for array processing

    NASA Astrophysics Data System (ADS)

    Pham, Tien; Whipps, Gene T.

    2003-08-01

    The US Army Research Laboratory (ARL) has developed an acoustic signal processing toolbox (ASPT) for acoustic sensor array processing. The intent of this document is to describe the toolbox and its uses. The ASPT is a GUI-based software that is developed and runs under MATLAB. The current version, ASPT 3.0, requires MATLAB 6.0 and above. ASPT contains a variety of narrowband (NB) and incoherent and coherent wideband (WB) direction-of-arrival (DOA) estimation and beamforming algorithms that have been researched and developed at ARL. Currently, ASPT contains 16 DOA and beamforming algorithms. It contains several different NB and WB versions of the MVDR, MUSIC and ESPRIT algorithms. In addition, there are a variety of pre-processing, simulation and analysis tools available in the toolbox. The user can perform simulation or real data analysis for all algorithms with user-defined signal model parameters and array geometries.

  10. Ice breakup: Observations of the acoustic signal

    NASA Astrophysics Data System (ADS)

    Waddell, S. R.; Farmer, D. M.

    1988-03-01

    We describe observations of ambient sound beneath landfast ice in the Canadian Arctic Archipelago and interpret its evolution over the period June-August in terms of ice cracking and disintegration. The data were recorded on six bands between 50 and 14,500 Hz for the period April 2 to August 7, 1986, in Dolphin and Union Strait. The frequency dependence of the attenuation of sound in water allows separation of distant and local noise sources. In conjunction with satellite imagery and meteorological data, it is shown that strong signals in the acoustic time series are associated with major breakup events. The acoustic signal can provide predictive information about ice conditions and the approach of breakup.

  11. Researches of the Electrotechnical Laboratory. No. 955: Speech recognition by description of acoustic characteristic variations

    NASA Astrophysics Data System (ADS)

    Hayamizu, Satoru

    1993-09-01

    A new speech recognition technique is proposed. This technique systematically describes acoustic characteristic variations using a large scale speech database, thereby, obtaining high recognition accuracy. Rules are extracted to represent knowledge concerning acoustic characteristic variations by observing the actual speech database. A general framework based on maps of the sets of variation factors to the acoustic feature spaces is proposed. A single recognition model is not used for each element of descriptive units regardless of the states of the variation factors. Large-scaled and systematic different recognition models are used for different states. A technique to structurize the representation of acoustic characteristic variations by clustering recognition models depending on variation factors is proposed. To investigate acoustic characteristic variations for phonetic contexts efficiently, word sets for reading texts of speech database are selected so that the maximum number of three phoneme sequences are covered in small number of words as possible. A selection algorithm, in which the first criterion is to maximize the number of different three phoneme sequences in the word set and the second criterion is to maximize the entropy of the three phonemes, is proposed. Read speed data of the word sets are collected and labelled as acoustic-phonetic segments. Experiments of speaker-independent word recognition using this speech database were conducted to show the description effectiveness of the acoustic characteristic variations using networks of acoustic-phonetic segments. The experiment shows the recognition errors are reduced. Basic framework for estimating the acoustic characteristics in unknown phonetic contexts using decision trees is proposed.

  12. A magnetic resonance imaging study on the articulatory and acoustic speech parameters of Malay vowels

    PubMed Central

    2014-01-01

    The phonetic properties of six Malay vowels are investigated using magnetic resonance imaging (MRI) to visualize the vocal tract in order to obtain dynamic articulatory parameters during speech production. To resolve image blurring due to the tongue movement during the scanning process, a method based on active contour extraction is used to track tongue contours. The proposed method efficiently tracks tongue contours despite the partial blurring of MRI images. Consequently, the articulatory parameters that are effectively measured as tongue movement is observed, and the specific shape of the tongue and its position for all six uttered Malay vowels are determined. Speech rehabilitation procedure demands some kind of visual perceivable prototype of speech articulation. To investigate the validity of the measured articulatory parameters based on acoustic theory of speech production, an acoustic analysis based on the uttered vowels by subjects has been performed. As the acoustic speech and articulatory parameters of uttered speech were examined, a correlation between formant frequencies and articulatory parameters was observed. The experiments reported a positive correlation between the constriction location of the tongue body and the first formant frequency, as well as a negative correlation between the constriction location of the tongue tip and the second formant frequency. The results demonstrate that the proposed method is an effective tool for the dynamic study of speech production. PMID:25060583

  13. The acoustics for speech of eight auditoriums in the city of Sao Paulo

    NASA Astrophysics Data System (ADS)

    Bistafa, Sylvio R.

    2002-11-01

    Eight auditoriums with a proscenium type of stage, which usually operate as dramatic theaters in the city of Sao Paulo, were acoustically surveyed in terms of their adequacy to unassisted speech. Reverberation times, early decay times, and speech levels were measured in different positions, together with objective measures of speech intelligibility. The measurements revealed reverberation time values rather uniform throughout the rooms, whereas significant variations were found in the values of the other acoustical measures with position. The early decay time was found to be better correlated with the objective measures of speech intelligibility than the reverberation time. The results from the objective measurements of speech intelligibility revealed that the speech transmission index STI, and its simplified version RaSTI, are strongly correlated with the early-to-late sound ratio C50 (1 kHz). However, it was found that the criterion value of acceptability of the latter is more easily met than the former. The results from these measurements enable to understand how the characteristics of the architectural design determine the acoustical quality for speech. Measurements of ST1-Gade were made as an attempt to validate it as an objective measure of ''support'' for the actor. The preliminary diagnosing results with ray tracing simulations will also be presented.

  14. Study of Harmonics-to-Noise Ratio and Critical-Band Energy Spectrum of Speech as Acoustic Indicators of Laryngeal and Voice Pathology

    NASA Astrophysics Data System (ADS)

    Shama, Kumara; krishna, Anantha; Cholayya, Niranjan U.

    2006-12-01

    Acoustic analysis of speech signals is a noninvasive technique that has been proved to be an effective tool for the objective support of vocal and voice disease screening. In the present study acoustic analysis of sustained vowels is considered. A simple[InlineEquation not available: see fulltext.]-means nearest neighbor classifier is designed to test the efficacy of a harmonics-to-noise ratio (HNR) measure and the critical-band energy spectrum of the voiced speech signal as tools for the detection of laryngeal pathologies. It groups the given voice signal sample into pathologic and normal. The voiced speech signal is decomposed into harmonic and noise components using an iterative signal extrapolation algorithm. The HNRs at four different frequency bands are estimated and used as features. Voiced speech is also filtered with 21 critical-bandpass filters that mimic the human auditory neurons. Normalized energies of these filter outputs are used as another set of features. The results obtained have shown that the HNR and the critical-band energy spectrum can be used to correlate laryngeal pathology and voice alteration, using previously classified voice samples. This method could be an additional acoustic indicator that supplements the clinical diagnostic features for voice evaluation.

  15. Acoustic signal propagation characterization of conduit networks

    NASA Astrophysics Data System (ADS)

    Khan, Muhammad Safeer

    Analysis of acoustic signal propagation in conduit networks has been an important area of research in acoustics. One major aspect of analyzing conduit networks as acoustic channels is that a propagating signal suffers frequency dependent attenuation due to thermo-viscous boundary layer effects and the presence of impedance mismatches such as side branches. The signal attenuation due to side branches is strongly influenced by their numbers and dimensions such as diameter and length. Newly developed applications for condition based monitoring of underground conduit networks involve measurement of acoustic signal attenuation through tests in the field. In many cases the exact installation layout of the field measurement location may not be accessible or actual installation may differ from the documented layout. The lack of exact knowledge of numbers and lengths of side branches, therefore, introduces uncertainty in the measurements of attenuation and contributes to the random variable error between measured results and those predicted from theoretical models. There are other random processes in and around conduit networks in the field that also affect the propagation of an acoustic signal. These random processes include but are not limited to the presence of strong temperature and humidity gradients within the conduits, blockages of variable sizes and types, effects of aging such as cracks, bends, sags and holes, ambient noise variations and presence of variable layer of water. It is reasonable to consider that the random processes contributing to the error in the measured attenuation are independent and arbitrarily distributed. The error, contributed by a large number of independent sources of arbitrary probability distributions, is best described by an approximately normal probability distribution in accordance with the central limit theorem. Using an analytical approach to model the attenuating effect of each of the random variable sources can be very complex and

  16. Precategorical Acoustic Storage and the Perception of Speech

    ERIC Educational Resources Information Center

    Frankish, Clive

    2008-01-01

    Theoretical accounts of both speech perception and of short term memory must consider the extent to which perceptual representations of speech sounds might survive in relatively unprocessed form. This paper describes a novel version of the serial recall task that can be used to explore this area of shared interest. In immediate recall of digit…

  17. Vowel Acoustics in Adults with Apraxia of Speech

    ERIC Educational Resources Information Center

    Jacks, Adam; Mathes, Katey A.; Marquardt, Thomas P.

    2010-01-01

    Purpose: To investigate the hypothesis that vowel production is more variable in adults with acquired apraxia of speech (AOS) relative to healthy individuals with unimpaired speech. Vowel formant frequency measures were selected as the specific target of focus. Method: Seven adults with AOS and aphasia produced 15 repetitions of 6 American English…

  18. Improving Robustness of Deep Neural Network Acoustic Models via Speech Separation and Joint Adaptive Training

    PubMed Central

    Narayanan, Arun; Wang, DeLiang

    2015-01-01

    Although deep neural network (DNN) acoustic models are known to be inherently noise robust, especially with matched training and testing data, the use of speech separation as a frontend and for deriving alternative feature representations has been shown to improve performance in challenging environments. We first present a supervised speech separation system that significantly improves automatic speech recognition (ASR) performance in realistic noise conditions. The system performs separation via ratio time-frequency masking; the ideal ratio mask (IRM) is estimated using DNNs. We then propose a framework that unifies separation and acoustic modeling via joint adaptive training. Since the modules for acoustic modeling and speech separation are implemented using DNNs, unification is done by introducing additional hidden layers with fixed weights and appropriate network architecture. On the CHiME-2 medium-large vocabulary ASR task, and with log mel spectral features as input to the acoustic model, an independently trained ratio masking frontend improves word error rates by 10.9% (relative) compared to the noisy baseline. In comparison, the jointly trained system improves performance by 14.4%. We also experiment with alternative feature representations to augment the standard log mel features, like the noise and speech estimates obtained from the separation module, and the standard feature set used for IRM estimation. Our best system obtains a word error rate of 15.4% (absolute), an improvement of 4.6 percentage points over the next best result on this corpus. PMID:26973851

  19. Acoustic properties of vowels in clear and conversational speech by female non-native English speakers

    NASA Astrophysics Data System (ADS)

    Li, Chi-Nin; So, Connie K.

    2005-04-01

    Studies have shown that talkers can improve the intelligibility of their speech when instructed to speak as if talking to a hearing-impaired person. The improvement of speech intelligibility is associated with specific acoustic-phonetic changes: increases in vowel duration and fundamental frequency (F0), a wider pitch range, and a shift in formant frequencies for F1 and F2. Most previous studies of clear speech production have been conducted with native speakers; research with second language speakers is much less common. The present study examined the acoustic properties of non-native English vowels produced in a clear speaking style. Five female Cantonese speakers and a comparison group of English speakers were recorded producing four vowels (/i u ae a/) in /bVt/ context in conversational and clear speech. Vowel durations, F0, pitch range, and the first two formants for each of the four vowels were measured. Analyses revealed that for both groups of speakers, vowel durations, F0, pitch range, and F1 spoken clearly were greater than those produced conversationally. However, F2 was higher in conversational speech than in clear speech. The findings suggest that female non-native English speakers exhibit acoustic-phonetic patterns similar to those of native speakers when asked to produce English vowels clearly.

  20. Identifying Potential Noise Sources within Acoustic Signals

    NASA Astrophysics Data System (ADS)

    Holcomb, Victoria; Lewalle, Jacques

    2013-11-01

    We test a new algorithm for its ability to detect sources of noise within random background. The goal of these tests is to better understand how to identify sources within acoustic signals while simultaneously determining the strengths and weaknesses of the algorithm in question. Unlike previously published algorithms, the antenna method does not pinpoint events by looking for the most energetic portions of a signal. The algorithm searches for the ideal lag combinations between three signals by taking excerpts of possible events. The excerpt with the lowest calculated minimum distance between possible events is how the algorithm identifies sources. At the minimum distance, the events are close in time and frequency. This method can be compared to the cross correlation and denoising methods to better understand its effectiveness. This work is supported in part by Spectral Energies LLC, under an SBIR grant from AFRL, as well as the Syracuse University MAE department.

  1. Speech privacy and annoyance considerations in the acoustic environment of passenger cars of high-speed trains.

    PubMed

    Jeon, Jin Yong; Hong, Joo Young; Jang, Hyung Suk; Kim, Jae Hyeon

    2015-12-01

    It is necessary to consider not only annoyance of interior noises but also speech privacy to achieve acoustic comfort in a passenger car of a high-speed train because speech from other passengers can be annoying. This study aimed to explore an optimal acoustic environment to satisfy speech privacy and reduce annoyance in a passenger car. Two experiments were conducted using speech sources and compartment noise of a high speed train with varying speech-to-noise ratios (SNRA) and background noise levels (BNL). Speech intelligibility was tested in experiment I, and in experiment II, perceived speech privacy, annoyance, and acoustic comfort of combined sounds with speech and background noise were assessed. The results show that speech privacy and annoyance were significantly influenced by the SNRA. In particular, the acoustic comfort was evaluated as acceptable when the SNRA was less than -6 dB for both speech privacy and noise annoyance. In addition, annoyance increased significantly as the BNL exceeded 63 dBA, whereas the effect of the background-noise level on the speech privacy was not significant. These findings suggest that an optimal level of interior noise in a passenger car might exist between 59 and 63 dBA, taking normal speech levels into account. PMID:26723351

  2. Semantic and acoustic analysis of speech by functional networks with distinct time scales.

    PubMed

    Deng, Siyi; Srinivasan, Ramesh

    2010-07-30

    Speech perception requires the successful interpretation of both phonetic and syllabic information in the auditory signal. It has been suggested by Poeppel (2003) that phonetic processing requires an optimal time scale of 25 ms while the time scale of syllabic processing is much slower (150-250 ms). To better understand the operation of brain networks at these characteristic time scales during speech perception, we studied the spatial and dynamic properties of EEG responses to five different stimuli: (1) amplitude modulated (AM) speech, (2) AM speech with added broadband noise, (3) AM reversed speech, (4) AM broadband noise, and (5) AM pure tone. Amplitude modulation at gamma band frequencies (40 Hz) elicited steady-state auditory evoked responses (SSAERs) bilaterally over primary auditory cortices. Reduced SSAERs were observed over the left auditory cortex only for stimuli containing speech. In addition, we found over the left hemisphere, anterior to primary auditory cortex, a network whose instantaneous frequencies in the theta to alpha band (4-16 Hz) are correlated with the amplitude envelope of the speech signal. This correlation was not observed for reversed speech. The presence of speech in the sound input activates a 4-16 Hz envelope tracking network and suppresses the 40-Hz gamma band network which generates the steady-state responses over the left auditory cortex. We believe these findings to be consistent with the idea that processing of the speech signals involves preferentially processing at syllabic time scales rather than phonetic time scales. PMID:20580635

  3. Semantic and acoustic analysis of speech by functional networks with distinct time scales

    PubMed Central

    Deng, Siyi; Srinivasan, Ramesh

    2014-01-01

    Speech perception requires the successful interpretation of both phonetic and syllabic information in the auditory signal. It has been suggested by Poeppel (2003) that phonetic processing requires an optimal time scale of 25 ms while the time scale of syllabic processing is much slower (150–250ms). To better understand the operation of brain networks at these characteristic time scales during speech perception, we studied the spatial and dynamic properties of EEG responses to five different stimuli: (1) amplitude modulated (AM) speech, (2) AM speech with added broadband noise, (3) AM reversed speech, (4) AM broadband noise, and (5) AM pure tone. Amplitude modulation at gamma band frequencies (40 Hz) elicited steady-state auditory evoked responses (SSAERs) bilaterally over primary auditory cortices. Reduced SSAERs were observed over the left auditory cortex only for stimuli containing speech. In addition, we found over the left hemisphere, anterior to primary auditory cortex, a network whose instantaneous frequencies in the theta to alpha band (4–16 Hz) are correlated with the amplitude envelope of the speech signal. This correlation was not observed for reversed speech. The presence of speech in the sound input activates a 4–16 Hz envelope tracking network and suppresses the 40-Hz gamma band network which generates the steady-state responses over the left auditory cortex. We believe these findings to be consistent with the idea that processing of the speech signals involves preferentially processing at syllabic time scales rather than phonetic time scales. PMID:20580635

  4. Group sparsity based spectrum estimation of harmonic speech signals

    NASA Astrophysics Data System (ADS)

    Zhang, Yimin D.; Wang, Ben

    2015-05-01

    Spectrum analysis of speech signals is important for their detection, recognition, and separation. Speech signals are nonstationary with time-varying frequencies which, when analyzed by Fourier analysis over a short time window, exhibit harmonic spectra, i.e., the fundamental frequencies are accompanied by multiple associated harmonic frequencies. With proper modeling, such harmonic signal components can be cast as group sparse and solved using group sparse signal reconstruction methods. In this case, all harmonic components contribute to effective signal detection and fundamental frequency estimation with improved reliability and spectrum resolution. The estimation of the fundamental frequency signature is implemented using the block sparse Bayesian learning technique, which is known to provide high-resolution spectrum estimations. Simulation results confirm the superiority of the proposed technique when compared to the conventional STFT-based methods.

  5. Bird population density estimated from acoustic signals

    USGS Publications Warehouse

    Dawson, D.K.; Efford, M.G.

    2009-01-01

    Many animal species are detected primarily by sound. Although songs, calls and other sounds are often used for population assessment, as in bird point counts and hydrophone surveys of cetaceans, there are few rigorous methods for estimating population density from acoustic data. 2. The problem has several parts - distinguishing individuals, adjusting for individuals that are missed, and adjusting for the area sampled. Spatially explicit capture-recapture (SECR) is a statistical methodology that addresses jointly the second and third parts of the problem. We have extended SECR to use uncalibrated information from acoustic signals on the distance to each source. 3. We applied this extension of SECR to data from an acoustic survey of ovenbird Seiurus aurocapilla density in an eastern US deciduous forest with multiple four-microphone arrays. We modelled average power from spectrograms of ovenbird songs measured within a window of 0??7 s duration and frequencies between 4200 and 5200 Hz. 4. The resulting estimates of the density of singing males (0??19 ha -1 SE 0??03 ha-1) were consistent with estimates of the adult male population density from mist-netting (0??36 ha-1 SE 0??12 ha-1). The fitted model predicts sound attenuation of 0??11 dB m-1 (SE 0??01 dB m-1) in excess of losses from spherical spreading. 5.Synthesis and applications. Our method for estimating animal population density from acoustic signals fills a gap in the census methods available for visually cryptic but vocal taxa, including many species of bird and cetacean. The necessary equipment is simple and readily available; as few as two microphones may provide adequate estimates, given spatial replication. The method requires that individuals detected at the same place are acoustically distinguishable and all individuals vocalize during the recording interval, or that the per capita rate of vocalization is known. We believe these requirements can be met, with suitable field methods, for a significant

  6. Audio signal recognition for speech, music, and environmental sounds

    NASA Astrophysics Data System (ADS)

    Ellis, Daniel P. W.

    2003-10-01

    Human listeners are very good at all kinds of sound detection and identification tasks, from understanding heavily accented speech to noticing a ringing phone underneath music playing at full blast. Efforts to duplicate these abilities on computer have been particularly intense in the area of speech recognition, and it is instructive to review which approaches have proved most powerful, and which major problems still remain. The features and models developed for speech have found applications in other audio recognition tasks, including musical signal analysis, and the problems of analyzing the general ``ambient'' audio that might be encountered by an auditorily endowed robot. This talk will briefly review statistical pattern recognition for audio signals, giving examples in several of these domains. Particular emphasis will be given to common aspects and lessons learned.

  7. Perceiving speech in context: Compensation for contextual variability during acoustic cue encoding and categorization

    NASA Astrophysics Data System (ADS)

    Toscano, Joseph Christopher

    Several fundamental questions about speech perception concern how listeners understand spoken language despite considerable variability in speech sounds across different contexts (the problem of lack of invariance in speech). This contextual variability is caused by several factors, including differences between individual talkers' voices, variation in speaking rate, and effects of coarticulatory context. A number of models have been proposed to describe how the speech system handles differences across contexts. Critically, these models make different predictions about (1) whether contextual variability is handled at the level of acoustic cue encoding or categorization, (2) whether it is driven by feedback from category-level processes or interactions between cues, and (3) whether listeners discard fine-grained acoustic information to compensate for contextual variability. Separating the effects of cue- and category-level processing has been difficult because behavioral measures tap processes that occur well after initial cue encoding and are influenced by task demands and linguistic information. Recently, we have used the event-related brain potential (ERP) technique to examine cue encoding and online categorization. Specifically, we have looked at differences in the auditory N1 as a measure of acoustic cue encoding and the P3 as a measure of categorization. This allows us to examine multiple levels of processing during speech perception and can provide a useful tool for studying effects of contextual variability. Here, I apply this approach to determine the point in processing at which context has an effect on speech perception and to examine whether acoustic cues are encoded continuously. Several types of contextual variability (talker gender, speaking rate, and coarticulation), as well as several acoustic cues (voice onset time, formant frequencies, and bandwidths), are examined in a series of experiments. The results suggest that (1) at early stages of speech

  8. Acoustic signals generated in inclined granular flows

    NASA Astrophysics Data System (ADS)

    Tan, Danielle S.; Jenkins, James T.; Keast, Stephen C.; Sachse, Wolfgang H.

    2015-10-01

    Spontaneous avalanching in specific deserts produces a low-frequency sound known as "booming." This creates a puzzle, because avalanches down the face of a dune result in collisions between sand grains that occur at much higher frequencies. Reproducing this phenomenon in the laboratory permits a better understanding of the underlying mechanisms for the generation of such lower frequency acoustic emissions, which may also be relevant to other dry granular flows. Here we report measurements of low-frequency acoustical signals, produced by dried "sounding" sand (sand capable of booming in the desert) flowing down an inclined chute. The amplitude of the signal diminishes over time but reappears upon drying of the sand. We show that the presence of this sound in the experiments may provide supporting evidence for a previously published "waveguide" explanation for booming. Also, we propose a model based on kinetic theory for a sheared inclined flow in which the flowing layer exhibits "breathing" modes superimposed on steady shearing. The predicted oscillation frequency is of a similar order of magnitude as the measurements, indicating that small perturbations can sustain oscillations of a low frequency. However, the frequency is underestimated, which indicates that the stiffness has been underestimated. Also, the model predicts a discrete spectrum of frequencies, instead of the broadband spectrum measured experimentally.

  9. Aging Affects Neural Synchronization to Speech-Related Acoustic Modulations

    PubMed Central

    Goossens, Tine; Vercammen, Charlotte; Wouters, Jan; van Wieringen, Astrid

    2016-01-01

    As people age, speech perception problems become highly prevalent, especially in noisy situations. In addition to peripheral hearing and cognition, temporal processing plays a key role in speech perception. Temporal processing of speech features is mediated by synchronized activity of neural oscillations in the central auditory system. Previous studies indicate that both the degree and hemispheric lateralization of synchronized neural activity relate to speech perception performance. Based on these results, we hypothesize that impaired speech perception in older persons may, in part, originate from deviances in neural synchronization. In this study, auditory steady-state responses that reflect synchronized activity of theta, beta, low and high gamma oscillations (i.e., 4, 20, 40, and 80 Hz ASSR, respectively) were recorded in young, middle-aged, and older persons. As all participants had normal audiometric thresholds and were screened for (mild) cognitive impairment, differences in synchronized neural activity across the three age groups were likely to be attributed to age. Our data yield novel findings regarding theta and high gamma oscillations in the aging auditory system. At an older age, synchronized activity of theta oscillations is increased, whereas high gamma synchronization is decreased. In contrast to young persons who exhibit a right hemispheric dominance for processing of high gamma range modulations, older adults show a symmetrical processing pattern. These age-related changes in neural synchronization may very well underlie the speech perception problems in aging persons. PMID:27378906

  10. Study of acoustic emission sources and signals

    NASA Astrophysics Data System (ADS)

    Pumarega, M. I. López; Armeite, M.; Oliveto, M. E.; Piotrkowski, R.; Ruzzante, J. E.

    2002-05-01

    Methods of acoustic emission (AE) signal analysis give information about material conditions, since AE generated in stressed solids can be used to indicate cracks and defect positions so as their damaging potential. We present a review of results of laboratory AE tests on metallic materials. Rings of seamless steel tubes, with and without oxide layers, were cut and then deformed by opening their ends. Seamless Zry-4 tubes were submitted to hydraulic stress tests until rupture with a purposely-constructed hydraulic system. In burst type signals, their parameters, Amplitude (A), Duration (D) and Risetime (R), were statistically studied. Amplitudes were found to follow the Log-normal distribution. This led to infer that the detected AE signal, is the complex consequence of a great number of random independent sources, which individual effects are linked. We could show, using cluster analysis for A, D and R mean values, with 5 clusters, coincidence between the clusters and the test types. A slight linear correlation was obtained for the parameters A and D. The arrival time of the AE signals was also studied, which conducted to discussing Poisson and Polya processes. The digitized signals were studied as (1/f)β noises. The general results are coherent if we consider the AE phenomena in the frame of Self Organized Criticality theory.

  11. Speech in ALS: Longitudinal Changes in Lips and Jaw Movements and Vowel Acoustics

    PubMed Central

    Yunusova, Yana; Green, Jordan R.; Lindstrom, Mary J.; Pattee, Gary L.; Zinman, Lorne

    2015-01-01

    Purpose The goal of this exploratory study was to investigate longitudinally the changes in facial kinematics, vowel formant frequencies, and speech intelligibility in individuals diagnosed with bulbar amyotrophic lateral sclerosis (ALS). This study was motivated by the need to understand articulatory and acoustic changes with disease progression and their subsequent effect on deterioration of speech in ALS. Method Lip and jaw movements and vowel acoustics were obtained for four individuals with bulbar ALS during four consecutive recording sessions with an average interval of three months between recordings. Participants read target words embedded into sentences at a comfortable speaking rate. Maximum vertical and horizontal mouth opening and maximum jaw displacements were obtained during corner vowels. First and second formant frequencies were measured for each vowel. Speech intelligibility and speaking rate score were obtained for each session as well. Results Transient, non-vowel-specific changes in kinematics of the jaw and lips were observed. Kinematic changes often preceded changes in vowel acoustics and speech intelligibility. Conclusions Nonlinear changes in speech kinematics should be considered in evaluation of the disease effects on jaw and lip musculature. Kinematic measures might be most suitable for early detection of changes associated with bulbar ALS.

  12. Acoustic and Articulatory Features of Diphthong Production: A Speech Clarity Study

    ERIC Educational Resources Information Center

    Tasko, Stephen M.; Greilick, Kristin

    2010-01-01

    Purpose: The purpose of this study was to evaluate how speaking clearly influences selected acoustic and orofacial kinematic measures associated with diphthong production. Method: Forty-nine speakers, drawn from the University of Wisconsin X-Ray Microbeam Speech Production Database (J. R. Westbury, 1994), served as participants. Samples of clear…

  13. Effects of contextual cues on speech recognition in simulated electric-acoustic stimulation.

    PubMed

    Kong, Ying-Yee; Donaldson, Gail; Somarowthu, Ala

    2015-05-01

    Low-frequency acoustic cues have shown to improve speech perception in cochlear-implant listeners. However, the mechanisms underlying this benefit are still not well understood. This study investigated the extent to which low-frequency cues can facilitate listeners' use of linguistic knowledge in simulated electric-acoustic stimulation (EAS). Experiment 1 examined differences in the magnitude of EAS benefit at the phoneme, word, and sentence levels. Speech materials were processed via noise-channel vocoding and lowpass (LP) filtering. The amount of spectral degradation in the vocoder speech was varied by applying different numbers of vocoder channels. Normal-hearing listeners were tested on vocoder-alone, LP-alone, and vocoder + LP conditions. Experiment 2 further examined factors that underlie the context effect on EAS benefit at the sentence level by limiting the low-frequency cues to temporal envelope and periodicity (AM + FM). Results showed that EAS benefit was greater for higher-context than for lower-context speech materials even when the LP ear received only low-frequency AM + FM cues. Possible explanations for the greater EAS benefit observed with higher-context materials may lie in the interplay between perceptual and expectation-driven processes for EAS speech recognition, and/or the band-importance functions for different types of speech materials. PMID:25994712

  14. Contributions of electric and acoustic hearing to bimodal speech and music perception.

    PubMed

    Crew, Joseph D; Galvin, John J; Landsberger, David M; Fu, Qian-Jie

    2015-01-01

    Cochlear implant (CI) users have difficulty understanding speech in noisy listening conditions and perceiving music. Aided residual acoustic hearing in the contralateral ear can mitigate these limitations. The present study examined contributions of electric and acoustic hearing to speech understanding in noise and melodic pitch perception. Data was collected with the CI only, the hearing aid (HA) only, and both devices together (CI+HA). Speech reception thresholds (SRTs) were adaptively measured for simple sentences in speech babble. Melodic contour identification (MCI) was measured with and without a masker instrument; the fundamental frequency of the masker was varied to be overlapping or non-overlapping with the target contour. Results showed that the CI contributes primarily to bimodal speech perception and that the HA contributes primarily to bimodal melodic pitch perception. In general, CI+HA performance was slightly improved relative to the better ear alone (CI-only) for SRTs but not for MCI, with some subjects experiencing a decrease in bimodal MCI performance relative to the better ear alone (HA-only). Individual performance was highly variable, and the contribution of either device to bimodal perception was both subject- and task-dependent. The results suggest that individualized mapping of CIs and HAs may further improve bimodal speech and music perception. PMID:25790349

  15. Contributions of Electric and Acoustic Hearing to Bimodal Speech and Music Perception

    PubMed Central

    Crew, Joseph D.; Galvin III, John J.; Landsberger, David M.; Fu, Qian-Jie

    2015-01-01

    Cochlear implant (CI) users have difficulty understanding speech in noisy listening conditions and perceiving music. Aided residual acoustic hearing in the contralateral ear can mitigate these limitations. The present study examined contributions of electric and acoustic hearing to speech understanding in noise and melodic pitch perception. Data was collected with the CI only, the hearing aid (HA) only, and both devices together (CI+HA). Speech reception thresholds (SRTs) were adaptively measured for simple sentences in speech babble. Melodic contour identification (MCI) was measured with and without a masker instrument; the fundamental frequency of the masker was varied to be overlapping or non-overlapping with the target contour. Results showed that the CI contributes primarily to bimodal speech perception and that the HA contributes primarily to bimodal melodic pitch perception. In general, CI+HA performance was slightly improved relative to the better ear alone (CI-only) for SRTs but not for MCI, with some subjects experiencing a decrease in bimodal MCI performance relative to the better ear alone (HA-only). Individual performance was highly variable, and the contribution of either device to bimodal perception was both subject- and task-dependent. The results suggest that individualized mapping of CIs and HAs may further improve bimodal speech and music perception. PMID:25790349

  16. Effects of contextual cues on speech recognition in simulated electric-acoustic stimulation

    PubMed Central

    Kong, Ying-Yee; Donaldson, Gail; Somarowthu, Ala

    2015-01-01

    Low-frequency acoustic cues have shown to improve speech perception in cochlear-implant listeners. However, the mechanisms underlying this benefit are still not well understood. This study investigated the extent to which low-frequency cues can facilitate listeners' use of linguistic knowledge in simulated electric-acoustic stimulation (EAS). Experiment 1 examined differences in the magnitude of EAS benefit at the phoneme, word, and sentence levels. Speech materials were processed via noise-channel vocoding and lowpass (LP) filtering. The amount of spectral degradation in the vocoder speech was varied by applying different numbers of vocoder channels. Normal-hearing listeners were tested on vocoder-alone, LP-alone, and vocoder + LP conditions. Experiment 2 further examined factors that underlie the context effect on EAS benefit at the sentence level by limiting the low-frequency cues to temporal envelope and periodicity (AM + FM). Results showed that EAS benefit was greater for higher-context than for lower-context speech materials even when the LP ear received only low-frequency AM + FM cues. Possible explanations for the greater EAS benefit observed with higher-context materials may lie in the interplay between perceptual and expectation-driven processes for EAS speech recognition, and/or the band-importance functions for different types of speech materials. PMID:25994712

  17. Time-expanded speech and speech recognition in older adults.

    PubMed

    Vaughan, Nancy E; Furukawa, Izumi; Balasingam, Nirmala; Mortz, Margaret; Fausti, Stephen A

    2002-01-01

    Speech understanding deficits are common in older adults. In addition to hearing sensitivity, changes in certain cognitive functions may affect speech recognition. One such change that may impact the ability to follow a rapidly changing speech signal is processing speed. When speakers slow the rate of their speech naturally in order to speak clearly, speech recognition is improved. The acoustic characteristics of naturally slowed speech are of interest in developing time-expansion algorithms to improve speech recognition for older listeners. In this study, we tested younger normally hearing, older normally hearing, and older hearing-impaired listeners on time-expanded speech using increased duration and increased intensity of unvoiced consonants. Although all groups performed best on unprocessed speech, performance with processed speech was better with the consonant gain feature without time expansion in the noise condition and better at the slowest time-expanded rate in the quiet condition. The effects of signal processing on speech recognition are discussed. PMID:17642020

  18. Compression and its effect on the speech signal.

    PubMed

    Verschuure, J; Maas, A J; Stikvoort, E; de Jong, R M; Goedegebure, A; Dreschler, W A

    1996-04-01

    Compression systems are often used in hearing aids to increase the wearing comfort. A patient has to readjust frequently the gain of a linear hearing aid because of the limited dynamic hearing range and the changing acoustical conditions. A great deal of attention has been given to the static parameters but very little to the dynamic parameters. We present a general method to describe the dynamic behavior of a compression system by comparing modulations at the output with modulations at the input. The use of this method resulted in a single parameter describing the temporal characteristics of a compressor, the cut-off modulation frequency. In this paper its value is compared with known properties of running speech. A limitation of this method is the use of only small modulation depths, and the consequence of this limitation is tested. The use of this method is described for an experimental digital compressor developed by the authors, and the effects of some temporal parameters such as attack and release time are studied. This method shows the rather large effects of some of the parameters on the effectiveness of a compressor on speech. This method is also used to analyze two generally accepted compression systems in hearing aids. The theoretical method is next compared to the effects of compression on the distribution of the amplitude envelope of running speech, and it could be shown that single-channel compression systems do not reduce the distribution width of speech filtered in frequency bands. This finding questions the use of compression systems for fitting the speech banana in the dynamic hearing range of impaired listeners. PMID:8698161

  19. Primary Progressive Apraxia of Speech: Clinical Features and Acoustic and Neurologic Correlates

    PubMed Central

    Strand, Edythe A.; Clark, Heather; Machulda, Mary; Whitwell, Jennifer L.; Josephs, Keith A.

    2015-01-01

    Purpose This study summarizes 2 illustrative cases of a neurodegenerative speech disorder, primary progressive apraxia of speech (AOS), as a vehicle for providing an overview of the disorder and an approach to describing and quantifying its perceptual features and some of its temporal acoustic attributes. Method Two individuals with primary progressive AOS underwent speech-language and neurologic evaluations on 2 occasions, ranging from 2.0 to 7.5 years postonset. Performance on several tests, tasks, and rating scales, as well as several acoustic measures, were compared over time within and between cases. Acoustic measures were compared with performance of control speakers. Results Both patients initially presented with AOS as the only or predominant sign of disease and without aphasia or dysarthria. The presenting features and temporal progression were captured in an AOS Rating Scale, an Articulation Error Score, and temporal acoustic measures of utterance duration, syllable rates per second, rates of speechlike alternating motion and sequential motion, and a pairwise variability index measure. Conclusions AOS can be the predominant manifestation of neurodegenerative disease. Clinical ratings of its attributes and acoustic measures of some of its temporal characteristics can support its diagnosis and help quantify its salient characteristics and progression over time. PMID:25654422

  20. Removal of noise from noise-degraded speech signals

    NASA Astrophysics Data System (ADS)

    1989-06-01

    Techniques for the removal of noise from noise-degraded speech signals were reviewed and evaluation with special emphasis on live radio and telephone communications and the extraction of information from similar noisy recordings. The related area on the development of speech-enhancement devices for hearing-impaired people was reviewed. Evaluation techniques were reviewed to determine their suitability, particularly for the assessment of changes in the performance of workers who might use noise-reduction equipments on a daily basis in the applications cited above. The main conclusion was that noise-reduction methods may be useful in improving the performance of human operators who extract information from noisy speech material despite a lack of improvement found in using conventional closed-response intelligibility tests to assess those methods.

  1. Experiment in Learning to Discriminate Frequency Transposed Speech.

    ERIC Educational Resources Information Center

    Ahlstrom, K.G.; And Others

    In order to improve speech perception by transposing the speech signals to lower frequencies, to determine which aspects of the information in the acoustic speech signals were influenced by transposition, and to compare two different methods of training speech perception, 44 subjects were trained to discriminate between transposed words or…

  2. Acoustic signal detection of manatee calls

    NASA Astrophysics Data System (ADS)

    Niezrecki, Christopher; Phillips, Richard; Meyer, Michael; Beusse, Diedrich O.

    2003-04-01

    The West Indian manatee (trichechus manatus latirostris) has become endangered partly because of a growing number of collisions with boats. A system to warn boaters of the presence of manatees, that can signal to boaters that manatees are present in the immediate vicinity, could potentially reduce these boat collisions. In order to identify the presence of manatees, acoustic methods are employed. Within this paper, three different detection algorithms are used to detect the calls of the West Indian manatee. The detection systems are tested in the laboratory using simulated manatee vocalizations from an audio compact disc. The detection method that provides the best overall performance is able to correctly identify ~=96% of the manatee vocalizations. However the system also results in a false positive rate of ~=16%. The results of this work may ultimately lead to the development of a manatee warning system that can warn boaters of the presence of manatees.

  3. Vowel Acoustics in Dysarthria: Speech Disorder Diagnosis and Classification

    ERIC Educational Resources Information Center

    Lansford, Kaitlin L.; Liss, Julie M.

    2014-01-01

    Purpose: The purpose of this study was to determine the extent to which vowel metrics are capable of distinguishing healthy from dysarthric speech and among different forms of dysarthria. Method: A variety of vowel metrics were derived from spectral and temporal measurements of vowel tokens embedded in phrases produced by 45 speakers with…

  4. Subauditory Speech Recognition based on EMG/EPG Signals

    NASA Technical Reports Server (NTRS)

    Jorgensen, Charles; Lee, Diana Dee; Agabon, Shane; Lau, Sonie (Technical Monitor)

    2003-01-01

    Sub-vocal electromyogram/electro palatogram (EMG/EPG) signal classification is demonstrated as a method for silent speech recognition. Recorded electrode signals from the larynx and sublingual areas below the jaw are noise filtered and transformed into features using complex dual quad tree wavelet transforms. Feature sets for six sub-vocally pronounced words are trained using a trust region scaled conjugate gradient neural network. Real time signals for previously unseen patterns are classified into categories suitable for primitive control of graphic objects. Feature construction, recognition accuracy and an approach for extension of the technique to a variety of real world application areas are presented.

  5. A Chimpanzee Recognizes Synthetic Speech With Significantly Reduced Acoustic Cues to Phonetic Content

    PubMed Central

    Heimbauer, Lisa A.; Beran, Michael J.; Owren, Michael J.

    2011-01-01

    Summary A long-standing debate concerns whether humans are specialized for speech perception [1–7], which some researchers argue is demonstrated by the ability to understand synthetic speech with significantly reduced acoustic cues to phonetic content [2–4,7]. We tested a chimpanzee (Pan troglodytes) that recognizes 128 spoken words [8,9], asking whether she could understand such speech. Three experiments presented 48 individual words, with the animal selecting a corresponding visuo-graphic symbol from among four alternatives. Experiment 1 tested spectrally reduced, noise-vocoded (NV) synthesis, originally developed to simulate input received by human cochlear-implant users [10]. Experiment 2 tested “impossibly unspeechlike” [3] sine-wave (SW) synthesis, which reduces speech to just three moving tones [11]. Although receiving only intermittent and non-contingent reward, the chimpanzee performed well above chance level, including when hearing synthetic versions for the first time. Recognition of SW words was least accurate, but improved in Experiment 3 when natural words in the same session were rewarded. The chimpanzee was more accurate with NV than SW versions, as were 32 human participants hearing these items. The chimpanzee's ability to spontaneously recognize acoustically reduced synthetic words suggests that experience rather than specialization is critical for speech-perception capabilities that some have suggested are uniquely human [12–14]. PMID:21723125

  6. Speech signal recognition with the homotopic representation method

    NASA Astrophysics Data System (ADS)

    Bianchi, F.; Pocci, P.; Prina-Ricotti, L.

    1981-02-01

    Speech recognition by a computer, using homotopic representation, is introduced, including the algorithm and the processing mode for the speech signal, the result of a vowel recognition experiment, and the result of a phonetic transcription experiment with simple words composed of four phonemes. The signal is stored in a delay line of M elements and N = M + 1 outputs. Homotopic defines a pair of outputs symmetrical to the exit located in the central element of the delay line. When the products of sample homotopic outputs of the first sequence of pair sampling are found, they are separately summed to the products of the following processing. This procedure is repeated continuously so that at every instant the transform function is the result of the last processing and the weighted sum of the previous result. In tests a female /o/ is recognized as /a/. Of 320 test phonemes, 15 are mistaken and 7 are dubious.

  7. The effects of spectral and temporal parameters on perceived confirmation of an auditory non-speech signal.

    PubMed

    Bodendörfer, Xaver; Kortekaas, Reinier; Weingarten, Markus; Schlittmeier, Sabine

    2015-08-01

    In human-machine interactions, the confirmation of an action or input is a very important information for users. A paired comparison experiment explored the effects of four acoustic parameters on the perceived confirmation of auditory non-speech signals. Reducing the frequency-ratio and the pulse-to-pulse time between two successive pulses increased perceived confirmation. The effects of the parameters frequency and number of pulses were not clear-cut. The results provide information for designing auditory confirmation signals. It is shown that findings about the effects of certain parameters on the perceived urgency of warning signals cannot be easily inverted to perceived confirmation. PMID:26328737

  8. Effects of Lengthening the Speech Signal on Auditory Word Discrimination in Kindergartners with SLI

    ERIC Educational Resources Information Center

    Segers, Eliane; Verhoeven, Ludo

    2005-01-01

    In the present study, it was investigated whether kindergartners with specific language impairment (SLI) and normal language achieving (NLA) kindergartners can benefit from slowing down the entire speech signal or part of the speech signal in a synthetic speech discrimination task. Subjects were 19 kindergartners with SLI and 24 NLA controls.…

  9. Acoustic signalling reflects personality in a social mammal

    PubMed Central

    Friel, Mary; Kunc, Hansjoerg P.; Griffin, Kym; Asher, Lucy; Collins, Lisa M.

    2016-01-01

    Social interactions among individuals are often mediated through acoustic signals. If acoustic signals are consistent and related to an individual's personality, these consistent individual differences in signalling may be an important driver in social interactions. However, few studies in non-human mammals have investigated the relationship between acoustic signalling and personality. Here we show that acoustic signalling rate is repeatable and strongly related to personality in a highly social mammal, the domestic pig (Sus scrofa domestica). Furthermore, acoustic signalling varied between environments of differing quality, with males from a poor-quality environment having a reduced vocalization rate compared with females and males from an enriched environment. Such differences may be mediated by personality with pigs from a poor-quality environment having more reactive and more extreme personality scores compared with pigs from an enriched environment. Our results add to the evidence that acoustic signalling reflects personality in a non-human mammal. Signals reflecting personalities may have far reaching consequences in shaping the evolution of social behaviours as acoustic communication forms an integral part of animal societies. PMID:27429775

  10. Acoustic signalling reflects personality in a social mammal.

    PubMed

    Friel, Mary; Kunc, Hansjoerg P; Griffin, Kym; Asher, Lucy; Collins, Lisa M

    2016-06-01

    Social interactions among individuals are often mediated through acoustic signals. If acoustic signals are consistent and related to an individual's personality, these consistent individual differences in signalling may be an important driver in social interactions. However, few studies in non-human mammals have investigated the relationship between acoustic signalling and personality. Here we show that acoustic signalling rate is repeatable and strongly related to personality in a highly social mammal, the domestic pig (Sus scrofa domestica). Furthermore, acoustic signalling varied between environments of differing quality, with males from a poor-quality environment having a reduced vocalization rate compared with females and males from an enriched environment. Such differences may be mediated by personality with pigs from a poor-quality environment having more reactive and more extreme personality scores compared with pigs from an enriched environment. Our results add to the evidence that acoustic signalling reflects personality in a non-human mammal. Signals reflecting personalities may have far reaching consequences in shaping the evolution of social behaviours as acoustic communication forms an integral part of animal societies. PMID:27429775

  11. From prosodic structure to acoustic saliency: A fMRI investigation of speech rate, clarity, and emphasis

    NASA Astrophysics Data System (ADS)

    Golfinopoulos, Elisa

    Acoustic variability in fluent speech can arise at many stages in speech production planning and execution. For example, at the phonological encoding stage, the grouping of phonemes into syllables determines which segments are coarticulated and, by consequence, segment-level acoustic variation. Likewise phonetic encoding, which determines the spatiotemporal extent of articulatory gestures, will affect the acoustic detail of segments. Functional magnetic resonance imaging (fMRI) was used to measure brain activity of fluent adult speakers in four speaking conditions: fast, normal, clear, and emphatic (or stressed) speech. These speech manner changes typically result in acoustic variations that do not change the lexical or semantic identity of productions but do affect the acoustic saliency of phonemes, syllables and/or words. Acoustic responses recorded inside the scanner were assessed quantitatively using eight acoustic measures and sentence duration was used as a covariate of non-interest in the neuroimaging analysis. Compared to normal speech, emphatic speech was characterized acoustically by a greater difference between stressed and unstressed vowels in intensity, duration, and fundamental frequency, and neurally by increased activity in right middle premotor cortex and supplementary motor area, and bilateral primary sensorimotor cortex. These findings are consistent with right-lateralized motor planning of prosodic variation in emphatic speech. Clear speech involved an increase in average vowel and sentence durations and average vowel spacing, along with increased activity in left middle premotor cortex and bilateral primary sensorimotor cortex. These findings are consistent with an increased reliance on feedforward control, resulting in hyper-articulation, under clear as compared to normal speech. Fast speech was characterized acoustically by reduced sentence duration and average vowel spacing, and neurally by increased activity in left anterior frontal

  12. 4D time-frequency representation for binaural speech signal processing

    NASA Astrophysics Data System (ADS)

    Mikhael, Raed; Szu, Harold H.

    2006-04-01

    Hearing is the ability to detect and process auditory information produced by the vibrating hair cilia residing in the corti of the ears to the auditory cortex of the brain via the auditory nerve. The primary and secondary corti of the brain interact with one another to distinguish and correlate the received information by distinguishing the varying spectrum of arriving frequencies. Binaural hearing is nature's way of employing the power inherent in working in pairs to process information, enhance sound perception, and reduce undesired noise. One ear might play a prominent role in sound recognition, while the other reinforces their perceived mutual information. Developing binaural hearing aid devices can be crucial in emulating the working powers of two ears and may be a step closer to significantly alleviating hearing loss of the inner ear. This can be accomplished by combining current speech research to already existing technologies such as RF communication between PDAs and Bluetooth. Ear Level Instrument (ELI) developed by Micro-tech Hearing Instruments and Starkey Laboratories is a good example of a digital bi-directional signal communicating between a PDA/mobile phone and Bluetooth. The agreement and disagreement of arriving auditory information to the Bluetooth device can be classified as sound and noise, respectively. Finding common features of arriving sound using a four coordinate system for sound analysis (four dimensional time-frequency representation), noise can be greatly reduced and hearing aids would become more efficient. Techniques developed by Szu within an Artificial Neural Network (ANN), Blind Source Separation (BSS), Adaptive Wavelets Transform (AWT), and Independent Component Analysis (ICA) hold many possibilities to the improvement of acoustic segmentation of phoneme, all of which will be discussed in this paper. Transmitted and perceived acoustic speech signal will improve, as the binaural hearing aid will emulate two ears in sound

  13. Acoustic-Phonetic Differences between Infant- and Adult-Directed Speech: The Role of Stress and Utterance Position

    ERIC Educational Resources Information Center

    Wang, Yuanyuan; Seidl, Amanda; Cristia, Alejandrina

    2015-01-01

    Previous studies have shown that infant-directed speech (IDS) differs from adult-directed speech (ADS) on a variety of dimensions. The aim of the current study was to investigate whether acoustic differences between IDS and ADS in English are modulated by prosodic structure. We compared vowels across the two registers (IDS, ADS) in both stressed…

  14. Speech after Radial Forearm Free Flap Reconstruction of the Tongue: A Longitudinal Acoustic Study of Vowel and Diphthong Sounds

    ERIC Educational Resources Information Center

    Laaksonen, Juha-Pertti; Rieger, Jana; Happonen, Risto-Pekka; Harris, Jeffrey; Seikaly, Hadi

    2010-01-01

    The purpose of this study was to use acoustic analyses to describe speech outcomes over the course of 1 year after radial forearm free flap (RFFF) reconstruction of the tongue. Eighteen Canadian English-speaking females and males with reconstruction for oral cancer had speech samples recorded (pre-operative, and 1 month, 6 months, and 1 year…

  15. An eighth-scale speech source for subjective assessments in acoustic models

    NASA Astrophysics Data System (ADS)

    Orlowski, R. J.

    1981-08-01

    The design of a source is described which is suitable for making speech recordings in eighth-scale acoustic models of auditoria. An attempt was made to match the directionality of the source with the directionality of the human voice using data reported in the literature. A narrow aperture was required for the design which was provided by mounting an inverted conical horn over the diaphragm of a high frequency loudspeaker. Resonance problems were encountered with the use of a horn and a description is given of the electronic techniques adopted to minimize the effect of these resonances. Subjective and objective assessments on the completed speech source have proved satisfactory. It has been used in a modelling exercise concerned with the acoustic design of a theatre with a thrust-type stage.

  16. Interpretation of acoustic signals from fluidzed beds

    SciTech Connect

    Halow, J.S.; Daw, C.S.; Finney, C.E.A.; Nguyen, K.

    1996-12-31

    Rhythmic {open_quotes}whooshing{close_quotes} sounds associated with rising bubbles are a characteristic feature of many fluidized beds. Although clearly distinguishable to the ear, these sounds are rather complicated in detail and seem to contain a large background of apparently irrelevant stochastic noise. While it is clear that these sounds contain some information about bed dynamics, it is not obvious how this information can be interpreted in a meaningful way. In this presentation we describe a technique for processing bed sounds that appears to work well for beds with large particles operating in a slugging or near-slugging mode. We find that our processing algorithm allows us to determine important bubble/slug features from sound measurements alone, including slug location at any point in time, the average bubble frequency and frequency variation, and corresponding dynamic pressure drops at different bed locations. We also have been able to correlate a portion of the acoustic signal with particle impacts on surfaces and particle motions near the grid. We conclude from our observations that relatively simple sound measurements can provide much diagnostic information and could be potentially used for bed control. 5 refs., 4 figs.

  17. Speech perception at positive signal-to-noise ratios using adaptive adjustment of time compression.

    PubMed

    Schlueter, Anne; Brand, Thomas; Lemke, Ulrike; Nitzschner, Stefan; Kollmeier, Birger; Holube, Inga

    2015-11-01

    Positive signal-to-noise ratios (SNRs) characterize listening situations most relevant for hearing-impaired listeners in daily life and should therefore be considered when evaluating hearing aid algorithms. For this, a speech-in-noise test was developed and evaluated, in which the background noise is presented at fixed positive SNRs and the speech rate (i.e., the time compression of the speech material) is adaptively adjusted. In total, 29 younger and 12 older normal-hearing, as well as 24 older hearing-impaired listeners took part in repeated measurements. Younger normal-hearing and older hearing-impaired listeners conducted one of two adaptive methods which differed in adaptive procedure and step size. Analysis of the measurements with regard to list length and estimation strategy for thresholds resulted in a practical method measuring the time compression for 50% recognition. This method uses time-compression adjustment and step sizes according to Versfeld and Dreschler [(2002). J. Acoust. Soc. Am. 111, 401-408], with sentence scoring, lists of 30 sentences, and a maximum likelihood method for threshold estimation. Evaluation of the procedure showed that older participants obtained higher test-retest reliability compared to younger participants. Depending on the group of listeners, one or two lists are required for training prior to data collection. PMID:26627804

  18. Thirty years of underwater acoustic signal processing in China

    NASA Astrophysics Data System (ADS)

    Li, Qihu

    2012-11-01

    Advances in technology and theory in 30 years of underwater acoustic signal processing and its applications in China are presented in this paper. The topics include research work in the field of underwater acoustic signal modeling, acoustic field matching, ocean waveguide and internal wave, the extraction and processing technique for acoustic vector signal information, the space/time correlation characteristics of low frequency acoustic channels, the invariant features of underwater target radiated noise, the transmission technology of underwater voice/image data and its anti-interference technique. Some frontier technologies in sonar design are also discussed, including large aperture towed line array sonar, high resolution synthetic aperture sonar, deep sea siren and deep sea manned subsea vehicle, diver detection sonar and demonstration projector of national ocean monitoring system in China, etc.

  19. Predicting binaural speech intelligibility using the signal-to-noise ratio in the envelope power spectrum domain.

    PubMed

    Chabot-Leclerc, Alexandre; MacDonald, Ewen N; Dau, Torsten

    2016-07-01

    This study proposes a binaural extension to the multi-resolution speech-based envelope power spectrum model (mr-sEPSM) [Jørgensen, Ewert, and Dau (2013). J. Acoust. Soc. Am. 134, 436-446]. It consists of a combination of better-ear (BE) and binaural unmasking processes, implemented as two monaural realizations of the mr-sEPSM combined with a short-term equalization-cancellation process, and uses the signal-to-noise ratio in the envelope domain (SNRenv) as the decision metric. The model requires only two parameters to be fitted per speech material and does not require an explicit frequency weighting. The model was validated against three data sets from the literature, which covered the following effects: the number of maskers, the masker types [speech-shaped noise (SSN), speech-modulated SSN, babble, and reversed speech], the masker(s) azimuths, reverberation on the target and masker, and the interaural time difference of the target and masker. The Pearson correlation coefficient between the simulated speech reception thresholds and the data across all experiments was 0.91. A model version that considered only BE processing performed similarly (correlation coefficient of 0.86) to the complete model, suggesting that BE processing could be considered sufficient to predict intelligibility in most realistic conditions. PMID:27475146

  20. Are you a good mimic? Neuro-acoustic signatures for speech imitation ability

    PubMed Central

    Reiterer, Susanne M.; Hu, Xiaochen; Sumathi, T. A.; Singh, Nandini C.

    2013-01-01

    We investigated individual differences in speech imitation ability in late bilinguals using a neuro-acoustic approach. One hundred and thirty-eight German-English bilinguals matched on various behavioral measures were tested for “speech imitation ability” in a foreign language, Hindi, and categorized into “high” and “low ability” groups. Brain activations and speech recordings were obtained from 26 participants from the two extreme groups as they performed a functional neuroimaging experiment which required them to “imitate” sentences in three conditions: (A) German, (B) English, and (C) German with fake English accent. We used recently developed novel acoustic analysis, namely the “articulation space” as a metric to compare speech imitation abilities of the two groups. Across all three conditions, direct comparisons between the two groups, revealed brain activations (FWE corrected, p < 0.05) that were more widespread with significantly higher peak activity in the left supramarginal gyrus and postcentral areas for the low ability group. The high ability group, on the other hand showed significantly larger articulation space in all three conditions. In addition, articulation space also correlated positively with imitation ability (Pearson's r = 0.7, p < 0.01). Our results suggest that an expanded articulation space for high ability individuals allows access to a larger repertoire of sounds, thereby providing skilled imitators greater flexibility in pronunciation and language learning. PMID:24155739

  1. The Effectiveness of Clear Speech as a Masker

    ERIC Educational Resources Information Center

    Calandruccio, Lauren; Van Engen, Kristin; Dhar, Sumitrajit; Bradlow, Ann R.

    2010-01-01

    Purpose: It is established that speaking clearly is an effective means of enhancing intelligibility. Because any signal-processing scheme modeled after known acoustic-phonetic features of clear speech will likely affect both target and competing speech, it is important to understand how speech recognition is affected when a competing speech signal…

  2. Segment-based acoustic models for continuous speech recognition

    NASA Astrophysics Data System (ADS)

    Ostendorf, Mari; Rohlicek, J. R.

    1993-07-01

    This research aims to develop new and more accurate stochastic models for speaker-independent continuous speech recognition, by extending previous work in segment-based modeling and by introducing a new hierarchical approach to representing intra-utterance statistical dependencies. These techniques, which are more costly than traditional approaches because of the large search space associated with higher order models, are made feasible through rescoring a set of HMM-generated N-best sentence hypotheses. We expect these different modeling techniques to result in improved recognition performance over that achieved by current systems, which handle only frame-based observations and assume that these observations are independent given an underlying state sequence. In the fourth quarter of the project, we have completed the following: (1) ported our recognition system to the Wall Street Journal task, a standard task in the ARPA community; (2) developed an initial dependency-tree model of intra-utterance observation correlation; and (3) implemented baseline language model estimation software. Our initial results on the Wall Street Journal task are quite good and represent significantly improved performance over most HMM systems reporting on the Nov. 1992 5k vocabulary test set.

  3. On the Acoustics of Emotion in Audio: What Speech, Music, and Sound have in Common

    PubMed Central

    Weninger, Felix; Eyben, Florian; Schuller, Björn W.; Mortillaro, Marcello; Scherer, Klaus R.

    2013-01-01

    Without doubt, there is emotional information in almost any kind of sound received by humans every day: be it the affective state of a person transmitted by means of speech; the emotion intended by a composer while writing a musical piece, or conveyed by a musician while performing it; or the affective state connected to an acoustic event occurring in the environment, in the soundtrack of a movie, or in a radio play. In the field of affective computing, there is currently some loosely connected research concerning either of these phenomena, but a holistic computational model of affect in sound is still lacking. In turn, for tomorrow’s pervasive technical systems, including affective companions and robots, it is expected to be highly beneficial to understand the affective dimensions of “the sound that something makes,” in order to evaluate the system’s auditory environment and its own audio output. This article aims at a first step toward a holistic computational model: starting from standard acoustic feature extraction schemes in the domains of speech, music, and sound analysis, we interpret the worth of individual features across these three domains, considering four audio databases with observer annotations in the arousal and valence dimensions. In the results, we find that by selection of appropriate descriptors, cross-domain arousal, and valence regression is feasible achieving significant correlations with the observer annotations of up to 0.78 for arousal (training on sound and testing on enacted speech) and 0.60 for valence (training on enacted speech and testing on music). The high degree of cross-domain consistency in encoding the two main dimensions of affect may be attributable to the co-evolution of speech and music from multimodal affect bursts, including the integration of nature sounds for expressive effects. PMID:23750144

  4. Measurement of vocal doses in speech: experimental procedure and signal processing.

    PubMed

    Svec, Jan G; Popolo, Peter S; Titze, Ingo R

    2003-01-01

    An experimental method for quantifying the amount of voicing over time is described in a tutorial manner. A new procedure for obtaining calibrated sound pressure levels (SPL) of speech from a head-mounted microphone is offered. An algorithm for voicing detection (kv) and fundamental frequency (F0) extraction from an electroglottographic signal is described. The extracted values of SPL, F0, and kv are used to derive five vocal doses: the time dose (total voicing time), the cycle dose (total number of vocal fold oscillatory cycles), the distance dose (total distance travelled by the vocal folds in an oscillatory path), the energy dissipation dose (total amount of heat energy dissipated in the vocal folds) and the radiated energy dose (total acoustic energy radiated from the mouth). The doses measure the vocal load and can be used for studying the effects of vocal fold tissue exposure to vibration. PMID:14686546

  5. Compensating Acoustic Mismatch Using Class-Based Histogram Equalization for Robust Speech Recognition

    NASA Astrophysics Data System (ADS)

    Suh, Youngjoo; Kim, Sungtak; Kim, Hoirin

    2007-12-01

    A new class-based histogram equalization method is proposed for robust speech recognition. The proposed method aims at not only compensating for an acoustic mismatch between training and test environments but also reducing the two fundamental limitations of the conventional histogram equalization method, the discrepancy between the phonetic distributions of training and test speech data, and the nonmonotonic transformation caused by the acoustic mismatch. The algorithm employs multiple class-specific reference and test cumulative distribution functions, classifies noisy test features into their corresponding classes, and equalizes the features by using their corresponding class reference and test distributions. The minimum mean-square error log-spectral amplitude (MMSE-LSA)-based speech enhancement is added just prior to the baseline feature extraction to reduce the corruption by additive noise. The experiments on the Aurora2 database proved the effectiveness of the proposed method by reducing relative errors by[InlineEquation not available: see fulltext.] over the mel-cepstral-based features and by[InlineEquation not available: see fulltext.] over the conventional histogram equalization method, respectively.

  6. Integration of auditory and somatosensory error signals in the neural control of speech movements

    PubMed Central

    Feng, Yongqiang; Gracco, Vincent L.

    2011-01-01

    We investigated auditory and somatosensory feedback contributions to the neural control of speech. In task I, sensorimotor adaptation was studied by perturbing one of these sensory modalities or both modalities simultaneously. The first formant (F1) frequency in the auditory feedback was shifted up by a real-time processor and/or the extent of jaw opening was increased or decreased with a force field applied by a robotic device. All eight subjects lowered F1 to compensate for the up-shifted F1 in the feedback signal regardless of whether or not the jaw was perturbed. Adaptive changes in subjects' acoustic output resulted from adjustments in articulatory movements of the jaw or tongue. Adaptation in jaw opening extent in response to the mechanical perturbation occurred only when no auditory feedback perturbation was applied or when the direction of adaptation to the force was compatible with the direction of adaptation to a simultaneous acoustic perturbation. In tasks II and III, subjects' auditory and somatosensory precision and accuracy were estimated. Correlation analyses showed that the relationships 1) between F1 adaptation extent and auditory acuity for F1 and 2) between jaw position adaptation extent and somatosensory acuity for jaw position were weak and statistically not significant. Taken together, the combined findings from this work suggest that, in speech production, sensorimotor adaptation updates the underlying control mechanisms in such a way that the planning of vowel-related articulatory movements takes into account a complex integration of error signals from previous trials but likely with a dominant role for the auditory modality. PMID:21562187

  7. Efficient blind dereverberation and echo cancellation based on independent component analysis for actual acoustic signals.

    PubMed

    Takeda, Ryu; Nakadai, Kazuhiro; Takahashi, Toru; Komatani, Kazunori; Ogata, Tetsuya; Okuno, Hiroshi G

    2012-01-01

    This letter presents a new algorithm for blind dereverberation and echo cancellation based on independent component analysis (ICA) for actual acoustic signals. We focus on frequency domain ICA (FD-ICA) because its computational cost and speed of learning convergence are sufficiently reasonable for practical applications such as hands-free speech recognition. In applying conventional FD-ICA as a preprocessing of automatic speech recognition in noisy environments, one of the most critical problems is how to cope with reverberations. To extract a clean signal from the reverberant observation, we model the separation process in the short-time Fourier transform domain and apply the multiple input/output inverse-filtering theorem (MINT) to the FD-ICA separation model. A naive implementation of this method is computationally expensive, because its time complexity is the second order of reverberation time. Therefore, the main issue in dereverberation is to reduce the high computational cost of ICA. In this letter, we reduce the computational complexity to the linear order of the reverberation time by using two techniques: (1) a separation model based on the independence of delayed observed signals with MINT and (2) spatial sphering for preprocessing. Experiments show that the computational cost grows in proportion to the linear order of the reverberation time and that our method improves the word correctness of automatic speech recognition by 10 to 20 points in a RT₂₀= 670 ms reverberant environment. PMID:22023192

  8. Logopenic and nonfluent variants of primary progressive aphasia are differentiated by acoustic measures of speech production.

    PubMed

    Ballard, Kirrie J; Savage, Sharon; Leyton, Cristian E; Vogel, Adam P; Hornberger, Michael; Hodges, John R

    2014-01-01

    Differentiation of logopenic (lvPPA) and nonfluent/agrammatic (nfvPPA) variants of Primary Progressive Aphasia is important yet remains challenging since it hinges on expert based evaluation of speech and language production. In this study acoustic measures of speech in conjunction with voxel-based morphometry were used to determine the success of the measures as an adjunct to diagnosis and to explore the neural basis of apraxia of speech in nfvPPA. Forty-one patients (21 lvPPA, 20 nfvPPA) were recruited from a consecutive sample with suspected frontotemporal dementia. Patients were diagnosed using the current gold-standard of expert perceptual judgment, based on presence/absence of particular speech features during speaking tasks. Seventeen healthy age-matched adults served as controls. MRI scans were available for 11 control and 37 PPA cases; 23 of the PPA cases underwent amyloid ligand PET imaging. Measures, corresponding to perceptual features of apraxia of speech, were periods of silence during reading and relative vowel duration and intensity in polysyllable word repetition. Discriminant function analyses revealed that a measure of relative vowel duration differentiated nfvPPA cases from both control and lvPPA cases (r(2) = 0.47) with 88% agreement with expert judgment of presence of apraxia of speech in nfvPPA cases. VBM analysis showed that relative vowel duration covaried with grey matter intensity in areas critical for speech motor planning and programming: precentral gyrus, supplementary motor area and inferior frontal gyrus bilaterally, only affected in the nfvPPA group. This bilateral involvement of frontal speech networks in nfvPPA potentially affects access to compensatory mechanisms involving right hemisphere homologues. Measures of silences during reading also discriminated the PPA and control groups, but did not increase predictive accuracy. Findings suggest that a measure of relative vowel duration from of a polysyllable word repetition task

  9. Logopenic and Nonfluent Variants of Primary Progressive Aphasia Are Differentiated by Acoustic Measures of Speech Production

    PubMed Central

    Ballard, Kirrie J.; Savage, Sharon; Leyton, Cristian E.; Vogel, Adam P.; Hornberger, Michael; Hodges, John R.

    2014-01-01

    Differentiation of logopenic (lvPPA) and nonfluent/agrammatic (nfvPPA) variants of Primary Progressive Aphasia is important yet remains challenging since it hinges on expert based evaluation of speech and language production. In this study acoustic measures of speech in conjunction with voxel-based morphometry were used to determine the success of the measures as an adjunct to diagnosis and to explore the neural basis of apraxia of speech in nfvPPA. Forty-one patients (21 lvPPA, 20 nfvPPA) were recruited from a consecutive sample with suspected frontotemporal dementia. Patients were diagnosed using the current gold-standard of expert perceptual judgment, based on presence/absence of particular speech features during speaking tasks. Seventeen healthy age-matched adults served as controls. MRI scans were available for 11 control and 37 PPA cases; 23 of the PPA cases underwent amyloid ligand PET imaging. Measures, corresponding to perceptual features of apraxia of speech, were periods of silence during reading and relative vowel duration and intensity in polysyllable word repetition. Discriminant function analyses revealed that a measure of relative vowel duration differentiated nfvPPA cases from both control and lvPPA cases (r2 = 0.47) with 88% agreement with expert judgment of presence of apraxia of speech in nfvPPA cases. VBM analysis showed that relative vowel duration covaried with grey matter intensity in areas critical for speech motor planning and programming: precentral gyrus, supplementary motor area and inferior frontal gyrus bilaterally, only affected in the nfvPPA group. This bilateral involvement of frontal speech networks in nfvPPA potentially affects access to compensatory mechanisms involving right hemisphere homologues. Measures of silences during reading also discriminated the PPA and control groups, but did not increase predictive accuracy. Findings suggest that a measure of relative vowel duration from of a polysyllable word repetition task

  10. Effective Acoustic Modeling for Pronunciation Quality Scoring of Strongly Accented Mandarin Speech

    NASA Astrophysics Data System (ADS)

    Ge, Fengpei; Liu, Changliang; Shao, Jian; Pan, Fuping; Dong, Bin; Yan, Yonghong

    In this paper we present our investigation into improving the performance of our computer-assisted language learning (CALL) system through exploiting the acoustic model and features within the speech recognition framework. First, to alleviate channel distortion, speaker-dependent cepstrum mean normalization (CMN) is adopted and the average correlation coefficient (average CC) between machine and expert scores is improved from 78.00% to 84.14%. Second, heteroscedastic linear discriminant analysis (HLDA) is adopted to enhance the discriminability of the acoustic model, which successfully increases the average CC from 84.14% to 84.62%. Additionally, HLDA causes the scoring accuracy to be more stable at various pronunciation proficiency levels, and thus leads to an increase in the speaker correct-rank rate from 85.59% to 90.99%. Finally, we use maximum a posteriori (MAP) estimation to tune the acoustic model to fit strongly accented test speech. As a result, the average CC is improved from 84.62% to 86.57%. These three novel techniques improve the accuracy of evaluating pronunciation quality.

  11. Effect of several acoustic cues on perceiving Mandarin retroflex affricates and fricatives in continuous speech.

    PubMed

    Zhu, Jian; Chen, Yaping

    2016-07-01

    Relatively little attention has been paid to the perception of the three-way contrast between unaspirated affricates, aspirated affricates and fricatives in Mandarin Chinese. This study reports two experiments that explore the acoustic cues relevant to the contrast between the Mandarin retroflex series /tʂ/, /tʂ(h)/ and /ʂ/ in continuous speech. Twenty participants performed two three-alternative forced-choice tasks, in which acoustic cues including closure, frication duration (FD), aspiration, and vocalic contexts (VCs) were systematically manipulated and presented in a carrier phrase. A subsequent classification tree analysis shows that FD distinguishes /tʂ/ from /tʂ(h)/ and /ʂ/, and that closure cues the affricate manner. Interactions between VC and individual cues are also found. The FD threshold for separating /ʂ/ and /tʂ/ is susceptible to the influence of the following vocalic segments, shifting to lower values if frication is followed by the low vowel /a/. On the other hand, while aspiration cues /tʂ(h)/ before /a/ and //, this acoustic cue is obscured by gesture continuation when /tʂ(h)/ precedes its homorganic approximant /ɻ/ in natural speech, which might cause potential confusion between /tʂ(h)/ and /ʂ/. PMID:27475170

  12. The role of metrical information in apraxia of speech. Perceptual and acoustic analyses of word stress.

    PubMed

    Aichert, Ingrid; Späth, Mona; Ziegler, Wolfram

    2016-02-01

    Several factors are known to influence speech accuracy in patients with apraxia of speech (AOS), e.g., syllable structure or word length. However, the impact of word stress has largely been neglected so far. More generally, the role of prosodic information at the phonetic encoding stage of speech production often remains unconsidered in models of speech production. This study aimed to investigate the influence of word stress on error production in AOS. Two-syllabic words with stress on the first (trochees) vs. the second syllable (iambs) were compared in 14 patients with AOS, three of them exhibiting pure AOS, and in a control group of six normal speakers. The patients produced significantly more errors on iambic than on trochaic words. A most prominent metrical effect was obtained for segmental errors. Acoustic analyses of word durations revealed a disproportionate advantage of the trochaic meter in the patients relative to the healthy controls. The results indicate that German apraxic speakers are sensitive to metrical information. It is assumed that metrical patterns function as prosodic frames for articulation planning, and that the regular metrical pattern in German, the trochaic form, has a facilitating effect on word production in patients with AOS. PMID:26792367

  13. Emphasis of short-duration acoustic speech cues for cochlear implant users.

    PubMed

    Vandali, A E

    2001-05-01

    A new speech-coding strategy for cochlear implant users, called the transient emphasis spectral maxima (TESM), was developed to aid perception of short-duration transient cues in speech. Speech-perception scores using the TESM strategy were compared to scores using the spectral maxima sound processor (SMSP) strategy in a group of eight adult users of the Nucleus 22 cochlear implant system. Significant improvements in mean speech-perception scores for the group were obtained on CNC open-set monosyllabic word tests in quiet (SMSP: 53.6% TESM: 61.3%, p<0.001), and on MUSL open-set sentence tests in multitalker noise (SMSP: 64.9% TESM: 70.6%, p<0.001). Significant increases were also shown for consonant scores in the word test (SMSP: 75.1% TESM: 80.6%, p<0.001) and for vowel scores in the word test (SMSP: 83.1% TESM: 85.7%, p<0.05). Analysis of consonant perception results from the CNC word tests showed that perception of nasal, stop, and fricative consonant discrimination was most improved. Information transmission analysis indicated that place of articulation was most improved, although improvements were also evident for manner of articulation. The increases in discrimination were shown to be related to improved coding of short-duration acoustic cues, particularly those of low intensity. PMID:11386557

  14. A Bayesian view on acoustic model-based techniques for robust speech recognition

    NASA Astrophysics Data System (ADS)

    Maas, Roland; Huemmer, Christian; Sehr, Armin; Kellermann, Walter

    2015-12-01

    This article provides a unifying Bayesian view on various approaches for acoustic model adaptation, missing feature, and uncertainty decoding that are well-known in the literature of robust automatic speech recognition. The representatives of these classes can often be deduced from a Bayesian network that extends the conventional hidden Markov models used in speech recognition. These extensions, in turn, can in many cases be motivated from an underlying observation model that relates clean and distorted feature vectors. By identifying and converting the observation models into a Bayesian network representation, we formulate the corresponding compensation rules. We thus summarize the various approaches as approximations or modifications of the same Bayesian decoding rule leading to a unified view on known derivations as well as to new formulations for certain approaches.

  15. Pulse analysis of acoustic emission signals

    NASA Technical Reports Server (NTRS)

    Houghton, J. R.; Packman, P. F.

    1977-01-01

    A method for the signature analysis of pulses in the frequency domain and the time domain is presented. Fourier spectrum, Fourier transfer function, shock spectrum and shock spectrum ratio were examined in the frequency domain analysis, and pulse shape deconvolution was developed for use in the time domain analysis. Comparisons of the relative performance of each analysis technique are made for the characterization of acoustic emission pulses recorded by a measuring system. To demonstrate the relative sensitivity of each of the methods to small changes in the pulse shape, signatures of computer modeled systems with analytical pulses are presented. Optimization techniques are developed and used to indicate the best design parameters values for deconvolution of the pulse shape. Several experiments are presented that test the pulse signature analysis methods on different acoustic emission sources. These include acoustic emissions associated with: (1) crack propagation, (2) ball dropping on a plate, (3) spark discharge and (4) defective and good ball bearings. Deconvolution of the first few micro-seconds of the pulse train are shown to be the region in which the significant signatures of the acoustic emission event are to be found.

  16. Pulse analysis of acoustic emission signals

    NASA Technical Reports Server (NTRS)

    Houghton, J. R.; Packman, P. F.

    1977-01-01

    A method for the signature analysis of pulses in the frequency domain and the time domain is presented. Fourier spectrum, Fourier transfer function, shock spectrum and shock spectrum ratio were examined in the frequency domain analysis and pulse shape deconvolution was developed for use in the time domain analysis. Comparisons of the relative performance of each analysis technique are made for the characterization of acoustic emission pulses recorded by a measuring system. To demonstrate the relative sensitivity of each of the methods to small changes in the pulse shape, signatures of computer modeled systems with analytical pulses are presented. Optimization techniques are developed and used to indicate the best design parameter values for deconvolution of the pulse shape. Several experiments are presented that test the pulse signature analysis methods on different acoustic emission sources. These include acoustic emission associated with (a) crack propagation, (b) ball dropping on a plate, (c) spark discharge, and (d) defective and good ball bearings. Deconvolution of the first few micro-seconds of the pulse train is shown to be the region in which the significant signatures of the acoustic emission event are to be found.

  17. The effects of selective attention and speech acoustics on neural speech-tracking in a multi-talker scene

    PubMed Central

    Rimmele, Johanna M.; Golumbic, Elana Zion; Schröger, Erich; Poeppel, David

    2015-01-01

    Attending to one speaker in multi-speaker situations is challenging. One neural mechanism proposed to underlie the ability to attend to a particular speaker is phase-locking of low-frequency activity in auditory cortex to speech’s temporal envelope (“speech-tracking”), which is more precise for attended speech. However, it is not known what brings about this attentional effect, and specifically if it reflects enhanced processing of the fine structure of attended speech. To investigate this question we compared attentional effects on speech-tracking of natural vs. vocoded speech which preserves the temporal envelope but removes the fine-structure of speech. Pairs of natural and vocoded speech stimuli were presented concurrently and participants attended to one stimulus and performed a detection task while ignoring the other stimulus. We recorded magnetoencephalography (MEG) and compared attentional effects on the speech-tracking response in auditory cortex. Speech-tracking of natural, but not vocoded, speech was enhanced by attention, whereas neural tracking of ignored speech was similar for natural and vocoded speech. These findings suggest that the more precise speech tracking of attended natural speech is related to processing its fine structure, possibly reflecting the application of higher-order linguistic processes. In contrast, when speech is unattended its fine structure is not processed to the same degree and thus elicits less precise speech tracking more similar to vocoded speech. PMID:25650107

  18. Acoustic Aspects of Photoacoustic Signal Generation and Detection in Gases

    NASA Astrophysics Data System (ADS)

    Miklós, A.

    2015-09-01

    In this paper photoacoustic signal generation and detection in gases is investigated and discussed from the standpoint of acoustics. Four topics are considered: the effect of the absorption-desorption process of modulated and pulsed light on the heat power density released in the gas; the generation of the primary sound by the released heat in an unbounded medium; the excitation of an acoustic resonator by the primary sound; and finally, the generation of the measurable PA signal by a microphone. When light is absorbed by a molecule and the excess energy is relaxed by collisions with the surrounding molecules, the average kinetic energy, thus also the temperature of an ensemble of molecules (called "particle" in acoustics) will increase. In other words heat energy is added to the energy of the particle. The rate of the energy transfer is characterized by the heat power density. A simple two-level model of absorption-desorption is applied for describing the heat power generation process for modulated and pulsed illumination. Sound generation by a laser beam in an unbounded medium is discussed by means of the Green's function technique. It is shown that the duration of the generated sound pulse depends mostly on beam geometry. A photoacoustic signal is mostly detected in a photoacoustic cell composed of acoustic resonators, buffers, filters, etc. It is not easy to interpret the measured PA signal in such a complicated acoustic system. The acoustic response of a PA detector to different kinds of excitations (modulated cw, pulsed, periodic pulse train) is discussed. It is shown that acoustic resonators respond very differently to modulated cw excitation and to excitation by a pulse train. The microphone for detecting the PA signal is also a part of the acoustic system; its properties have to be taken into account by the design of a PA detector. The moving membrane of the microphone absorbs acoustic energy; thus, it may influence the resonance frequency and

  19. Wavelet-based ground vehicle recognition using acoustic signals

    NASA Astrophysics Data System (ADS)

    Choe, Howard C.; Karlsen, Robert E.; Gerhart, Grant R.; Meitzler, Thomas J.

    1996-03-01

    We present, in this paper, a wavelet-based acoustic signal analysis to remotely recognize military vehicles using their sound intercepted by acoustic sensors. Since expedited signal recognition is imperative in many military and industrial situations, we developed an algorithm that provides an automated, fast signal recognition once implemented in a real-time hardware system. This algorithm consists of wavelet preprocessing, feature extraction and compact signal representation, and a simple but effective statistical pattern matching. The current status of the algorithm does not require any training. The training is replaced by human selection of reference signals (e.g., squeak or engine exhaust sound) distinctive to each individual vehicle based on human perception. This allows a fast archiving of any new vehicle type in the database once the signal is collected. The wavelet preprocessing provides time-frequency multiresolution analysis using discrete wavelet transform (DWT). Within each resolution level, feature vectors are generated from statistical parameters and energy content of the wavelet coefficients. After applying our algorithm on the intercepted acoustic signals, the resultant feature vectors are compared with the reference vehicle feature vectors in the database using statistical pattern matching to determine the type of vehicle from where the signal originated. Certainly, statistical pattern matching can be replaced by an artificial neural network (ANN); however, the ANN would require training data sets and time to train the net. Unfortunately, this is not always possible for many real world situations, especially collecting data sets from unfriendly ground vehicles to train the ANN. Our methodology using wavelet preprocessing and statistical pattern matching provides robust acoustic signal recognition. We also present an example of vehicle recognition using acoustic signals collected from two different military ground vehicles. In this paper, we will

  20. Advantages from bilateral hearing in speech perception in noise with simulated cochlear implants and residual acoustic hearing.

    PubMed

    Schoof, Tim; Green, Tim; Faulkner, Andrew; Rosen, Stuart

    2013-02-01

    Acoustic simulations were used to study the contributions of spatial hearing that may arise from combining a cochlear implant with either a second implant or contralateral residual low-frequency acoustic hearing. Speech reception thresholds (SRTs) were measured in twenty-talker babble. Spatial separation of speech and noise was simulated using a spherical head model. While low-frequency acoustic information contralateral to the implant simulation produced substantially better SRTs there was no effect of spatial cues on SRT, even when interaural differences were artificially enhanced. Simulated bilateral implants showed a significant head shadow effect, but no binaural unmasking based on interaural time differences, and weak, inconsistent overall spatial release from masking. There was also a small but significant non-spatial summation effect. It appears that typical cochlear implant speech processing strategies may substantially reduce the utility of spatial cues, even in the absence of degraded neural processing arising from auditory deprivation. PMID:23363118

  1. Acoustic signals of baby black caimans.

    PubMed

    Vergne, Amélie L; Aubin, Thierry; Taylor, Peter; Mathevon, Nicolas

    2011-12-01

    In spite of the importance of crocodilian vocalizations for the understanding of the evolution of sound communication in Archosauria and due to the small number of experimental investigations, information concerning the vocal world of crocodilians is limited. By studying black caimans Melanosuchus niger in their natural habitat, here we supply the experimental evidence that juvenile crocodilians can use a graded sound system in order to elicit adapted behavioral responses from their mother and siblings. By analyzing the acoustic structure of calls emitted in two different situations ('undisturbed context', during which spontaneous calls of juvenile caimans were recorded without perturbing the group, and a simulated 'predator attack', during which calls were recorded while shaking juveniles) and by testing their biological relevance through playback experiments, we reveal the existence of two functionally different types of juvenile calls that produce a different response from the mother and other siblings. Young black caimans can thus modulate the structure of their vocalizations along an acoustic continuum as a function of the emission context. Playback experiments show that both mother and juveniles discriminate between these 'distress' and 'contact' calls. Acoustic communication is thus an important component mediating relationships within family groups in caimans as it is in birds, their archosaurian relatives. Although probably limited, the vocal repertoire of young crocodilians is capable of transmitting the information necessary for allowing siblings and mother to modulate their behavior. PMID:21978842

  2. Acoustic Feature Optimization Based on F-Ratio for Robust Speech Recognition

    NASA Astrophysics Data System (ADS)

    Sun, Yanqing; Zhou, Yu; Zhao, Qingwei; Yan, Yonghong

    This paper focuses on the problem of performance degradation in mismatched speech recognition. The F-Ratio analysis method is utilized to analyze the significance of different frequency bands for speech unit classification, and we find that frequencies around 1kHz and 3kHz, which are the upper bounds of the first and the second formants for most of the vowels, should be emphasized in comparison to the Mel-frequency cepstral coefficients (MFCC). The analysis result is further observed to be stable in several typical mismatched situations. Similar to the Mel-Frequency scale, another frequency scale called the F-Ratio-scale is thus proposed to optimize the filter bank design for the MFCC features, and make each subband contains equal significance for speech unit classification. Under comparable conditions, with the modified features we get a relative 43.20% decrease compared with the MFCC in sentence error rate for the emotion affected speech recognition, 35.54%, 23.03% for the noisy speech recognition at 15dB and 0dB SNR (signal to noise ratio) respectively, and 64.50% for the three years' 863 test data. The application of the F-Ratio analysis on the clean training set of the Aurora2 database demonstrates its robustness over languages, texts and sampling rates.

  3. Speech motor learning changes the neural response to both auditory and somatosensory signals

    PubMed Central

    Ito, Takayuki; Coppola, Joshua H.; Ostry, David J.

    2016-01-01

    In the present paper, we present evidence for the idea that speech motor learning is accompanied by changes to the neural coding of both auditory and somatosensory stimuli. Participants in our experiments undergo adaptation to altered auditory feedback, an experimental model of speech motor learning which like visuo-motor adaptation in limb movement, requires that participants change their speech movements and associated somatosensory inputs to correct for systematic real-time changes to auditory feedback. We measure the sensory effects of adaptation by examining changes to auditory and somatosensory event-related responses. We find that adaptation results in progressive changes to speech acoustical outputs that serve to correct for the perturbation. We also observe changes in both auditory and somatosensory event-related responses that are correlated with the magnitude of adaptation. These results indicate that sensory change occurs in conjunction with the processes involved in speech motor adaptation. PMID:27181603

  4. A Frame-Based Context-Dependent Acoustic Modeling for Speech Recognition

    NASA Astrophysics Data System (ADS)

    Terashima, Ryuta; Zen, Heiga; Nankaku, Yoshihiko; Tokuda, Keiichi

    We propose a novel acoustic model for speech recognition, named FCD (Frame-based Context Dependent) model. It can obtain a probability distribution by using a top-down clustering technique to simultaneously consider the local frame position in phoneme, phoneme duration, and phoneme context. The model topology is derived from connecting left-to-right HMM models without self-loop transition for each phoneme duration. Because the FCD model can change the probability distribution into a sequence corresponding with one phoneme duration, it can has the ability to generate a smooth trajectory of speech feature vector. We also performed an experiment to evaluate the performance of speech recognition for the model. In the experiment, 132 questions for frame position, 66 questions for phoneme duration and 134 questions for phoneme context were used to train the sub-phoneme FCD model. In order to compare the performance, left-to-right HMM and two types of HSMM models with almost same number of states were also trained. As a result, 18% of relative improvement of tri-phone accuracy was achieved by the FCD model.

  5. Analysis of acoustic signals on welding and cutting

    SciTech Connect

    Morita, Takao; Ogawa, Yoji; Sumitomo, Takashi

    1995-12-31

    The sounds emitted during the welding and cutting processes are closely related to the processing phenomena, and sometimes they provide useful information for evaluation of their processing conditions. The analyses of acoustic signals from arc welding, plasma arc cutting, oxy-flame cutting, and water jet cutting are carried out in details in order to develop effective signal processing algorithm. The sound from TIG arc welding has the typical line spectrum which principal frequency, is almost the same as that of supplied electricity. The disturbance of welding process is clearly appeared oil the acoustic emission. The sound exposure level for CO{sub 2} or MIG welding is higher than that for TIG welding, and the relative intensity of the typical line spectrum caused by supplied electricity becomes low. But the sudden transition of welding condition oil produces an apparent change of sound exposure level. On the contrary, the acoustics from cutting processes are much louder than those of arc welding and show more chaotic behavior because the supplied fluid velocity and temperature of arc for cutting processes are much higher than those for welding processes. Therefore, it requires a special technique to extract the well meaning signals from the loud acoustic sounds. Further point of view, the reduction of acoustic exposure level becomes an important research theme with the growth of application fields of cutting processes.

  6. An ALE Meta-Analysis on the Audiovisual Integration of Speech Signals

    PubMed Central

    Erickson, Laura C.; Heeg, Elizabeth; Rauschecker, Josef P.; Turkeltaub, Peter E.

    2014-01-01

    The brain improves speech processing through the integration of audiovisual (AV) signals. Situations involving AV speech integration may be crudely dichotomized into those where auditory and visual inputs contain 1) equivalent, complementary signals (validating AV speech), or 2) inconsistent, different signals (conflicting AV speech). This simple framework may allow for the systematic examination of broad commonalities and differences between AV neural processes engaged by various experimental paradigms frequently used to study AV speech integration. We conducted an activation likelihood estimation (ALE) meta-analysis of 22 functional imaging studies comprising 33 experiments, 311 subjects, and 347 foci examining “conflicting” versus “validating” AV speech. Experimental paradigms included content congruency, timing synchrony, and perceptual measures, such as the McGurk effect or synchrony judgments, across AV speech stimulus types (sub-lexical to sentence). Co-localization of conflicting AV speech experiments revealed consistency across at least two contrast types (e.g., synchrony and congruency) in a network of dorsal-stream regions in the frontal, parietal, and temporal lobes. There was consistency across all contrast types (synchrony, congruency, and percept) in the bilateral posterior superior/middle temporal cortex. Although fewer studies were available, validating AV speech experiments were localized to other regions, such as ventral-stream visual areas in the occipital and inferior temporal cortex. These results suggest that while equivalent, complementary AV speech signals may evoke activity in regions related to the corroboration of sensory input, conflicting AV speech signals recruit widespread dorsal-stream areas likely involved in the resolution of conflicting sensory signals. PMID:24996043

  7. Processing of Speech Signals for Physical and Sensory Disabilities

    NASA Astrophysics Data System (ADS)

    Levitt, Harry

    1995-10-01

    Assistive technology involving voice communication is used primarily by people who are deaf, hard of hearing, or who have speech and/or language disabilities. It is also used to a lesser extent by people with visual or motor disabilities. A very wide range of devices has been developed for people with hearing loss. These devices can be categorized not only by the modality of stimulation [i.e., auditory, visual, tactile, or direct electrical stimulation of the auditory nerve (auditory-neural)] but also in terms of the degree of speech processing that is used. At least four such categories can be distinguished: assistive devices (a) that are not designed specifically for speech, (b) that take the average characteristics of speech into account, (c) that process articulatory or phonetic characteristics of speech, and (d) that embody some degree of automatic speech recognition. Assistive devices for people with speech and/or language disabilities typically involve some form of speech synthesis or symbol generation for severe forms of language disability. Speech synthesis is also used in text-to-speech systems for sightless persons. Other applications of assistive technology involving voice communication include voice control of wheelchairs and other devices for people with mobility disabilities.

  8. An acoustical assessment of pitch-matching accuracy in relation to speech frequency, speech frequency range, age and gender in preschool children

    NASA Astrophysics Data System (ADS)

    Trollinger, Valerie L.

    This study investigated the relationship between acoustical measurement of singing accuracy in relationship to speech fundamental frequency, speech fundamental frequency range, age and gender in preschool-aged children. Seventy subjects from Southeastern Pennsylvania; the San Francisco Bay Area, California; and Terre Haute, Indiana, participated in the study. Speech frequency was measured by having the subjects participate in spontaneous and guided speech activities with the researcher, with 18 diverse samples extracted from each subject's recording for acoustical analysis for fundamental frequency in Hz with the CSpeech computer program. The fundamental frequencies were averaged together to derive a mean speech frequency score for each subject. Speech range was calculated by subtracting the lowest fundamental frequency produced from the highest fundamental frequency produced, resulting in a speech range measured in increments of Hz. Singing accuracy was measured by having the subjects each echo-sing six randomized patterns using the pitches Middle C, D, E, F♯, G and A (440), using the solfege syllables of Do and Re, which were recorded by a 5-year-old female model. For each subject, 18 samples of singing were recorded. All samples were analyzed by the CSpeech for fundamental frequency. For each subject, deviation scores in Hz were derived by calculating the difference between what the model sang in Hz and what the subject sang in response in Hz. Individual scores for each child consisted of an overall mean total deviation frequency, mean frequency deviations for each pattern, and mean frequency deviation for each pitch. Pearson correlations, MANOVA and ANOVA analyses, Multiple Regressions and Discriminant Analysis revealed the following findings: (1) moderate but significant (p < .001) relationships emerged between mean speech frequency and the ability to sing the pitches E, F♯, G and A in the study; (2) mean speech frequency also emerged as the strongest

  9. Acoustic diagnosis of pulmonary hypertension: automated speech- recognition-inspired classification algorithm outperforms physicians.

    PubMed

    Kaddoura, Tarek; Vadlamudi, Karunakar; Kumar, Shine; Bobhate, Prashant; Guo, Long; Jain, Shreepal; Elgendi, Mohamed; Coe, James Y; Kim, Daniel; Taylor, Dylan; Tymchak, Wayne; Schuurmans, Dale; Zemp, Roger J; Adatia, Ian

    2016-01-01

    We hypothesized that an automated speech- recognition-inspired classification algorithm could differentiate between the heart sounds in subjects with and without pulmonary hypertension (PH) and outperform physicians. Heart sounds, electrocardiograms, and mean pulmonary artery pressures (mPAp) were recorded simultaneously. Heart sound recordings were digitized to train and test speech-recognition-inspired classification algorithms. We used mel-frequency cepstral coefficients to extract features from the heart sounds. Gaussian-mixture models classified the features as PH (mPAp ≥ 25 mmHg) or normal (mPAp < 25 mmHg). Physicians blinded to patient data listened to the same heart sound recordings and attempted a diagnosis. We studied 164 subjects: 86 with mPAp ≥ 25 mmHg (mPAp 41 ± 12 mmHg) and 78 with mPAp < 25 mmHg (mPAp 17 ± 5 mmHg) (p  < 0.005). The correct diagnostic rate of the automated speech-recognition-inspired algorithm was 74% compared to 56% by physicians (p = 0.005). The false positive rate for the algorithm was 34% versus 50% (p = 0.04) for clinicians. The false negative rate for the algorithm was 23% and 68% (p = 0.0002) for physicians. We developed an automated speech-recognition-inspired classification algorithm for the acoustic diagnosis of PH that outperforms physicians that could be used to screen for PH and encourage earlier specialist referral. PMID:27609672

  10. Acoustic diagnosis of pulmonary hypertension: automated speech- recognition-inspired classification algorithm outperforms physicians

    PubMed Central

    Kaddoura, Tarek; Vadlamudi, Karunakar; Kumar, Shine; Bobhate, Prashant; Guo, Long; Jain, Shreepal; Elgendi, Mohamed; Coe, James Y; Kim, Daniel; Taylor, Dylan; Tymchak, Wayne; Schuurmans, Dale; Zemp, Roger J.; Adatia, Ian

    2016-01-01

    We hypothesized that an automated speech- recognition-inspired classification algorithm could differentiate between the heart sounds in subjects with and without pulmonary hypertension (PH) and outperform physicians. Heart sounds, electrocardiograms, and mean pulmonary artery pressures (mPAp) were recorded simultaneously. Heart sound recordings were digitized to train and test speech-recognition-inspired classification algorithms. We used mel-frequency cepstral coefficients to extract features from the heart sounds. Gaussian-mixture models classified the features as PH (mPAp ≥ 25 mmHg) or normal (mPAp < 25 mmHg). Physicians blinded to patient data listened to the same heart sound recordings and attempted a diagnosis. We studied 164 subjects: 86 with mPAp ≥ 25 mmHg (mPAp 41 ± 12 mmHg) and 78 with mPAp < 25 mmHg (mPAp 17 ± 5 mmHg) (p  < 0.005). The correct diagnostic rate of the automated speech-recognition-inspired algorithm was 74% compared to 56% by physicians (p = 0.005). The false positive rate for the algorithm was 34% versus 50% (p = 0.04) for clinicians. The false negative rate for the algorithm was 23% and 68% (p = 0.0002) for physicians. We developed an automated speech-recognition-inspired classification algorithm for the acoustic diagnosis of PH that outperforms physicians that could be used to screen for PH and encourage earlier specialist referral. PMID:27609672

  11. Atmospheric influence on volcano-acoustic signals

    NASA Astrophysics Data System (ADS)

    Matoza, Robin; de Groot-Hedlin, Catherine; Hedlin, Michael; Fee, David; Garcés, Milton; Le Pichon, Alexis

    2010-05-01

    Volcanoes are natural sources of infrasound, useful for studying infrasonic propagation in the atmosphere. Large, explosive volcanic eruptions typically produce signals that can be recorded at ranges of hundreds of kilometers propagating in atmospheric waveguides. In addition, sustained volcanic eruptions can produce smaller-amplitude repetitive signals recordable at >10 km range. These include repetitive impulsive signals and continuous tremor signals. The source functions of these signals can remain relatively invariant over timescales of weeks to months. Observed signal fluctuations from such persistent sources at an infrasound recording station may therefore be attributed to dynamic atmospheric propagation effects. We present examples of repetitive and sustained volcano infrasound sources at Mount St. Helens, Washington and Kilauea Volcano, Hawaii, USA. The data recorded at >10 km range show evidence of propagation effects induced by tropospheric variability at the mesoscale and microscale. Ray tracing and finite-difference simulations of the infrasound propagation produce qualitatively consistent results. However, the finite-difference simulations indicate that low-frequency effects such as diffraction, and scattering from topography may be important factors for infrasonic propagation at this scale.

  12. Intelligibility Assessment of Ideal Binary-Masked Noisy Speech with Acceptance of Room Acoustic

    NASA Astrophysics Data System (ADS)

    Vladimír, Sedlak; Daniela, Durackova; Roman, Zalusky; Tomas, Kovacik

    2015-01-01

    In this paper the intelligibility of ideal binary-masked noisy signal is evaluated for different signal to noise ratio (SNR), mask error, masker types, distance between source and receiver, reverberation time and local criteria for forming the binary mask. The ideal binary mask is computed from time-frequency decompositions of target and masker signals by thresholding the local SNR within time-frequency units. The intelligibility of separated signal is measured using different objective measures computed in frequency and perceptual domain. The present study replicates and extends the findings which were already presented but mainly shows impact of room acoustic on the intelligibility performance of IBM technique.

  13. Discrimination and Comprehension of Synthetic Speech by Students with Visual Impairments: The Case of Similar Acoustic Patterns

    ERIC Educational Resources Information Center

    Papadopoulos, Konstantinos; Argyropoulos, Vassilios S.; Kouroupetroglou, Georgios

    2008-01-01

    This study examined the perceptions held by sighted students and students with visual impairments of the intelligibility and comprehensibility of similar acoustic patterns produced by synthetic speech. It determined the types of errors the students made and compared the performance of the two groups on auditory discrimination and comprehension.

  14. Transient Auditory Storage of Acoustic Details Is Associated with Release of Speech from Informational Masking in Reverberant Conditions

    ERIC Educational Resources Information Center

    Huang, Ying; Huang, Qiang; Chen, Xun; Wu, Xihong; Li, Liang

    2009-01-01

    Perceptual integration of the sound directly emanating from the source with reflections needs both temporal storage and correlation computation of acoustic details. We examined whether the temporal storage is frequency dependent and associated with speech unmasking. In Experiment 1, a break in correlation (BIC) between interaurally correlated…

  15. [A modified speech enhancement algorithm for electronic cochlear implant and its digital signal processing realization].

    PubMed

    Wang, Yulin; Tian, Xuelong

    2014-08-01

    In order to improve the speech quality and auditory perceptiveness of electronic cochlear implant under strong noise background, a speech enhancement system used for electronic cochlear implant front-end was constructed. Taking digital signal processing (DSP) as the core, the system combines its multi-channel buffered serial port (McBSP) data transmission channel with extended audio interface chip TLV320AIC10, so speech signal acquisition and output with high speed are realized. Meanwhile, due to the traditional speech enhancement method which has the problems as bad adaptability, slow convergence speed and big steady-state error, versiera function and de-correlation principle were used to improve the existing adaptive filtering algorithm, which effectively enhanced the quality of voice communications. Test results verified the stability of the system and the de-noising performance of the algorithm, and it also proved that they could provide clearer speech signals for the deaf or tinnitus patients. PMID:25464779

  16. [A modified speech enhancement algorithm for electronic cochlear implant and its digital signal processing realization].

    PubMed

    Wang, Yulin; Tian, Xuelong

    2014-08-01

    In order to improve the speech quality and auditory perceptiveness of electronic cochlear implant under strong noise background, a speech enhancement system used for electronic cochlear implant front-end was constructed. Taking digital signal processing (DSP) as the core, the system combines its multi-channel buffered serial port (McBSP) data transmission channel with extended audio interface chip TLV320AIC10, so speech signal acquisition and output with high speed are realized. Meanwhile, due to the traditional speech enhancement method which has the problems as bad adaptability, slow convergence speed and big steady-state error, versiera function and de-correlation principle were used to improve the existing adaptive filtering algorithm, which effectively enhanced the quality of voice communications. Test results verified the stability of the system and the de-noising performance of the algorithm, and it also proved that they could provide clearer speech signals for the deaf or tinnitus patients. PMID:25508410

  17. Comments on "Effects of Noise on Speech Production: Acoustic and Perceptual Analyses" [J. Acoust. Soc. Am. 84, 917-928 (1988)].

    PubMed

    Fitch, H

    1989-11-01

    The effect of background noise on speech production is an important issue, both from the practical standpoint of developing speech recognition algorithms and from the theoretical standpoint of understanding how speech is tuned to the environment in which it is spoken. Summers et al. [J. Acoust. Soc. Am. 84, 917-928 (1988]) address this issue by experimentally manipulating the level of noise delivered through headphones to two talkers and making several kinds of acoustic measurements on the resulting speech. They indicate that they have replicated effects on amplitude, duration, and pitch and have found effects on spectral tilt and first-formant frequency (F1). The authors regard these acoustic changes as effects in themselves rather than as consequences of a change in vocal effort, and thus treat equally the change in spectral tilt and the change in F1. In fact, the change in spectral tilt is a well-documented and understood consequence of the change in the glottal waveform, which is known to occur with increased effort. The situation with F1 is less clear and is made difficult by measurement problems. The bias in linear predictive coding (LPC) techniques related to two of the other changes-fundamental frequency and spectral tilt-is discussed. PMID:2808931

  18. Comparing the effects of reverberation and of noise on speech recognition in simulated electric-acoustic listening

    PubMed Central

    Helms Tillery, Kate; Brown, Christopher A.; Bacon, Sid P.

    2012-01-01

    Cochlear implant users report difficulty understanding speech in both noisy and reverberant environments. Electric-acoustic stimulation (EAS) is known to improve speech intelligibility in noise. However, little is known about the potential benefits of EAS in reverberation, or about how such benefits relate to those observed in noise. The present study used EAS simulations to examine these questions. Sentences were convolved with impulse responses from a model of a room whose estimated reverberation times were varied from 0 to 1 sec. These reverberated stimuli were then vocoded to simulate electric stimulation, or presented as a combination of vocoder plus low-pass filtered speech to simulate EAS. Monaural sentence recognition scores were measured in two conditions: reverberated speech and speech in a reverberated noise. The long-term spectrum and amplitude modulations of the noise were equated to the reverberant energy, allowing a comparison of the effects of the interferer (speech vs noise). Results indicate that, at least in simulation, (1) EAS provides significant benefit in reverberation; (2) the benefits of EAS in reverberation may be underestimated by those in a comparable noise; and (3) the EAS benefit in reverberation likely arises from partially preserved cues in this background accessible via the low-frequency acoustic component. PMID:22280603

  19. A Critical Examination of the Statistic Used for Processing Speech Signals.

    ERIC Educational Resources Information Center

    Knox, Keith

    This paper assesses certain properties of human mental processes by focusing on the tactics utilized in perceiving speech signals. Topics discussed in the paper include the power spectrum approach to fluctuations and noise, with particular reference to biological structures; "l/f-like" fluctuations in speech and music and the functioning of a…

  20. On the Perception of Speech Sounds as Biologically Significant Signals1,2

    PubMed Central

    Pisoni, David B.

    2012-01-01

    This paper reviews some of the major evidence and arguments currently available to support the view that human speech perception may require the use of specialized neural mechanisms for perceptual analysis. Experiments using synthetically produced speech signals with adults are briefly summarized and extensions of these results to infants and other organisms are reviewed with an emphasis towards detailing those aspects of speech perception that may require some need for specialized species-specific processors. Finally, some comments on the role of early experience in perceptual development are provided as an attempt to identify promising areas of new research in speech perception. PMID:399200

  1. Time-Frequency Feature Representation Using Multi-Resolution Texture Analysis and Acoustic Activity Detector for Real-Life Speech Emotion Recognition

    PubMed Central

    Wang, Kun-Ching

    2015-01-01

    The classification of emotional speech is mostly considered in speech-related research on human-computer interaction (HCI). In this paper, the purpose is to present a novel feature extraction based on multi-resolutions texture image information (MRTII). The MRTII feature set is derived from multi-resolution texture analysis for characterization and classification of different emotions in a speech signal. The motivation is that we have to consider emotions have different intensity values in different frequency bands. In terms of human visual perceptual, the texture property on multi-resolution of emotional speech spectrogram should be a good feature set for emotion classification in speech. Furthermore, the multi-resolution analysis on texture can give a clearer discrimination between each emotion than uniform-resolution analysis on texture. In order to provide high accuracy of emotional discrimination especially in real-life, an acoustic activity detection (AAD) algorithm must be applied into the MRTII-based feature extraction. Considering the presence of many blended emotions in real life, in this paper make use of two corpora of naturally-occurring dialogs recorded in real-life call centers. Compared with the traditional Mel-scale Frequency Cepstral Coefficients (MFCC) and the state-of-the-art features, the MRTII features also can improve the correct classification rates of proposed systems among different language databases. Experimental results show that the proposed MRTII-based feature information inspired by human visual perception of the spectrogram image can provide significant classification for real-life emotional recognition in speech. PMID:25594590

  2. Time-forward speech intelligibility in time-reversed rooms

    PubMed Central

    Longworth-Reed, Laricia; Brandewie, Eugene; Zahorik, Pavel

    2009-01-01

    The effects of time-reversed room acoustics on word recognition abilities were examined using virtual auditory space techniques, which allowed for temporal manipulation of the room acoustics independent of the speech source signals. Two acoustical conditions were tested: one in which room acoustics were simulated in a realistic time-forward fashion and one in which the room acoustics were reversed in time, causing reverberation and acoustic reflections to precede the direct-path energy. Significant decreases in speech intelligibility—from 89% on average to less than 25%—were observed between the time-forward and time-reversed rooms. This result is not predictable using standard methods for estimating speech intelligibility based on the modulation transfer function of the room. It may instead be due to increased degradation of onset information in the speech signals when room acoustics are time-reversed. PMID:19173377

  3. Somatosensory basis of speech production.

    PubMed

    Tremblay, Stéphanie; Shiller, Douglas M; Ostry, David J

    2003-06-19

    The hypothesis that speech goals are defined acoustically and maintained by auditory feedback is a central idea in speech production research. An alternative proposal is that speech production is organized in terms of control signals that subserve movements and associated vocal-tract configurations. Indeed, the capacity for intelligible speech by deaf speakers suggests that somatosensory inputs related to movement play a role in speech production-but studies that might have documented a somatosensory component have been equivocal. For example, mechanical perturbations that have altered somatosensory feedback have simultaneously altered acoustics. Hence, any adaptation observed under these conditions may have been a consequence of acoustic change. Here we show that somatosensory information on its own is fundamental to the achievement of speech movements. This demonstration involves a dissociation of somatosensory and auditory feedback during speech production. Over time, subjects correct for the effects of a complex mechanical load that alters jaw movements (and hence somatosensory feedback), but which has no measurable or perceptible effect on acoustic output. The findings indicate that the positions of speech articulators and associated somatosensory inputs constitute a goal of speech movements that is wholly separate from the sounds produced. PMID:12815431

  4. Recognition of Emotions in Mexican Spanish Speech: An Approach Based on Acoustic Modelling of Emotion-Specific Vowels

    PubMed Central

    Caballero-Morales, Santiago-Omar

    2013-01-01

    An approach for the recognition of emotions in speech is presented. The target language is Mexican Spanish, and for this purpose a speech database was created. The approach consists in the phoneme acoustic modelling of emotion-specific vowels. For this, a standard phoneme-based Automatic Speech Recognition (ASR) system was built with Hidden Markov Models (HMMs), where different phoneme HMMs were built for the consonants and emotion-specific vowels associated with four emotional states (anger, happiness, neutral, sadness). Then, estimation of the emotional state from a spoken sentence is performed by counting the number of emotion-specific vowels found in the ASR's output for the sentence. With this approach, accuracy of 87–100% was achieved for the recognition of emotional state of Mexican Spanish speech. PMID:23935410

  5. Modulation of Radio Frequency Signals by Nonlinearly Generated Acoustic Fields

    NASA Astrophysics Data System (ADS)

    Johnson, Spencer Joseph

    Acousto-electromagnetic scattering is a process in which an acoustic excitation is utilized to induce modulation on an electromagnetic (EM) wave. This phenomenon can be exploited in remote sensing and detection schemes whereby target objects are mechanically excited by high powered acoustic waves resulting in unique object characterizations when interrogated with EM signals. Implementation of acousto-EM sensing schemes, however, are limited by a lack of fundamental understanding of the nonlinear interaction between acoustic and EM waves and inefficient simulation methods in the determination of the radiation patterns of higher order scattered acoustic fields. To address the insufficient simulation issue, a computationally efficient mathematical model describing higher order scattered sound fields, particularly of third-order in which a 40x increase in computation speed is achieved, is derived using a multi-Gaussian beam (MGB) expansion that expresses the sound field of any arbitrary axially symmetric beam as a series of Gaussian base functions. The third-order intermodulation (IM3) frequency components are produced by considering the cascaded nonlinear second-order effects when analyzing the interaction between the first- and second-order frequency components during the nonlinear scattering of sound by sound from two noncollinear ultrasonic baffled piston sources. The theory is extended to the modeling of the sound beams generated by parametric transducer arrays, showing that the MGB model can be efficiently used to calculate both the second- and third-order sound fields of the array. Additionally, a near-to-far-field (NTFF) transformation method is developed to model the far-field characteristics of scattered sound fields, extending Kirchhoff's theorem, typically applied to EM waves, determining the far-field patterns of an acoustic source from amplitude and phase measurements made in the near-field by including the higher order sound fields generated by the

  6. The influence of phonemic awareness development on acoustic cue weighting strategies in children's speech perception.

    PubMed

    Mayo, Catherine; Scobbie, James M; Hewlett, Nigel; Waters, Daphne

    2003-10-01

    In speech perception, children give particular patterns of weight to different acoustic cues (their cue weighting). These patterns appear to change with increased linguistic experience. Previous speech perception research has found a positive correlation between more analytical cue weighting strategies and the ability to consciously think about and manipulate segment-sized units (phonemic awareness). That research did not, however, aim to address whether the relation is in any way causal or, if so, then in which direction possible causality might move. Causality in this relation could move in 1 of 2 ways: Either phonemic awareness development could impact on cue weighting strategies or changes in cue weighting could allow for the later development of phonemic awareness. The aim of this study was to follow the development of these 2 processes longitudinally to determine which of the above 2 possibilities was more likely. Five-year-old children were tested 3 times in 7 months on their cue weighting strategies for a /so/-/[symbol in text]o/ contrast, in which the 2 cues manipulated were the frequency of fricative spectrum and the frequency of vowel-onset formant transitions. The children were also tested at the same time on their phoneme segmentation and phoneme blending skills. Results showed that phonemic awareness skills tended to improve before cue weighting changed and that early phonemic awareness ability predicted later cue weighting strategies. These results suggest that the development of metaphonemic awareness may play some role in changes in cue weighting. PMID:14575351

  7. Acoustic markers of prominence influence infants' and adults' segmentation of speech sequences.

    PubMed

    Bion, Ricardo A H; Benavides-Varela, Silvia; Nespor, Marina

    2011-03-01

    Two experiments investigated the way acoustic markers of prominence influence the grouping of speech sequences by adults and 7-month-old infants. In the first experiment, adults were familiarized with and asked to memorize sequences of adjacent syllables that alternated in either pitch or duration. During the test phase, participants heard pairs of syllables with constant pitch and duration and were asked whether the syllables had appeared adjacently during familiarization. Adults were better at remembering pairs of syllables that during familiarization had short syllables preceding long syllables, or high-pitched syllables preceding low-pitched syllables. In the second experiment, infants were familiarized and tested with similar stimuli as in the first experiment, and their preference for pairs of syllables was accessed using the head-turn preference paradigm.When familiarized with syllables alternating in pitch, infants showed a preference to listen to pairs of syllables that had high pitch in the first syllable. However, no preference was found when the familiarization stream alternated in duration. It is proposed that these perceptual biases help infants and adults find linguistic units in the continuous speech stream.While the bias for grouping based on pitch appears early in development, biases for durational grouping might rely on more extensive linguistic experience. PMID:21524015

  8. Statistical evidence that musical universals derive from the acoustic characteristics of human speech

    NASA Astrophysics Data System (ADS)

    Schwartz, David; Howe, Catharine; Purves, Dale

    2003-04-01

    Listeners of all ages and societies produce a similar consonance ordering of chromatic scale tone combinations. Despite intense interest in this perceptual phenomenon over several millennia, it has no generally accepted explanation in physical, psychological, or physiological terms. Here we show that the musical universal of consonance ordering can be understood in terms of the statistical relationship between a pattern of sound pressure at the ear and the possible generative sources of the acoustic energy pattern. Since human speech is the principal naturally occurring source of tone-evoking (i.e., periodic) sound energy for human listeners, we obtained normalized spectra from more than 100000 recorded speech segments. The probability distribution of amplitude/frequency combinations derived from these spectra predicts both the fundamental frequency ratios that define the chromatic scale intervals and the consonance ordering of chromatic scale tone combinations. We suggest that these observations reveal the statistical character of the perceptual process by which the auditory system guides biologically successful behavior in response to inherently ambiguous sound stimuli.

  9. Very low-frequency signals support perceptual organization of implant-simulated speech for adults and children

    PubMed Central

    Nittrouer, Susan; Tarr, Eric; Bolster, Virginia; Caldwell-Tarr, Amanda; Moberly, Aaron C.; Lowenstein, Joanna H.

    2014-01-01

    Objective Using signals processed to simulate speech received through cochlear implants and low-frequency extended hearing aids, this study examined the proposal that low-frequency signals facilitate the perceptual organization of broader, spectrally degraded signals. Design In two experiments, words and sentences were presented in diotic and dichotic configurations as four-channel noise-vocoded signals (VOC-only), and as those signals combined with the acoustic signal below 250 Hz (LOW-plus). Dependent measures were percent correct recognition scores, and the difference between scores for the two processing conditions given as proportions of recognition scores for VOC-only. The influence of linguistic context was also examined. Study Sample Participants had normal hearing. In all, 40 adults, 40 7-year-olds, and 20 5-year-olds participated. Results Participants of all ages showed benefits of adding the low-frequency signal. The effect was greater for sentences than words, but no effect of configuration was found. The influence of linguistic context was similar across age groups, and did not contribute to the low-frequency effect. Listeners who scored more poorly with VOC-only stimuli showed greater low-frequency effects. Conclusion The benefit of adding a very low-frequency signal to a broader, spectrally degraded signal seems to derive from its facilitative influence on perceptual organization of the sensory input. PMID:24456179

  10. Elicitation of the Acoustic Change Complex to Long-Duration Speech Stimuli in Four-Month-Old Infants.

    PubMed

    Chen, Ke Heng; Small, Susan A

    2015-01-01

    The acoustic change complex (ACC) is an auditory-evoked potential elicited to changes within an ongoing stimulus that indicates discrimination at the level of the auditory cortex. Only a few studies to date have attempted to record ACCs in young infants. The purpose of the present study was to investigate the elicitation of ACCs to long-duration speech stimuli in English-learning 4-month-old infants. ACCs were elicited to consonant contrasts made up of two concatenated speech tokens. The stimuli included native dental-dental /dada/ and dental-labial /daba/ contrasts and a nonnative Hindi dental-retroflex /daDa/ contrast. Each consonant-vowel speech token was 410 ms in duration. Slow cortical responses were recorded to the onset of the stimulus and to the acoustic change from /da/ to either /ba/ or /Da/ within the stimulus with significantly prolonged latencies compared with adults. ACCs were reliably elicited for all stimulus conditions with more robust morphology compared with our previous findings using stimuli that were shorter in duration. The P1 amplitudes elicited to the acoustic change in /daba/ and /daDa/ were significantly larger compared to /dada/ supporting that the brain discriminated between the speech tokens. These findings provide further evidence for the use of ACCs as an index of discrimination ability. PMID:26798343

  11. Elicitation of the Acoustic Change Complex to Long-Duration Speech Stimuli in Four-Month-Old Infants

    PubMed Central

    Chen, Ke Heng; Small, Susan A.

    2015-01-01

    The acoustic change complex (ACC) is an auditory-evoked potential elicited to changes within an ongoing stimulus that indicates discrimination at the level of the auditory cortex. Only a few studies to date have attempted to record ACCs in young infants. The purpose of the present study was to investigate the elicitation of ACCs to long-duration speech stimuli in English-learning 4-month-old infants. ACCs were elicited to consonant contrasts made up of two concatenated speech tokens. The stimuli included native dental-dental /dada/ and dental-labial /daba/ contrasts and a nonnative Hindi dental-retroflex /daDa/ contrast. Each consonant-vowel speech token was 410 ms in duration. Slow cortical responses were recorded to the onset of the stimulus and to the acoustic change from /da/ to either /ba/ or /Da/ within the stimulus with significantly prolonged latencies compared with adults. ACCs were reliably elicited for all stimulus conditions with more robust morphology compared with our previous findings using stimuli that were shorter in duration. The P1 amplitudes elicited to the acoustic change in /daba/ and /daDa/ were significantly larger compared to /dada/ supporting that the brain discriminated between the speech tokens. These findings provide further evidence for the use of ACCs as an index of discrimination ability. PMID:26798343

  12. The effect of different open plan and enclosed classroom acoustic conditions on speech perception in Kindergarten children.

    PubMed

    Mealings, Kiri T; Demuth, Katherine; Buchholz, Jörg M; Dillon, Harvey

    2015-10-01

    Open plan classrooms, where several classes are in the same room, have recently re-emerged in Australian primary schools. This paper explores how the acoustics of four Kindergarten classrooms [an enclosed classroom (25 children), double classroom (44 children), fully open plan triple classroom (91 children), and a semi-open plan K-6 "21st century learning space" (205 children)] affect speech perception. Twenty-two to 23 5-6-year-old children in each classroom participated in an online four-picture choice speech perception test while adjacent classes engaged in quiet versus noisy activities. The noise levels recorded during the test were higher the larger the classroom, except in the noisy condition for the K-6 classroom, possibly due to acoustic treatments. Linear mixed effects models revealed children's performance accuracy and speed decreased as noise level increased. Additionally, children's speech perception abilities decreased the further away they were seated from the loudspeaker in noise levels above 50 dBA. These results suggest that fully open plan classrooms are not appropriate learning environments for critical listening activities with young children due to their high intrusive noise levels which negatively affect speech perception. If open plan classrooms are desired, they need to be acoustically designed to be appropriate for critical listening activities. PMID:26520328

  13. Compensation for Coarticulation: Disentangling Auditory and Gestural Theories of Perception of Coarticulatory Effects in Speech

    ERIC Educational Resources Information Center

    Viswanathan, Navin; Magnuson, James S.; Fowler, Carol A.

    2010-01-01

    According to one approach to speech perception, listeners perceive speech by applying general pattern matching mechanisms to the acoustic signal (e.g., Diehl, Lotto, & Holt, 2004). An alternative is that listeners perceive the phonetic gestures that structured the acoustic signal (e.g., Fowler, 1986). The two accounts have offered different…

  14. Can acoustic vowel space predict the habitual speech rate of the speaker?

    PubMed

    Tsao, Y-C; Iqbal, K

    2005-01-01

    This study aims to find whether the acoustic vowel space reflect the habitual speaking rate of the speaker. The vowel space is defined as the area of the quadrilateral formed by the four corner vowels (i.e.,/i/,/æ/,/u/,/α) in the F1F2- 2 plane. The study compares the acoustic vowel space in the speech of habitually slow and fast talkers and further analyzes them by gender. In addition to the measurement of vowel duration and midpoint frequencies of F1 and F2, the F1/F2 vowel space areas were measured and compared across speakers. The results indicate substantial overlap in vowel space area functions between slow and fast talkers, though the slow speakers were found to have larger vowel spaces. Furthermore, large variability in vowel space area functions was noted among interspeakers in each group. Both F1 and F2 formant frequencies were found to be gender sensitive in consistence with the existing data. No predictive relation between vowel duration and formant frequencies was observed among speakers. PMID:17282413

  15. Speaker height estimation from speech: Fusing spectral regression and statistical acoustic models.

    PubMed

    Hansen, John H L; Williams, Keri; Bořil, Hynek

    2015-08-01

    Estimating speaker height can assist in voice forensic analysis and provide additional side knowledge to benefit automatic speaker identification or acoustic model selection for automatic speech recognition. In this study, a statistical approach to height estimation that incorporates acoustic models within a non-uniform height bin width Gaussian mixture model structure as well as a formant analysis approach that employs linear regression on selected phones are presented. The accuracy and trade-offs of these systems are explored by examining the consistency of the results, location, and causes of error as well a combined fusion of the two systems using data from the TIMIT corpus. Open set testing is also presented using the Multi-session Audio Research Project corpus and publicly available YouTube audio to examine the effect of channel mismatch between training and testing data and provide a realistic open domain testing scenario. The proposed algorithms achieve a highly competitive performance to previously published literature. Although the different data partitioning in the literature and this study may prevent performance comparisons in absolute terms, the mean average error of 4.89 cm for males and 4.55 cm for females provided by the proposed algorithm on TIMIT utterances containing selected phones suggest a considerable estimation error decrease compared to past efforts. PMID:26328721

  16. Effects of acoustic variability in the perceptual learning of non-native-accented speech sounds.

    PubMed

    Wade, Travis; Jongman, Allard; Sereno, Joan

    2007-01-01

    This study addressed whether acoustic variability and category overlap in non-native speech contribute to difficulty in its recognition, and more generally whether the benefits of exposure to acoustic variability during categorization training are stable across differences in category confusability. Three experiments considered a set of Spanish-accented English productions. The set was seen to pose learning and recognition difficulty (experiment 1) and was more variable and confusable than a parallel set of native productions (experiment 2). A training study (experiment 3) probed the relative contributions of category central tendency and variability to difficulty in vowel identification using derived inventories in which these dimensions were manipulated based on the results of experiments 1 and 2. Training and test difficulty related straightforwardly to category confusability but not to location in the vowel space. Benefits of high-variability exposure also varied across vowel categories, and seemed to be diminished for highly confusable vowels. Overall, variability was implicated in perception and learning difficulty in ways that warrant further investigation. PMID:17914280

  17. Wideband link-budget analysis for undersea acoustic signaling

    NASA Astrophysics Data System (ADS)

    Rice, Joseph A.; Hansen, Joseph T.

    2002-11-01

    Link-budget analysis is commonly applied to satellite and wireless communications for estimating the signal-to-noise ratio (SNR) at the receiver. Link-budget analysis considers transmitter power, transmitter antenna gain, channel losses, channel noise, and receiver antenna gain. For underwater signaling, the terms of the sonar equation readily translate to a formulation of the link budget. However, the strong frequency dependence of underwater acoustic propagation requires special consideration, and is represented as an intermediate result called the channel SNR. The channel SNR includes ambient-noise and transmission-loss components. Several acoustic communication and navigation problems are addressed through wideband link-budget analyses. [Work sponsored by ONR 321.

  18. Signal processing methodologies for an acoustic fetal heart rate monitor

    NASA Technical Reports Server (NTRS)

    Pretlow, Robert A., III; Stoughton, John W.

    1992-01-01

    Research and development is presented of real time signal processing methodologies for the detection of fetal heart tones within a noise-contaminated signal from a passive acoustic sensor. A linear predictor algorithm is utilized for detection of the heart tone event and additional processing derives heart rate. The linear predictor is adaptively 'trained' in a least mean square error sense on generic fetal heart tones recorded from patients. A real time monitor system is described which outputs to a strip chart recorder for plotting the time history of the fetal heart rate. The system is validated in the context of the fetal nonstress test. Comparisons are made with ultrasonic nonstress tests on a series of patients. Comparative data provides favorable indications of the feasibility of the acoustic monitor for clinical use.

  19. A matched filter algorithm for acoustic signal detection

    NASA Astrophysics Data System (ADS)

    Jordan, D. W.

    1985-06-01

    This thesis is a presentation of several alternative acoustic filter designs which allow Space Shuttle payload experiment initiation prior to launch. This initiation is accomplished independently of any spacecraft services by means of a matched band-pass filter tuned to the acoustic signal characteristic of the Auxiliary Power Unit (APU) which is brought up to operating RPM's approximately five minutes prior to launch. These alternative designs include an analog filter built around operational amplifiers, a digital IIR design implemented with an INTEL 2920 Signal Processor, and an Adaptive FIR Weiner design. Working prototypes of the first two filters are developed and a discussion of the advantage of the 2920 digital design is presented.

  20. Modeling of acoustic emission signal propagation in waveguides.

    PubMed

    Zelenyak, Andreea-Manuela; Hamstad, Marvin A; Sause, Markus G R

    2015-01-01

    Acoustic emission (AE) testing is a widely used nondestructive testing (NDT) method to investigate material failure. When environmental conditions are harmful for the operation of the sensors, waveguides are typically mounted in between the inspected structure and the sensor. Such waveguides can be built from different materials or have different designs in accordance with the experimental needs. All these variations can cause changes in the acoustic emission signals in terms of modal conversion, additional attenuation or shift in frequency content. A finite element method (FEM) was used to model acoustic emission signal propagation in an aluminum plate with an attached waveguide and was validated against experimental data. The geometry of the waveguide is systematically changed by varying the radius and height to investigate the influence on the detected signals. Different waveguide materials were implemented and change of material properties as function of temperature were taken into account. Development of the option of modeling different waveguide options replaces the time consuming and expensive trial and error alternative of experiments. Thus, the aim of this research has important implications for those who use waveguides for AE testing. PMID:26007731

  1. Modeling of Acoustic Emission Signal Propagation in Waveguides

    PubMed Central

    Zelenyak, Andreea-Manuela; Hamstad, Marvin A.; Sause, Markus G. R.

    2015-01-01

    Acoustic emission (AE) testing is a widely used nondestructive testing (NDT) method to investigate material failure. When environmental conditions are harmful for the operation of the sensors, waveguides are typically mounted in between the inspected structure and the sensor. Such waveguides can be built from different materials or have different designs in accordance with the experimental needs. All these variations can cause changes in the acoustic emission signals in terms of modal conversion, additional attenuation or shift in frequency content. A finite element method (FEM) was used to model acoustic emission signal propagation in an aluminum plate with an attached waveguide and was validated against experimental data. The geometry of the waveguide is systematically changed by varying the radius and height to investigate the influence on the detected signals. Different waveguide materials were implemented and change of material properties as function of temperature were taken into account. Development of the option of modeling different waveguide options replaces the time consuming and expensive trial and error alternative of experiments. Thus, the aim of this research has important implications for those who use waveguides for AE testing. PMID:26007731

  2. Emotional recognition from the speech signal for a virtual education agent

    NASA Astrophysics Data System (ADS)

    Tickle, A.; Raghu, S.; Elshaw, M.

    2013-06-01

    This paper explores the extraction of features from the speech wave to perform intelligent emotion recognition. A feature extract tool (openSmile) was used to obtain a baseline set of 998 acoustic features from a set of emotional speech recordings from a microphone. The initial features were reduced to the most important ones so recognition of emotions using a supervised neural network could be performed. Given that the future use of virtual education agents lies with making the agents more interactive, developing agents with the capability to recognise and adapt to the emotional state of humans is an important step.

  3. INSTRUMENTATION FOR SURVEYING ACOUSTIC SIGNALS IN NATURAL GAS TRANSMISSION LINES

    SciTech Connect

    John L. Loth; Gary J. Morris; George M. Palmer; Richard Guiler; Deepak Mehra

    2003-09-01

    In the U.S. natural gas is distributed through more than one million miles of high-pressure transmission pipelines. If all leaks and infringements could be detected quickly, it would enhance safety and U.S. energy security. Only low frequency acoustic waves appear to be detectable over distances up to 60 km where pipeline shut-off valves provide access to the inside of the pipeline. This paper describes a Portable Acoustic Monitoring Package (PAMP) developed to record and identify acoustic signals characteristic of: leaks, pump noise, valve and flow metering noise, third party infringement, manual pipeline water and gas blow-off, etc. This PAMP consists of a stainless steel 1/2 inch NPT plumbing tree rated for use on 1000 psi pipelines. Its instrumentation is designed to measure acoustic waves over the entire frequency range from zero to 16,000 Hz by means of four instruments: (1) microphone, (2) 3-inch water full range differential pressure transducer with 0.1% of range sensitivity, (3) a novel 3 inch to 100 inch water range amplifier, using an accumulator with needle valve and (4) a line-pressure transducer. The weight of the PAMP complete with all accessories is 36 pounds. This includes a remote control battery/switch box assembly on a 25-foot extension chord, a laptop data acquisition computer on a field table and a sun shield.

  4. Speech transmission index from running speech: A neural network approach

    NASA Astrophysics Data System (ADS)

    Li, F. F.; Cox, T. J.

    2003-04-01

    Speech transmission index (STI) is an important objective parameter concerning speech intelligibility for sound transmission channels. It is normally measured with specific test signals to ensure high accuracy and good repeatability. Measurement with running speech was previously proposed, but accuracy is compromised and hence applications limited. A new approach that uses artificial neural networks to accurately extract the STI from received running speech is developed in this paper. Neural networks are trained on a large set of transmitted speech examples with prior knowledge of the transmission channels' STIs. The networks perform complicated nonlinear function mappings and spectral feature memorization to enable accurate objective parameter extraction from transmitted speech. Validations via simulations demonstrate the feasibility of this new method on a one-net-one-speech extract basis. In this case, accuracy is comparable with normal measurement methods. This provides an alternative to standard measurement techniques, and it is intended that the neural network method can facilitate occupied room acoustic measurements.

  5. Laser-induced thermal acoustics (LITA) signals from finite beams

    NASA Astrophysics Data System (ADS)

    Cummings, E. B.; Leyva, I. A.; Hornung, H. G.

    1995-06-01

    Laser-induced thermal acoustics (LITA) is a four-wave mixing technique that may be employed to measure sound speeds, transport properties, velocities, and susceptibilities of fluids. It is particularly effective in high-pressure gases ( greater than 1 bar). An analytical expression for LITA signals is derived by the use of linearized equations of hydrodynamics and light scattering. This analysis, which includes full finite-beam-size effects and the optoacoustic effects of thermalization and electrostriction, predicts the amplitude and the time history of narrow-band time-resolved LITA and broadband spectrally resolved (mulitplex) LITA signals. The time behavior of the detected LITA signal depends significantly on the detection solid angle, with implications for the measurement of diffusivities by the use of LITA and the proper physical picture of LITA scattering. This and other elements of the physics of LITA that emerge from the analysis are discussed. Theoretical signals are compared with experimental LITA data.

  6. Joint Spatial-Spectral Feature Space Clustering for Speech Activity Detection from ECoG Signals

    PubMed Central

    Kanas, Vasileios G.; Mporas, Iosif; Benz, Heather L.; Sgarbas, Kyriakos N.; Bezerianos, Anastasios; Crone, Nathan E.

    2014-01-01

    Brain machine interfaces for speech restoration have been extensively studied for more than two decades. The success of such a system will depend in part on selecting the best brain recording sites and signal features corresponding to speech production. The purpose of this study was to detect speech activity automatically from electrocorticographic signals based on joint spatial-frequency clustering of the ECoG feature space. For this study, the ECoG signals were recorded while a subject performed two different syllable repetition tasks. We found that the optimal frequency resolution to detect speech activity from ECoG signals was 8 Hz, achieving 98.8% accuracy by employing support vector machines (SVM) as a classifier. We also defined the cortical areas that held the most information about the discrimination of speech and non-speech time intervals. Additionally, the results shed light on the distinct cortical areas associated with the two syllable repetition tasks and may contribute to the development of portable ECoG-based communication. PMID:24658248

  7. A Robust Approach For Acoustic Noise Suppression In Speech Using ANFIS

    NASA Astrophysics Data System (ADS)

    Martinek, Radek; Kelnar, Michal; Vanus, Jan; Bilik, Petr; Zidek, Jan

    2015-11-01

    The authors of this article deals with the implementation of a combination of techniques of the fuzzy system and artificial intelligence in the application area of non-linear noise and interference suppression. This structure used is called an Adaptive Neuro Fuzzy Inference System (ANFIS). This system finds practical use mainly in audio telephone (mobile) communication in a noisy environment (transport, production halls, sports matches, etc). Experimental methods based on the two-input adaptive noise cancellation concept was clearly outlined. Within the experiments carried out, the authors created, based on the ANFIS structure, a comprehensive system for adaptive suppression of unwanted background interference that occurs in audio communication and degrades the audio signal. The system designed has been tested on real voice signals. This article presents the investigation and comparison amongst three distinct approaches to noise cancellation in speech; they are LMS (least mean squares) and RLS (recursive least squares) adaptive filtering and ANFIS. A careful review of literatures indicated the importance of non-linear adaptive algorithms over linear ones in noise cancellation. It was concluded that the ANFIS approach had the overall best performance as it efficiently cancelled noise even in highly noise-degraded speech. Results were drawn from the successful experimentation, subjective-based tests were used to analyse their comparative performance while objective tests were used to validate them. Implementation of algorithms was experimentally carried out in Matlab to justify the claims and determine their relative performances.

  8. Acoustic Source Characteristics, Across-Formant Integration, and Speech Intelligibility Under Competitive Conditions

    PubMed Central

    2015-01-01

    An important aspect of speech perception is the ability to group or select formants using cues in the acoustic source characteristics—for example, fundamental frequency (F0) differences between formants promote their segregation. This study explored the role of more radical differences in source characteristics. Three-formant (F1+F2+F3) synthetic speech analogues were derived from natural sentences. In Experiment 1, F1+F3 were generated by passing a harmonic glottal source (F0 = 140 Hz) through second-order resonators (H1+H3); in Experiment 2, F1+F3 were tonal (sine-wave) analogues (T1+T3). F2 could take either form (H2 or T2). In some conditions, the target formants were presented alone, either monaurally or dichotically (left ear = F1+F3; right ear = F2). In others, they were accompanied by a competitor for F2 (F1+F2C+F3; F2), which listeners must reject to optimize recognition. Competitors (H2C or T2C) were created using the time-reversed frequency and amplitude contours of F2. Dichotic presentation of F2 and F2C ensured that the impact of the competitor arose primarily through informational masking. In the absence of F2C, the effect of a source mismatch between F1+F3 and F2 was relatively modest. When F2C was present, intelligibility was lowest when F2 was tonal and F2C was harmonic, irrespective of which type matched F1+F3. This finding suggests that source type and context, rather than similarity, govern the phonetic contribution of a formant. It is proposed that wideband harmonic analogues are more effective informational maskers than narrowband tonal analogues, and so become dominant in across-frequency integration of phonetic information when placed in competition. PMID:25751040

  9. Acoustic source characteristics, across-formant integration, and speech intelligibility under competitive conditions.

    PubMed

    Roberts, Brian; Summers, Robert J; Bailey, Peter J

    2015-06-01

    An important aspect of speech perception is the ability to group or select formants using cues in the acoustic source characteristics--for example, fundamental frequency (F0) differences between formants promote their segregation. This study explored the role of more radical differences in source characteristics. Three-formant (F1+F2+F3) synthetic speech analogues were derived from natural sentences. In Experiment 1, F1+F3 were generated by passing a harmonic glottal source (F0 = 140 Hz) through second-order resonators (H1+H3); in Experiment 2, F1+F3 were tonal (sine-wave) analogues (T1+T3). F2 could take either form (H2 or T2). In some conditions, the target formants were presented alone, either monaurally or dichotically (left ear = F1+F3; right ear = F2). In others, they were accompanied by a competitor for F2 (F1+F2C+F3; F2), which listeners must reject to optimize recognition. Competitors (H2C or T2C) were created using the time-reversed frequency and amplitude contours of F2. Dichotic presentation of F2 and F2C ensured that the impact of the competitor arose primarily through informational masking. In the absence of F2C, the effect of a source mismatch between F1+F3 and F2 was relatively modest. When F2C was present, intelligibility was lowest when F2 was tonal and F2C was harmonic, irrespective of which type matched F1+F3. This finding suggests that source type and context, rather than similarity, govern the phonetic contribution of a formant. It is proposed that wideband harmonic analogues are more effective informational maskers than narrowband tonal analogues, and so become dominant in across-frequency integration of phonetic information when placed in competition. PMID:25751040

  10. Visual signal processing, speechreading, and related issues

    NASA Astrophysics Data System (ADS)

    Levitt, Harry

    2003-06-01

    People with hearing loss make use of visual speech cues to supplement the impoverished speech signal. This process, known as speechreading (or lipreading) can be very effective because of the complementary nature of auditory and visual speech cues. Despite the importance of visual speech cues (for both normal-hearing and hearing-impaired people) research on the visual characteristics of speech has lagged behind research on the acoustic characteristics of speech. The field of acoustic phonetics benefited substantially from the availability of powerful techniques for acoustic signal analysis. The substantial, recent advances in optical signal processing have opened up new vistas for visual speech analysis analogous to the way technological innovation revolutionized the field of acoustic phonetics. This paper describes several experiments in the emerging field of optic phonetics.

  11. Effects of a music therapy voice protocol on speech intelligibility, vocal acoustic measures, and mood of individuals with Parkinson's disease.

    PubMed

    Haneishi, E

    2001-01-01

    This study examined the effects of a Music Therapy Voice Protocol (MTVP) on speech intelligibility, vocal intensity, maximum vocal range, maximum duration of sustained vowel phonation, vocal fundamental frequency, vocal fundamental frequency variability, and mood of individuals with Parkinson's disease. Four female patients, who demonstrated voice and speech problems, served as their own controls and participated in baseline assessment (study pretest), a series of MTVP sessions involving vocal and singing exercises, and final evaluation (study posttest). In study pre and posttests, data for speech intelligibility and all acoustic variables were collected. Statistically significant increases were found in speech intelligibility, as rated by caregivers, and in vocal intensity from study pretest to posttest as the results of paired samples t-tests. In addition, before and after each MTVP session (session pre and posttests), self-rated mood scores and selected acoustic variables were collected. No significant differences were found in any of the variables from the session pretests to posttests, across the entire treatment period, or their interactions as the results of two-way ANOVAs with repeated measures. Although not significant, the mean of mood scores in session posttests (M = 8.69) was higher than that in session pretests (M = 7.93). PMID:11796078

  12. Calculation of selective filters of a device for primary analysis of speech signals

    NASA Astrophysics Data System (ADS)

    Chudnovskii, L. S.; Ageev, V. M.

    2014-07-01

    The amplitude-frequency responses of filters for primary analysis of speech signals, which have a low quality factor and a high rolloff factor in the high-frequency range, are calculated using the linear theory of speech production and psychoacoustic measurement data. The frequency resolution of the filter system for a sinusoidal signal is 40-200 Hz. The modulation-frequency resolution of amplitude- and frequency-modulated signals is 3-6 Hz. The aforementioned features of the calculated filters are close to the amplitudefrequency responses of biological auditory systems at the level of the eighth nerve.

  13. Automatic speech recognition

    NASA Astrophysics Data System (ADS)

    Espy-Wilson, Carol

    2005-04-01

    Great strides have been made in the development of automatic speech recognition (ASR) technology over the past thirty years. Most of this effort has been centered around the extension and improvement of Hidden Markov Model (HMM) approaches to ASR. Current commercially-available and industry systems based on HMMs can perform well for certain situational tasks that restrict variability such as phone dialing or limited voice commands. However, the holy grail of ASR systems is performance comparable to humans-in other words, the ability to automatically transcribe unrestricted conversational speech spoken by an infinite number of speakers under varying acoustic environments. This goal is far from being reached. Key to the success of ASR is effective modeling of variability in the speech signal. This tutorial will review the basics of ASR and the various ways in which our current knowledge of speech production, speech perception and prosody can be exploited to improve robustness at every level of the system.

  14. Low-Frequency Acoustic Signals Propagation in Buried Pipelines

    NASA Astrophysics Data System (ADS)

    Ovchinnikov, A. L.; Lapshin, B. M.

    2016-01-01

    The article deals with the issues concerning acoustic signals propagation in the large-diameter oil pipelines caused by mechanical action on the pipe body. Various mechanisms of signals attenuation are discussed. It is shown that the calculation of the attenuation caused only by internal energy loss, i.e, the presence of viscosity, thermal conductivity and liquid pipeline wall friction lead to low results. The results of experimental studies, carried out using the existing pipeline with a diameter of 1200 mm. are shown. It is experimentally proved that the main mechanism of signal attenuation is the energy emission into the environment. The numerical values of attenuation coefficients that are 0,14- 0.18 dB/m for the pipeline of 1200 mm in diameter, in the frequency range from 50 Hz to 500 Hz, are determined.

  15. Critique: auditory form and gestural topology in the perception of speech.

    PubMed

    Remez, R E

    1996-03-01

    Some influential accounts of speech perception have asserted that the goal of perception is to recover the articulatory gestures that create the acoustic signal, while others have proposed that speech perception proceeds by a method of acoustic categorization of signal elements. These accounts have been frustrated by difficulties in identifying a set of primitive articulatory constituents underlying speech production, and a set of primitive acoustic-auditory elements underlying speech perception. An argument by Lindblom favors an account of production and perception based on the auditory form of speech and its cognitive elaboration, rejecting the aim of defining a set of articulatory primitives by appealing to theoretical principle, while recognizing the empirical difficulty of identifying a set of acoustic or auditory primitives. An examination of this thesis found opportunities to defend some of its conclusions with independent evidence, but favors a characterization of the constituents of speech perception as linguistic rather than as articulatory or acoustic. PMID:8964930

  16. Identifying fatigue crack geometric features from acoustic emission signals

    NASA Astrophysics Data System (ADS)

    Bao, Jingjing; Poddar, Banibrata; Giurgiutiu, Victor

    2016-04-01

    Acoustic emission (AE) caused by the growth of fatigue crack were well studied by researchers. Conventional approaches predominantly are based on statistical analysis. In this study we focus on identifying geometric features of the crack from the AE signals using physics based approach. One of the main challenges of this approach is to develop a physics of materials based understanding of the generation and propagation of acoustic emissions due to the growth of a fatigue crack. As the geometry changes due to the crack growth, so does the local vibration modes around the crack. Our aim is to understand these changing local vibration modes and find possible relation between the AE signal features and the geometric features of the crack. Finite element (FE) analysis was used to model AE events due to fatigue crack growth. This was done using dipole excitation at the crack tips. Harmonic analysis was also performed on these FE models to understand the local vibration modes. Experimental study was carried out to verify these results. Piezoelectric wafer active sensors (PWAS) were used to excite cracked specimen and the local vibration modes were captured using laser Doppler vibrometry. The preliminary results show that the AE signals do carry the information related to the crack geometry.

  17. Supervised Single-Channel Speech Separation via Sparse Decomposition Using Periodic Signal Models

    NASA Astrophysics Data System (ADS)

    Nakashizuka, Makoto; Okumura, Hiroyuki; Iiguni, Youji

    In this paper, we propose a method for supervised single-channel speech separation through sparse decomposition using periodic signal models. The proposed separation method employs sparse decomposition, which decomposes a signal into a set of periodic signals under a sparsity penalty. In order to achieve separation through sparse decomposition, the decomposed periodic signals have to be assigned to the corresponding sources. For the assignment of the periodic signal, we introduce clustering using a K-means algorithm to group the decomposed periodic signals into as many clusters as the number of speakers. After the clustering, each cluster is assigned to its corresponding speaker using preliminarily learnt codebooks. Through separation experiments, we compare our method with MaxVQ, which performs separation on the frequency spectrum domain. The experimental results in terms of signal-to-distortion ratio show that the proposed sparse decomposition method is comparable to the frequency domain approach and has less computational costs for assignment of speech components.

  18. Integrated speech enhancement for functional MRI environment.

    PubMed

    Pathak, Nishank; Milani, Ali A; Panahi, Issa; Briggs, Richard

    2009-01-01

    This paper presents an integrated speech enhancement (SE) method for the noisy MRI environment. We show that the performance of SE system improves considerably when the speech signal dominated by MRI acoustic noise at very low SNR is enhanced in two successive stages using two-channel SE methods followed by a single-channel post processing SE algorithm. Actual MRI noisy speech data are used in our experiments showing the improved performance of the proposed SE method. PMID:19964964

  19. Primary acoustic signal structure during free falling drop collision with a water surface

    NASA Astrophysics Data System (ADS)

    Chashechkin, Yu. D.; Prokhorov, V. E.

    2016-04-01

    Consistent optical and acoustic techniques have been used to study the structure of hydrodynamic disturbances and acoustic signals generated as a free falling drop penetrates water. The relationship between the structures of hydrodynamic and acoustic perturbations arising as a result of a falling drop contacting with the water surface and subsequent immersion into water is traced. The primary acoustic signal is characterized, in addition to stably reproduced features (steep leading edge followed by long decay with local pressure maxima), by irregular high-frequency packets, which are studied for the first time. Reproducible experimental data are used to recognize constant and variable components of the primary acoustic signal.

  20. Acoustic changes in the production of lexical stress during Lombard speech.

    PubMed

    Arciuli, Joanne; Simpson, Briony S; Vogel, Adam P; Ballard, Kirrie J

    2014-06-01

    The Lombard effect describes the phenomenon of individuals increasing their vocal intensity when speaking in the presence of background noise. Here, we conducted an investigation of the production of lexical stress during Lombard speech. Participants (N = 27) produced the same sentences in three conditions: one quiet condition and two noise conditions at 70 dB (white noise; multi-talker babble). Manual acoustic analyses (syllable duration, vowel intensity, and vowel fundamental frequency) were completed for repeated productions of two trisyllabic words with opposing patterns of lexical stress (weak-strong; strong-weak) in each of the three conditions. In total, 324 productions were analysed (12 utterances per participant). Results revealed that, rather than increasing vocal intensity equally across syllables, participants alter the degree of stress contrastivity when speaking in noise. This was especially evident in the production of strong-weak lexical stress where there was an increase in contrastivity across syllables in terms of intensity and fundamental frequency. This preliminary study paves the way for further research that is needed to establish these findings using a larger set of multisyllabic stimuli. PMID:25102603

  1. Precursory acoustic signals and ground deformation in volcanic explosions

    NASA Astrophysics Data System (ADS)

    Bowman, D. C.; Kim, K.; Anderson, J.; Lees, J. M.; Taddeucci, J.; Graettinger, A. H.; Sonder, I.; Valentine, G.

    2013-12-01

    We investigate precursory acoustic signals that appear prior to volcanic explosions in real and experimental settings. Acoustic records of a series of experimental blasts designed to mimic maar explosions show precursory energy 0.02 to 0.05 seconds before the high amplitude overpressure arrival. These blasts consisted of 1 to 1/3 lb charges detonated in unconsolidated granular material at depths between 0.5 and 1 m, and were performed during the Buffalo Man Made Maars experiment in Springville, New York, USA. The preliminary acoustic arrival is 1 to 2 orders of magnitude lower in amplitude compared to the main blast wave. The waveforms vary from blast to blast, perhaps reflecting the different explosive yields and burial depths of each shot. Similar arrivals are present in some infrasound records at Santiaguito volcano, Guatemala, where they precede the main blast signal by about 2 seconds and are about 1 order of magnitude weaker. Precursory infrasound has also been described at Sakurajima volcano, Japan (Yokoo et al, 2013; Bull. Volc. Soc. Japan, 58, 163-181) and Suwanosejima volcano, Japan (Yokoo and Iguchi, 2010; JVGR, 196, 287-294), where it is attributed to rapid deformation of the vent region. Vent deformation has not been directly observed at these volcanoes because of the difficulty of visually observing the crater floor. However, particle image velocimetry of video records at Santiaguito has revealed rapid and widespread ground motion just prior to eruptions (Johnson et al, 2008; Nature, 456, 377-381) and may be the cause of much of the infrasound recorded at that volcano (Johnson and Lees, 2010; GRL, 37, L22305). High speed video records of the blasts during the Man Made Maars experiment also show rapid deformation of the ground immediately before the explosion plume breaches the surface. We examine the connection between source yield, burial depths, ground deformation, and the production of initial acoustic phases for each simulated maar explosion. We

  2. Computational principles underlying the recognition of acoustic signals in insects.

    PubMed

    Clemens, Jan; Hennig, R Matthias

    2013-08-01

    Many animals produce pulse-like signals during acoustic communication. These signals exhibit structure on two time scales: they consist of trains of pulses that are often broadcast in packets-so called chirps. Temporal parameters of the pulse and of the chirp are decisive for female preference. Despite these signals being produced by animals from many different taxa (e.g. frogs, grasshoppers, crickets, bushcrickets, flies), a general framework for their evaluation is still lacking. We propose such a framework, based on a simple and physiologically plausible model. The model consists of feature detectors, whose time-varying output is averaged over the signal and then linearly combined to yield the behavioral preference. We fitted this model to large data sets collected in two species of crickets and found that Gabor filters--known from visual and auditory physiology--explain the preference functions in these two species very well. We further explored the properties of Gabor filters and found a systematic relationship between parameters of the filters and the shape of preference functions. Although these Gabor filters were relatively short, they were also able to explain aspects of the preference for signal parameters on the longer time scale due to the integration step in our model. Our framework explains a wide range of phenomena associated with female preference for a widespread class of signals in an intuitive and physiologically plausible fashion. This approach thus constitutes a valuable tool to understand the functioning and evolution of communication systems in many species. PMID:23417450

  3. Objective evaluation of speech signal quality by the prediction of multiple foreground diagnostic acceptability measure attributes.

    PubMed

    Sen, Deep; Lu, W

    2012-05-01

    A methodology is described to objectively diagnose the quality of speech signals by predicting the perceptual detectability of a selected set of distortions. The distortions are a statistically selected subset of the broad number of distortions used in diagnostic acceptability measure (DAM) testing. The justification for such a methodology is established from the analysis of a set of speech signals representing a broad set of distortions and their respective DAM scores. At the heart of the ability to isolate and diagnose the perceptibility of the individual distortions is a physiologically motivated cochlear model. The philosophy and methodology is thus distinct from traditional objective measures that are typically designed to predict mean opinion scores (MOS) using well versed functional psychoacoustic models. Even so, a weighted sum of these objectively predicted set of distortions is able to predict accurate and robust MOS scores, even when the reference speech signals have been subject to the Lombard effect. PMID:22559381

  4. Experimental investigation of the effects of the acoustical conditions in a simulated classroom on speech recognition and learning in children a

    PubMed Central

    Valente, Daniel L.; Plevinsky, Hallie M.; Franco, John M.; Heinrichs-Graham, Elizabeth C.; Lewis, Dawna E.

    2012-01-01

    The potential effects of acoustical environment on speech understanding are especially important as children enter school where students’ ability to hear and understand complex verbal information is critical to learning. However, this ability is compromised because of widely varied and unfavorable classroom acoustics. The extent to which unfavorable classroom acoustics affect children’s performance on longer learning tasks is largely unknown as most research has focused on testing children using words, syllables, or sentences as stimuli. In the current study, a simulated classroom environment was used to measure comprehension performance of two classroom learning activities: a discussion and lecture. Comprehension performance was measured for groups of elementary-aged students in one of four environments with varied reverberation times and background noise levels. The reverberation time was either 0.6 or 1.5 s, and the signal-to-noise level was either +10 or +7 dB. Performance is compared to adult subjects as well as to sentence-recognition in the same condition. Significant differences were seen in comprehension scores as a function of age and condition; both increasing background noise and reverberation degraded performance in comprehension tasks compared to minimal differences in measures of sentence-recognition. PMID:22280587

  5. Modern Techniques in Acoustical Signal and Image Processing

    SciTech Connect

    Candy, J V

    2002-04-04

    Acoustical signal processing problems can lead to some complex and intricate techniques to extract the desired information from noisy, sometimes inadequate, measurements. The challenge is to formulate a meaningful strategy that is aimed at performing the processing required even in the face of uncertainties. This strategy can be as simple as a transformation of the measured data to another domain for analysis or as complex as embedding a full-scale propagation model into the processor. The aims of both approaches are the same--to extract the desired information and reject the extraneous, that is, develop a signal processing scheme to achieve this goal. In this paper, we briefly discuss this underlying philosophy from a ''bottom-up'' approach enabling the problem to dictate the solution rather than visa-versa.

  6. Language-Specific Developmental Differences in Speech Production: A Cross-Language Acoustic Study

    ERIC Educational Resources Information Center

    Li, Fangfang

    2012-01-01

    Speech productions of 40 English- and 40 Japanese-speaking children (aged 2-5) were examined and compared with the speech produced by 20 adult speakers (10 speakers per language). Participants were recorded while repeating words that began with "s" and "sh" sounds. Clear language-specific patterns in adults' speech were found, with English…

  7. Perceptually-driven signal analysis for acoustic event classification

    NASA Astrophysics Data System (ADS)

    Philips, Scott M.

    In many acoustic signal processing applications human listeners are able to outperform automated processing techniques, particularly in the identification and classification of acoustic events. The research discussed in this paper develops a framework for employing perceptual information from human listening experiments to improve automatic event classification. We focus on the identification of new signal attributes, or features, that are able to predict the human performance observed in formal listening experiments. Using this framework, our newly identified features have the ability to elevate automatic classification performance closer to the level of human listeners. We develop several new methods for learning a perceptual feature transform from human similarity measures. In addition to providing a more fundamental basis for uncovering perceptual features than previous approaches, these methods also lead to a greater insight into how humans perceive sounds in a dataset. We also develop a new approach for learning a perceptual distance metric. This metric is shown to be applicable to modern kernel-based techniques used in machine learning and provides a connection between the fields of psychoacoustics and machine learning. Our research demonstrates these new methods in the area of active sonar signal processing. There is anecdotal evidence within the sonar community that human operators are adept in the task of discriminating between active sonar target and clutter echoes. We confirm this ability in a series of formal listening experiments. With the results of these experiments, we then identify perceptual features and distance metrics using our novel methods. The results show better agreement with human performance than previous approaches. While this work demonstrates these methods using perceptual similarity measures from active sonar data, they are applicable to any similarity measure between signals.

  8. Applications of sub-audible speech recognition based upon electromyographic signals

    NASA Technical Reports Server (NTRS)

    Jorgensen, C. Charles (Inventor); Betts, Bradley J. (Inventor)

    2009-01-01

    Method and system for generating electromyographic or sub-audible signals (''SAWPs'') and for transmitting and recognizing the SAWPs that represent the original words and/or phrases. The SAWPs may be generated in an environment that interferes excessively with normal speech or that requires stealth communications, and may be transmitted using encoded, enciphered or otherwise transformed signals that are less subject to signal distortion or degradation in the ambient environment.

  9. Interactions between distal speech rate, linguistic knowledge, and speech environment.

    PubMed

    Morrill, Tuuli; Baese-Berk, Melissa; Heffner, Christopher; Dilley, Laura

    2015-10-01

    During lexical access, listeners use both signal-based and knowledge-based cues, and information from the linguistic context can affect the perception of acoustic speech information. Recent findings suggest that the various cues used in lexical access are implemented with flexibility and may be affected by information from the larger speech context. We conducted 2 experiments to examine effects of a signal-based cue (distal speech rate) and a knowledge-based cue (linguistic structure) on lexical perception. In Experiment 1, we manipulated distal speech rate in utterances where an acoustically ambiguous critical word was either obligatory for the utterance to be syntactically well formed (e.g., Conner knew that bread and butter (are) both in the pantry) or optional (e.g., Don must see the harbor (or) boats). In Experiment 2, we examined identical target utterances as in Experiment 1 but changed the distribution of linguistic structures in the fillers. The results of the 2 experiments demonstrate that speech rate and linguistic knowledge about critical word obligatoriness can both influence speech perception. In addition, it is possible to alter the strength of a signal-based cue by changing information in the speech environment. These results provide support for models of word segmentation that include flexible weighting of signal-based and knowledge-based cues. PMID:25794478

  10. Signal Restoration of Non-stationary Acoustic Signals in the Time Domain

    NASA Technical Reports Server (NTRS)

    Babkin, Alexander S.

    1988-01-01

    Signal restoration is a method of transforming a nonstationary signal acquired by a ground based microphone to an equivalent stationary signal. The benefit of the signal restoration is a simplification of the flight test requirements because it could dispense with the need to acquire acoustic data with another aircraft flying in concert with the rotorcraft. The data quality is also generally improved because the contamination of the signal by the propeller and wind noise is not present. The restoration methodology can also be combined with other data acquisition methods, such as a multiple linear microphone array for further improvement of the test results. The methodology and software are presented for performing the signal restoration in the time domain. The method has no restrictions on flight path geometry or flight regimes. Only requirement is that the aircraft spatial position be known relative to the microphone location and synchronized with the acoustic data. The restoration process assumes that the moving source radiates a stationary signal, which is then transformed into a nonstationary signal by various modulation processes. The restoration contains only the modulation due to the source motion.

  11. Articulatory Mediation of Speech Perception: A Causal Analysis of Multi-Modal Imaging Data

    ERIC Educational Resources Information Center

    Gow, David W., Jr.; Segawa, Jennifer A.

    2009-01-01

    The inherent confound between the organization of articulation and the acoustic-phonetic structure of the speech signal makes it exceptionally difficult to evaluate the competing claims of motor and acoustic-phonetic accounts of how listeners recognize coarticulated speech. Here we use Granger causation analysis of high spatiotemporal resolution…

  12. Study on demodulated signal distribution and acoustic pressure phase sensitivity of a self-interfered distributed acoustic sensing system

    NASA Astrophysics Data System (ADS)

    Shang, Ying; Yang, Yuan-Hong; Wang, Chen; Liu, Xiao-Hui; Wang, Chang; Peng, Gang-Ding

    2016-06-01

    We propose a demodulated signal distribution theory for a self-interfered distributed acoustic sensing system. The distribution region of Rayleigh backscattering including the acoustic sensing signal in the sensing fiber is investigated theoretically under different combinations of both the path difference and pulse width Additionally we determine the optimal solution between the path difference and pulse width to obtain the maximum phase change per unit length. We experimentally test this theory and realize a good acoustic pressure phase sensitivity of  ‑150 dB re rad/(μPa·m) of fiber in the frequency range from 200 Hz to 1 kHz.

  13. Floc Growth and Changes in ADV Acoustic Backscatter Signal

    NASA Astrophysics Data System (ADS)

    Rouhnia, M.; Keyvani, A.; Strom, K.

    2013-12-01

    A series of experiments were conducted to examine the effect of mud floc growth on the acoustic back-scatter signal recorded by a Nortek Vector acoustic Doppler velocimeter (ADV). Several studies have shown that calibration equations can be developed to link the backscatter strength with average suspended sediment concentration (SSC) when the sediment particle size distribution remains constant. However, when mud is present, the process of flocculation can alter the suspended particle size distribution. Past studies have shown that it is still unclear as to the degree of dependence of the calibration equation on changes in floc size. Part of the ambiguity lies in the fact that flocs can be porous and rather loosely packed and therefore might not scatter to the same extent as a grain of sand. In addition, direct, detailed measurements of floc size have not accompanied experiments examining the dependence of ADV backscatter and suspended sediment concentration. In this research, a set of laboratory experiments is used to test how floc growth affects the backscatter strength. The laboratory data is examined in light of an analytic model that was developed based on scatter theory to account for changes in both SSC and the floc properties of size and density. For the experiments, a turbulent suspension was created in a tank with a rotating paddle. Fixed concentrations of a mixture of kaolinite and montmorillonite were added to the tank in a step-wise manner. For each step, the flocs were allowed to grow to their equilibrium size before breaking the flocs with high turbulent mixing, adding more sediment, and then returning the mixing rate to a range suitable for the re-growth of flocs. During each floc growth phase, data was simultaneously collected at the same elevation in the tank using a floc camera to capture the changes in floc size, a Nortek Vector ADV for the acoustic backscatter, and a Campbell Scientific OBS 3+ for optical backscatter. Physical samples of the

  14. Adaptive Plasticity in Wild Field Cricket’s Acoustic Signaling

    PubMed Central

    Bertram, Susan M.; Harrison, Sarah J.; Thomson, Ian R.; Fitzsimmons, Lauren P.

    2013-01-01

    Phenotypic plasticity can be adaptive when phenotypes are closely matched to changes in the environment. In crickets, rhythmic fluctuations in the biotic and abiotic environment regularly result in diel rhythms in density of sexually active individuals. Given that density strongly influences the intensity of sexual selection, we asked whether crickets exhibit plasticity in signaling behavior that aligns with these rhythmic fluctuations in the socio-sexual environment. We quantified the acoustic mate signaling behavior of wild-caught males of two cricket species, Gryllus veletis and G. pennsylvanicus. Crickets exhibited phenotypically plastic mate signaling behavior, with most males signaling more often and more attractively during the times of day when mating activity is highest in the wild. Most male G. pennsylvanicus chirped more often and louder, with shorter interpulse durations, pulse periods, chirp durations, and interchirp durations, and at slightly higher carrier frequencies during the time of the day that mating activity is highest in the wild. Similarly, most male G. veletis chirped more often, with more pulses per chirp, longer interpulse durations, pulse periods, and chirp durations, shorter interchirp durations, and at lower carrier frequencies during the time of peak mating activity in the wild. Among-male variation in signaling plasticity was high, with some males signaling in an apparently maladaptive manner. Body size explained some of the among-male variation in G. pennsylvanicus plasticity but not G. veletis plasticity. Overall, our findings suggest that crickets exhibit phenotypically plastic mate attraction signals that closely match the fluctuating socio-sexual context they experience. PMID:23935965

  15. Extended amplification of acoustic signals by amphibian burrows.

    PubMed

    Muñoz, Matías I; Penna, Mario

    2016-07-01

    Animals relying on acoustic signals for communication must cope with the constraints imposed by the environment for sound propagation. A resource to improve signal broadcast is the use of structures that favor the emission or the reception of sounds. We conducted playback experiments to assess the effect of the burrows occupied by the frogs Eupsophus emiliopugini and E. calcaratus on the amplitude of outgoing vocalizations. In addition, we evaluated the influence of these cavities on the reception of externally generated sounds potentially interfering with conspecific communication, namely, the vocalizations emitted by four syntopic species of anurans (E. emiliopugini, E. calcaratus, Batrachyla antartandica, and Pleurodema thaul) and the nocturnal owls Strix rufipes and Glaucidium nanum. Eupsophus advertisement calls emitted from within the burrows experienced average amplitude gains of 3-6 dB at 100 cm from the burrow openings. Likewise, the incoming vocalizations of amphibians and birds were amplified on average above 6 dB inside the cavities. The amplification of internally broadcast Eupsophus vocalizations favors signal detection by nearby conspecifics. Reciprocally, the amplification of incoming conspecific and heterospecific signals facilitates the detection of neighboring males and the monitoring of the levels of potentially interfering biotic noise by resident frogs, respectively. PMID:27209276

  16. Acoustic and perceptual correlates of faster-than-habitual speech produced by speakers with Parkinson's disease and Multiple Sclerosis

    PubMed Central

    Kuo, Christina; Tjaden, Kris; Sussman, Joan E.

    2014-01-01

    Acoustic-perceptual characteristics of a faster-than-habitual rate (Fast condition) were examined for speakers with Parkinson's disease (PD) and Multiple Sclerosis (MS). Judgments of intelligibility for sentences produced at a habitual rate (Habitual condition) and at a faster-than-habitual rate (Fast condition) by 46 speakers with PD or MS as well as a group of 32 healthy speakers revealed that the Fast condition was, on average, associated with decreased intelligibility. However, some speakers' intelligibility did not decline. To further understand the acoustic characteristics of varied intelligibility in the Fast condition for speakers with dysarthria, a subgroup of speakers with PD or MS whose intelligibility did not decline in the Fast condition (No Decline group, n = 8) and a subgroup of speakers with significantly declined intelligibility (Decline group, n = 8) were compared. Acoustic measures of global speech timing, suprasegmental characteristics, and utterance-level segmental characteristics for vocalics were examined for the two subgroups. Results suggest acoustic contributions to intelligibility under rate modulation are complex. Potential clinical relevance and implications for the acoustic bases of intelligibility are discussed. PMID:25287378

  17. Audiovisual Speech Perception in Children with Developmental Language Disorder in Degraded Listening Conditions

    ERIC Educational Resources Information Center

    Meronen, Auli; Tiippana, Kaisa; Westerholm, Jari; Ahonen, Timo

    2013-01-01

    Purpose: The effect of the signal-to-noise ratio (SNR) on the perception of audiovisual speech in children with and without developmental language disorder (DLD) was investigated by varying the noise level and the sound intensity of acoustic speech. The main hypotheses were that the McGurk effect (in which incongruent visual speech alters the…

  18. Acoustic and perceptual evaluation of category goodness of /t/ and /k/ in typical and misarticulated children's speech.

    PubMed

    Strömbergsson, Sofia; Salvi, Giampiero; House, David

    2015-06-01

    This investigation explores perceptual and acoustic characteristics of children's successful and unsuccessful productions of /t/ and /k/, with a specific aim of exploring perceptual sensitivity to phonetic detail, and the extent to which this sensitivity is reflected in the acoustic domain. Recordings were collected from 4- to 8-year-old children with a speech sound disorder (SSD) who misarticulated one of the target plosives, and compared to productions recorded from peers with typical speech development (TD). Perceptual responses were registered with regards to a visual-analog scale, ranging from "clear [t]" to "clear [k]." Statistical models of prototypical productions were built, based on spectral moments and discrete cosine transform features, and used in the scoring of SSD productions. In the perceptual evaluation, "clear substitutions" were rated as less prototypical than correct productions. Moreover, target-appropriate productions of /t/ and /k/ produced by children with SSD were rated as less prototypical than those produced by TD peers. The acoustical modeling could to a large extent discriminate between the gross categories /t/ and /k/, and scored the SSD utterances on a continuous scale that was largely consistent with the category of production. However, none of the methods exhibited the same sensitivity to phonetic detail as the human listeners. PMID:26093431

  19. Disordered speech disrupts conversational entrainment: a study of acoustic-prosodic entrainment and communicative success in populations with communication challenges.

    PubMed

    Borrie, Stephanie A; Lubold, Nichola; Pon-Barry, Heather

    2015-01-01

    Conversational entrainment, a pervasive communication phenomenon in which dialogue partners adapt their behaviors to align more closely with one another, is considered essential for successful spoken interaction. While well-established in other disciplines, this phenomenon has received limited attention in the field of speech pathology and the study of communication breakdowns in clinical populations. The current study examined acoustic-prosodic entrainment, as well as a measure of communicative success, in three distinctly different dialogue groups: (i) healthy native vs. healthy native speakers (Control), (ii) healthy native vs. foreign-accented speakers (Accented), and (iii) healthy native vs. dysarthric speakers (Disordered). Dialogue group comparisons revealed significant differences in how the groups entrain on particular acoustic-prosodic features, including pitch, intensity, and jitter. Most notably, the Disordered dialogues were characterized by significantly less acoustic-prosodic entrainment than the Control dialogues. Further, a positive relationship between entrainment indices and communicative success was identified. These results suggest that the study of conversational entrainment in speech pathology will have essential implications for both scientific theory and clinical application in this domain. PMID:26321996

  20. Acoustic Emission Signals in Thin Plates Produced by Impact Damage

    NASA Technical Reports Server (NTRS)

    Prosser, William H.; Gorman, Michael R.; Humes, Donald H.

    1999-01-01

    Acoustic emission (AE) signals created by impact sources in thin aluminum and graphite/epoxy composite plates were analyzed. Two different impact velocity regimes were studied. Low-velocity (less than 0.21 km/s) impacts were created with an airgun firing spherical steel projectiles (4.5 mm diameter). High-velocity (1.8 to 7 km/s) impacts were generated with a two-stage light-gas gun firing small cylindrical nylon projectiles (1.5 mm diameter). Both the impact velocity and impact angle were varied. The impacts did not penetrate the aluminum plates at either low or high velocities. For high-velocity impacts in composites, there were both impacts that fully penetrated the plate as well as impacts that did not. All impacts generated very large amplitude AE signals (1-5 V at the sensor), which propagated as plate (extensional and/or flexural) modes. In the low-velocity impact studies, the signal was dominated by a large flexural mode with only a small extensional mode component detected. As the impact velocity was increased within the low velocity regime, the overall amplitudes of both the extensional and flexural modes increased. In addition, a relative increase in the amplitude of high-frequency components of the flexural mode was also observed. Signals caused by high-velocity impacts that did not penetrate the plate contained both a large extensional and flexural mode component of comparable amplitudes. The signals also contained components of much higher frequency and were easily differentiated from those caused by low-velocity impacts. An interesting phenomenon was observed in that the large flexural mode component, seen in every other case, was absent from the signal when the impact particle fully penetrated through the composite plates.

  1. Speech-clarity judgments of hearing-aid-processed speech in noise: differing polar patterns and acoustic environments.

    PubMed

    Amlani, Amyn M; Rakerd, Brad; Punch, Jerry L

    2006-06-01

    This investigation assessed the extent to which listeners' preferences for hearing aid microphone polar patterns vary across listening environments, and whether normal-hearing and inexperienced and experienced hearing-impaired listeners differ in such preferences. Paired-comparison judgments of speech clarity (i.e. subjective speech intelligibility) were made monaurally for recordings of speech in noise processed by a commercially available hearing aid programmed with an omnidirectional and two directional polar patterns (cardioid and hypercardioid). Testing environments included a sound-treated room, a living room, and a classroom. Polar-pattern preferences were highly reliable and agreed closely across all three groups of listeners. All groups preferred listening in the sound-treated room over listening in the living room, and preferred listening in the living room over listening in the classroom. Each group preferred the directional patterns to the omnidirectional pattern in all room conditions. We observed no differences in preference judgments between the two directional patterns or between hearing-impaired listeners' extent of amplification experience. Overall, findings indicate that listeners perceived qualitative benefits from microphones having directional polar patterns. PMID:16777778

  2. The Effect of Asymmetrical Signal Degradation on Binaural Speech Recognition in Children and Adults.

    ERIC Educational Resources Information Center

    Rothpletz, Ann M.; Tharpe, Anne Marie; Grantham, D. Wesley

    2004-01-01

    To determine the effect of asymmetrical signal degradation on binaural speech recognition, 28 children and 14 adults were administered a sentence recognition task amidst multitalker babble. There were 3 listening conditions: (a) monaural, with mild degradation in 1 ear; (b) binaural, with mild degradation in both ears (symmetric degradation); and…

  3. Acceptance Noise Level: Effects of the Speech Signal, Babble, and Listener Language

    ERIC Educational Resources Information Center

    Shi, Lu-Feng; Azcona, Gabrielly; Buten, Lupe

    2015-01-01

    Purpose: The acceptable noise level (ANL) measure has gained much research/clinical interest in recent years. The present study examined how the characteristics of the speech signal and the babble used in the measure may affect the ANL in listeners with different native languages. Method: Fifteen English monolingual, 16 Russian-English bilingual,…

  4. Frontal top-down signals increase coupling of auditory low-frequency oscillations to continuous speech in human listeners.

    PubMed

    Park, Hyojin; Ince, Robin A A; Schyns, Philippe G; Thut, Gregor; Gross, Joachim

    2015-06-15

    Humans show a remarkable ability to understand continuous speech even under adverse listening conditions. This ability critically relies on dynamically updated predictions of incoming sensory information, but exactly how top-down predictions improve speech processing is still unclear. Brain oscillations are a likely mechanism for these top-down predictions [1, 2]. Quasi-rhythmic components in speech are known to entrain low-frequency oscillations in auditory areas [3, 4], and this entrainment increases with intelligibility [5]. We hypothesize that top-down signals from frontal brain areas causally modulate the phase of brain oscillations in auditory cortex. We use magnetoencephalography (MEG) to monitor brain oscillations in 22 participants during continuous speech perception. We characterize prominent spectral components of speech-brain coupling in auditory cortex and use causal connectivity analysis (transfer entropy) to identify the top-down signals driving this coupling more strongly during intelligible speech than during unintelligible speech. We report three main findings. First, frontal and motor cortices significantly modulate the phase of speech-coupled low-frequency oscillations in auditory cortex, and this effect depends on intelligibility of speech. Second, top-down signals are significantly stronger for left auditory cortex than for right auditory cortex. Third, speech-auditory cortex coupling is enhanced as a function of stronger top-down signals. Together, our results suggest that low-frequency brain oscillations play a role in implementing predictive top-down control during continuous speech perception and that top-down control is largely directed at left auditory cortex. This suggests a close relationship between (left-lateralized) speech production areas and the implementation of top-down control in continuous speech perception. PMID:26028433

  5. Speech perception as categorization

    PubMed Central

    Holt, Lori L.; Lotto, Andrew J.

    2010-01-01

    Speech perception (SP) most commonly refers to the perceptual mapping from the highly variable acoustic speech signal to a linguistic representation, whether it be phonemes, diphones, syllables, or words. This is an example of categorization, in that potentially discriminable speech sounds are assigned to functionally equivalent classes. In this tutorial, we present some of the main challenges to our understanding of the categorization of speech sounds and the conceptualization of SP that has resulted from these challenges. We focus here on issues and experiments that define open research questions relevant to phoneme categorization, arguing that SP is best understood as perceptual categorization, a position that places SP in direct contact with research from other areas of perception and cognition. PMID:20601702

  6. An acoustic feature-based similarity scoring system for speech rehabilitation assistance.

    PubMed

    Syauqy, Dahnial; Wu, Chao-Min; Setyawati, Onny

    2016-08-01

    The purpose of this study is to develop a tool to assist speech therapy and rehabilitation, which focused on automatic scoring based on the comparison of the patient's speech with another normal speech on several aspects including pitch, vowel, voiced-unvoiced segments, strident fricative and sound intensity. The pitch estimation employed the use of cepstrum-based algorithm for its robustness; the vowel classification used multilayer perceptron (MLP) to classify vowel from pitch and formants; and the strident fricative detection was based on the major peak spectral intensity, location and the pitch existence in the segment. In order to evaluate the performance of the system, this study analyzed eight patient's speech recordings (four males, four females; 4-58-years-old), which had been recorded in previous study in cooperation with Taipei Veterans General Hospital and Taoyuan General Hospital. The experiment result on pitch algorithm showed that the cepstrum method had 5.3% of gross pitch error from a total of 2086 frames. On the vowel classification algorithm, MLP method provided 93% accuracy (men), 87% (women) and 84% (children). In total, the overall results showed that 156 tool's grading results (81%) were consistent compared to 192 audio and visual observations done by four experienced respondents. Implication for Rehabilitation Difficulties in communication may limit the ability of a person to transfer and exchange information. The fact that speech is one of the primary means of communication has encouraged the needs of speech diagnosis and rehabilitation. The advances of technology in computer-assisted speech therapy (CAST) improve the quality, time efficiency of the diagnosis and treatment of the disorders. The present study attempted to develop tool to assist speech therapy and rehabilitation, which provided simple interface to let the assessment be done even by the patient himself without the need of particular knowledge of speech processing while at the

  7. Prosodic Contrasts in Ironic Speech

    ERIC Educational Resources Information Center

    Bryant, Gregory A.

    2010-01-01

    Prosodic features in spontaneous speech help disambiguate implied meaning not explicit in linguistic surface structure, but little research has examined how these signals manifest themselves in real conversations. Spontaneously produced verbal irony utterances generated between familiar speakers in conversational dyads were acoustically analyzed…

  8. Acoustic Analysis of Clear Versus Conversational Speech in Individuals with Parkinson Disease

    ERIC Educational Resources Information Center

    Goberman, A.M.; Elmer, L.W.

    2005-01-01

    A number of studies have been devoted to the examination of clear versus conversational speech in non-impaired speakers. The purpose of these previous studies has been primarily to help increase speech intelligibility for the benefit of hearing-impaired listeners. The goal of the present study was to examine differences between conversational and…

  9. A Computational Model of Word Segmentation from Continuous Speech Using Transitional Probabilities of Atomic Acoustic Events

    ERIC Educational Resources Information Center

    Rasanen, Okko

    2011-01-01

    Word segmentation from continuous speech is a difficult task that is faced by human infants when they start to learn their native language. Several studies indicate that infants might use several different cues to solve this problem, including intonation, linguistic stress, and transitional probabilities between subsequent speech sounds. In this…

  10. Acoustic Analysis of the Speech of Children with Cochlear Implants: A Longitudinal Study

    ERIC Educational Resources Information Center

    Liker, Marko; Mildner, Vesna; Sindija, Branka

    2007-01-01

    The aim of the study was to analyse the speech of the children with cochlear implants, and compare it with the speech of hearing controls. We focused on three categories of Croatian sounds: vowels (F1 and F2 frequencies), fricatives (noise frequencies of /s/ and /[esh]/ ), and affricates (total duration and the pattern of stop-fricative components…

  11. Designing acoustics for linguistically diverse classrooms: Effects of background noise, reverberation and talker foreign accent on speech comprehension by native and non-native English-speaking listeners

    NASA Astrophysics Data System (ADS)

    Peng, Zhao Ellen

    The current classroom acoustics standard (ANSI S12.60-2010) recommends core learning spaces not to exceed background noise level (BNL) of 35 dBA and reverberation time (RT) of 0.6 second, based on speech intelligibility performance mainly by the native English-speaking population. Existing literature has not correlated these recommended values well with student learning outcomes. With a growing population of non-native English speakers in American classrooms, the special needs for perceiving degraded speech among non-native listeners, either due to realistic room acoustics or talker foreign accent, have not been addressed in the current standard. This research seeks to investigate the effects of BNL and RT on the comprehension of English speech from native English and native Mandarin Chinese talkers as perceived by native and non-native English listeners, and to provide acoustic design guidelines to supplement the existing standard. This dissertation presents two studies on the effects of RT and BNL on more realistic classroom learning experiences. How do native and non-native English-speaking listeners perform on speech comprehension tasks under adverse acoustic conditions, if the English speech is produced by talkers of native English (Study 1) versus native Mandarin Chinese (Study 2)? Speech comprehension materials were played back in a listening chamber to individual listeners: native and non-native English-speaking in Study 1; native English, native Mandarin Chinese, and other non-native English-speaking in Study 2. Each listener was screened for baseline English proficiency level, and completed dual tasks simultaneously involving speech comprehension and adaptive dot-tracing under 15 acoustic conditions, comprised of three BNL conditions (RC-30, 40, and 50) and five RT scenarios (0.4 to 1.2 seconds). The results show that BNL and RT negatively affect both objective performance and subjective perception of speech comprehension, more severely for non

  12. Ocean acoustic signal processing: A model-based approach

    SciTech Connect

    Candy, J.V. ); Sullivan, E.J. )

    1992-12-01

    A model-based approach is proposed to solve the ocean acoustic signal processing problem that is based on a state-space representation of the normal-mode propagation model. It is shown that this representation can be utilized to spatially propagate both modal (depth) and range functions given the basic parameters (wave numbers, etc.) developed from the solution of the associated boundary value problem. This model is then generalized to the stochastic case where an approximate Gauss--Markov model evolves. The Gauss--Markov representation, in principle, allows the inclusion of stochastic phenomena such as noise and modeling errors in a consistent manner. Based on this framework, investigations are made of model-based solutions to the signal enhancement, detection and related parameter estimation problems. In particular, a modal/pressure field processor is designed that allows {ital in} {ital situ} recursive estimation of the sound velocity profile. Finally, it is shown that the associated residual or so-called innovation sequence that ensues from the recursive nature of this formulation can be employed to monitor the model's fit to the data and also form the basis of a sequential detector.

  13. Acoustic emission source localization based on distance domain signal representation

    NASA Astrophysics Data System (ADS)

    Gawronski, M.; Grabowski, K.; Russek, P.; Staszewski, W. J.; Uhl, T.; Packo, P.

    2016-04-01

    Acoustic emission is a vital non-destructive testing technique and is widely used in industry for damage detection, localisation and characterization. The latter two aspects are particularly challenging, as AE data are typically noisy. What is more, elastic waves generated by an AE event, propagate through a structural path and are significantly distorted. This effect is particularly prominent for thin elastic plates. In these media the dispersion phenomenon results in severe localisation and characterization issues. Traditional Time Difference of Arrival methods for localisation techniques typically fail when signals are highly dispersive. Hence, algorithms capable of dispersion compensation are sought. This paper presents a method based on the Time - Distance Domain Transform for an accurate AE event localisation. The source localisation is found through a minimization problem. The proposed technique focuses on transforming the time signal to the distance domain response, which would be recorded at the source. Only, basic elastic material properties and plate thickness are used in the approach, avoiding arbitrary parameters tuning.

  14. Signal processing for passive detection and classification of underwater acoustic signals

    NASA Astrophysics Data System (ADS)

    Chung, Kil Woo

    2011-12-01

    This dissertation examines signal processing for passive detection, classification and tracking of underwater acoustic signals for improving port security and the security of coastal and offshore operations. First, we consider the problem of passive acoustic detection of a diver in a shallow water environment. A frequency-domain multi-band matched-filter approach to swimmer detection is presented. The idea is to break the frequency contents of the hydrophone signals into multiple narrow frequency bands, followed by time averaged (about half of a second) energy calculation over each band. Then, spectra composed of such energy samples over the chosen frequency bands are correlated to form a decision variable. The frequency bands with highest Signal/Noise ratio are used for detection. The performance of the proposed approach is demonstrated for experimental data collected for a diver in the Hudson River. We also propose a new referenceless frequency-domain multi-band detector which, unlike other reference-based detectors, does not require a diver specific signature. Instead, our detector matches to a general feature of the diver spectrum in the high frequency range: the spectrum is roughly periodic in time and approximately flat when the diver exhales. The performance of the proposed approach is demonstrated by using experimental data collected from the Hudson River. Moreover, we present detection, classification and tracking of small vessel signals. Hydroacoustic sensors can be applied for the detection of noise generated by vessels, and this noise can be used for vessel detection, classification and tracking. This dissertation presents recent improvements aimed at the measurement and separation of ship DEMON (Detection of Envelope Modulation on Noise) acoustic signatures in busy harbor conditions. Ship signature measurements were conducted in the Hudson River and NY Harbor. The DEMON spectra demonstrated much better temporal stability compared with the full ship

  15. On-Line Acoustic and Semantic Interpretation of Talker Information

    ERIC Educational Resources Information Center

    Creel, Sarah C.; Tumlin, Melanie A.

    2011-01-01

    Recent work demonstrates that listeners utilize talker-specific information in the speech signal to inform real-time language processing. However, there are multiple representational levels at which this may take place. Listeners might use acoustic cues in the speech signal to access the talker's identity and information about what they tend to…

  16. A Fibre Bragg Grating Sensor as a Receiver for Acoustic Communications Signals

    PubMed Central

    Wild, Graham; Hinckley, Steven

    2011-01-01

    A Fibre Bragg Grating (FBG) acoustic sensor is used as a receiver for acoustic communications signals. Acoustic transmissions were generated in aluminium and Carbon Fibre Composite (CFC) panels. The FBG receiver was coupled to the bottom surface opposite a piezoelectric transmitter. For the CFC, a second FBG was embedded within the layup for comparison. We show the transfer function, frequency response, and transient response of the acoustic communications channels. In addition, the FBG receiver was used to detect Phase Shift Keying (PSK) communications signals, which was shown to be the most robust method in a highly resonant communications channel. PMID:22346585

  17. Disordered speech disrupts conversational entrainment: a study of acoustic-prosodic entrainment and communicative success in populations with communication challenges

    PubMed Central

    Borrie, Stephanie A.; Lubold, Nichola; Pon-Barry, Heather

    2015-01-01

    Conversational entrainment, a pervasive communication phenomenon in which dialogue partners adapt their behaviors to align more closely with one another, is considered essential for successful spoken interaction. While well-established in other disciplines, this phenomenon has received limited attention in the field of speech pathology and the study of communication breakdowns in clinical populations. The current study examined acoustic-prosodic entrainment, as well as a measure of communicative success, in three distinctly different dialogue groups: (i) healthy native vs. healthy native speakers (Control), (ii) healthy native vs. foreign-accented speakers (Accented), and (iii) healthy native vs. dysarthric speakers (Disordered). Dialogue group comparisons revealed significant differences in how the groups entrain on particular acoustic–prosodic features, including pitch, intensity, and jitter. Most notably, the Disordered dialogues were characterized by significantly less acoustic-prosodic entrainment than the Control dialogues. Further, a positive relationship between entrainment indices and communicative success was identified. These results suggest that the study of conversational entrainment in speech pathology will have essential implications for both scientific theory and clinical application in this domain. PMID:26321996

  18. Integrated Spacesuit Audio System Enhances Speech Quality and Reduces Noise

    NASA Technical Reports Server (NTRS)

    Huang, Yiteng Arden; Chen, Jingdong; Chen, Shaoyan Sharyl

    2009-01-01

    A new approach has been proposed for increasing astronaut comfort and speech capture. Currently, the special design of a spacesuit forms an extreme acoustic environment making it difficult to capture clear speech without compromising comfort. The proposed Integrated Spacesuit Audio (ISA) system is to incorporate the microphones into the helmet and use software to extract voice signals from background noise.

  19. Method of detection, classification, and identification of objects employing acoustic signal analysis

    NASA Astrophysics Data System (ADS)

    Orzanowski, Tomasz; Madura, Henryk; Sosnowski, Tomasz; Chmielewski, Krzysztof

    2008-10-01

    The methods of detection and identification of objects based on acoustic signal analysis are used in many applications, e.g., alarm systems, military battlefield reconnaissance systems, intelligent ammunition, and others. The construction of technical objects such as vehicle or helicopter gives some possibilities to identify them on the basis of acoustic signals generated by those objects. In this paper a method of automatic detection, classification and identification of military vehicles and helicopters using a digital analysis of acoustic signals is presented. The method offers a relatively high probability of object detection in attendance of other disturbing acoustic signals. Moreover, it provides low probability of false classification and identification of object. The application of this method to acoustic sensor for the anti-helicopter mine is also presented.

  20. Effect of Digital Frequency Compression (DFC) on Speech Recognition in Candidates for Combined Electric and Acoustic Stimulation (EAS)

    PubMed Central

    Gifford, René H.; Dorman, Michael F.; Spahr, Anthony J.; McKarns, Sharon A.

    2008-01-01

    Purpose To compare the effects of conventional amplification (CA) and digital frequency compression (DFC) amplification on the speech recognition abilities of candidates for a partial-insertion cochlear implant, that is, candidates for combined electric and acoustic stimulation (EAS). Method The participants were 6 patients whose audiometric thresholds at 500 Hz and below were ≤60 dB HL and whose thresholds at 2000 Hz and above were ≥80 dB HL. Six tests of speech understanding were administered with CA and DFC. The Abbreviated Profile of Hearing Aid Benefit (APHAB) was also administered following use of CA and DFC. Results Group mean scores were not statistically different in the CA and DFC conditions. However, 2 patients received substantial benefit in DFC conditions. APHAB scores suggested increased ease of communication, but also increased aversive sound quality. Conclusion Results suggest that a relatively small proportion of individuals who meet EAS candidacy will receive substantial benefit from a DFC hearing aid and that a larger proportion will receive at least a small benefit when speech is presented against a background of noise. This benefit, however, comes at a cost—aversive sound quality. PMID:17905905

  1. Analysis of speech signals' characteristics based on MF-DFA with moving overlapping windows

    NASA Astrophysics Data System (ADS)

    Zhao, Huan; He, Shaofang

    2016-01-01

    In this paper, multi-fractal characteristics of speech signals are analyzed based on MF-DFA, and it is found that the multi-fractal features are influenced greatly by frame length and noise, besides, there is a little difference between them among speech frames. Secondly, motivated by framing and using frame shift to ensure the continuity and smooth transition of speech in speech signals processing, an advanced MF-DFA (MF-DFA with forward moving overlapping windows) is proposed. The length of moving overlapping windows is determined by parameter θ. Given the value of time scale s, we have MF-DFA with the maximum moving overlapping windows and MF-DFA with half overlapping windows when θ = 1 / s and θ = 1 / 2 respectively. Moreover, when θ = 1 we exactly have MF-DFA. Numerical experiments and analysis illustrate that the multi-fractal characteristics based on AMF-DFA outperform MF-DFA and MF-DMA in stability, noise immunity and discrimination.

  2. Behavioral Signal Processing: Deriving Human Behavioral Informatics From Speech and Language

    PubMed Central

    Narayanan, Shrikanth; Georgiou, Panayiotis G.

    2013-01-01

    The expression and experience of human behavior are complex and multimodal and characterized by individual and contextual heterogeneity and variability. Speech and spoken language communication cues offer an important means for measuring and modeling human behavior. Observational research and practice across a variety of domains from commerce to healthcare rely on speech- and language-based informatics for crucial assessment and diagnostic information and for planning and tracking response to an intervention. In this paper, we describe some of the opportunities as well as emerging methodologies and applications of human behavioral signal processing (BSP) technology and algorithms for quantitatively understanding and modeling typical, atypical, and distressed human behavior with a specific focus on speech- and language-based communicative, affective, and social behavior. We describe the three important BSP components of acquiring behavioral data in an ecologically valid manner across laboratory to real-world settings, extracting and analyzing behavioral cues from measured data, and developing models offering predictive and decision-making support. We highlight both the foundational speech and language processing building blocks as well as the novel processing and modeling opportunities. Using examples drawn from specific real-world applications ranging from literacy assessment and autism diagnostics to psychotherapy for addiction and marital well being, we illustrate behavioral informatics applications of these signal processing techniques that contribute to quantifying higher level, often subjectively described, human behavior in a domain-sensitive fashion. PMID:24039277

  3. Sub-Audible Speech Recognition Based upon Electromyographic Signals

    NASA Technical Reports Server (NTRS)

    Jorgensen, Charles C. (Inventor); Lee, Diana D. (Inventor); Agabon, Shane T. (Inventor)

    2012-01-01

    Method and system for processing and identifying a sub-audible signal formed by a source of sub-audible sounds. Sequences of samples of sub-audible sound patterns ("SASPs") for known words/phrases in a selected database are received for overlapping time intervals, and Signal Processing Transforms ("SPTs") are formed for each sample, as part of a matrix of entry values. The matrix is decomposed into contiguous, non-overlapping two-dimensional cells of entries, and neural net analysis is applied to estimate reference sets of weight coefficients that provide sums with optimal matches to reference sets of values. The reference sets of weight coefficients are used to determine a correspondence between a new (unknown) word/phrase and a word/phrase in the database.

  4. Optimization of speech in noise with three signal processing algorithms for normal-hearing and hearing-impaired subjects

    NASA Astrophysics Data System (ADS)

    Franck, Bas A. M.; Dreschler, Wouter A.; Lyzenga, Johannes

    2002-05-01

    In this study a three-dimensional Simplex procedure was applied to optimize speech in noise by a combination of signal processing algorithms for different acoustic conditions and hearing losses. The algorithms used to span the three dimensions are noise reduction, spectral tilting, and spectral enhancement, respectively. Additionally, we studied the algorithms for their main effects and interaction effects within the optimization process. The subjects were asked to evaluate two consecutive, differently processed sentences on listening comfort. Three different noise types and two signal-to-noise ratios (S/N) were used. Three groups of subjects participated: normal hearing, normal hearing with simulated impaired auditory filtering (by spectral smearing), and sensorineurally hearing-impaired subjects. For the normal-hearing group we applied S/N=0 dB. For the hearing-impaired and the simulated hearing-impaired subjects we applied S/N=5 dB. We will discuss the similarities and differences in the response patterns of the three groups. Also, the individual preferences will be related to the hearing capacity, and to the type of interfering noise. Finally, we will discuss differences in the perceptual features that are used to judge listening comfort of the fragments by normal-hearing and hearing-impaired subjects.

  5. Recognition of information-bearing elements in speech

    NASA Astrophysics Data System (ADS)

    Hermansky, Hynek

    2003-10-01

    An acoustic speech signal carries many different kinds of information: the basic linguistic message, many characteristics of the speaker of the message, details of the environment in which the message was produced and transmitted, etc. The human auditory/cognitive system is able to detect, decode, and separate all these information sources. Understanding this ability and emulating it on a machine has been an important but elusive scientific and engineering goal for a long time. This talk critically surveys the situation in the speech recognition field. It puts automatic recognition of speech in perspective with other acoustic signal detection and classification tasks, reviews some historical, contemporary, and evolving techniques for machine recognition of speech, critically compares competing techniques, and gives some examples of applications in speech, speaker, and language recognition and identification. The talk is intended for an audience interested but not directly involved in the processing of speech.

  6. Filtering of Acoustic Signals within the Hearing Organ

    PubMed Central

    Ramamoorthy, Sripriya; Chen, Fangyi; Jacques, Steven L.; Wang, Ruikang; Choudhury, Niloy; Fridberger, Anders

    2014-01-01

    The detection of sound by the mammalian hearing organ involves a complex mechanical interplay among different cell types. The inner hair cells, which are the primary sensory receptors, are stimulated by the structural vibrations of the entire organ of Corti. The outer hair cells are thought to modulate these sound-evoked vibrations to enhance hearing sensitivity and frequency resolution, but it remains unclear whether other structures also contribute to frequency tuning. In the current study, sound-evoked vibrations were measured at the stereociliary side of inner and outer hair cells and their surrounding supporting cells, using optical coherence tomography interferometry in living anesthetized guinea pigs. Our measurements demonstrate the presence of multiple vibration modes as well as significant differences in frequency tuning and response phase among different cell types. In particular, the frequency tuning at the inner hair cells differs from other cell types, causing the locus of maximum inner hair cell activation to be shifted toward the apex of the cochlea compared with the outer hair cells. These observations show that additional processing and filtering of acoustic signals occur within the organ of Corti before inner hair cell excitation, representing a departure from established theories. PMID:24990925

  7. MASS-DEPENDENT BARYON ACOUSTIC OSCILLATION SIGNAL AND HALO BIAS

    SciTech Connect

    Wang Qiao; Zhan Hu

    2013-05-10

    We characterize the baryon acoustic oscillations (BAO) feature in halo two-point statistics using N-body simulations. We find that nonlinear damping of the BAO signal is less severe for halos in the mass range we investigate than for dark matter. The amount of damping depends weakly on the halo mass. The correlation functions show a mass-dependent drop of the halo clustering bias below roughly 90 h {sup -1} Mpc, which coincides with the scale of the BAO trough. The drop of bias is 4% for halos with mass M > 10{sup 14} h {sup -1} M{sub Sun} and reduces to roughly 2% for halos with mass M > 10{sup 13} h {sup -1} M{sub Sun }. In contrast, halo biases in simulations without BAO change more smoothly around 90 h {sup -1} Mpc. In Fourier space, the bias of M > 10{sup 14} h {sup -1} M{sub Sun} halos decreases smoothly by 11% from wavenumber k = 0.012 h Mpc{sup -1} to 0.2 h Mpc{sup -1}, whereas that of M > 10{sup 13} h {sup -1} M{sub Sun} halos decreases by less than 4% over the same range. By comparing the halo biases in pairs of otherwise identical simulations, one with and the other without BAO, we also observe a modulation of the halo bias. These results suggest that precise calibrations of the mass-dependent BAO signal and scale-dependent bias on large scales would be needed for interpreting precise measurements of the two-point statistics of clusters or massive galaxies in the future.

  8. Acoustic Variations in Adductor Spasmodic Dysphonia as a Function of Speech Task.

    ERIC Educational Resources Information Center

    Sapienza, Christine M.; Walton, Suzanne; Murry, Thomas

    1999-01-01

    Acoustic phonatory events were identified in 14 women diagnosed with adductor spasmodic dysphonia (ADSD), a focal laryngeal dystonia that disturbs phonatory function, and compared with those of 14 age-matched women with no vocal dysfunction. Findings indicated ADSD subjects produced more aberrant acoustic events than controls during tasks of…

  9. The effects of hearing protectors on speech communication and the perception of warning signals

    NASA Astrophysics Data System (ADS)

    Suter, Alice H.

    1989-06-01

    Because hearing protectors attenuate the noise and signal by equal amounts within a given frequency range, reducing both to a level where there is less likelihood of distortion, they often provide improved listening conditions. The crossover level from disadvantage to advantage usually occurs between 80 and 90 dB. However, hearing protectors may adversely affect speech recognition under a variety of conditions. For hearing-impaired listeners, whose average hearing levels at 2000, 3000, and 4000 Hz exceed 30 dB, certain speech sounds will fall below the level of audibility. Visual cues may decrease the disadvantage imposed by hearing protectors. However, the Occlusion Effect, which decreases vocal output when the talker wears protection, adversely affects the listener's speech recognition. The poorest performance occurs when both talkers and listeners wear protectors. Hearing protectors affect warning signal perception in a similar manner. Again the crossover level seems to be between 80 and 90 dB, and there is greater degradation for individuals with impaired hearing. Earmuffs appear to pose greater problems than plugs, and this is especially true of difficulties in signal localization. Earplugs produce mainly front-back localization errors, while earmuffs produce left-right localization errors as well. Earmuffs also drastically impede localization in the vertical plane.

  10. Direct classification of all American English phonemes using signals from functional speech motor cortex

    NASA Astrophysics Data System (ADS)

    Mugler, Emily M.; Patton, James L.; Flint, Robert D.; Wright, Zachary A.; Schuele, Stephan U.; Rosenow, Joshua; Shih, Jerry J.; Krusienski, Dean J.; Slutzky, Marc W.

    2014-06-01

    Objective. Although brain-computer interfaces (BCIs) can be used in several different ways to restore communication, communicative BCI has not approached the rate or efficiency of natural human speech. Electrocorticography (ECoG) has precise spatiotemporal resolution that enables recording of brain activity distributed over a wide area of cortex, such as during speech production. In this study, we sought to decode elements of speech production using ECoG. Approach. We investigated words that contain the entire set of phonemes in the general American accent using ECoG with four subjects. Using a linear classifier, we evaluated the degree to which individual phonemes within each word could be correctly identified from cortical signal. Main results. We classified phonemes with up to 36% accuracy when classifying all phonemes and up to 63% accuracy for a single phoneme. Further, misclassified phonemes follow articulation organization described in phonology literature, aiding classification of whole words. Precise temporal alignment to phoneme onset was crucial for classification success. Significance. We identified specific spatiotemporal features that aid classification, which could guide future applications. Word identification was equivalent to information transfer rates as high as 3.0 bits s-1 (33.6 words min-1), supporting pursuit of speech articulation for BCI control.

  11. Intensity Accents in French 2 Year Olds' Speech.

    ERIC Educational Resources Information Center

    Allen, George D.

    The acoustic features and functions of accentuation in French are discussed, and features of accentuation in the speech of French 2-year-olds are explored. The four major acoustic features used to signal accentual distinctions are fundamental frequency of voicing, duration of segments and syllables, intensity of segments and syllables, and…

  12. A human vocal utterance corpus for perceptual and acoustic analysis of speech, singing, and intermediate vocalizations

    NASA Astrophysics Data System (ADS)

    Gerhard, David

    2002-11-01

    In this paper we present the collection and annotation process of a corpus of human utterance vocalizations used for speech and song research. The corpus was collected to fill a void in current research tools, since no corpus currently exists which is useful for the classification of intermediate utterances between speech and monophonic singing. Much work has been done in the domain of speech versus music discrimination, and several corpora exist which can be used for this research. A specific example is the work done by Eric Scheirer and Malcom Slaney [IEEE ICASSP, 1997, pp. 1331-1334]. The collection of the corpus is described including questionnaire design and intended and actual response characteristics, as well as the collection and annotation of pre-existing samples. The annotation of the corpus consisted of a survey tool for a subset of the corpus samples, including ratings of the clips based on a speech-song continuum, and questions on the perceptual qualities of speech and song, both generally and corresponding to particular clips in the corpus.

  13. Study of acoustic emission signals during fracture shear deformation

    NASA Astrophysics Data System (ADS)

    Ostapchuk, A. A.; Pavlov, D. V.; Markov, V. K.; Krasheninnikov, A. V.

    2016-07-01

    We study acoustic manifestations of different regimes of shear deformation of a fracture filled with a thin layer of granular material. It is established that the observed acoustic portrait is determined by the structure of the fracture at the mesolevel. Joint analysis of the activity of acoustic pulses and their spectral characteristics makes it possible to construct the pattern of internal evolutionary processes occurring in the thin layer of the interblock contact and consider the fracture deformation process as the evolution of a self-organizing system.

  14. Synergy of seismic, acoustic, and video signals in blast analysis

    SciTech Connect

    Anderson, D.P.; Stump, B.W.; Weigand, J.

    1997-09-01

    The range of mining applications from hard rock quarrying to coal exposure to mineral recovery leads to a great variety of blasting practices. A common characteristic of many of the sources is that they are detonated at or near the earth`s surface and thus can be recorded by camera or video. Although the primary interest is in the seismic waveforms that these blasts generate, the visual observations of the blasts provide important constraints that can be applied to the physical interpretation of the seismic source function. In particular, high speed images can provide information on detonation times of individuals charges, the timing and amount of mass movement during the blasting process and, in some instances, evidence of wave propagation away from the source. All of these characteristics can be valuable in interpreting the equivalent seismic source function for a set of mine explosions and quantifying the relative importance of the different processes. This paper documents work done at the Los Alamos National Laboratory and Southern Methodist University to take standard Hi-8 video of mine blasts, recover digital images from them, and combine them with ground motion records for interpretation. The steps in the data acquisition, processing, display, and interpretation are outlined. The authors conclude that the combination of video with seismic and acoustic signals can be a powerful diagnostic tool for the study of blasting techniques and seismology. A low cost system for generating similar diagnostics using consumer-grade video camera and direct-to-disk video hardware is proposed. Application is to verification of the Comprehensive Test Ban Treaty.

  15. Estimation of the Tool Condition by Applying the Wavelet Transform to Acoustic Emission Signals

    SciTech Connect

    Gomez, M. P.; Piotrkowski, R.; Ruzzante, J. E.; D'Attellis, C. E.

    2007-03-21

    This work follows the search of parameters to evaluate the tool condition in machining processes. The selected sensing technique is acoustic emission and it is applied to a turning process of steel samples. The obtained signals are studied using the wavelet transformation. The tool wear level is quantified as a percentage of the final wear specified by the Standard ISO 3685. The amplitude and relevant scale obtained of acoustic emission signals could be related with the wear level.

  16. Masking Property Based Residual Acoustic Echo Cancellation for Hands-Free Communication in Automobile Environment

    NASA Astrophysics Data System (ADS)

    Lee, Yoonjae; Jeong, Seokyeong; Ko, Hanseok

    A residual acoustic echo cancellation method that employs the masking property is proposed to enhance the speech quality of hands-free communication devices in an automobile environment. The conventional masking property is employed for speech enhancement using the masking threshold of the desired clean speech signal. In this Letter, either the near-end speech or residual noise is selected as the desired signal according to the double-talk detector. Then, the residual echo signal is masked by the desired signal (masker). Experiments confirm the effectiveness of the proposed method by deriving the echo return loss enhancement and by examining speech waveforms and spectrograms.

  17. Investigation of Interference Phenomena of Broadband Acoustic Vector Signals in Shallow Water

    NASA Astrophysics Data System (ADS)

    Piao, Shengchun; Ren, Qunyan

    2010-09-01

    Although the ocean environment in shallow water is very complex, there still exists stable interference pattern for broadband low frequency sound propagation. The waveguide invariant concept is introduced to describe the broadband interference structure of the acoustic pressure field in a waveguide and now it is widely used in underwater acoustic signal processing. Acoustic vector sensor can measure the particle velocity in the ocean and provides more information for the underwater sound field. In this paper, the interference phenomena of broadband vector acoustic signals in shallow water are investigated by numerical simulation. Energy spatial-frequency distributions are shown for energy flux density vector obtained by combination of pressure and particle velocity signals and they are analyzed according to normal mode theory. Comparisons of the interference structure between the scale acoustic field and vector acoustic field also have been made. The waveguide invariant concept is extended to describe the interference structure of vector acoustic field in shallow water. A method for extraction of the waveguide invariant from interference patterns in vector acoustic field spectrograms is presented, which can be used in matched-field processing and geoacoustic inversion. It is shown that this method may have more advantages than the traditional methods which calculate the waveguide invariant using measured sound pressure in the ocean.

  18. Improving robustness of speech recognition systems

    NASA Astrophysics Data System (ADS)

    Mitra, Vikramjit

    2010-11-01

    Current Automatic Speech Recognition (ASR) systems fail to perform nearly as good as human speech recognition performance due to their lack of robustness against speech variability and noise contamination. The goal of this dissertation is to investigate these critical robustness issues, put forth different ways to address them and finally present an ASR architecture based upon these robustness criteria. Acoustic variations adversely affect the performance of current phone-based ASR systems, in which speech is modeled as 'beads-on-a-string', where the beads are the individual phone units. While phone units are distinctive in cognitive domain, they are varying in the physical domain and their variation occurs due to a combination of factors including speech style, speaking rate etc.; a phenomenon commonly known as 'coarticulation'. Traditional ASR systems address such coarticulatory variations by using contextualized phone-units such as triphones. Articulatory phonology accounts for coarticulatory variations by modeling speech as a constellation of constricting actions known as articulatory gestures. In such a framework, speech variations such as coarticulation and lenition are accounted for by gestural overlap in time and gestural reduction in space. To realize a gesture-based ASR system, articulatory gestures have to be inferred from the acoustic signal. At the initial stage of this research an initial study was performed using synthetically generated speech to obtain a proof-of-concept that articulatory gestures can indeed be recognized from the speech signal. It was observed that having vocal tract constriction trajectories (TVs) as intermediate representation facilitated the gesture recognition task from the speech signal. Presently no natural speech database contains articulatory gesture annotation; hence an automated iterative time-warping architecture is proposed that can annotate any natural speech database with articulatory gestures and TVs. Two natural

  19. Acoustic emission signal classification for gearbox failure detection

    NASA Astrophysics Data System (ADS)

    Shishino, Jun

    The purpose of this research is to develop a methodology and technique to determine the optimal number of clusters in acoustic emission (AE) data obtained from a ground test stand of a rotating H-60 helicopter tail gearbox by using mathematical algorithms and visual inspection. Signs of fatigue crack growth were observed from the AE signals acquired from the result of the optimal number of clusters in a data set. Previous researches have determined the number of clusters by visually inspecting the AE plots from number of iterations. This research is focused on finding the optimal number of clusters in the data set by using mathematical algorithms then using visual verification to confirm it. The AE data were acquired from the ground test stand that simulates the tail end of an H-60 Seahawk at Naval Air Station in Patuxant River, Maryland. The data acquired were filtered to eliminate durations that were greater than 100,000 is and 0 energy hit data to investigate the failure mechanisms occurring on the output bevel gear. From the filtered data, different AE signal parameters were chosen to perform iterations to see which clustering algorithms and number of outputs is the best. The clustering algorithms utilized are the Kohonen Self-organizing Map (SOM), k-mean and Gaussian Mixture Model (GMM). From the clustering iterations, the three cluster criterion algorithms were performed to observe the suggested optimal number of cluster by the criterions. The three criterion algorithms utilized are the Davies-Bouldin, Silhouette and Tou Criterions. After the criterions had suggested the optimal number of cluster for each data set, visual verification by observing the AE plots and statistical analysis of each cluster were performed. By observing the AE plots and the statistical analysis, the optimal number of cluster in the data set and effective clustering algorithms were determined. Along with the optimal number of clusters and effective clustering algorithm, the mechanisms

  20. Graph-based sensor fusion for classification of transient acoustic signals.

    PubMed

    Srinivas, Umamahesh; Nasrabadi, Nasser M; Monga, Vishal

    2015-03-01

    Advances in acoustic sensing have enabled the simultaneous acquisition of multiple measurements of the same physical event via co-located acoustic sensors. We exploit the inherent correlation among such multiple measurements for acoustic signal classification, to identify the launch/impact of munition (i.e., rockets, mortars). Specifically, we propose a probabilistic graphical model framework that can explicitly learn the class conditional correlations between the cepstral features extracted from these different measurements. Additionally, we employ symbolic dynamic filtering-based features, which offer improvements over the traditional cepstral features in terms of robustness to signal distortions. Experiments on real acoustic data sets show that our proposed algorithm outperforms conventional classifiers as well as the recently proposed joint sparsity models for multisensor acoustic classification. Additionally our proposed algorithm is less sensitive to insufficiency in training samples compared to competing approaches. PMID:25014986

  1. Revisiting Neil Armstrongs Moon-Landing Quote: Implications for Speech Perception, Function Word Reduction, and Acoustic Ambiguity.

    PubMed

    Baese-Berk, Melissa M; Dilley, Laura C; Schmidt, Stephanie; Morrill, Tuuli H; Pitt, Mark A

    2016-01-01

    Neil Armstrong insisted that his quote upon landing on the moon was misheard, and that he had said one small step for a man, instead of one small step for man. What he said is unclear in part because function words like a can be reduced and spectrally indistinguishable from the preceding context. Therefore, their presence can be ambiguous, and they may disappear perceptually depending on the rate of surrounding speech. Two experiments are presented examining production and perception of reduced tokens of for and for a in spontaneous speech. Experiment 1 investigates the distributions of several acoustic features of for and for a. The results suggest that the distributions of for and for a overlap substantially, both in terms of temporal and spectral characteristics. Experiment 2 examines perception of these same tokens when the context speaking rate differs. The perceptibility of the function word a varies as a function of this context speaking rate. These results demonstrate that substantial ambiguity exists in the original quote from Armstrong, and that this ambiguity may be understood through context speaking rate. PMID:27603209

  2. Comparison of Acoustic and Kinematic Approaches to Measuring Utterance-Level Speech Variability

    ERIC Educational Resources Information Center

    Howell, Peter; Anderson, Andrew J.; Bartrip, Jon; Bailey, Eleanor

    2009-01-01

    Purpose: The spatiotemporal index (STI) is one measure of variability. As currently implemented, kinematic data are used, requiring equipment that cannot be used with some patient groups or in scanners. An experiment is reported that addressed whether STI can be extended to an audio measure of sound pressure of the speech envelope over time that…

  3. The Exploitation of Subphonemic Acoustic Detail in L2 Speech Segmentation

    ERIC Educational Resources Information Center

    Shoemaker, Ellenor

    2014-01-01

    The current study addresses an aspect of second language (L2) phonological acquisition that has received little attention to date--namely, the acquisition of allophonic variation as a word boundary cue. The role of subphonemic variation in the segmentation of speech by native speakers has been indisputably demonstrated; however, the acquisition of…

  4. Impact of Aberrant Acoustic Properties on the Perception of Sound Quality in Electrolarynx Speech

    ERIC Educational Resources Information Center

    Meltzner, Geoffrey S.; Hillman, Robert E.

    2005-01-01

    A large percentage of patients who have undergone laryngectomy to treat advanced laryngeal cancer rely on an electrolarynx (EL) to communicate verbally. Although serviceable, EL speech is plagued by shortcomings in both sound quality and intelligibility. This study sought to better quantify the relative contributions of previously identified…

  5. Speech Signal and Facial Image Processing for Obstructive Sleep Apnea Assessment

    PubMed Central

    Espinoza-Cuadros, Fernando; Fernández-Pozo, Rubén; Toledano, Doroteo T.; Alcázar-Ramírez, José D.; López-Gonzalo, Eduardo; Hernández-Gómez, Luis A.

    2015-01-01

    Obstructive sleep apnea (OSA) is a common sleep disorder characterized by recurring breathing pauses during sleep caused by a blockage of the upper airway (UA). OSA is generally diagnosed through a costly procedure requiring an overnight stay of the patient at the hospital. This has led to proposing less costly procedures based on the analysis of patients' facial images and voice recordings to help in OSA detection and severity assessment. In this paper we investigate the use of both image and speech processing to estimate the apnea-hypopnea index, AHI (which describes the severity of the condition), over a population of 285 male Spanish subjects suspected to suffer from OSA and referred to a Sleep Disorders Unit. Photographs and voice recordings were collected in a supervised but not highly controlled way trying to test a scenario close to an OSA assessment application running on a mobile device (i.e., smartphones or tablets). Spectral information in speech utterances is modeled by a state-of-the-art low-dimensional acoustic representation, called i-vector. A set of local craniofacial features related to OSA are extracted from images after detecting facial landmarks using Active Appearance Models (AAMs). Support vector regression (SVR) is applied on facial features and i-vectors to estimate the AHI. PMID:26664493

  6. Speech Signal and Facial Image Processing for Obstructive Sleep Apnea Assessment.

    PubMed

    Espinoza-Cuadros, Fernando; Fernández-Pozo, Rubén; Toledano, Doroteo T; Alcázar-Ramírez, José D; López-Gonzalo, Eduardo; Hernández-Gómez, Luis A

    2015-01-01

    Obstructive sleep apnea (OSA) is a common sleep disorder characterized by recurring breathing pauses during sleep caused by a blockage of the upper airway (UA). OSA is generally diagnosed through a costly procedure requiring an overnight stay of the patient at the hospital. This has led to proposing less costly procedures based on the analysis of patients' facial images and voice recordings to help in OSA detection and severity assessment. In this paper we investigate the use of both image and speech processing to estimate the apnea-hypopnea index, AHI (which describes the severity of the condition), over a population of 285 male Spanish subjects suspected to suffer from OSA and referred to a Sleep Disorders Unit. Photographs and voice recordings were collected in a supervised but not highly controlled way trying to test a scenario close to an OSA assessment application running on a mobile device (i.e., smartphones or tablets). Spectral information in speech utterances is modeled by a state-of-the-art low-dimensional acoustic representation, called i-vector. A set of local craniofacial features related to OSA are extracted from images after detecting facial landmarks using Active Appearance Models (AAMs). Support vector regression (SVR) is applied on facial features and i-vectors to estimate the AHI. PMID:26664493

  7. Limited condition dependence of male acoustic signals in the grasshopper Chorthippus biguttulus

    PubMed Central

    Franzke, Alexandra; Reinhold, Klaus

    2012-01-01

    In many animal species, male acoustic signals serve to attract a mate and therefore often play a major role for male mating success. Male body condition is likely to be correlated with male acoustic signal traits, which signal male quality and provide choosy females indirect benefits. Environmental factors such as food quantity or quality can influence male body condition and therefore possibly lead to condition-dependent changes in the attractiveness of acoustic signals. Here, we test whether stressing food plants influences acoustic signal traits of males via condition-dependent expression of these traits. We examined four male song characteristics, which are vital for mate choice in females of the grasshopper Chorthippus biguttulus. Only one of the examined acoustic traits, loudness, was significantly altered by changing body condition because of drought- and moisture-related stress of food plants. No condition dependence could be observed for syllable to pause ratio, gap duration within syllables, and onset accentuation. We suggest that food plant stress and therefore food plant quality led to shifts in loudness of male grasshopper songs via body condition changes. The other three examined acoustic traits of males do not reflect male body condition induced by food plant quality. PMID:22957192

  8. Speech Enhancement Using Microphone Arrays.

    NASA Astrophysics Data System (ADS)

    Adugna, Eneyew

    Arrays of sensors have been employed effectively in communication systems for the directional transmission and reception of electromagnetic waves. Among the numerous benefits, this helps improve the signal-to-interference ratio (SIR) of the signal at the receiver. Arrays have since been used in related areas that employ propagating waves for the transmission of information. Several investigators have successfully adopted array principles to acoustics, sonar, seismic, and medical imaging. In speech applications the microphone is used as the sensor for acoustic data acquisition. The performance of subsequent speech processing algorithms--such as speech recognition or speaker recognition--relies heavily on the level of interference within the transduced or recorded speech signal. The normal practice is to use a single, hand-held or head-mounted, microphone. Under most environmental conditions, i.e., environments where other acoustic sources are also active, the speech signal from a single microphone is a superposition of acoustic signals present in the environment. Such cases represent a lower SIR value. To alleviate this problem an array of microphones--linear array, planar array, and 3-dimensional arrays--have been suggested and implemented. This work focuses on microphone arrays in room environments where reverberation is the main source of interference. The acoustic wave incident on the array from a point source is sampled and recorded by a linear array of sensors along with reflected waves. Array signal processing algorithms are developed and used to remove reverberations from the signal received by the array. Signals from other positions are considered as interference. Unlike most studies that deal with plane waves, we base our algorithm on spherical waves originating at a source point. This is especially true for room environments. The algorithm consists of two stages--a first stage to locate the source and a second stage to focus on the source. The first part

  9. Diagnostics of DC and Induction Motors Based on the Analysis of Acoustic Signals

    NASA Astrophysics Data System (ADS)

    Glowacz, A.

    2014-10-01

    In this paper, a non-invasive method of early fault diagnostics of electric motors was proposed. This method uses acoustic signals generated by electric motors. Essential features were extracted from acoustic signals of motors. A plan of study of acoustic signals of electric motors was proposed. Researches were carried out for faultless induction motor, induction motor with one faulty rotor bar, induction motor with two faulty rotor bars and flawless Direct Current, and Direct Current motor with shorted rotor coils. Researches were carried out for methods of signal processing: log area ratio coefficients, Multiple signal classification, Nearest Neighbor classifier and the Bayes classifier. A pattern creation process was carried out using 40 samples of sound. In the identification process 130 five-second test samples were used. The proposed approach will also reduce the costs of maintenance and the number of faulty motors in the industry.

  10. Wavelet packet transform for detection of single events in acoustic emission signals

    NASA Astrophysics Data System (ADS)

    Bianchi, Davide; Mayrhofer, Erwin; Gröschl, Martin; Betz, Gerhard; Vernes, András

    2015-12-01

    Acoustic emission signals in tribology can be used for monitoring the state of bodies in contact and relative motion. The recorded signal includes information which can be associated with different events, such as the formation and propagation of cracks, appearance of scratches and so on. One of the major challenges in analyzing these acoustic emission signals is to identify parts of the signal which belong to such an event and discern it from noise. In this contribution, a wavelet packet decomposition within the framework of multiresolution analysis theory is considered to analyze acoustic emission signals to investigate the failure of tribological systems. By applying the wavelet packet transform a method for the extraction of single events in rail contact fatigue test is proposed. The extraction of such events at several stages of the test permits a classification and the analysis of the evolution of cracks in the rail.

  11. Call Transmission Efficiency in Native and Invasive Anurans: Competing Hypotheses of Divergence in Acoustic Signals

    PubMed Central

    Llusia, Diego; Gómez, Miguel; Penna, Mario; Márquez, Rafael

    2013-01-01

    Invasive species are a leading cause of the current biodiversity decline, and hence examining the major traits favouring invasion is a key and long-standing goal of invasion biology. Despite the prominent role of the advertisement calls in sexual selection and reproduction, very little attention has been paid to the features of acoustic communication of invasive species in nonindigenous habitats and their potential impacts on native species. Here we compare for the first time the transmission efficiency of the advertisement calls of native and invasive species, searching for competitive advantages for acoustic communication and reproduction of introduced taxa, and providing insights into competing hypotheses in evolutionary divergence of acoustic signals: acoustic adaptation vs. morphological constraints. Using sound propagation experiments, we measured the attenuation rates of pure tones (0.2–5 kHz) and playback calls (Lithobates catesbeianus and Pelophylax perezi) across four distances (1, 2, 4, and 8 m) and over two substrates (water and soil) in seven Iberian localities. All factors considered (signal type, distance, substrate, and locality) affected transmission efficiency of acoustic signals, which was maximized with lower frequency sounds, shorter distances, and over water surface. Despite being broadcast in nonindigenous habitats, the advertisement calls of invasive L. catesbeianus were propagated more efficiently than those of the native species, in both aquatic and terrestrial substrates, and in most of the study sites. This implies absence of optimal relationship between native environments and propagation of acoustic signals in anurans, in contrast to what predicted by the acoustic adaptation hypothesis, and it might render these vertebrates particularly vulnerable to intrusion of invasive species producing low frequency signals, such as L. catesbeianus. Our findings suggest that mechanisms optimizing sound transmission in native habitat can play a

  12. Call transmission efficiency in native and invasive anurans: competing hypotheses of divergence in acoustic signals.

    PubMed

    Llusia, Diego; Gómez, Miguel; Penna, Mario; Márquez, Rafael

    2013-01-01

    Invasive species are a leading cause of the current biodiversity decline, and hence examining the major traits favouring invasion is a key and long-standing goal of invasion biology. Despite the prominent role of the advertisement calls in sexual selection and reproduction, very little attention has been paid to the features of acoustic communication of invasive species in nonindigenous habitats and their potential impacts on native species. Here we compare for the first time the transmission efficiency of the advertisement calls of native and invasive species, searching for competitive advantages for acoustic communication and reproduction of introduced taxa, and providing insights into competing hypotheses in evolutionary divergence of acoustic signals: acoustic adaptation vs. morphological constraints. Using sound propagation experiments, we measured the attenuation rates of pure tones (0.2-5 kHz) and playback calls (Lithobates catesbeianus and Pelophylax perezi) across four distances (1, 2, 4, and 8 m) and over two substrates (water and soil) in seven Iberian localities. All factors considered (signal type, distance, substrate, and locality) affected transmission efficiency of acoustic signals, which was maximized with lower frequency sounds, shorter distances, and over water surface. Despite being broadcast in nonindigenous habitats, the advertisement calls of invasive L. catesbeianus were propagated more efficiently than those of the native species, in both aquatic and terrestrial substrates, and in most of the study sites. This implies absence of optimal relationship between native environments and propagation of acoustic signals in anurans, in contrast to what predicted by the acoustic adaptation hypothesis, and it might render these vertebrates particularly vulnerable to intrusion of invasive species producing low frequency signals, such as L. catesbeianus. Our findings suggest that mechanisms optimizing sound transmission in native habitat can play a less

  13. A unique method to study acoustic transmission through ducts using signal synthesis and averaging of acoustic pulses

    NASA Technical Reports Server (NTRS)

    Salikuddin, M.; Ramakrishnan, R.; Ahuja, K. K.; Brown, W. H.

    1981-01-01

    An acoustic impulse technique using a loudspeaker driver is developed to measure the acoustic properties of a duct/nozzle system. A signal synthesis method is used to generate a desired single pulse with a flat spectrum. The convolution of the desired signal and the inverse Fourier transform of the reciprocal of the driver's response are then fed to the driver. A signal averaging process eliminates the jet mixing noise from the mixture of jet noise and the internal noise, thereby allowing very low intensity signals to be measured accurately, even for high velocity jets. A theoretical analysis is carried out to predict the incident sound field; this is used to help determine the number and locations of the induct measurement points to account for the contributions due to higher order modes present in the incident tube method. The impulse technique is validated by comparing experimentally determined acoustic characteristics of a duct-nozzle system with similar results obtained by the impedance tube method. Absolute agreement in the comparisons was poor, but the overall shapes of the time histories and spectral distributions were much alike.

  14. The Auditory-Brainstem Response to Continuous, Non-repetitive Speech Is Modulated by the Speech Envelope and Reflects Speech Processing

    PubMed Central

    Reichenbach, Chagit S.; Braiman, Chananel; Schiff, Nicholas D.; Hudspeth, A. J.; Reichenbach, Tobias

    2016-01-01

    The auditory-brainstem response (ABR) to short and simple acoustical signals is an important clinical tool used to diagnose the integrity of the brainstem. The ABR is also employed to investigate the auditory brainstem in a multitude of tasks related to hearing, such as processing speech or selectively focusing on one speaker in a noisy environment. Such research measures the response of the brainstem to short speech signals such as vowels or words. Because the voltage signal of the ABR has a tiny amplitude, several hundred to a thousand repetitions of the acoustic signal are needed to obtain a reliable response. The large number of repetitions poses a challenge to assessing cognitive functions due to neural adaptation. Here we show that continuous, non-repetitive speech, lasting several minutes, may be employed to measure the ABR. Because the speech is not repeated during the experiment, the precise temporal form of the ABR cannot be determined. We show, however, that important structural features of the ABR can nevertheless be inferred. In particular, the brainstem responds at the fundamental frequency of the speech signal, and this response is modulated by the envelope of the voiced parts of speech. We accordingly introduce a novel measure that assesses the ABR as modulated by the speech envelope, at the fundamental frequency of speech and at the characteristic latency of the response. This measure has a high signal-to-noise ratio and can hence be employed effectively to measure the ABR to continuous speech. We use this novel measure to show that the ABR is weaker to intelligible speech than to unintelligible, time-reversed speech. The methods presented here can be employed for further research on speech processing in the auditory brainstem and can lead to the development of future clinical diagnosis of brainstem function. PMID:27303286

  15. Signal recovery technique based on a physical method of underwater acoustics

    NASA Astrophysics Data System (ADS)

    Guo, Xinyi; Wu, Guoqing; Ma, Li

    2010-09-01

    In the underwater sound channel we often use an array to receive signals from distant sources. The received signals are often mixed with environmental interference. In the complex acoustic environment, received signals are distorted greatly and elongated in time. In many practical applications such as sound communications, sound remote sensing and active sonar signals, we hope to obtain the original signal's waveform. In general theory, the received signals are the convolution of emission signals and Green's function of environment. In unknown Green's function of environment, simply relying on the array to record the information to determine the sound source signal wave propagation features and the environment is not enough. However, in certain circumstances, based on a physics method of underwater acoustics, the spread of recovery technology is successful.

  16. Two-microphone spatial filtering provides speech reception benefits for cochlear implant users in difficult acoustic environments.

    PubMed

    Goldsworthy, Raymond L; Delhorne, Lorraine A; Desloge, Joseph G; Braida, Louis D

    2014-08-01

    This article introduces and provides an assessment of a spatial-filtering algorithm based on two closely-spaced (∼1 cm) microphones in a behind-the-ear shell. The evaluated spatial-filtering algorithm used fast (∼10 ms) temporal-spectral analysis to determine the location of incoming sounds and to enhance sounds arriving from straight ahead of the listener. Speech reception thresholds (SRTs) were measured for eight cochlear implant (CI) users using consonant and vowel materials under three processing conditions: An omni-directional response, a dipole-directional response, and the spatial-filtering algorithm. The background noise condition used three simultaneous time-reversed speech signals as interferers located at 90°, 180°, and 270°. Results indicated that the spatial-filtering algorithm can provide speech reception benefits of 5.8 to 10.7 dB SRT compared to an omni-directional response in a reverberant room with multiple noise sources. Given the observed SRT benefits, coupled with an efficient design, the proposed algorithm is promising as a CI noise-reduction solution. PMID:25096120

  17. Measuring glottal activity during voiced speech using a tuned electromagnetic resonating collar sensor

    NASA Astrophysics Data System (ADS)

    Brown, D. R., III; Keenaghan, K.; Desimini, S.

    2005-11-01

    Non-acoustic speech sensors can be employed to obtain measurements of one or more aspects of the speech production process, such as glottal activity, even in the presence of background noise. These sensors have a long history of clinical applications and have also recently been applied to the problem of denoising speech signals recorded in acoustically noisy environments (Ng et al 2000 Proc. Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP) (Istanbul, Turkey) vol 1, pp 229-32). Recently, researchers developed a new non-acoustic speech sensor based primarily on a tuned electromagnetic resonator collar (TERC) (Brown et al 2004 Meas. Sci. Technol. 15 1291). The TERC sensor measures glottal activity by sensing small changes in the dielectric properties of the glottis that result from voiced speech. This paper builds on the seminal work in Brown et al (2004). The primary contributions of this paper are (i) a description of a new single-mode TERC sensor design addressing the comfort and complexity issues of the original sensor, (ii) a complete description of new external interface systems used to obtain long-duration recordings from the TERC sensor and (iii) more extensive experimental results and analysis for the single-mode TERC sensor including spectrograms of speech containing both voiced and unvoiced speech segments in quiet and acoustically noisy environments. The experimental results demonstrate that the single-mode TERC sensor is able to detect glottal activity up to the fourth harmonic and is also insensitive to acoustic background noise.

  18. Dolphin's echolocation signals in a complicated acoustic environment

    NASA Astrophysics Data System (ADS)

    Ivanov, M. P.

    2004-07-01

    Echolocation abilities of a dolphin ( Tursiops truncatus ponticus) were investigated in laboratory conditions. The experiment was carried out in an open cage using an acoustic control over the behavior of the animal detecting underwater objects in a complicated acoustic environment. Targets of different strength were used as test objects. The dolphin was found to be able to detect objects at distances exceeding 650 m. For the target location, the dolphin used both single-pulse and multipulse echolocation modes. Time characteristics of echolocation pulses and time sequences of pulses as functions of the distance to the target were obtained.

  19. Acoustic Analyses of Speech Sounds and Rhythms in Japanese- and English-Learning Infants

    PubMed Central

    Yamashita, Yuko; Nakajima, Yoshitaka; Ueda, Kazuo; Shimada, Yohko; Hirsh, David; Seno, Takeharu; Smith, Benjamin Alexander

    2013-01-01

    The purpose of this study was to explore developmental changes, in terms of spectral fluctuations and temporal periodicity with Japanese- and English-learning infants. Three age groups (15, 20, and 24 months) were selected, because infants diversify phonetic inventories with age. Natural speech of the infants was recorded. We utilized a critical-band-filter bank, which simulated the frequency resolution in adults’ auditory periphery. First, the correlations between the power fluctuations of the critical-band outputs represented by factor analysis were observed in order to see how the critical bands should be connected to each other, if a listener is to differentiate sounds in infants’ speech. In the following analysis, we analyzed the temporal fluctuations of factor scores by calculating autocorrelations. The present analysis identified three factors as had been observed in adult speech at 24 months of age in both linguistic environments. These three factors were shifted to a higher frequency range corresponding to the smaller vocal tract size of the infants. The results suggest that the vocal tract structures of the infants had developed to become adult-like configuration by 24 months of age in both language environments. The amount of utterances with periodic nature of shorter time increased with age in both environments. This trend was clearer in the Japanese environment. PMID:23450824

  20. MOOD STATE PREDICTION FROM SPEECH OF VARYING ACOUSTIC QUALITY FOR INDIVIDUALS WITH BIPOLAR DISORDER

    PubMed Central

    Gideon, John; Provost, Emily Mower; McInnis, Melvin

    2016-01-01

    Speech contains patterns that can be altered by the mood of an individual. There is an increasing focus on automated and distributed methods to collect and monitor speech from large groups of patients suffering from mental health disorders. However, as the scope of these collections increases, the variability in the data also increases. This variability is due in part to the range in the quality of the devices, which in turn affects the quality of the recorded data, negatively impacting the accuracy of automatic assessment. It is necessary to mitigate variability effects in order to expand the impact of these technologies. This paper explores speech collected from phone recordings for analysis of mood in individuals with bipolar disorder. Two different phones with varying amounts of clipping, loudness, and noise are employed. We describe methodologies for use during preprocessing, feature extraction, and data modeling to correct these differences and make the devices more comparable. The results demonstrate that these pipeline modifications result in statistically significantly higher performance, which highlights the potential of distributed mental health systems. PMID:27570493

  1. Speech signal denoising with wavelet-transforms and the mean opinion score characterizing the filtering quality

    NASA Astrophysics Data System (ADS)

    Yaseen, Alauldeen S.; Pavlov, Alexey N.; Hramov, Alexander E.

    2016-03-01

    Speech signal processing is widely used to reduce noise impact in acquired data. During the last decades, wavelet-based filtering techniques are often applied in communication systems due to their advantages in signal denoising as compared with Fourier-based methods. In this study we consider applications of a 1-D double density complex wavelet transform (1D-DDCWT) and compare the results with the standard 1-D discrete wavelet-transform (1DDWT). The performances of the considered techniques are compared using the mean opinion score (MOS) being the primary metric for the quality of the processed signals. A two-dimensional extension of this approach can be used for effective image denoising.

  2. Nonsensory factors in speech perception

    NASA Astrophysics Data System (ADS)

    Holt, Rachael F.; Carney, Arlene E.

    2001-05-01

    The nature of developmental differences was examined in a speech discrimination task, the change/no-change procedure, in which a varying number of speech stimuli are presented during a trial. Standard stimuli are followed by comparison stimuli that are identical to or acoustically different from the standard. Fourteen adults and 30 4- and 5-year-old children were tested with three speech contrast pairs at a variety of signal-to-noise ratios using various numbers of standard and comparison stimulus presentations. Adult speech discrimination performance followed the predictions of the multiple looks hypothesis [N. F. Viemeister and G. H. Wakefield, J. Acoust. Soc. Am. 90, 858-865 (1991)] there was an increase in d by a factor of 1.4 for a doubling in the number of standard and comparison stimulus presentations near d values of 1.0. For children, increasing the number of standard stimuli improved discrimination performance, whereas increasing the number of comparisons did not. The multiple looks hypothesis did not explain the children's data. They are explained more parsimoniously by the developmental weighting shift [Nittrouer et al., J. Acoust. Soc. Am. 101, 2253-2266 (1993)], which proposes that children attend to different aspects of speech stimuli from adults. [Work supported by NIDCD and ASHF.

  3. An information processing method for acoustic emission signal inspired from musical staff

    NASA Astrophysics Data System (ADS)

    Zheng, Wei; Wu, Chunxian

    2016-01-01

    This study proposes a musical-staff-inspired signal processing method for standard description expressions for discrete signals and describing the integrated characteristics of acoustic emission (AE) signals. The method maps various AE signals with complex environments into the normalized musical space. Four new indexes are proposed to comprehensively describe the signal. Several key features, such as contour, amplitude, and signal changing rate, are quantitatively expressed in a normalized musical space. The processed information requires only a small storage space to maintain high fidelity. The method is illustrated by using experiments on sandstones and computed tomography (CT) scanning to determine its validity for AE signal processing.

  4. System and method for investigating sub-surface features of a rock formation with acoustic sources generating coded signals

    SciTech Connect

    Vu, Cung Khac; Nihei, Kurt; Johnson, Paul A; Guyer, Robert; Ten Cate, James A; Le Bas, Pierre-Yves; Larmat, Carene S

    2014-12-30

    A system and a method for investigating rock formations includes generating, by a first acoustic source, a first acoustic signal comprising a first plurality of pulses, each pulse including a first modulated signal at a central frequency; and generating, by a second acoustic source, a second acoustic signal comprising a second plurality of pulses. A receiver arranged within the borehole receives a detected signal including a signal being generated by a non-linear mixing process from the first-and-second acoustic signal in a non-linear mixing zone within the intersection volume. The method also includes-processing the received signal to extract the signal generated by the non-linear mixing process over noise or over signals generated by a linear interaction process, or both.

  5. Changes in Speech Production in a Child with a Cochlear Implant: Acoustic and Kinematic Evidence.

    ERIC Educational Resources Information Center

    Goffman, Lisa; Ertmer, David J.; Erdle, Christa

    2002-01-01

    A method is presented for examining change in motor patterns used to produce linguistic contrasts. In this case study, the method is applied to a child who experienced hearing loss at age 3 and received a multi-channel cochlear implant at 7. Post-implant, acoustic durations showed a maturational change. (Contains references.) (Author/CR)

  6. Intelligibility of Telephone Speech for the Hearing Impaired When Various Microphones Are Used for Acoustic Coupling.

    ERIC Educational Resources Information Center

    Janota, Claus P.; Janota, Jeanette Olach

    1991-01-01

    Various candidate microphones were evaluated for acoustic coupling of hearing aids to a telephone receiver. Results from testing by 9 hearing-impaired adults found comparable listening performance with a pressure gradient microphone at a 10 decibel higher level of interfering noise than with a normal pressure-sensitive microphone. (Author/PB)

  7. Exploration of Acoustic Features for Automatic Vowel Discrimination in Spontaneous Speech

    ERIC Educational Resources Information Center

    Tyson, Na'im R.

    2012-01-01

    In an attempt to understand what acoustic/auditory feature sets motivated transcribers towards certain labeling decisions, I built machine learning models that were capable of discriminating between canonical and non-canonical vowels excised from the Buckeye Corpus. Specifically, I wanted to model when the dictionary form and the transcribed-form…

  8. Depression Diagnoses and Fundamental Frequency-Based Acoustic Cues in Maternal Infant-Directed Speech

    ERIC Educational Resources Information Center

    Porritt, Laura L.; Zinser, Michael C.; Bachorowski, Jo-Anne; Kaplan, Peter S.

    2014-01-01

    F[subscript 0]-based acoustic measures were extracted from a brief, sentence-final target word spoken during structured play interactions between mothers and their 3- to 14-month-old infants and were analyzed based on demographic variables and DSM-IV Axis-I clinical diagnoses and their common modifiers. F[subscript 0] range (?F[subscript 0]) was…

  9. Prediction and constraint in audiovisual speech perception.

    PubMed

    Peelle, Jonathan E; Sommers, Mitchell S

    2015-07-01

    During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing the precision of prediction. Electrophysiological studies demonstrate that oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to acoustic information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration

  10. Corruption of ant acoustical signals by mimetic social parasites

    PubMed Central

    Schönrogge, Karsten; Bonelli, Simona; Barbero, Francesca; Balletto, Emilio

    2010-01-01

    Recent recordings of the stridulations of Myrmica ants revealed that their queens made distinctive sounds from their workers, although the acoustics of queens and workers, respectively, were the same in different species of Myrmica. Queen recordings induced enhanced protective behavior when played to workers in the one species tested. Larvae and pupae of the butterfly genus Maculinea inhabit Myrmica colonies as social parasites, and both stages generate sounds that mimic those of a Myrmica queen, inducing similar superior treatments from workers as their model. We discuss how initial penetration and acceptance as a colony member is achieved by Maculinea through mimicking the species-specific semio-chemicals of their hosts, and how acoustical mimicry is then employed to elevate the parasite’s membership of that society towards the highest attainable level in their host’s hierarchy. We postulate that, if acoustics is as well developed a means of communication in certain ants as these studies suggest, then others among an estimated 10,000 species of ant social parasite may supplement their well-known use of chemical and tactile mimicry to trick host ants with mimicry of host acoustical systems. PMID:20585513

  11. Copula filtration of spoken language signals on the background of acoustic noise

    NASA Astrophysics Data System (ADS)

    Kolchenko, Lilia V.; Sinitsyn, Rustem B.

    2010-09-01

    This paper is devoted to the filtration of acoustic signals on the background of acoustic noise. Signal filtering is done with the help of a nonlinear analogue of a correlation function - a copula. The copula is estimated with the help of kernel estimates of the cumulative distribution function. At the second stage we suggest a new procedure of adaptive filtering. The silence and sound intervals are detected before the filtration with the help of nonparametric algorithm. The results are confirmed by experimental processing of spoken language signals.

  12. Applications for Subvocal Speech

    NASA Technical Reports Server (NTRS)

    Jorgensen, Charles; Betts, Bradley

    2007-01-01

    A research and development effort now underway is directed toward the use of subvocal speech for communication in settings in which (1) acoustic noise could interfere excessively with ordinary vocal communication and/or (2) acoustic silence or secrecy of communication is required. By "subvocal speech" is meant sub-audible electromyographic (EMG) signals, associated with speech, that are acquired from the surface of the larynx and lingual areas of the throat. Topics addressed in this effort include recognition of the sub-vocal EMG signals that represent specific original words or phrases; transformation (including encoding and/or enciphering) of the signals into forms that are less vulnerable to distortion, degradation, and/or interception; and reconstruction of the original words or phrases at the receiving end of a communication link. Potential applications include ordinary verbal communications among hazardous- material-cleanup workers in protective suits, workers in noisy environments, divers, and firefighters, and secret communications among law-enforcement officers and military personnel in combat and other confrontational situations.

  13. Forward model of thermally-induced acoustic signal specific to intralumenal detection geometry

    NASA Astrophysics Data System (ADS)

    Mukherjee, Sovanlal; Bunting, Charles F.; Piao, Daqing

    2011-03-01

    This work investigates a forward model associated with intra-lumenal detection of acoustic signal originated from transient thermal-expansion of the tissue. The work is specific to intra-lumenal thermo-acoustic tomography (TAT) which detects the contrast of tissue dielectric properties with ultrasonic resolution, but it is also extendable to intralumenal photo-acoustic tomography (PAT) which detects the contrast of light absorption properties of tissue with ultrasound resolution. Exact closed-form frequency-domain or time-domain forward model of thermally-induced acoustic signal have been studied rigorously for planar geometry and two other geometries, including cylindrical and spherical geometries both of which is specific to external-imaging, i.e. breast or brain imaging using an externally-deployed applicator. This work extends the existing studies to the specific geometry of internal or intra-lumenal imaging, i.e., prostate imaging by an endo-rectally deployed applicator. In this intra-lumenal imaging geometry, both the source that excites the transient thermal-expansion of the tissue and the acoustic transducer that acquires the thermally-induced acoustic signal are assumed enclosed by the tissue and on the surface of a long cylindrical applicator. The Green's function of the frequency-domain thermo-acoustic equation in spherical coordinates is expanded to cylindrical coordinates associated with intra-lumenal geometry. Inverse Fourier transform is then applied to obtain a time-domain solution of the thermo-acoustic pressure wave for intra-lumenal geometry. Further employment of the boundary condition to the "convex" applicator-tissue interface would render an exact forward solution toward accurate reconstruction for intra-lumenal thermally-induced acoustic imaging.

  14. Characterizing, synthesizing, and/or canceling out acoustic signals from sound sources

    DOEpatents

    Holzrichter, John F.; Ng, Lawrence C.

    2007-03-13

    A system for characterizing, synthesizing, and/or canceling out acoustic signals from inanimate and animate sound sources. Electromagnetic sensors monitor excitation sources in sound producing systems, such as animate sound sources such as the human voice, or from machines, musical instruments, and various other structures. Acoustical output from these sound producing systems is also monitored. From such information, a transfer function characterizing the sound producing system is generated. From the transfer function, acoustical output from the sound producing system may be synthesized or canceled. The systems disclosed enable accurate calculation of transfer functions relating specific excitations to specific acoustical outputs. Knowledge of such signals and functions can be used to effect various sound replication, sound source identification, and sound cancellation applications.

  15. Sources and Radiation Patterns of Volcano-Acoustic Signals Investigated with Field-Scale Chemical Explosions

    NASA Astrophysics Data System (ADS)

    Bowman, D. C.; Lees, J. M.; Taddeucci, J.; Graettinger, A. H.; Sonder, I.; Valentine, G.

    2014-12-01

    We investigate the processes that give rise to complex acoustic signals during volcanic blasts by monitoring buried chemical explosions with infrasound and audio range microphones, strong motion sensors, and high speed imagery. Acoustic waveforms vary with scaled depth of burial (SDOB, units in meters per cube root of joules), ranging from high amplitude, impulsive, gas expansion dominated signals at low SDOB to low amplitude, longer duration, ground motion dominated signals at high SDOB. Typically, the sudden upward acceleration of the substrate above the blast produces the first acoustic arrival, followed by a second pulse due to the eruption of pressurized gas at the surface. Occasionally, a third overpressure occurs when displaced material decelerates upon impact with the ground. The transition between ground motion dominated and gas release dominated acoustics ranges between 0.0038-0.0018 SDOB, respectively. For example, one explosion registering an SDOB=0.0031 produced two overpressure pulses of approximately equal amplitude, one due to ground motion, the other to gas release. Recorded volcano infrasound has also identified distinct ground motion and gas release components during explosions at Sakurajima, Santiaguito, and Karymsky volcanoes. Our results indicate that infrasound records may provide a proxy for the depth and energy of these explosions. Furthermore, while magma fragmentation models indicate the possibility of several explosions during a single vulcanian eruption (Alidibirov, Bull Volc., 1994), our results suggest that a single explosion can also produce complex acoustic signals. Thus acoustic records alone cannot be used to distinguish between single explosions and multiple closely-spaced blasts at volcanoes. Results from a series of lateral blasts during the 2014 field experiment further indicates whether vent geometry can produce directional acoustic radiation patterns like those observed at Tungarahua volcano (Kim et al., GJI, 2012). Beside

  16. Synchrony capture filterbank: auditory-inspired signal processing for tracking individual frequency components in speech.

    PubMed

    Kumaresan, Ramdas; Peddinti, Vijay Kumar; Cariani, Peter

    2013-06-01

    A processing scheme for speech signals is proposed that emulates synchrony capture in the auditory nerve. The role of stimulus-locked spike timing is important for representation of stimulus periodicity, low frequency spectrum, and spatial location. In synchrony capture, dominant single frequency components in each frequency region impress their time structures on temporal firing patterns of auditory nerve fibers with nearby characteristic frequencies (CFs). At low frequencies, for voiced sounds, synchrony capture divides the nerve into discrete CF territories associated with individual harmonics. An adaptive, synchrony capture filterbank (SCFB) consisting of a fixed array of traditional, passive linear (gammatone) filters cascaded with a bank of adaptively tunable, bandpass filter triplets is proposed. Differences in triplet output envelopes steer triplet center frequencies via voltage controlled oscillators (VCOs). The SCFB exhibits some cochlea-like responses, such as two-tone suppression and distortion products, and possesses many desirable properties for processing speech, music, and natural sounds. Strong signal components dominate relatively greater numbers of filter channels, thereby yielding robust encodings of relative component intensities. The VCOs precisely lock onto harmonics most important for formant tracking, pitch perception, and sound separation. PMID:23742379

  17. Pulse analysis of acoustic emission signals. Ph.D. Thesis

    NASA Technical Reports Server (NTRS)

    Houghton, J. R.

    1976-01-01

    A method for the signature analysis of pulses in the frequency domain and the time domain is presented. Fourier spectrum, Fourier transfer function, shock spectrum and shock spectrum ratio are examined in the frequency domain analysis, and pulse shape deconvolution is developed for use in the time domain analysis. To demonstrate the relative sensitivity of each of the methods to small changes in the pulse shape, signatures of computer modeled systems with analytical pulses are presented. Optimization techniques are developed and used to indicate the best design parameters values for deconvolution of the pulse shape. Several experiments are presented that test the pulse signature analysis methods on different acoustic emission sources. These include acoustic emissions associated with: (1) crack propagation, (2) ball dropping on a plate, (3) spark discharge and (4) defective and good ball bearings.

  18. Acoustic and ultrasonic signals as diagnostic tools for check valves

    SciTech Connect

    Auyang, M.K. )

    1993-05-01

    A typical nuclear plant has between 60 and 115 safety-related check valves ranging from 2 to 30 in. The majority of these valves control water flow. Recent studies done by the Institute of Nuclear Power Operations (INPO), Electric Power Research Institute (EPRI) and the US Nuclear Regulatory Commission (NRC) found that many of these safety-related valves were not functioning properly. Typical problems found in these valves included disk flutter, backstop tapping, flow leakage, disk pin and hinge pin wear, or even missing disks. These findings led to INPO's Significant Operating Experience Report (SOER, 1986), and finally, NRC generic letter 8904, which requires that all safety-related check valves in a nuclear plant be regularly monitored. In response to this need, the industry has developed various diagnostic equipment to monitor and test check valves, using technologies ranging from acoustics and ultrasonics to magnetic - even radiography has been considered. Of these, systems that depend on a combination of acoustic and ultrasonic techniques are among the most promising for two reasons: these two technologies supplement each other, making diagnosis of the check valves much more certain than any single technology, and this approach can be made nonintrusive. The nonintrusive feature allows the check valves to be monitored and diagnosed without being disassembled or removed from the piping system. This paper shows that by carefully studying the acoustic and ultrasonic signatures acquired from a check value, either individually or in combination, an individual with the proper training and experience in acoustic and ultrasonic signature analyses can deduce the structural integrity of the check valve with good confidence. Most of the conclusions are derived from controlled experiments in the laboratory where the diagnosis can be verified. Other conclusions were based on test data obtained in the field.

  19. Predicting Speech Intelligibility with a Multiple Speech Subsystems Approach in Children with Cerebral Palsy

    ERIC Educational Resources Information Center

    Lee, Jimin; Hustad, Katherine C.; Weismer, Gary

    2014-01-01

    Purpose: Speech acoustic characteristics of children with cerebral palsy (CP) were examined with a multiple speech subsystems approach; speech intelligibility was evaluated using a prediction model in which acoustic measures were selected to represent three speech subsystems. Method: Nine acoustic variables reflecting different subsystems, and…

  20. Improving the speech intelligibility in classrooms

    NASA Astrophysics Data System (ADS)

    Lam, Choi Ling Coriolanus

    of the reverberation time, the indoor ambient noise (or background noise level), the signal-to-noise ratio, and the speech transmission index, it aims to establish a guideline for improving the speech intelligibility in classrooms for any countries and any environmental conditions. The study showed that the acoustical conditions of most of the measured classrooms in Hong Kong are unsatisfactory. The selection of materials inside a classroom is important for improving speech intelligibility at design stage, especially the acoustics ceiling, to shorten the reverberation time inside the classroom. The signal-to-noise should be higher than 11dB(A) for over 70% of speech perception, either tonal or non-tonal languages, without the usage of address system. The unexpected results bring out a call to revise the standard design and to devise acceptable standards for classrooms in Hong Kong. It is also demonstrated a method for assessment on the classroom in other cities with similar environmental conditions.

  1. Surface Roughness Evaluation Based on Acoustic Emission Signals in Robot Assisted Polishing

    PubMed Central

    de Agustina, Beatriz; Marín, Marta María; Teti, Roberto; Rubio, Eva María

    2014-01-01

    The polishing process is the most common technology used in applications where a high level of surface quality is demanded. The automation of polishing processes is especially difficult due to the high level of skill and dexterity that is required. Much of this difficulty arises because of the lack of reliable data on the effect of the polishing parameters on the resulting surface roughness. An experimental study was developed to evaluate the surface roughness obtained during Robot Assisted Polishing processes by the analysis of acoustic emission signals in the frequency domain. The aim is to find out a trend of a feature or features calculated from the acoustic emission signals detected along the process. Such an evaluation was made with the objective of collecting valuable information for the establishment of the end point detection of polishing process. As a main conclusion, it can be affirmed that acoustic emission (AE) signals can be considered useful to monitor the polishing process state. PMID:25405509

  2. Speech communications in noise

    NASA Technical Reports Server (NTRS)

    1984-01-01

    The physical characteristics of speech, the methods of speech masking measurement, and the effects of noise on speech communication are investigated. Topics include the speech signal and intelligibility, the effects of noise on intelligibility, the articulation index, and various devices for evaluating speech systems.

  3. Speech recognition and understanding

    SciTech Connect

    Vintsyuk, T.K.

    1983-05-01

    This article discusses the automatic processing of speech signals with the aim of finding a sequence of works (speech recognition) or a concept (speech understanding) being transmitted by the speech signal. The goal of the research is to develop an automatic typewriter that will automatically edit and type text under voice control. A dynamic programming method is proposed in which all possible class signals are stored, after which the presented signal is compared to all the stored signals during the recognition phase. Topics considered include element-by-element recognition of words of speech, learning speech recognition, phoneme-by-phoneme speech recognition, the recognition of connected speech, understanding connected speech, and prospects for designing speech recognition and understanding systems. An application of the composition dynamic programming method for the solution of basic problems in the recognition and understanding of speech is presented.

  4. The Effect of Habitat Acoustics on Common Marmoset Vocal Signal Transmission

    PubMed Central

    MORRILL, RYAN J.; THOMAS, A. WREN; SCHIEL, NICOLA; SOUTO, ANTONIO; MILLER, CORY T.

    2013-01-01

    Noisy acoustic environments present several challenges for the evolution of acoustic communication systems. Among the most significant is the need to limit degradation of spectro-temporal signal structure in order to maintain communicative efficacy. This can be achieved by selecting for several potentially complementary processes. Selection can act on behavioral mechanisms permitting signalers to control the timing and occurrence of signal production to avoid acoustic interference. Likewise, the signal itself may be the target of selection, biasing the evolution of its structure to comprise acoustic features that avoid interference from ambient noise or degrade minimally in the habitat. Here, we address the latter topic for common marmoset (Callithrix jacchus) long-distance contact vocalizations, known as phee calls. Our aim was to test whether this vocalization is specifically adapted for transmission in a species-typical forest habitat, the Atlantic forests of northeastern Brazil. We combined seasonal analyses of ambient habitat acoustics with experiments in which pure tones, clicks, and vocalizations were broadcast and rerecorded at different distances to characterize signal degradation in the habitat. Ambient sound was analyzed from intervals throughout the day and over rainy and dry seasons, showing temporal regularities across varied timescales. Broadcast experiment results indicated that the tone and click stimuli showed the typically inverse relationship between frequency and signaling efficacy. Although marmoset phee calls degraded over distance with marked predictability compared with artificial sounds, they did not otherwise appear to be specially designed for increased transmission efficacy or minimal interference in this habitat. We discuss these data in the context of other similar studies and evidence of potential behavioral mechanisms for avoiding acoustic interference in order to maintain effective vocal communication in common marmosets. PMID

  5. System and method for investigating sub-surface features of a rock formation with acoustic sources generating conical broadcast signals

    DOEpatents

    Vu, Cung Khac; Skelt, Christopher; Nihei, Kurt; Johnson, Paul A.; Guyer, Robert; Ten Cate, James A.; Le Bas, Pierre -Yves; Larmat, Carene S.

    2015-08-18

    A method of interrogating a formation includes generating a conical acoustic signal, at a first frequency--a second conical acoustic signal at a second frequency each in the between approximately 500 Hz and 500 kHz such that the signals intersect in a desired intersection volume outside the borehole. The method further includes receiving, a difference signal returning to the borehole resulting from a non-linear mixing of the signals in a mixing zone within the intersection volume.

  6. Evaluating the role of spectral and envelope characteristics in the intelligibility advantage of clear speech

    PubMed Central

    Krause, Jean C.; Braida, Louis D.

    2009-01-01

    In adverse listening conditions, talkers can increase their intelligibility by speaking clearly [Picheny, M.A., et al. (1985). J. Speech Hear. Res. 28, 96–103; Payton, K. L., et al. (1994). J. Acoust. Soc. Am. 95, 1581–1592]. This modified speaking style, known as clear speech, is typically spoken more slowly than conversational speech [Picheny, M. A., et al. (1986). J. Speech Hear. Res. 29, 434–446; Uchanski, R. M., et al. (1996). J. Speech Hear. Res. 39, 494–509]. However, talkers can produce clear speech at normal rates (clear∕normal speech) with training [Krause, J. C., and Braida, L. D. (2002). J. Acoust. Soc. Am. 112, 2165–2172] suggesting that clear speech has some inherent acoustic properties, independent of rate, that contribute to its improved intelligibility. Identifying these acoustic properties could lead to improved signal processing schemes for hearing aids. Two global-level properties of clear∕normal speech that appear likely to be associated with improved intelligibility are increased energy in the 1000–3000-Hz range of long-term spectra and increased modulation depth of low-frequency modulations of the intensity envelope [Krause, J. C., and Braida, L. D. (2004). J. Acoust. Soc. Am. 115, 362–378]. In an attempt to isolate the contributions of these two properties to intelligibility, signal processing transformations were developed to manipulate each of these aspects of conversational speech independently. Results of intelligibility testing with hearing-impaired listeners and normal-hearing listeners in noise suggest that (1) increasing energy between 1000 and 3000 Hz does not fully account for the intelligibility benefit of clear∕normal speech, and (2) simple filtering of the intensity envelope is generally detrimental to intelligibility. While other manipulations of the intensity envelope are required to determine conclusively the role of this factor in intelligibility, it is also likely that additional properties important for

  7. Beeping and piping: characterization of two mechano-acoustic signals used by honey bees in swarming.

    PubMed

    Schlegel, Thomas; Visscher, P Kirk; Seeley, Thomas D

    2012-12-01

    Of the many signals used by honey bees during the process of swarming, two of them--the stop signal and the worker piping signal--are not easily distinguished for both are mechano-acoustic signals produced by scout bees who press their bodies against other bees while vibrating their wing muscles. To clarify the acoustic differences between these two signals, we recorded both signals from the same swarm and at the same time, and compared them in terms of signal duration, fundamental frequency, and frequency modulation. Stop signals and worker piping signals differ in all three variables: duration, 174 ± 64 vs. 602 ± 377 ms; fundamental frequency, 407 vs. 451 Hz; and frequency modulation, absent vs. present. While it remains unclear which differences the bees use to distinguish the two signals, it is clear that they do so for the signals have opposite effects. Stop signals cause inhibition of actively dancing scout bees whereas piping signals cause excitation of quietly resting non-scout bees. PMID:23149930

  8. Frequency Characteristics of Acoustic Emission Signals from Cementitious Waste-forms with Encapsulated Al

    SciTech Connect

    Spasova, Lyubka M.; Ojovan, Michael I.

    2007-07-01

    Acoustic emission (AE) signals were continuously recorded and their intrinsic frequency characteristics examined in order to evaluate the mechanical performance of cementitious wasteform samples with encapsulated Al waste. The primary frequency in the power spectrum and its range of intensity for the detected acoustic waves were potentially related with appearance of different micro-mechanical events caused by Al corrosion within the encapsulating cement system. In addition the process of cement matrix hardening has been shown as a source of AE signals characterized with essentially higher primary frequency (above 2 MHz) compared with those due to Al corrosion development (below 40 kHz) and cement cracking (above 100 kHz). (authors)

  9. Search for acoustic signals from high energy cascades

    NASA Technical Reports Server (NTRS)

    Bell, R.; Bowen, T.

    1985-01-01

    High energy cosmic ray secondaries can be detected by means of the cascades they produce when they pass through matter. When the charged particles of these cascades ionize the matter they are traveling through, the heat produced and resulting thermal expansion causes a thermoacoustic wave. These sound waves travel at about one hundred-thousandth the speed of light, and should allow an array of acoustic transducers to resolve structure in the cascade to about 1 cm without high speed electronics or segmentation of the detector.

  10. Research on power-law acoustic transient signal detection based on wavelet transform

    NASA Astrophysics Data System (ADS)

    Han, Jian-hui; Yang, Ri-jie; Wang, Wei

    2007-11-01

    Aiming at the characteristics of acoustic transient signal emitted from antisubmarine weapon which is being dropped into water (torpedo, aerial sonobuoy and rocket assisted depth charge etc.), such as short duration, low SNR, abruptness and instability, based on traditional power-law detector, a new method to detect acoustic transient signal is proposed. Firstly wavelet transform is used to de-noise signal, removes random spectrum components and improves SNR. Then Power- Law detector is adopted to detect transient signal. The simulation results show the method can effectively extract envelop characteristic of transient signal on the condition of low SNR. The performance of WT-Power-Law markedly outgoes that of traditional Power-Law detection method.

  11. Usage Autocorrelation Function in the Capacity of Indicator Shape of the Signal in Acoustic Emission Testing of Intricate Castings

    NASA Astrophysics Data System (ADS)

    Popkov, Artem

    2016-01-01

    The article contains information about acoustic emission signals analysing using autocorrelation function. Operation factors were analysed, such as shape of signal, the origins time and carrier frequency. The purpose of work is estimating the validity of correlations methods analysing signals. Acoustic emission signal consist of different types of waves, which propagate on different trajectories in object of control. Acoustic emission signal is amplitude-, phase- and frequency-modeling signal. It was described by carrier frequency at a given point of time. Period of signal make up 12.5 microseconds and carrier frequency make up 80 kHz for analysing signal. Usage autocorrelation function like indicator the origin time of acoustic emission signal raises validity localization of emitters.

  12. Effects of speech style, room acoustics, and vocal fatigue on vocal effort.

    PubMed

    Bottalico, Pasquale; Graetzer, Simone; Hunter, Eric J

    2016-05-01

    Vocal effort is a physiological measure that accounts for changes in voice production as vocal loading increases. It has been quantified in terms of sound pressure level (SPL). This study investigates how vocal effort is affected by speaking style, room acoustics, and short-term vocal fatigue. Twenty subjects were recorded while reading a text at normal and loud volumes in anechoic, semi-reverberant, and reverberant rooms in the presence of classroom babble noise. The acoustics in each environment were modified by creating a strong first reflection in the talker position. After each task, the subjects answered questions addressing their perception of the vocal effort, comfort, control, and clarity of their own voice. Variation in SPL for each subject was measured per task. It was found that SPL and self-reported effort increased in the loud style and decreased when the reflective panels were present and when reverberation time increased. Self-reported comfort and control decreased in the loud style, while self-reported clarity increased when panels were present. The lowest magnitude of vocal fatigue was experienced in the semi-reverberant room. The results indicate that early reflections may be used to reduce vocal effort without modifying reverberation time. PMID:27250179

  13. Speech intelligibility and recall of first and second language words heard at different signal-to-noise ratios.

    PubMed

    Hygge, Staffan; Kjellberg, Anders; Nöstl, Anatole

    2015-01-01

    Free recall of spoken words in Swedish (native tongue) and English were assessed in two signal-to-noise ratio (SNR) conditions (+3 and +12 dB), with and without half of the heard words being repeated back orally directly after presentation [shadowing, speech intelligibility (SI)]. A total of 24 word lists with 12 words each were presented in English and in Swedish to Swedish speaking college students. Pre-experimental measures of working memory capacity (operation span, OSPAN) were taken. A basic hypothesis was that the recall of the words would be impaired when the encoding of the words required more processing resources, thereby depleting working memory resources. This would be the case when the SNR was low or when the language was English. A low SNR was also expected to impair SI, but we wanted to compare the sizes of the SNR-effects on SI and recall. A low score on working memory capacity was expected to further add to the negative effects of SNR and language on both SI and recall. The results indicated that SNR had strong effects on both SI and recall, but also that the effect size was larger for recall than for SI. Language had a main effect on recall, but not on SI. The shadowing procedure had different effects on recall of the early and late parts of the word lists. Working memory capacity was unimportant for the effect on SI and recall. Thus, recall appear to be a more sensitive indicator than SI for the acoustics of learning, which has implications for building codes and recommendations concerning classrooms and other workplaces, where both hearing and learning is important. PMID:26441765

  14. Speech intelligibility and recall of first and second language words heard at different signal-to-noise ratios

    PubMed Central

    Hygge, Staffan; Kjellberg, Anders; Nöstl, Anatole

    2015-01-01

    Free recall of spoken words in Swedish (native tongue) and English were assessed in two signal-to-noise ratio (SNR) conditions (+3 and +12 dB), with and without half of the heard words being repeated back orally directly after presentation [shadowing, speech intelligibility (SI)]. A total of 24 word lists with 12 words each were presented in English and in Swedish to Swedish speaking college students. Pre-experimental measures of working memory capacity (operation span, OSPAN) were taken. A basic hypothesis was that the recall of the words would be impaired when the encoding of the words required more processing resources, thereby depleting working memory resources. This would be the case when the SNR was low or when the language was English. A low SNR was also expected to impair SI, but we wanted to compare the sizes of the SNR-effects on SI and recall. A low score on working memory capacity was expected to further add to the negative effects of SNR and language on both SI and recall. The results indicated that SNR had strong effects on both SI and recall, but also that the effect size was larger for recall than for SI. Language had a main effect on recall, but not on SI. The shadowing procedure had different effects on recall of the early and late parts of the word lists. Working memory capacity was unimportant for the effect on SI and recall. Thus, recall appear to be a more sensitive indicator than SI for the acoustics of learning, which has implications for building codes and recommendations concerning classrooms and other workplaces, where both hearing and learning is important. PMID:26441765

  15. The Role of the Listener's State in Speech Perception

    ERIC Educational Resources Information Center

    Viswanathan, Navin

    2009-01-01

    Accounts of speech perception disagree on whether listeners perceive the acoustic signal (Diehl, Lotto, & Holt, 2004) or the vocal tract gestures that produce the signal (e.g., Fowler, 1986). In this dissertation, I outline a research program using a phenomenon called "perceptual compensation for coarticulation" (Mann, 1980) to examine this…

  16. The Modulation Transfer Function for Speech Intelligibility

    PubMed Central

    Elliott, Taffeta M.; Theunissen, Frédéric E.

    2009-01-01

    We systematically determined which spectrotemporal modulations in speech are necessary for comprehension by human listeners. Speech comprehension has been shown to be robust to spectral and temporal degradations, but the specific relevance of particular degradations is arguable due to the complexity of the joint spectral and temporal information in the speech signal. We applied a novel modulation filtering technique to recorded sentences to restrict acoustic information quantitatively and to obtain a joint spectrotemporal modulation transfer function for speech comprehension, the speech MTF. For American English, the speech MTF showed the criticality of low modulation frequencies in both time and frequency. Comprehension was significantly impaired when temporal modulations <12 Hz or spectral modulations <4 cycles/kHz were removed. More specifically, the MTF was bandpass in temporal modulations and low-pass in spectral modulations: temporal modulations from 1 to 7 Hz and spectral modulations <1 cycles/kHz were the most important. We evaluated the importance of spectrotemporal modulations for vocal gender identification and found a different region of interest: removing spectral modulations between 3 and 7 cycles/kHz significantly increases gender misidentifications of female speakers. The determination of the speech MTF furnishes an additional method for producing speech signals with reduced bandwidth but high intelligibility. Such compression could be used for audio applications such as file compression or noise removal and for clinical applications such as signal processing for cochlear implants. PMID:19266016

  17. Acoustic evidence for the development of gestural coordination in the speech of 2-year-olds: a longitudinal study.

    PubMed

    Goodell, E W; Studdert-Kennedy, M

    1993-08-01

    Studies of child phonology have often assumed that young children first master a repertoire of phonemes and then build their lexicon by forming combinations of these abstract, contrastive units. However, evidence from children's systematic errors suggests that children first build a repertoire of words as integral sequences of gestures and then gradually differentiate these sequences into their gestural and segmental components. Recently, experimental support for this position has been found in the acoustic records of the speech of 3-, 5-, and 7-year-old children, suggesting that even in older children some phonemes have not yet fully segregated as units of gestural organization and control. The present longitudinal study extends this work to younger children (22- and 32-month-olds). Results demonstrate clear differences in the duration and coordination of gestures between children and adults, and a clear shift toward the patterns of adult speakers during roughly the third year of life. Details of the child-adult differences and developmental changes vary from one aspect of an utterance to another. PMID:8377484

  18. A user's guide for the signal processing software for image and speech compression developed in the Communications and Signal Processing Laboratory (CSPL), version 1

    NASA Technical Reports Server (NTRS)

    Kumar, P.; Lin, F. Y.; Vaishampayan, V.; Farvardin, N.

    1986-01-01

    A complete documentation of the software developed in the Communication and Signal Processing Laboratory (CSPL) during the period of July 1985 to March 1986 is provided. Utility programs and subroutines that were developed for a user-friendly image and speech processing environment are described. Additional programs for data compression of image and speech type signals are included. Also, programs for the zero-memory and block transform quantization in the presence of channel noise are described. Finally, several routines for simulating the perfromance of image compression algorithms are included.

  19. Beeping and piping: characterization of two mechano-acoustic signals used by honey bees in swarming

    NASA Astrophysics Data System (ADS)

    Schlegel, Thomas; Visscher, P. Kirk; Seeley, Thomas D.

    2012-12-01

    Of the many signals used by honey bees during the process of swarming, two of them—the stop signal and the worker piping signal—are not easily distinguished for both are mechano-acoustic signals produced by scout bees who press their bodies against other bees while vibrating their wing muscles. To clarify the acoustic differences between these two signals, we recorded both signals from the same swarm and at the same time, and compared them in terms of signal duration, fundamental frequency, and frequency modulation. Stop signals and worker piping signals differ in all three variables: duration, 174 ± 64 vs. 602 ± 377 ms; fundamental frequency, 407 vs. 451 Hz; and frequency modulation, absent vs. present. While it remains unclear which differences the bees use to distinguish the two signals, it is clear that they do so for the signals have opposite effects. Stop signals cause inhibition of actively dancing scout bees whereas piping signals cause excitation of quietly resting non-scout bees.

  20. Zipf's Law in Short-Time Timbral Codings of Speech, Music, and Environmental Sound Signals

    PubMed Central

    Haro, Martín; Serrà, Joan; Herrera, Perfecto; Corral, Álvaro

    2012-01-01

    Timbre is a key perceptual feature that allows discrimination between different sounds. Timbral sensations are highly dependent on the temporal evolution of the power spectrum of an audio signal. In order to quantitatively characterize such sensations, the shape of the power spectrum has to be encoded in a way that preserves certain physical and perceptual properties. Therefore, it is common practice to encode short-time power spectra using psychoacoustical frequency scales. In this paper, we study and characterize the statistical properties of such encodings, here called timbral code-words. In particular, we report on rank-frequency distributions of timbral code-words extracted from 740 hours of audio coming from disparate sources such as speech, music, and environmental sounds. Analogously to text corpora, we find a heavy-tailed Zipfian distribution with exponent close to one. Importantly, this distribution is found independently of different encoding decisions and regardless of the audio source. Further analysis on the intrinsic characteristics of most and least frequent code-words reveals that the most frequent code-words tend to have a more homogeneous structure. We also find that speech and music databases have specific, distinctive code-words while, in the case of the environmental sounds, this database-specific code-words are not present. Finally, we find that a Yule-Simon process with memory provides a reasonable quantitative approximation for our data, suggesting the existence of a common simple generative mechanism for all considered sound sources. PMID:22479497

  1. Wayside acoustic diagnosis of defective train bearings based on signal resampling and information enhancement

    NASA Astrophysics Data System (ADS)

    He, Qingbo; Wang, Jun; Hu, Fei; Kong, Fanrang

    2013-10-01

    The diagnosis of train bearing defects plays a significant role to maintain the safety of railway transport. Among various defect detection techniques, acoustic diagnosis is capable of detecting incipient defects of a train bearing as well as being suitable for wayside monitoring. However, the wayside acoustic signal will be corrupted by the Doppler effect and surrounding heavy noise. This paper proposes a solution to overcome these two difficulties in wayside acoustic diagnosis. In the solution, a dynamically resampling method is firstly presented to reduce the Doppler effect, and then an adaptive stochastic resonance (ASR) method is proposed to enhance the defective characteristic frequency automatically by the aid of noise. The resampling method is based on a frequency variation curve extracted from the time-frequency distribution (TFD) of an acoustic signal by dynamically minimizing the local cost functions. For the ASR method, the genetic algorithm is introduced to adaptively select the optimal parameter of the multiscale noise tuning (MST)-based stochastic resonance (SR) method. The proposed wayside acoustic diagnostic scheme combines signal resampling and information enhancement, and thus is expected to be effective in wayside defective bearing detection. The experimental study verifies the effectiveness of the proposed solution.

  2. Subjective evaluation of speech and noise in learning environments in the realm of classroom acoustics: Results from laboratory and field experiments

    NASA Astrophysics Data System (ADS)

    Meis, Markus; Nocke, Christian; Hofmann, Simone; Becker, Bernhard

    2005-04-01

    The impact of different acoustical conditions in learning environments on noise annoyance and the evaluation of speech quality were tested in a series of three experiments. In Experiment 1 (n=79) the auralization of seven classrooms with reverberation times from 0.55 to 3.21 s [average between 250 Hz to 2 kHz] served to develop a Semantic Differential, evaluating a simulated teacher's voice. Four factors were found: acoustical comfort, roughness, sharpness, and loudness. In Experiment 2, the effects of two classroom renovations were examined from a holistic perspective. The rooms were treated acoustically with acoustic ceilings (RT=0.5 s [250 Hz-2 kHz]) and muffling floor materials as well as non-acoustically with a new lighting system and color design. The results indicate that pupils (n=61) in renovated classrooms judged the simulated voice more positively, were less annoyed from the noise in classrooms, and were more motivated to participate in the lessons. In Experiment 3 the sound environments from six different lecture rooms (RT=0.8 to 1.39 s [250 Hz-2 kHz]) in two Universities of Oldenburg were evaluated by 321 students during the lectures. Evidence found supports the assumption that acoustical comfort in rooms is dependent on frequency for rooms with higher reverberation times.

  3. Development of an Acoustic Signal Analysis Tool “Auto-F” Based on the Temperament Scale

    NASA Astrophysics Data System (ADS)

    Modegi, Toshio

    The MIDI interface is originally designed for electronic musical instruments but we consider this music-note based coding concept can be extended for general acoustic signal description. We proposed applying the MIDI technology to coding of bio-medical auscultation sound signals such as heart sounds for retrieving medical records and performing telemedicine. Then we have tried to extend our encoding targets including vocal sounds, natural sounds and electronic bio-signals such as ECG, using Generalized Harmonic Analysis method. Currently, we are trying to separate vocal sounds included in popular songs and encode both vocal sounds and background instrumental sounds into separate MIDI channels. And also, we are trying to extract articulation parameters such as MIDI pitch-bend parameters in order to reproduce natural acoustic sounds using a GM-standard MIDI tone generator. In this paper, we present an overall algorithm of our developed acoustic signal analysis tool, based on those research works, which can analyze given time-based signals on the musical temperament scale. The prominent feature of this tool is producing high-precision MIDI codes, which reproduce the similar signals as the given source signal using a GM-standard MIDI tone generator, and also providing analyzed texts in the XML format.

  4. Differentiating speech and nonspeech sounds via amplitude envelope cues

    NASA Astrophysics Data System (ADS)

    Lehnhoff, Robert J.; Strange, Winifred; Long, Glenis

    2001-05-01

    Recent evidence from neuroscience and behavioral speech science suggests that the temporal modulation pattern of the speech signal plays a distinctive role in speech perception. As a first step in exploring the nature of the perceptually relevant information in the temporal pattern of speech, this experiment examined whether speech versus nonspeech environmental sounds could be differentiated on the basis of their amplitude envelopes. Conversational speech was recorded from native speakers of six different languages (French, German, Hebrew, Hindi, Japanese, and Russian) along with samples of their English. Nonspeech sounds included animal vocalizations, water sounds, and other environmental sounds (e.g., thunder). The stimulus set included 30 2-s speech segments and 30 2-s nonspeech events. Frequency information was removed from all stimuli using a technique described by Dorman et al. [J. Acoust. Soc. Am. 102 (1997)]. Nine normal-hearing adult listeners participated in the experiment. Subjects decided whether each sound was (originally) speech or nonspeech and rated their confidence (7-point Likert scale). Overall, subjects differentiated speech from nonspeech very accurately (84% correct). Only 12 stimuli were not correctly categorized at greater than chance levels. Acoustical analysis is underway to determine what parameters of the amplitude envelope differentiate speech from nonspeech sounds.

  5. Speech input and output

    NASA Astrophysics Data System (ADS)

    Class, F.; Mangold, H.; Stall, D.; Zelinski, R.

    1981-12-01

    Possibilities for acoustical dialogs with electronic data processing equipment were investigated. Speech recognition is posed as recognizing word groups. An economical, multistage classifier for word string segmentation is presented and its reliability in dealing with continuous speech (problems of temporal normalization and context) is discussed. Speech synthesis is considered in terms of German linguistics and phonetics. Preprocessing algorithms for total synthesis of written texts were developed. A macrolanguage, MUSTER, is used to implement this processing in an acoustic data information system (ADES).

  6. Mode tomography using signals from the Long Range Ocean Acoustic Propagation EXperiment (LOAPEX)

    NASA Astrophysics Data System (ADS)

    Chandrayadula, Tarun K.

    Ocean acoustic tomography uses acoustic signals to infer the environmental properties of the ocean. The procedure for tomography consists of low frequency acoustic transmissions at mid-water depths to receivers located at hundreds of kilometer ranges. The arrival times of the signal at the receiver are then inverted for the sound speed of the background environment. Using this principle, experiments such as the 2004 Long Range Ocean Acoustic Propagation EXperiment have used acoustic signals recorded across Vertical Line Arrays (VLAs) to infer the Sound Speed Profile (SSP) across depth. The acoustic signals across the VLAs can be represented in terms of orthonormal basis functions called modes. The lower modes of the basis set concentrated around mid-water propagate longer distances and can be inverted for mesoscale effects such as currents and eddies. In spite of these advantages, mode tomography has received less attention. One of the important reasons for this is that internal waves in the ocean cause significant amplitude and travel time fluctuations in the modes. The amplitude and travel time fluctuations cause errors in travel time estimates. The absence of a statistical model and the lack of signal processing techniques for internal wave effects have precluded the modes from being used in tomographic inversions. This thesis estimates a statistical model for modes affected by internal waves and then uses the estimated model to design appropriate signal processing methods to obtain tomographic observables for the low modes. In order to estimate a statistical model, this thesis uses both the LOAPEX signals and also numerical simulations. The statistical model describes the amplitude and phase coherence across different frequencies for modes at different ranges. The model suggests that Matched Subspace Detectors (MSDs) based on the amplitude statistics of the modes are the optimum detectors to make travel time estimates for modes up to 250 km. The mean of the

  7. Prediction and constraint in audiovisual speech perception

    PubMed Central

    Peelle, Jonathan E.; Sommers, Mitchell S.

    2015-01-01

    During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing precision of prediction. Electrophysiological studies demonstrate oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to auditory information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration, supported

  8. Acoustic Signal Processing for Pipe Condition Assessment (WaterRF Report 4360)

    EPA Science Inventory

    Unique to prestressed concrete cylinder pipe (PCCP), individual wire breaks create an excitation in the pipe wall that may vary in response to the remaining compression of the pipe core. This project was designed to improve acoustic signal processing for pipe condition assessment...

  9. Some Interactions of Speech Rate, Signal Distortion, and Certain Linguistic Factors in Listening Comprehension. Professional Paper No. 39-68.

    ERIC Educational Resources Information Center

    Sticht, Thomas G.

    This experiment was designed to determine the relative effects of speech rate and signal distortion due to the time-compression process on listening comprehension. In addition, linguistic factors--including sequencing of random words into story form, and inflection and phraseology--were qualitatively considered for their effects on listening…

  10. Acoustic emission from single point machining: Part 2, Signal changes with tool wear. Revised

    SciTech Connect

    Heiple, C.R.; Carpenter, S.H.; Armentrout, D.L.; McManigle, A.P.

    1989-12-31

    Changes in acoustic emission signal characteristics with tool wear were monitored during single point machining of 4340 steel and Ti-6Al-4V heat treated to several strength levels, 606l-T6 aluminum, 304 stainless steel, 17-4PH stainless steel, 410 stainless steel, lead, and teflon. No signal characteristic changed in the same way with tool wear for all materials tested. A single change in a particular AE signal characteristic with tool wear valid for all materials probably does not exist. Nevertheless, changes in various signal characteristic with wear for a given material may be sufficient to be used to monitor tool wear.

  11. Acoustic emission from single point machining: Part 2, Signal changes with tool wear

    SciTech Connect

    Heiple, C.R.; Carpenter, S.H.; Armentrout, D.L.; McManigle, A.P.

    1989-01-01

    Changes in acoustic emission signal characteristics with tool wear were monitored during single point machining of 4340 steel and Ti-6Al-4V heat treated to several strength levels, 606l-T6 aluminum, 304 stainless steel, 17-4PH stainless steel, 410 stainless steel, lead, and teflon. No signal characteristic changed in the same way with tool wear for all materials tested. A single change in a particular AE signal characteristic with tool wear valid for all materials probably does not exist. Nevertheless, changes in various signal characteristic with wear for a given material may be sufficient to be used to monitor tool wear.

  12. A method for reducing the level of spurious signals in surface acoustic wave filters

    NASA Astrophysics Data System (ADS)

    Borodii, Iu. N.; Grankin, I. M.; Zapunnyi, A. P.; Kolomeiko, A. V.

    1986-03-01

    A method for reducing spurious signals in surface acoustic wave (SAW) filters is proposed whereby both bulk and reflected wave signals are attenuated by electrodes of special configuration providing synphase addition of the useful signal and nonsynphase addition of spurious signal components. The electrodes of the input and output converters are made with a common focus point and equal angular apertures. The shape of the electrodes of the focusing converters on anisotropic crystal surfaces is determined by the corresponding SAW group velocity curve. An implementation of the method proposed here is examined together with some test results.

  13. Cortical asymmetries in speech perception: what’s wrong, what’s right, and what’s left?

    PubMed Central

    McGettigan, Carolyn; Scott, Sophie K.

    2014-01-01

    Over the last 30 years hemispheric asymmetries in speech perception have been construed within a domain general framework, where preferential processing of speech is due to left lateralized, non-linguistic acoustic sensitivities. A prominent version of this argument holds that the left temporal lobe selectively processes rapid/temporal information in sound. Acoustically, this is a poor characterization of speech and there has been little empirical support for a left-hemisphere selectivity for these cues. In sharp contrast, the right temporal lobe is demonstrably sensitive to specific acoustic properties. We suggest that acoustic accounts of speech sensitivities need to be informed by the nature of the speech signal, and that a simple domain general/specific dichotomy may be incorrect. PMID:22521208

  14. Infrasonic and seismic signals from earthquakes and explosions observed with Plostina seismo-acoustic array

    NASA Astrophysics Data System (ADS)

    Ghica, D.; Ionescu, C.

    2012-04-01

    Plostina seismo-acoustic array has been recently deployed by the National Institute for Earth Physics in the central part of Romania, near the Vrancea epicentral area. The array has a 2.5 km aperture and consists of 7 seismic sites (PLOR) and 7 collocated infrasound instruments (IPLOR). The array is being used to assess the importance of collocated seismic and acoustic sensors for the purposes of (1) seismic monitoring of the local and regional events, and (2) acoustic measurement, consisting of detection of the infrasound events (explosions, mine and quarry blasts, earthquakes, aircraft etc.). This paper focuses on characterization of infrasonic and seismic signals from the earthquakes and explosions (accidental and mining type). Two Vrancea earthquakes with magnitude above 5.0 were selected to this study: one occurred on 1st of May 2011 (MD = 5.3, h = 146 km), and the other one, on 4th October 2011 (MD = 5.2, h = 142 km). The infrasonic signals from the earthquakes have the appearance of the vertical component of seismic signals. Because the mechanism of the infrasonic wave formation is the coupling of seismic waves with the atmosphere, trace velocity values for such signals are compatible with the characteristics of the various seismic phases observed with PLOR array. The study evaluates and characterizes, as well, infrasound and seismic data recorded from the explosion caused by the military accident produced at Evangelos Florakis Naval Base, in Cyprus, on 11th July 2011. Additionally, seismo-acoustic signals presumed to be related to strong mine and quarry blasts were investigated. Ground truth of mine observations provides validation of this interpretation. The combined seismo-acoustic analysis uses two types of detectors for signal identification: one is the automatic detector DFX-PMCC, applied for infrasound detection and characterization, while the other one, which is used for seismic data, is based on array processing techniques (beamforming and frequency

  15. Antifade sonar employs acoustic field diversity to recover signals from multipath fading

    SciTech Connect

    Lubman, D.

    1996-04-01

    Co-located pressure and particle motion (PM) hydrophones together with four-channel diversity combiners may be used to recover signals from multipath fading. Multipath fading is important in both shallow and deep water propagation and can be an important source of signal loss. The acoustic field diversity concept arises from the notion of conservation of signal energy and the observation that in rooms at least, the total acoustic energy density is the sum of potential energy (scalar field-sound pressure) and kinetic energy (vector field-sound PM) portions. One pressure hydrophone determines acoustic potential energy density at a point. In principle, three PM sensors (displacement, velocity, or acceleration) directed along orthogonal axes describe the kinetic energy density at a point. For a single plane wave, the time-averaged potential and kinetic field energies are identical everywhere. In multipath interference, however, potential and kinetic field energies at a point are partitioned unequally, depending mainly on relative signal phases. Thus, when pressure signals are in deep fade, abundant kinetic field signal energy may be available at that location. Performance benefits require a degree of uncorrelated fading between channels. The expectation of nearly uncorrelated fading is motivated from room theory. Performance benefits for sonar limited by independent Rayleigh fading are suggested by analogy to antifade radio. Average SNR can be improved by several decibels, holding time on target is multiplied manifold, and the bit error rate for data communication is reduced substantially. {copyright} {ital 1996 American Institute of Physics.}

  16. Antifade sonar employs acoustic field diversity to recover signals from multipath fading

    NASA Astrophysics Data System (ADS)

    Lubman, David

    1996-04-01

    Co-located pressure and particle motion (PM) hydrophones together with four-channel diversity combiners may be used to recover signals from multipath fading. Multipath fading is important in both shallow and deep water propagation and can be an important source of signal loss. The acoustic field diversity concept arises from the notion of conservation of signal energy and the observation that in rooms at least, the total acoustic energy density is the sum of potential energy (scalar field-sound pressure) and kinetic energy (vector field-sound PM) portions. One pressure hydrophone determines acoustic potential energy density at a point. In principle, three PM sensors (displacement, velocity, or acceleration) directed along orthogonal axes describe the kinetic energy density at a point. For a single plane wave, the time-averaged potential and kinetic field energies are identical everywhere. In multipath interference, however, potential and kinetic field energies at a point are partitioned unequally, depending mainly on relative signal phases. Thus, when pressure signals are in deep fade, abundant kinetic field signal energy may be available at that location. Performance benefits require a degree of uncorrelated fading between channels. The expectation of nearly uncorrelated fading is motivated from room theory. Performance benefits for sonar limited by independent Rayleigh fading are suggested by analogy to antifade radio. Average SNR can be improved by several decibels, holding time on target is multiplied manifold, and the bit error rate for data communication is reduced substantially.

  17. Robust Speech Rate Estimation for Spontaneous Speech

    PubMed Central

    Wang, Dagen; Narayanan, Shrikanth S.

    2010-01-01

    In this paper, we propose a direct method for speech rate estimation from acoustic features without requiring any automatic speech transcription. We compare various spectral and temporal signal analysis and smoothing strategies to better characterize the underlying syllable structure to derive speech rate. The proposed algorithm extends the methods of spectral subband correlation by including temporal correlation and the use of prominent spectral subbands for improving the signal correlation essential for syllable detection. Furthermore, to address some of the practical robustness issues in previously proposed methods, we introduce some novel components into the algorithm such as the use of pitch confidence for filtering spurious syllable envelope peaks, magnifying window for tackling neighboring syllable smearing, and relative peak measure thresholds for pseudo peak rejection. We also describe an automated approach for learning algorithm parameters from data, and find the optimal settings through Monte Carlo simulations and parameter sensitivity analysis. Final experimental evaluations are conducted based on a portion of the Switchboard corpus for which manual phonetic segmentation information, and published results for direct comparison are available. The results show a correlation coefficient of 0.745 with respect to the ground truth based on manual segmentation. This result is about a 17% improvement compared to the current best single estimator and a 11% improvement over the multiestimator evaluated on the same Switchboard database. PMID:20428476

  18. Seismic and Acoustic Array Monitoring of Signal from Tungurahua Volcano, Ecuador

    NASA Astrophysics Data System (ADS)

    Terbush, B. R.; Anthony, R. E.; Johnson, J. B.; Ruiz, M. C.

    2012-12-01

    Tungurahua Volcano is an active stratovolcano located in Ecuador's eastern Cordillera. Since its most recent cycle of eruptive activity, beginning in 1999, it has produced both strombolian-to-vulcanian eruptions, and regular vapor emissions. Tungurahua is located above the city of Baños, so volcanic activity is well-monitored by Ecuador's Instituto Geofisico Nacional with a seismic and infrasound network, and other surveillance tools. Toward better understanding of the complex seismic and acoustic signals associated with low-level Tungurahua activity, and which are often low in signal-to-noise, we deployed temporary seismo-acoustic arrays between June 9th and 20th in 2012. This deployment was part of a Field Volcano Geophysics class, a collaboration between New Mexico Institute of Mining and Technology and the Escuela Politecnica Nacional's Instituto Geofísico in Ecuador. Two six-element arrays were deployed on the flank of the volcano. A seismo-acoustic array, which consisted of combined broadband seismic and infrasound sensors, possessed 100-meter spacing, and was deployed five kilometers north of the vent in an open field at 2700 m. The second array had only acoustic sensors with 30-meter spacing, and was deployed approximately six kilometers northwest of the vent, on an old pyroclastic flow deposit. The arrays picked up signals from four distinct explosion events, a number of diverse tremor signals, local volcano tectonic and long period earthquakes, and a regional tectonic event of magnitude 4.9. Coherency of both seismic and acoustic array data was quantified using Fisher Statistics, which was effective for identifying myriad signals. For most signals Fisher Statistics were particularly high in low frequency bands, between 0.5 and 2 Hz. Array analyses helped to filter out noise induced by cultural sources and livestock signals, which were particularly pronounced in the deployment site. Volcan Tungurahua sources were considered plane wave signals and could

  19. A multimodal spectral approach to characterize rhythm in natural speech.

    PubMed

    Alexandrou, Anna Maria; Saarinen, Timo; Kujala, Jan; Salmelin, Riitta

    2016-01-01

    Human utterances demonstrate temporal patterning, also referred to as rhythm. While simple oromotor behaviors (e.g., chewing) feature a salient periodical structure, conversational speech displays a time-varying quasi-rhythmic pattern. Quantification of periodicity in speech is challenging. Unimodal spectral approaches have highlighted rhythmic aspects of speech. However, speech is a complex multimodal phenomenon that arises from the interplay of articulatory, respiratory, and vocal systems. The present study addressed the question of whether a multimodal spectral approach, in the form of coherence analysis between electromyographic (EMG) and acoustic signals, would allow one to characterize rhythm in natural speech more efficiently than a unimodal analysis. The main experimental task consisted of speech production at three speaking rates; a simple oromotor task served as control. The EMG-acoustic coherence emerged as a sensitive means of tracking speech rhythm, whereas spectral analysis of either EMG or acoustic amplitude envelope alone was less informative. Coherence metrics seem to distinguish and highlight rhythmic structure in natural speech. PMID:26827019

  20. Automatic Speech Recognition Based on Electromyographic Biosignals

    NASA Astrophysics Data System (ADS)

    Jou, Szu-Chen Stan; Schultz, Tanja

    This paper presents our studies of automatic speech recognition based on electromyographic biosignals captured from the articulatory muscles in the face using surface electrodes. We develop a phone-based speech recognizer and describe how the performance of this recognizer improves by carefully designing and tailoring the extraction of relevant speech feature toward electromyographic signals. Our experimental design includes the collection of audibly spoken speech simultaneously recorded as acoustic data using a close-speaking microphone and as electromyographic signals using electrodes. Our experiments indicate that electromyographic signals precede the acoustic signal by about 0.05-0.06 seconds. Furthermore, we introduce articulatory feature classifiers, which had recently shown to improved classical speech recognition significantly. We describe that the classification accuracy of articulatory features clearly benefits from the tailored feature extraction. Finally, these classifiers are integrated into the overall decoding framework applying a stream architecture. Our final system achieves a word error rate of 29.9% on a 100-word recognition task.

  1. Acoustic tweezers for studying intracellular calcium signaling in SKBR-3 human breast cancer cells

    PubMed Central

    Hwang, Jae Youn; Yoon, Chi Woo; Lim, Hae Gyun; Park, Jin Man; Yoon, Sangpil; Lee, Jungwoo; Shung, K. Kirk

    2016-01-01

    Extracellular matrix proteins such as fibronectin (FNT) play crucial roles in cell proliferation, adhesion, and migration. For better understanding of these associated cellular activities, various microscopic manipulation tools have been used to study their intracellular signaling pathways. Recently, it has appeared that acoustic tweezers may possess similar capabilities in the study. Therefore, we here demonstrate that our newly developed acoustic tweezers with a high-frequency lithium niobate ultrasonic transducer have potentials to study intracellular calcium signaling by FNT-binding to human breast cancer cells (SKBR-3). It is found that intracellular calcium elevations in SKBR-3 cells, initially occurring on the microbead-contacted spot and then eventually spreading over the entire cell, are elicited by attaching an acoustically trapped FNT-coated microbead. Interestingly, they are suppressed by either extracellular calcium elimination or phospholipase C (PLC) inhibition. Hence, this suggests that our acoustic tweezers may serve as an alternative tool in the study of intracellular signaling by FNT-binding activities. PMID:26150401

  2. Leak detection in gas pipeline by acoustic and signal processing - A review

    NASA Astrophysics Data System (ADS)

    Adnan, N. F.; Ghazali, M. F.; Amin, M. M.; Hamat, A. M. A.

    2015-12-01

    The pipeline system is the most important part in media transport in order to deliver fluid to another station. The weak maintenance and poor safety will contribute to financial losses in term of fluid waste and environmental impacts. There are many classifications of techniques to make it easier to show their specific method and application. This paper's discussion about gas leak detection in pipeline system using acoustic method will be presented in this paper. The wave propagation in the pipeline is a key parameter in acoustic method when the leak occurs and the pressure balance of the pipe will generated by the friction between wall in the pipe. The signal processing is used to decompose the raw signal and show in time- frequency. Findings based on the acoustic method can be used for comparative study in the future. Acoustic signal and HHT is the best method to detect leak in gas pipelines. More experiments and simulation need to be carried out to get the fast result of leaking and estimation of their location.

  3. Data quality enhancement and knowledge discovery from relevant signals in acoustic emission

    NASA Astrophysics Data System (ADS)

    Mejia, Felipe; Shyu, Mei-Ling; Nanni, Antonio

    2015-10-01

    The increasing popularity of structural health monitoring has brought with it a growing need for automated data management and data analysis tools. Of great importance are filters that can systematically detect unwanted signals in acoustic emission datasets. This study presents a semi-supervised data mining scheme that detects data belonging to unfamiliar distributions. This type of outlier detection scheme is useful detecting the presence of new acoustic emission sources, given a training dataset of unwanted signals. In addition to classifying new observations (herein referred to as "outliers") within a dataset, the scheme generates a decision tree that classifies sub-clusters within the outlier context set. The obtained tree can be interpreted as a series of characterization rules for newly-observed data, and they can potentially describe the basic structure of different modes within the outlier distribution. The data mining scheme is first validated on a synthetic dataset, and an attempt is made to confirm the algorithms' ability to discriminate outlier acoustic emission sources from a controlled pencil-lead-break experiment. Finally, the scheme is applied to data from two fatigue crack-growth steel specimens, where it is shown that extracted rules can adequately describe crack-growth related acoustic emission sources while filtering out background "noise." Results show promising performance in filter generation, thereby allowing analysts to extract, characterize, and focus only on meaningful signals.

  4. Elevated stress hormone diminishes the strength of female preferences for acoustic signals in the green treefrog.

    PubMed

    Davis, A Gabriell; Leary, Christopher J

    2015-03-01

    Mate selection can be stressful; time spent searching for mates can increase predation risk and/or decrease food consumption, resulting in elevated stress hormone levels. Both high predation risk and low food availability are often associated with increased variation in mate choice by females, but it is not clear whether stress hormone levels contribute to such variation in female behavior. We examined how the stress hormone corticosterone (CORT) affects female preferences for acoustic signals in the green treefrog, Hyla cinerea. Specifically, we assessed whether CORT administration affects female preferences for call rate - an acoustic feature that is typically under directional selection via mate choice by females in most anurans and other species that communicate using acoustic signals. Using a dual speaker playback paradigm, we show that females that were administered higher doses of CORT were less likely to choose male advertisement calls broadcast at high rates. Neither CORT dose nor level was related to the latency of female phonotactic responses, suggesting that elevated CORT does not influence the motivation to mate. Results were also not related to circulating sex steroids (i.e., progesterone, androgens or estradiol) that have traditionally been the focus of studies examining the hormonal basis for variation in female mate choice. Our results thus indicate that elevated CORT levels decrease the strength of female preferences for acoustic signals. PMID:25644312

  5. Neural Oscillations Carry Speech Rhythm through to Comprehension

    PubMed Central

    Peelle, Jonathan E.; Davis, Matthew H.

    2012-01-01

    A key feature of speech is the quasi-regular rhythmic information contained in its slow amplitude modulations. In this article we review the information conveyed by speech rhythm, and the role of ongoing brain oscillations in listeners’ processing of this content. Our starting point is the fact that speech is inherently temporal, and that rhythmic information conveyed by the amplitude envelope contains important markers for place and manner of articulation, segmental information, and speech rate. Behavioral studies demonstrate that amplitude envelope information is relied upon by listeners and plays a key role in speech intelligibility. Extending behavioral findings, data from neuroimaging – particularly electroencephalography (EEG) and magnetoencephalography (MEG) – point to phase locking by ongoing cortical oscillations to low-frequency information (~4–8 Hz) in the speech envelope. This phase modulation effectively encodes a prediction of when important events (such as stressed syllables) are likely to occur, and acts to increase sensitivity to these relevant acoustic cues. We suggest a framework through which such neural entrainment to speech rhythm can explain effects of speech rate on word and segment perception (i.e., that the perception of phonemes and words in connected speech is influenced by preceding speech rate). Neuroanatomically, acoustic amplitude modulations are processed largely bilaterally in auditory cortex, with intelligible speech resulting in differential recruitment of left-hemisphere regions. Notable among these is lateral anterior temporal cortex, which we propose functions in a domain-general fashion to support ongoing memory and integration of meaningful input. Together, the reviewed evidence suggests that low-frequency oscillations in the acoustic speech signal form the foundation of a rhythmic hierarchy supporting spoken language, mirrored by phase-locked oscillations in the human brain. PMID:22973251

  6. Multi-scale morphology analysis of acoustic emission signal and quantitative diagnosis for bearing fault

    NASA Astrophysics Data System (ADS)

    Wang, Wen-Jing; Cui, Ling-Li; Chen, Dao-Yun

    2016-04-01

    Monitoring of potential bearing faults in operation is of critical importance to safe operation of high speed trains. One of the major challenges is how to differentiate relevant signals to operational conditions of bearings from noises emitted from the surrounding environment. In this work, we report a procedure for analyzing acoustic emission signals collected from rolling bearings for diagnosis of bearing health conditions by examining their morphological pattern spectrum (MPS) through a multi-scale morphology analysis procedure. The results show that acoustic emission signals resulted from a given type of bearing faults share rather similar MPS curves. Further examinations in terms of sample entropy and Lempel-Ziv complexity of MPS curves suggest that these two parameters can be utilized to determine damage modes.

  7. Temperature and Pressure Dependence of Signal Amplitudes for Electrostriction Laser-Induced Thermal Acoustics

    NASA Technical Reports Server (NTRS)

    Herring, Gregory C.

    2015-01-01

    The relative signal strength of electrostriction-only (no thermal grating) laser-induced thermal acoustics (LITA) in gas-phase air is reported as a function of temperature T and pressure P. Measurements were made in the free stream of a variable Mach number supersonic wind tunnel, where T and P are varied simultaneously as Mach number is varied. Using optical heterodyning, the measured signal amplitude (related to the optical reflectivity of the acoustic grating) was averaged for each of 11 flow conditions and compared to the expected theoretical dependence of a pure-electrostriction LITA process, where the signal is proportional to the square root of [P*P /( T*T*T)].

  8. A study of the connection between tidal velocities, soliton packets and acoustic signal losses

    NASA Astrophysics Data System (ADS)

    Chin-Bing, Stanley A.; Warn-Varnas, Alex C.; King, David B.; Lamb, Kevin G.; Hawkins, James A.; Teixeira, Marvi

    2002-11-01

    Coupled ocean model and acoustic model simulations of soliton packets in the Yellow Sea have indicated that the environmental conditions necessary for anomalous signal losses can occur several times in a 24 h period. These conditions and the subsequent signal losses were observed in simulations made over an 80 h space-time evolution of soliton packets that were generated by a 0.7 m/s tidal velocity [Chin-Bing et al., J. Acoust. Soc. Am. 111, 2459 (2002)]. This particular tidal velocity was used to initiate the Lamb soliton model because the soliton packets that were generated compared favorably with SAR measurements of soliton packets in the Yellow Sea. The tidal velocities in this region can range from 0.3 m/s to 1.2 m/s. In this work we extend our simulations and analyses to include soliton packets generated by other tidal velocities in the 0.3-1.2 m/s band. Anomalous signal losses are again observed. Examples will be shown that illustrate the connections between the tidal velocities, the soliton packets that are generated by these tidal velocities, and the signal losses that can occur when acoustic signals are propagated through these soliton packets. [Work supported by ONR/NRL and by a High Performance Computing DoD grant.

  9. Speech coding

    SciTech Connect

    Ravishankar, C., Hughes Network Systems, Germantown, MD

    1998-05-08

    Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the

  10. Speech research: Studies on the nature of speech, instrumentation for its investigation, and practical applications

    NASA Astrophysics Data System (ADS)

    Liberman, A. M.

    1982-03-01

    This report is one of a regular series on the status and progress of studies on the nature of speech, instrumentation for its investigation and practical applications. Manuscripts cover the following topics: Speech perception and memory coding in relation to reading ability; The use of orthographic structure by deaf adults: Recognition of finger-spelled letters; Exploring the information support for speech; The stream of speech; Using the acoustic signal to make inferences about place and duration of tongue-palate contact. Patterns of human interlimb coordination emerge from the the properties of nonlinear limit cycle oscillatory processes: Theory and data; Motor control: Which themes do we orchestrate? Exploring the nature of motor control in Down's syndrome; Periodicity and auditory memory: A pilot study; Reading skill and language skill: On the role of sign order and morphological structure in memory for American Sign Language sentences; Perception of nasal consonants with special reference to Catalan; and Speech production Characteristics of the hearing impaired.

  11. Listening to the Deep: live monitoring of ocean noise and cetacean acoustic signals.

    PubMed

    André, M; van der Schaar, M; Zaugg, S; Houégnigan, L; Sánchez, A M; Castell, J V

    2011-01-01

    The development and broad use of passive acoustic monitoring techniques have the potential to help assessing the large-scale influence of artificial noise on marine organisms and ecosystems. Deep-sea observatories have the potential to play a key role in understanding these recent acoustic changes. LIDO (Listening to the Deep Ocean Environment) is an international project that is allowing the real-time long-term monitoring of marine ambient noise as well as marine mammal sounds at cabled and standalone observatories. Here, we present the overall development of the project and the use of passive acoustic monitoring (PAM) techniques to provide the scientific community with real-time data at large spatial and temporal scales. Special attention is given to the extraction and identification of high frequency cetacean echolocation signals given the relevance of detecting target species, e.g. beaked whales, in mitigation processes, e.g. during military exercises. PMID:21665016

  12. Ductile Deformation of Dehydrating Serpentinite Evidenced by Acoustic Signal Monitoring

    NASA Astrophysics Data System (ADS)

    Gasc, J.; Hilairet, N.; Wang, Y.; Schubnel, A. J.

    2012-12-01

    Serpentinite dehydration is believed to be responsible for triggering earthquakes at intermediate depths (i.e., 60-300 km) in subduction zones. Based on experimental results, some authors have proposed mechanisms that explain how brittle deformation can occur despite high pressure and temperature conditions [1]. However, reproducing microseismicity in the laboratory associated with the deformation of dehydrating serpentinite remains challenging. A recent study showed that, even for fast dehydration kinetics, ductile deformation could take place rather than brittle faulting in the sample [2]. This latter study was conducted in a multi-anvil apparatus without the ability to control differential stress during dehydration. We have since conducted controlled deformation experiments in the deformation-DIA (D-DIA) on natural serpentinite samples at sector 13 (GSECARS) of the APS. Monochromatic radiation was used with both a 2D MAR-CCD detector and a CCD camera to determine the stress and the strain of the sample during the deformation process [3]. In addition, an Acoustic Emission (AE) recording setup was used to monitor the microseismicity from the sample, using piezo-ceramic transducers glued on the basal truncation of the anvils. The use of six independent transducers allows locating the AEs and calculating the corresponding focal mechanisms. The samples were deformed at strain rates of 10-5-10-4 s-1 under confining pressures of 3-5 GPa. Dehydration was triggered during the deformation by heating the samples at rates ranging from 5 to 60 K/min. Before the onset of the dehydration, X-ray diffraction data showed that the serpentinite sustained ~1 GPa of stress which plummeted when dehydration occurred. Although AEs were recorded during the compression and decompression stages, no AEs ever accompanied this stress drop, suggesting ductile deformation of the samples. Hence, unlike many previous studies, no evidence for fluid embrittlement and anticrack generation was found

  13. Particle Swarm Optimization Based Feature Enhancement and Feature Selection for Improved Emotion Recognition in Speech and Glottal Signals

    PubMed Central

    Muthusamy, Hariharan; Polat, Kemal; Yaacob, Sazali

    2015-01-01

    In the recent years, many research works have been published using speech related features for speech emotion recognition, however, recent studies show that there is a strong correlation between emotional states and glottal features. In this work, Mel-frequency cepstralcoefficients (MFCCs), linear predictive cepstral coefficients (LPCCs), perceptual linear predictive (PLP) features, gammatone filter outputs, timbral texture features, stationary wavelet transform based timbral texture features and relative wavelet packet energy and entropy features were extracted from the emotional speech (ES) signals and its glottal waveforms(GW). Particle swarm optimization based clustering (PSOC) and wrapper based particle swarm optimization (WPSO) were proposed to enhance the discerning ability of the features and to select the discriminating features respectively. Three different emotional speech databases were utilized to gauge the proposed method. Extreme learning machine (ELM) was employed to classify the different types of emotions. Different experiments were conducted and the results show that the proposed method significantly improves the speech emotion recognition performance compared to previous works published in the literature. PMID:25799141

  14. Particle swarm optimization based feature enhancement and feature selection for improved emotion recognition in speech and glottal signals.

    PubMed

    Muthusamy, Hariharan; Polat, Kemal; Yaacob, Sazali

    2015-01-01

    In the recent years, many research works have been published using speech related features for speech emotion recognition, however, recent studies show that there is a strong correlation between emotional states and glottal features. In this work, Mel-frequency cepstralcoefficients (MFCCs), linear predictive cepstral coefficients (LPCCs), perceptual linear predictive (PLP) features, gammatone filter outputs, timbral texture features, stationary wavelet transform based timbral texture features and relative wavelet packet energy and entropy features were extracted from the emotional speech (ES) signals and its glottal waveforms(GW). Particle swarm optimization based clustering (PSOC) and wrapper based particle swarm optimization (WPSO) were proposed to enhance the discerning ability of the features and to select the discriminating features respectively. Three different emotional speech databases were utilized to gauge the proposed method. Extreme learning machine (ELM) was employed to classify the different types of emotions. Different experiments were conducted and the results show that the proposed method significantly improves the speech emotion recognition performance compared to previous works published in the literature. PMID:25799141

  15. Acoustics

    NASA Astrophysics Data System (ADS)

    The acoustics research activities of the DLR fluid-mechanics department (Forschungsbereich Stroemungsmechanik) during 1988 are surveyed and illustrated with extensive diagrams, drawings, graphs, and photographs. Particular attention is given to studies of helicopter rotor noise (high-speed impulsive noise, blade/vortex interaction noise, and main/tail-rotor interaction noise), propeller noise (temperature, angle-of-attack, and nonuniform-flow effects), noise certification, and industrial acoustics (road-vehicle flow noise and airport noise-control installations).

  16. Tracking the Speech Signal--Time-Locked MEG Signals during Perception of Ultra-Fast and Moderately Fast Speech in Blind and in Sighted Listeners

    ERIC Educational Resources Information Center

    Hertrich, Ingo; Dietrich, Susanne; Ackermann, Hermann

    2013-01-01

    Blind people can learn to understand speech at ultra-high syllable rates (ca. 20 syllables/s), a capability associated with hemodynamic activation of the central-visual system. To further elucidate the neural mechanisms underlying this skill, magnetoencephalographic (MEG) measurements during listening to sentence utterances were cross-correlated…

  17. Acoustic cardiac signals analysis: a Kalman filter–based approach

    PubMed Central

    Salleh, Sheik Hussain; Hussain, Hadrina Sheik; Swee, Tan Tian; Ting, Chee-Ming; Noor, Alias Mohd; Pipatsart, Surasak; Ali, Jalil; Yupapin, Preecha P

    2012-01-01

    Auscultation of the heart is accompanied by both electrical activity and sound. Heart auscultation provides clues to diagnose many cardiac abnormalities. Unfortunately, detection of relevant symptoms and diagnosis based on heart sound through a stethoscope is difficult. The reason GPs find this difficult is that the heart sounds are of short duration and separated from one another by less than 30 ms. In addition, the cost of false positives constitutes wasted time and emotional anxiety for both patient and GP. Many heart diseases cause changes in heart sound, waveform, and additional murmurs before other signs and symptoms appear. Heart-sound auscultation is the primary test conducted by GPs. These sounds are generated primarily by turbulent flow of blood in the heart. Analysis of heart sounds requires a quiet environment with minimum ambient noise. In order to address such issues, the technique of denoising and estimating the biomedical heart signal is proposed in this investigation. Normally, the performance of the filter naturally depends on prior information related to the statistical properties of the signal and the background noise. This paper proposes Kalman filtering for denoising statistical heart sound. The cycles of heart sounds are certain to follow first-order Gauss–Markov process. These cycles are observed with additional noise for the given measurement. The model is formulated into state-space form to enable use of a Kalman filter to estimate the clean cycles of heart sounds. The estimates obtained by Kalman filtering are optimal in mean squared sense. PMID:22745550

  18. Early recognition of speech

    PubMed Central

    Remez, Robert E; Thomas, Emily F

    2013-01-01

    Classic research on the perception of speech sought to identify minimal acoustic correlates of each consonant and vowel. In explaining perception, this view designated momentary components of an acoustic spectrum as cues to the recognition of elementary phonemes. This conceptualization of speech perception is untenable given the findings of phonetic sensitivity to modulation independent of the acoustic and auditory form of the carrier. The empirical key is provided by studies of the perceptual organization of speech, a low-level integrative function that finds and follows the sensory effects of speech amid concurrent events. These projects have shown that the perceptual organization of speech is keyed to modulation; fast; unlearned; nonsymbolic; indifferent to short-term auditory properties; and organization requires attention. The ineluctably multisensory nature of speech perception also imposes conditions that distinguish language among cognitive systems. WIREs Cogn Sci 2013, 4:213–223. doi: 10.1002/wcs.1213 PMID:23926454

  19. Formant-Frequency Variation and Informational Masking of Speech by Extraneous Formants: Evidence Against Dynamic and Speech-Specific Acoustical Constraints

    PubMed Central

    2014-01-01

    How speech is separated perceptually from other speech remains poorly understood. Recent research indicates that the ability of an extraneous formant to impair intelligibility depends on the variation of its frequency contour. This study explored the effects of manipulating the depth and pattern of that variation. Three formants (F1+F2+F3) constituting synthetic analogues of natural sentences were distributed across the 2 ears, together with a competitor for F2 (F2C) that listeners must reject to optimize recognition (left = F1+F2C; right = F2+F3). The frequency contours of F1 − F3 were each scaled to 50% of their natural depth, with little effect on intelligibility. Competitors were created either by inverting the frequency contour of F2 about its geometric mean (a plausibly speech-like pattern) or using a regular and arbitrary frequency contour (triangle wave, not plausibly speech-like) matched to the average rate and depth of variation for the inverted F2C. Adding a competitor typically reduced intelligibility; this reduction depended on the depth of F2C variation, being greatest for 100%-depth, intermediate for 50%-depth, and least for 0%-depth (constant) F2Cs. This suggests that competitor impact depends on overall depth of frequency variation, not depth relative to that for the target formants. The absence of tuning (i.e., no minimum in intelligibility for the 50% case) suggests that the ability to reject an extraneous formant does not depend on similarity in the depth of formant-frequency variation. Furthermore, triangle-wave competitors were as effective as their more speech-like counterparts, suggesting that the selection of formants from the ensemble also does not depend on speech-specific constraints. PMID:24842068

  20. Formant-frequency variation and informational masking of speech by extraneous formants: evidence against dynamic and speech-specific acoustical constraints.

    PubMed

    Roberts, Brian; Summers, Robert J; Bailey, Peter J

    2014-08-01

    How speech is separated perceptually from other speech remains poorly understood. Recent research indicates that the ability of an extraneous formant to impair intelligibility depends on the variation of its frequency contour. This study explored the effects of manipulating the depth and pattern of that variation. Three formants (F1+F2+F3) constituting synthetic analogues of natural sentences were distributed across the 2 ears, together with a competitor for F2 (F2C) that listeners must reject to optimize recognition (left = F1+F2C; right = F2+F3). The frequency contours of F1 - F3 were each scaled to 50% of their natural depth, with little effect on intelligibility. Competitors were created either by inverting the frequency contour of F2 about its geometric mean (a plausibly speech-like pattern) or using a regular and arbitrary frequency contour (triangle wave, not plausibly speech-like) matched to the average rate and depth of variation for the inverted F2C. Adding a competitor typically reduced intelligibility; this reduction depended on the depth of F2C variation, being greatest for 100%-depth, intermediate for 50%-depth, and least for 0%-depth (constant) F2Cs. This suggests that competitor impact depends on overall depth of frequency variation, not depth relative to that for the target formants. The absence of tuning (i.e., no minimum in intelligibility for the 50% case) suggests that the ability to reject an extraneous formant does not depend on similarity in the depth of formant-frequency variation. Furthermore, triangle-wave competitors were as effective as their more speech-like counterparts, suggesting that the selection of formants from the ensemble also does not depend on speech-specific constraints. PMID:24842068

  1. Speech perception as an active cognitive process

    PubMed Central

    Heald, Shannon L. M.; Nusbaum, Howard C.

    2014-01-01

    One view of speech perception is that acoustic signals are transformed into representations for pattern matching to determine linguistic structure. This process can be taken as a statistical pattern-matching problem, assuming realtively stable linguistic categories are characterized by neural representations related to auditory properties of speech that can be compared to speech input. This kind of pattern matching can be termed a passive process which implies rigidity of processing with few demands on cognitive processing. An alternative view is that speech recognition, even in early stages, is an active process in which speech analysis is attentionally guided. Note that this does not mean consciously guided but that information-contingent changes in early auditory encoding can occur as a function of context and experience. Active processing assumes that attention, plasticity, and listening goals are important in considering how listeners cope with adverse circumstances that impair hearing by masking noise in the environment or hearing loss. Although theories of speech perception have begun to incorporate some active processing, they seldom treat early speech encoding as plastic and attentionally guided. Recent research has suggested that speech perception is the product of both feedforward and feedback interactions between a number of brain regions that include descending projections perhaps as far downstream as the cochlea. It is important to understand how the ambiguity of the speech signal and constraints of context dynamically determine cognitive resources recruited during perception including focused attention, learning, and working memory. Theories of speech perception need to go beyond the current corticocentric approach in order to account for the intrinsic dynamics of the auditory encoding of speech. In doing so, this may provide new insights into ways in which hearing disorders and loss may be treated either through augementation or therapy. PMID

  2. Extraction of fault component from abnormal sound in diesel engines using acoustic signals

    NASA Astrophysics Data System (ADS)

    Dayong, Ning; Changle, Sun; Yongjun, Gong; Zengmeng, Zhang; Jiaoyi, Hou

    2016-06-01

    In this paper a method for extracting fault components from abnormal acoustic signals and automatically diagnosing diesel engine faults is presented. The method named dislocation superimposed method (DSM) is based on the improved random decrement technique (IRDT), differential function (DF) and correlation analysis (CA). The aim of DSM is to linearly superpose multiple segments of abnormal acoustic signals because of the waveform similarity of faulty components. The method uses sample points at the beginning of time when abnormal sound appears as the starting position for each segment. In this study, the abnormal sound belonged to shocking faulty type; thus, the starting position searching method based on gradient variance was adopted. The coefficient of similar degree between two same sized signals is presented. By comparing with a similar degree, the extracted fault component could be judged automatically. The results show that this method is capable of accurately extracting the fault component from abnormal acoustic signals induced by faulty shocking type and the extracted component can be used to identify the fault type.

  3. Signal characteristics of an underwater explosive acoustic telemetry system

    SciTech Connect

    Calloway, T.M.

    1984-07-01

    Pressure pulses from small (<1 gm) explosive charges detonated between depths of 150 and 1200 m and detected by hydrophones submerged at a depth of 45 m are analyzed. Experimental data on peak pressures, time constants, and shock-wave/bubble-pulse intervals are summarized. The mass of each explosive is converted to its TNT energy-equivalent mass, which is used in fitting semiempirical scaling laws to the data. Equations are obtained for predicting the characteristics of the signal, given the range and depth of the explosive together with its TNT energy-equivalent mass. The parameter values that provide the best fit of the scaling laws to the experimental data are compared with those values applicable to larger explosives (>50 gm) detonated within 150 m of the surface. 20 references, 7 figures, 8 tables.

  4. Ultrasonic speech translator and communications system

    DOEpatents

    Akerman, M. Alfred; Ayers, Curtis W.; Haynes, Howard D.

    1996-01-01

    A wireless communication system undetectable by radio frequency methods for converting audio signals, including human voice, to electronic signals in the ultrasonic frequency range, transmitting the ultrasonic signal by way of acoustical pressure waves across a carrier medium, including gases, liquids, or solids, and reconverting the ultrasonic acoustical pressure waves back to the original audio signal. The ultrasonic speech translator and communication system (20) includes an ultrasonic transmitting device (100) and an ultrasonic receiving device (200). The ultrasonic transmitting device (100) accepts as input (115) an audio signal such as human voice input from a microphone (114) or tape deck. The ultrasonic transmitting device (100) frequency modulates an ultrasonic carrier signal with the audio signal producing a frequency modulated ultrasonic carrier signal, which is transmitted via acoustical pressure waves across a carrier medium such as gases, liquids or solids. The ultrasonic receiving device (200) converts the frequency modulated ultrasonic acoustical pressure waves to a frequency modulated electronic signal, demodulates the audio signal from the ultrasonic carrier signal, and conditions the demodulated audio signal to reproduce the original audio signal at its output (250).

  5. Ultrasonic speech translator and communications system

    DOEpatents

    Akerman, M.A.; Ayers, C.W.; Haynes, H.D.

    1996-07-23

    A wireless communication system undetectable by radio frequency methods for converting audio signals, including human voice, to electronic signals in the ultrasonic frequency range, transmitting the ultrasonic signal by way of acoustical pressure waves across a carrier medium, including gases, liquids, or solids, and reconverting the ultrasonic acoustical pressure waves back to the original audio signal. The ultrasonic speech translator and communication system includes an ultrasonic transmitting device and an ultrasonic receiving device. The ultrasonic transmitting device accepts as input an audio signal such as human voice input from a microphone or tape deck. The ultrasonic transmitting device frequency modulates an ultrasonic carrier signal with the audio signal producing a frequency modulated ultrasonic carrier signal, which is transmitted via acoustical pressure waves across a carrier medium such as gases, liquids or solids. The ultrasonic receiving device converts the frequency modulated ultrasonic acoustical pressure waves to a frequency modulated electronic signal, demodulates the audio signal from the ultrasonic carrier signal, and conditions the demodulated audio signal to reproduce the original audio signal at its output. 7 figs.

  6. Near- Source, Seismo-Acoustic Signals Accompanying a NASCAR Race at the Texas Motor Speedway

    NASA Astrophysics Data System (ADS)

    Stump, B. W.; Hayward, C.; Underwood, R.; Howard, J. E.; MacPhail, M. D.; Golden, P.; Endress, A.

    2014-12-01

    Near-source, seismo-acoustic observations provide a unique opportunity to characterize urban sources, remotely sense human activities including vehicular traffic and monitor large engineering structures. Energy separately coupled into the solid earth and atmosphere provides constraints on not only the location of these sources but also the physics of the generating process. Conditions and distances at which these observations can be made are dependent upon not only local geological conditions but also atmospheric conditions at the time of the observations. In order to address this range of topics, an empirical, seismo-acoustic study was undertaken in and around the Texas Motor Speedway in the Dallas-Ft. Worth area during the first week of April 2014 at which time a range of activities associated with a series of NASCAR races occurred. Nine, seismic sensors were deployed around the 1.5-mile track for purposes of documenting the direct-coupled seismic energy from the passage of the cars and other vehicles on the track. Six infrasound sensors were deployed on a rooftop in a rectangular array configuration designed to provide high frequency beam forming for acoustic signals. Finally, a five-element infrasound array was deployed outside the track in order to characterize how the signals propagate away from the sources in the near-source region. Signals recovered from within the track were able to track and characterize the motion of a variety of vehicles during the race weekend including individual racecars. Seismic data sampled at 1000 sps documented strong Doppler effects as the cars approached and moved away from individual sensors. There were faint seismic signals that arrived at seismic velocity but local acoustic to seismic coupling as supported by the acoustic observations generated the majority of seismic signals. Actual seismic ground motions were small as demonstrated by the dominance of regional seismic signals from a magnitude 4.0 earthquake that arrived at

  7. Seismo-acoustic signals associated with degassing explosions recorded at Shishaldin Volcano, Alaska, 2003 2004

    NASA Astrophysics Data System (ADS)

    Petersen, Tanja; McNutt, Stephen R.

    2007-03-01

    In summer 2003, a Chaparral Model 2 microphone was deployed at Shishaldin Volcano, Aleutian Islands, Alaska. The pressure sensor was co-located with a short-period seismometer on the volcano’s north flank at a distance of 6.62 km from the active summit vent. The seismo-acoustic data exhibit a correlation between impulsive acoustic signals (1 2 Pa) and long-period (LP, 1 2 Hz) earthquakes. Since it last erupted in 1999, Shishaldin has been characterized by sustained seismicity consisting of many hundreds to two thousand LP events per day. The activity is accompanied by up to ˜200 m high discrete gas puffs exiting the small summit vent, but no significant eruptive activity has been confirmed. The acoustic waveforms possess similarity throughout the data set (July 2003 November 2004) indicating a repetitive source mechanism. The simplicity of the acoustic waveforms, the impulsive onsets with relatively short (˜10 20 s) gradually decaying codas and the waveform similarities suggest that the acoustic pulses are generated at the fluid air interface within an open-vent system. SO2 measurements have revealed a low SO2 flux, suggesting a hydrothermal system with magmatic gases leaking through. This hypothesis is supported by the steady-state nature of Shishaldin’s volcanic system since 1999. Time delays between the seismic LP and infrasound onsets were acquired from a representative day of seismo-acoustic data. A simple model was used to estimate source depths. The short seismo-acoustic delay times have revealed that the seismic and acoustic sources are co-located at a depth of 240±200 m below the crater rim. This shallow depth is confirmed by resonance of the upper portion of the open conduit, which produces standing waves with f=0.3 Hz in the acoustic waveform codas. The infrasound data has allowed us to relate Shishaldin’s LP earthquakes to degassing explosions, created by gas volume ruptures from a fluid air interface.

  8. Design Foundations for Content-Rich Acoustic Interfaces: Investigating Audemes as Referential Non-Speech Audio Cues

    ERIC Educational Resources Information Center

    Ferati, Mexhid Adem

    2012-01-01

    To access interactive systems, blind and visually impaired users can leverage their auditory senses by using non-speech sounds. The current structure of non-speech sounds, however, is geared toward conveying user interface operations (e.g., opening a file) rather than large theme-based information (e.g., a history passage) and, thus, is ill-suited…

  9. Punch stretching process monitoring using acoustic emission signal analysis. II - Application of frequency domain deconvolution

    NASA Technical Reports Server (NTRS)

    Liang, Steven Y.; Dornfeld, David A.; Nickerson, Jackson A.

    1987-01-01

    The coloring effect on the acoustic emission signal due to the frequency response of the data acquisition/processing instrumentation may bias the interpretation of AE signal characteristics. In this paper, a frequency domain deconvolution technique, which involves the identification of the instrumentation transfer functions and multiplication of the AE signal spectrum by the inverse of these system functions, has been carried out. In this way, the change in AE signal characteristics can be better interpreted as the result of the change in only the states of the process. Punch stretching process was used as an example to demonstrate the application of the technique. Results showed that, through the deconvolution, the frequency characteristics of AE signals generated during the stretching became more distinctive and can be more effectively used as tools for process monitoring.

  10. Problems Associated with Statistical Pattern Recognition of Acoustic Emission Signals in a Compact Tension Fatigue Specimen

    NASA Technical Reports Server (NTRS)

    Hinton, Yolanda L.

    1999-01-01

    Acoustic emission (AE) data were acquired during fatigue testing of an aluminum 2024-T4 compact tension specimen using a commercially available AE system. AE signals from crack extension were identified and separated from noise spikes, signals that reflected from the specimen edges, and signals that saturated the instrumentation. A commercially available software package was used to train a statistical pattern recognition system to classify the signals. The software trained a network to recognize signals with a 91-percent accuracy when compared with the researcher's interpretation of the data. Reasons for the discrepancies are examined and it is postulated that additional preprocessing of the AE data to focus on the extensional wave mode and eliminate other effects before training the pattern recognition system will result in increased accuracy.

  11. Non-invasive estimation of static and pulsatile intracranial pressure from transcranial acoustic signals.

    PubMed

    Levinsky, Alexandra; Papyan, Surik; Weinberg, Guy; Stadheim, Trond; Eide, Per Kristian

    2016-05-01

    The aim of the present study was to examine whether a method for estimation of non-invasive ICP (nICP) from transcranial acoustic (TCA) signals mixed with head-generated sounds estimate the static and pulsatile invasive ICP (iICP). For that purpose, simultaneous iICP and mixed TCA signals were obtained from patients undergoing continuous iICP monitoring as part of clinical management. The ear probe placed in the right outer ear channel sent a TCA signal with fixed frequency (621 Hz) that was picked up by the left ear probe along with acoustic signals generated by the intracranial compartment. Based on a mathematical model of the association between mixed TCA and iICP, the static and pulsatile nICP values were determined. Total 39 patients were included in the study; the total number of observations for prediction of static and pulsatile iICP were 5789 and 6791, respectively. The results demonstrated a good agreement between iICP/nICP observations, with mean difference of 0.39 mmHg and 0.53 mmHg for static and pulsatile ICP, respectively. In summary, in this cohort of patients, mixed TCA signals estimated the static and pulsatile iICP with rather good accuracy. Further studies are required to validate whether mixed TCA signals may become useful for measurement of nICP. PMID:26997563

  12. Dual fiber Bragg gratings configuration-based fiber acoustic sensor for low-frequency signal detection

    NASA Astrophysics Data System (ADS)

    Yang, Dong; Wang, Shun; Lu, Ping; Liu, Deming

    2014-11-01

    We propose and fabricate a new type fiber acoustic sensor based on dual fiber Bragg gratings (FBGs) configuration. The acoustic sensor head is constructed by putting the sensing cells enclosed in an aluminum cylinder space built by two Cband FBGs and a titanium diaphragm of 50 um thickness. One end of each FBG is longitudinally adhered to the diaphragm by UV glue. Both of the two FBGs are employed for reflecting light. The dual FBGs play roles not only as signal transmission system but also as sensing component, and they demodulate each other's optical signal mutually during the measurement. Both of the two FBGs are pre-strained and the output optical power experiences fluctuation in a linear relationship along with a variation of axial strain and surrounding acoustic interference. So a precise approach to measure the frequency and sound pressure of the acoustic disturbance is achieved. Experiments are performed and results show that a relatively flat frequency response in a range from 200 Hz to 1 kHz with the average signal-to-noise ratio (SNR) above 21 dB is obtained. The maximum sound pressure sensitivity of 11.35mV/Pa is achieved with the Rsquared value of 0.99131 when the sound pressure in the range of 87.7-106.6dB. It has potential applications in low frequency signal detection. Owing to its direct self-demodulation method, the sensing system reveals the advantages of easy to demodulate, good temperature stability and measurement reliability. Besides, performance of the proposed sensor could be improved by optimizing the parameters of the sensor, especially the diaphragm.

  13. Circuit for echo and noise suppression of acoustic signals transmitted through a drill string

    DOEpatents

    Drumheller, D.S.; Scott, D.D.

    1993-12-28

    An electronic circuit for digitally processing analog electrical signals produced by at least one acoustic transducer is presented. In a preferred embodiment of the present invention, a novel digital time delay circuit is utilized which employs an array of First-in-First-out (FiFo) microchips. Also, a bandpass filter is used at the input to this circuit for isolating drill string noise and eliminating high frequency output. 20 figures.

  14. Signal Processing Methods for Removing the Effects of Whole Body Vibration upon Speech

    NASA Technical Reports Server (NTRS)

    Bitner, Rachel M.; Begault, Durand R.

    2014-01-01

    Humans may be exposed to whole-body vibration in environments where clear speech communications are crucial, particularly during the launch phases of space flight and in high-performance aircraft. Prior research has shown that high levels of vibration cause a decrease in speech intelligibility. However, the effects of whole-body vibration upon speech are not well understood, and no attempt has been made to restore speech distorted by whole-body vibration. In this paper, a model for speech under whole-body vibration is proposed and a method to remove its effect is described. The method described reduces the perceptual effects of vibration, yields higher ASR accuracy scores, and may significantly improve intelligibility. Possible applications include incorporation within communication systems to improve radio-communication systems in environments such a spaceflight, aviation, or off-road vehicle operations.

  15. The Acoustic Structure and Information Content of Female Koala Vocal Signals

    PubMed Central

    Charlton, Benjamin D.

    2015-01-01

    Determining the information content of animal vocalisations can give valuable insights into the potential functions of vocal signals. The source-filter theory of vocal production allows researchers to examine the information content of mammal vocalisations by linking variation in acoustic features with variation in relevant physical characteristics of the caller. Here I used a source-filter theory approach to classify female koala vocalisations into different call-types, and determine which acoustic features have the potential to convey important information about the caller to other conspecifics. A two-step cluster analysis classified female calls into bellows, snarls and tonal rejection calls. Additional results revealed that female koala vocalisations differed in their potential to provide information about a given caller’s phenotype that may be of importance to receivers. Female snarls did not contain reliable acoustic cues to the caller’s identity and age. In contrast, female bellows and tonal rejection calls were individually distinctive, and the tonal rejection calls of older female koalas had consistently lower mean, minimum and maximum fundamental frequency. In addition, female bellows were significantly shorter in duration and had higher fundamental frequency, formant frequencies, and formant frequency spacing than male bellows. These results indicate that female koala vocalisations have the potential to signal the caller’s identity, age and sex. I go on to discuss the anatomical basis for these findings, and consider the possible functional relevance of signalling this type of information in the koala’s natural habitat. PMID:26465340

  16. Gearbox fault diagnosis based on deep random forest fusion of acoustic and vibratory signals

    NASA Astrophysics Data System (ADS)

    Li, Chuan; Sanchez, René-Vinicio; Zurita, Grover; Cerrada, Mariela; Cabrera, Diego; Vásquez, Rafael E.

    2016-08-01

    Fault diagnosis is an effective tool to guarantee safe operations in gearboxes. Acoustic and vibratory measurements in such mechanical devices are all sensitive to the existence of faults. This work addresses the use of a deep random forest fusion (DRFF) technique to improve fault diagnosis performance for gearboxes by using measurements of an acoustic emission (AE) sensor and an accelerometer that are used for monitoring the gearbox condition simultaneously. The statistical parameters of the wavelet packet transform (WPT) are first produced from the AE signal and the vibratory signal, respectively. Two deep Boltzmann machines (DBMs) are then developed for deep representations of the WPT statistical parameters. A random forest is finally suggested to fuse the outputs of the two DBMs as the integrated DRFF model. The proposed DRFF technique is evaluated using gearbox fault diagnosis experiments under different operational conditions, and achieves 97.68% of the classification rate for 11 different condition patterns. Compared to other peer algorithms, the addressed method exhibits the best performance. The results indicate that the deep learning fusion of acoustic and vibratory signals may improve fault diagnosis capabilities for gearboxes.

  17. The Acoustic Structure and Information Content of Female Koala Vocal Signals.

    PubMed

    Charlton, Benjamin D

    2015-01-01

    Determining the information content of animal vocalisations can give valuable insights into the potential functions of vocal signals. The source-filter theory of vocal production allows researchers to examine the information content of mammal vocalisations by linking variation in acoustic features with variation in relevant physical characteristics of the caller. Here I used a source-filter theory approach to classify female koala vocalisations into different call-types, and determine which acoustic features have the potential to convey important information about the caller to other conspecifics. A two-step cluster analysis classified female calls into bellows, snarls and tonal rejection calls. Additional results revealed that female koala vocalisations differed in their potential to provide information about a given caller's phenotype that may be of importance to receivers. Female snarls did not contain reliable acoustic cues to the caller's identity and age. In contrast, female bellows and tonal rejection calls were individually distinctive, and the tonal rejection calls of older female koalas had consistently lower mean, minimum and maximum fundamental frequency. In addition, female bellows were significantly shorter in duration and had higher fundamental frequency, formant frequencies, and formant frequency spacing than male bellows. These results indicate that female koala vocalisations have the potential to signal the caller's identity, age and sex. I go on to discuss the anatomical basis for these findings, and consider the possible functional relevance of signalling this type of information in the koala's natural habitat. PMID:26465340

  18. Moisture estimation in power transformer oil using acoustic signals and spectral kurtosis

    NASA Astrophysics Data System (ADS)

    Leite, Valéria C. M. N.; Veloso, Giscard F. C.; Borges da Silva, Luiz Eduardo; Lambert-Torres, Germano; Borges da Silva, Jonas G.; Onofre Pereira Pinto, João

    2016-03-01

    The aim of this paper is to present a new technique for estimating the contamination by moisture in power transformer insulating oil based on the spectral kurtosis analysis of the acoustic signals of partial discharges (PDs). Basically, in this approach, the spectral kurtosis of the PD acoustic signal is calculated and the correlation between its maximum value and the moisture percentage is explored to find a function that calculates the moisture percentage. The function can be easily implemented in DSP, FPGA, or any other type of embedded system for online moisture monitoring. To evaluate the proposed approach, an experiment is assembled with a piezoelectric sensor attached to a tank, which is filled with insulating oil samples contaminated by different levels of moisture. A device generating electrical discharges is submerged into the oil to simulate the occurrence of PDs. Detected acoustic signals are processed using fast kurtogram algorithm to extract spectral kurtosis values. The obtained data are used to find the fitting function that relates the water contamination to the maximum value of the spectral kurtosis. Experimental results show that the proposed method is suitable for online monitoring system of power transformers.

  19. Acoustic effects of the ATOC signal (75 Hz, 195 dB) on dolphins and whales

    SciTech Connect

    Au, W.W.; Nachtigall, P.E.; Pawloski, J.L.

    1997-05-01

    The Acoustic Thermometry of Ocean Climate (ATOC) program of Scripps Institution of Oceanography and the Applied Physics Laboratory, University of Washington, will broadcast a low-frequency 75-Hz phase modulated acoustic signal over ocean basins in order to study ocean temperatures on a global scale and examine the effects of global warming. One of the major concerns is the possible effect of the ATOC signal on marine life, especially on dolphins and whales. In order to address this issue, the hearing sensitivity of a false killer whale ({ital Pseudorca crassidens}) and a Risso{close_quote}s dolphin ({ital Grampus griseus}) to the ATOC sound was measured behaviorally. A staircase procedure with the signal levels being changed in 1-dB steps was used to measure the animals{close_quote} threshold to the actual ATOC coded signal. The results indicate that small odontocetes such as the {ital Pseudorca} and {ital Grampus} swimming directly above the ATOC source will not hear the signal unless they dive to a depth of approximately 400 m. A sound propagation analysis suggests that the sound-pressure level at ranges greater than 0.5 km will be less than 130 dB for depths down to about 500 m. Several species of baleen whales produce sounds much greater than 170{endash}180 dB. With the ATOC source on the axis of the deep sound channel (greater than 800 m), the ATOC signal will probably have minimal physical and physiological effects on cetaceans. {copyright} {ital 1997 Acoustical Society of America.}

  20. "Perception of the speech code" revisited: Speech is alphabetic after all.

    PubMed

    Fowler, Carol A; Shankweiler, Donald; Studdert-Kennedy, Michael

    2016-03-01

    We revisit an article, "Perception of the Speech Code" (PSC), published in this journal 50 years ago (Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967) and address one of its legacies concerning the status of phonetic segments, which persists in theories of speech today. In the perspective of PSC, segments both exist (in language as known) and do not exist (in articulation or the acoustic speech signal). Findings interpreted as showing that speech is not a sound alphabet, but, rather, phonemes are encoded in the signal, coupled with findings that listeners perceive articulation, led to the motor theory of speech perception, a highly controversial legacy of PSC. However, a second legacy, the paradoxical perspective on segments has been mostly unquestioned. We remove the paradox by offering an alternative supported by converging evidence that segments exist in language both as known and as used. We support the existence of segments in both language knowledge and in production by showing that phonetic segments are articulatory and dynamic and that coarticulation does not eliminate them. We show that segments leave an acoustic signature that listeners can track. This suggests that speech is well-adapted to public communication in facilitating, not creating a barrier to, exchange of language forms. PMID:26301536

  1. Estimation of the Fundamental Frequency of the Speech Signal Compressed by G.723.1 Algorithm Applying PCC Interpolation

    NASA Astrophysics Data System (ADS)

    Milivojević, Zoran N.; Brodić, Darko

    2011-07-01

    In this paper the results of the estimation of the fundamental frequency of the speech signal modeled by the G.723.1 method are analyzed. The estimation of the fundamental frequency was performed by the Peaking-Peaks algorithm with the implemented Parametric Cubic Convolution (PCC) interpolation. The efficiency of PCC was tested for Keys, Greville and Greville two-parametric kernel. Depending on MSE a window that gives optimal results was chosen.

  2. Response of acoustic signals generated in water by energetic xenon ions

    NASA Astrophysics Data System (ADS)

    Miyachi, T.; Nakamura, Y.; Kuraza, G.; Fujii, M.; Nagashima, A.; Hasebe, N.; Kobayashi, M. N.; Kobayashi, S.; Miyajima, M.; Okudaira, O.; Yamashita, N.; Shibata, H.; Murakami, T.; Uchihori, Y.; Okada, N.; Tou, T.

    2006-05-01

    The acoustic signals generated by bombarding 400 MeV/n xenon ions in water were studied using an array of piezoelectric lead-zirconate-titanate elements. The observed signal was reduced to a bipolar form through Fourier analysis. The output voltage corresponded to the amount of energy deposit in water, and it tailed off beyond the range of 400 MeV/n xenon in water. This magnitude was explained qualitatively as cumulative processes. Its behavior was consistent with the calculations based on the Bethe-Bloch formula. Possible applications of this detector to radiology and heavily doped radiation detectors are described.

  3. Quadratic Time-Frequency Analysis of Hydroacoustic Signals as Applied to Acoustic Emissions of Large Whales

    NASA Astrophysics Data System (ADS)

    Le Bras, Ronan; Victor, Sucic; Damir, Malnar; Götz, Bokelmann

    2014-05-01

    In order to enrich the set of attributes in setting up a large database of whale signals, as envisioned in the Baleakanta project, we investigate methods of time-frequency analysis. The purpose of establishing the database is to increase and refine knowledge of the emitted signal and of its propagation characteristics, leading to a better understanding of the animal migrations in a non-invasive manner and to characterize acoustic propagation in oceanic media. The higher resolution for signal extraction and a better separation from other signals and noise will be used for various purposes, including improved signal detection and individual animal identification. The quadratic class of time-frequency distributions (TFDs) is the most popular set of time-frequency tools for analysis and processing of non-stationary signals. Two best known and most studied members of this class are the spectrogram and the Wigner-Ville distribution. However, to be used efficiently, i.e. to have highly concentrated signal components while significantly suppressing interference and noise simultaneously, TFDs need to be optimized first. The optimization method used in this paper is based on the Cross-Wigner-Ville distribution, and unlike similar approaches it does not require prior information on the analysed signal. The method is applied to whale signals, which, just like the majority of other real-life signals, can generally be classified as multicomponent non-stationary signals, and hence time-frequency techniques are a natural choice for their representation, analysis, and processing. We present processed data from a set containing hundreds of individual calls. The TFD optimization method results into a high resolution time-frequency representation of the signals. It allows for a simple extraction of signal components from the TFD's dominant ridges. The local peaks of those ridges can then be used for the signal components instantaneous frequency estimation, which in turn can be used as

  4. Acoustics

    NASA Technical Reports Server (NTRS)

    Goodman, Jerry R.; Grosveld, Ferdinand

    2007-01-01

    The acoustics environment in space operations is important to maintain at manageable levels so that the crewperson can remain safe, functional, effective, and reasonably comfortable. High acoustic levels can produce temporary or permanent hearing loss, or cause other physiological symptoms such as auditory pain, headaches, discomfort, strain in the vocal cords, or fatigue. Noise is defined as undesirable sound. Excessive noise may result in psychological effects such as irritability, inability to concentrate, decrease in productivity, annoyance, errors in judgment, and distraction. A noisy environment can also result in the inability to sleep, or sleep well. Elevated noise levels can affect the ability to communicate, understand what is being said, hear what is going on in the environment, degrade crew performance and operations, and create habitability concerns. Superfluous noise emissions can also create the inability to hear alarms or other important auditory cues such as an equipment malfunctioning. Recent space flight experience, evaluations of the requirements in crew habitable areas, and lessons learned (Goodman 2003; Allen and Goodman 2003; Pilkinton 2003; Grosveld et al. 2003) show the importance of maintaining an acceptable acoustics environment. This is best accomplished by having a high-quality set of limits/requirements early in the program, the "designing in" of acoustics in the development of hardware and systems, and by monitoring, testing and verifying the levels to ensure that they are acceptable.

  5. Behavioral assessment of acoustic parameters relevant to signal recognition and preference in a vocal fish.

    PubMed

    McKibben, J R; Bass, A H

    1998-12-01

    Acoustic signal recognition depends on the receiver's processing of the physical attributes of a sound. This study takes advantage of the simple communication sounds produced by plainfin midshipman fish to examine effects of signal variation on call recognition and preference. Nesting male midshipman generate both long duration (> 1 min) sinusoidal-like "hums" and short duration "grunts." The hums of neighboring males often overlap, creating beat waveforms. Presentation of humlike, single tone stimuli, but not grunts or noise, elicited robust attraction (phonotaxis) by gravid females. In two-choice tests, females differentiated and chose between acoustic signals that differed in duration, frequency, amplitude, and fine temporal content. Frequency preferences were temperature dependent, in accord with the known temperature dependence of hum fundamental frequency. Concurrent hums were simulated with two-tone beat stimuli, either presented from a single speaker or produced more naturally by interference between adjacent sources. Whereas certain single-source beats reduced stimulus attractiveness, beats which resolved into unmodulated tones at their sources did not affect preference. These results demonstrate that phonotactic assessment of stimulus relevance can be applied in a teleost fish, and that multiple signal parameters can affect receiver response in a vertebrate with relatively simple communication signals. PMID:9857511

  6. Brain-Computer Interfaces for Speech Communication

    PubMed Central

    Brumberg, Jonathan S.; Nieto-Castanon, Alfonso; Kennedy, Philip R.; Guenther, Frank H.

    2010-01-01

    This paper briefly reviews current silent speech methodologies for normal and disabled individuals. Current techniques utilizing electromyographic (EMG) recordings of vocal tract movements are useful for physically healthy individuals but fail for tetraplegic individuals who do not have accurate voluntary control over the speech articulators. Alternative methods utilizing EMG from other body parts (e.g., hand, arm, or facial muscles) or electroencephalography (EEG) can provide capable silent communication to severely paralyzed users, though current interfaces are extremely slow relative to normal conversation rates and require constant attention to a computer screen that provides visual feedback and/or cueing. We present a novel approach to the problem of silent speech via an intracortical microelectrode brain computer interface (BCI) to predict intended speech information directly from the activity of neurons involved in speech production. The predicted speech is synthesized and acoustically fed back to the user with a delay under 50 ms. We demonstrate that the Neurotrophic Electrode used in the BCI is capable of providing useful neural recordings for over 4 years, a necessary property for BCIs that need to remain viable over the lifespan of the user. Other design considerations include neural decoding techniques based on previous research involving BCIs for computer cursor or robotic arm control via prediction of intended movement kinematics from motor cortical signals in monkeys and humans. Initial results from a study of continuous speech production with instantaneous acoustic feedback show the BCI user was able to improve his control over an artificial speech synthesizer both within and across recording sessions. The success of this initial trial validates the potential of the intracortical microelectrode-based approach for providing a speech prosthesis that can allow much more rapid communication rates. PMID:20204164

  7. Seismo-acoustic Signals Recorded at KSIAR, the Infrasound Array Installed at PS31

    NASA Astrophysics Data System (ADS)

    Kim, T. S.; Che, I. Y.; Jeon, J. S.; Chi, H. C.; Kang, I. B.

    2014-12-01

    One of International Monitoring System (IMS)'s primary seismic stations, PS31, called Korea Seismic Research Station (KSRS), was installed around Wonju, Korea in 1970s. It has been operated by US Air Force Technical Applications Center (AFTAC) for more than 40 years. KSRS is composed of 26 seismic sensors including 19 short period, 6 long period and 1 broad band seismometers. The 19 short period sensors were used to build an array with a 10-km aperture while the 6 long period sensors were used for a relatively long period array with a 40-km aperture. After KSRS was certified as an IMS station in 2006 by Comprehensive Nuclear Test Ban Treaty Organization (CTBTO), Korea Institute of Geoscience and Mineral Resources (KIGAM) which is the Korea National Data Center started to take over responsibilities on the operation and maintenance of KSRS from AFTAC. In April of 2014, KIGAM installed an infrasound array, KSIAR, on the existing four short period seismic stations of KSRS, the sites KS05, KS06, KS07 and KS16. The collocated KSIAR changed KSRS from a seismic array into a seismo-acoustic array. The aperture of KSIAR is 3.3 km. KSIAR also has a 100-m small aperture infrasound array at KS07. The infrasound data from KSIAR except that from the site KS06 is being transmitted in real time to KIGAM with VPN and internet line. An initial analysis on seismo-acoustic signals originated from local and regional distance ranges has been performed since May 2014. The analysis with the utilization of an array process called Progressive Multi-Channel Correlation (PMCC) detected seismo-acoustic signals caused by various sources including small explosions in relation to constructing local tunnels and roads. Some of them were not found in the list of automatic bulletin of KIGAM. The seismo-acoustic signals recorded by KSIAR are supplying a useful information for discriminating local and regional man-made events from natural events.

  8. Achieving Electric-Acoustic Benefit with a Modulated Tone

    PubMed Central

    Brown, Christopher A.; Bacon, Sid P.

    2013-01-01

    Objective When either real or simulated electric stimulation from a cochlear implant (CI) is combined with low-frequency acoustic stimulation (electric-acoustic stimulation [EAS]), speech intelligibility in noise can improve dramatically. We recently showed that a similar benefit to intelligibility can be observed in simulation when the low-frequency acoustic stimulation (low-pass target speech) is replaced with a tone that is modulated both in frequency with the fundamental frequency (F0) of the target talker and in amplitude with the amplitude envelope of the low-pass target speech (Brown & Bacon 2009). The goal of the current experiment was to examine the benefit of the modulated tone to intelligibility in CI patients. Design Eight CI users who had some residual acoustic hearing either in the implanted ear, the unimplanted ear, or both ears participated in this study. Target speech was combined with either multitalker babble or a single competing talker and presented to the implant. Stimulation to the acoustic region consisted of no signal, target speech, or a tone that was modulated in frequency to track the changes in the target talker’s F0 and in amplitude to track the amplitude envelope of target speech low-pass filtered at 500 Hz. Results All patients showed improvements in intelligibility over electric-only stimulation when either the tone or target speech was presented acoustically. The average improvement in intelligibility was 46 percentage points due to the tone and 55 percentage points due to target speech. Conclusions The results demonstrate that a tone carrying F0 and amplitude envelope cues of target speech can provide significant benefit to CI users and may lead to new technologies that could offer EAS benefit to many patients who would not benefit from current EAS approaches. PMID:19546806

  9. The correlation dimension: A robust chaotic feature for classifying acoustic emission signals generated in construction materials

    NASA Astrophysics Data System (ADS)

    Kacimi, S.; Laurens, S.

    2009-07-01

    In the field of acoustic emission (AE) source recognition, this paper presents a classification feature based on the paradigm of nonlinear dynamical systems, often referred to as chaos theory. The approach considers signals as time series expressing an underlying dynamical phenomenon and enclosing all the information regarding the dynamics. The scientific knowledge on nonlinear dynamical systems has considerably improved for the past 40 years. The dynamical behavior is analyzed in the phase space, which is the space generated by the state variables of the system. The time evolution of a system is expressed in the phase space by trajectories, and the asymptotic behavior of trajectories defines a space area which is referred to as a system attractor. Dynamical systems may be characterized by the topological properties of attractors, such as the correlation dimension, which is a fractal dimension. According to Takens theorem, even if the system is not clearly defined, it is possible to infer topological information about the attractor from experimental observations. Such a method, which is called phase space reconstruction, was successfully applied for the classification of acoustic emission waveforms propagating in more or less complex materials such as granite and concrete. Laboratory tests were carried out in order to collect numerous AE waveforms from various controlled acoustic sources. Then, each signal was processed to extract a reconstructed attractor from which the correlation dimension was computed. The first results of this research show that the correlation dimension assessed after phase space reconstruction is very relevant and robust for classifying AE signals. These promising results may be explained by the fact that the totality of the signal is used to achieve classifying information. Moreover, due to the self-similar nature of attractors, the correlation dimension, and thus a correlation dimension-based classification approach, is theoretically

  10. Acoustic alarm signalling facilitates predator protection of treehoppers by mutualist ant bodyguards

    PubMed Central

    Morales, Manuel A; Barone, Jennifer L; Henry, Charles S

    2008-01-01

    Mutualism is a net positive interaction that includes varying degrees of both costs and benefits. Because tension between the costs and benefits of mutualism can lead to evolutionary instability, identifying mechanisms that regulate investment between partners is critical to understanding the evolution and maintenance of mutualism. Recently, studies have highlighted the importance of interspecific signalling as one mechanism for regulating investment between mutualist partners. Here, we provide evidence for interspecific alarm signalling in an insect protection mutualism and we demonstrate a functional link between this acoustic signalling and efficacy of protection. The treehopper Publilia concava Say (Hemiptera: Membracidae) is an insect that provides ants with a carbohydrate-rich excretion called honeydew in return for protection from predators. Adults of this species produce distinct vibrational signals in the context of predator encounters. In laboratory trials, putative alarm signal production significantly increased following initial contact with ladybeetle predators (primarily Harmonia axyridis Pallas, Coleoptera: Coccinellidae), but not following initial contact with ants. In field trials, playback of a recorded treehopper alarm signal resulted in a significant increase in both ant activity and the probability of ladybeetle discovery by ants relative to both silence and treehopper courtship signal controls. Our results show that P. concava treehoppers produce alarm signals in response to predator threat and that this signalling can increase effectiveness of predator protection by ants. PMID:18480015

  11. Speech analyzer

    NASA Technical Reports Server (NTRS)

    Lokerson, D. C. (Inventor)

    1977-01-01

    A speech signal is analyzed by applying the signal to formant filters which derive first, second and third signals respectively representing the frequency of the speech waveform in the first, second and third formants. A first pulse train having approximately a pulse rate representing the average frequency of the first formant is derived; second and third pulse trains having pulse rates respectively representing zero crossings of the second and third formants are derived. The first formant pulse train is derived by establishing N signal level bands, where N is an integer at least equal to two. Adjacent ones of the signal bands have common boundaries, each of which is a predetermined percentage of the peak level of a complete cycle of the speech waveform.

  12. Multisensor multipulse Linear Predictive Coding (LPC) analysis in noise for medium rate speech transmission

    NASA Astrophysics Data System (ADS)

    Preuss, R. D.

    1985-12-01

    The theory of multipulse linear predictive coding (LPC) analysis is extended to include the possible presence of acoustic noise, as for a telephone near a busy road. Models are developed assuming two signals are provided: the primary signal is the output of a microphone which samples the combined acoustic fields of the noise and the speech, while the secondary signal is the output of a microphone which samples the acoustic field of the noise alone. Analysis techniques to extract the multipulse LPC parameters from these two signals are developed; these techniques are developed as approximations to maximum likelihood analysis for the given model.

  13. Optimal Combination of Neural Temporal Envelope and Fine Structure Cues to Explain Speech Identification in Background Noise

    PubMed Central

    Moon, Il Joon; Won, Jong Ho; Ives, D. Timothy; Nie, Kaibao; Heinz, Michael G.; Lorenzi, Christian; Rubinstein, Jay T.

    2014-01-01

    The dichotomy between acoustic temporal envelope (ENV) and fine structure (TFS) cues has stimulated numerous studies over the past decade to understand the relative role of acoustic ENV and TFS in human speech perception. Such acoustic temporal speech cues produce distinct neural discharge patterns at the level of the auditory nerve, yet little is known about the central neural mechanisms underlying the dichotomy in speech perception between neural ENV and TFS cues. We explored the question of how the peripheral auditory system encodes neural ENV and TFS cues in steady or fluctuating background noise, and how the central auditory system combines these forms of neural information for speech identification. We sought to address this question by (1) measuring sentence identification in background noise for human subjects as a function of the degree of available acoustic TFS information and (2) examining the optimal combination of neural ENV and TFS cues to explain human speech perception performance using computational models of the peripheral auditory system and central neural observers. Speech-identification performance by human subjects decreased as the acoustic TFS information was degraded in the speech signals. The model predictions best matched human performance when a greater emphasis was placed on neural ENV coding rather than neural TFS. However, neural TFS cues were necessary to account for the full effect of background-noise modulations on human speech-identification performance. PMID:25186758

  14. A Novel Method for Speech Acquisition and Enhancement by 94 GHz Millimeter-Wave Sensor

    PubMed Central

    Chen, Fuming; Li, Sheng; Li, Chuantao; Liu, Miao; Li, Zhao; Xue, Huijun; Jing, Xijing; Wang, Jianqi

    2015-01-01

    In order to improve the speech acquisition ability of a non-contact method, a 94 GHz millimeter wave (MMW) radar sensor was employed to detect speech signals. This novel non-contact speech acquisition method was shown to have high directional sensitivity, and to be immune to strong acoustical disturbance. However, MMW radar speech is often degraded by combined sources of noise, which mainly include harmonic, electrical circuit and channel noise. In this paper, an algorithm combining empirical mode decomposition (EMD) and mutual information entropy (MIE) was proposed for enhancing the perceptibility and intelligibility of radar speech. Firstly, the radar speech signal was adaptively decomposed into oscillatory components called intrinsic mode functions (IMFs) by EMD. Secondly, MIE was used to determine the number of reconstructive components, and then an adaptive threshold was employed to remove the noise from the radar speech. The experimental results show that human speech can be effectively acquired by a 94 GHz MMW radar sensor when the detection distance is 20 m. Moreover, the noise of the radar speech is greatly suppressed and the speech sounds become more pleasant to human listeners after being enhanced by the proposed algorithm, suggesting that this novel speech acquisition and enhancement method will provide a promising alternative for various applications associated with speech detection. PMID:26729126

  15. Temporal patterns in the acoustic signals of beaked whales at Cross Seamount.

    PubMed

    Johnston, D W; McDonald, M; Polovina, J; Domokos, R; Wiggins, S; Hildebrand, J

    2008-04-23

    Seamounts may influence the distribution of marine mammals through a combination of increased ocean mixing, enhanced local productivity and greater prey availability. To study the effects of seamounts on the presence and acoustic behaviour of cetaceans, we deployed a high-frequency acoustic recording package on the summit of Cross Seamount during April through October 2005. The most frequently detected cetacean vocalizations were echolocation sounds similar to those produced by ziphiid and mesoplodont beaked whales together with buzz-type signals consistent with prey-capture attempts. Beaked whale signals occurred almost entirely at night throughout the six-month deployment. Measurements of prey presence with a Simrad EK-60 fisheries acoustics echo sounder indicate that Cross Seamount may enhance local productivity in near-surface waters. Concentrations of micronekton were aggregated over the seamount in near-surface waters at night, and dense concentrations of nekton were detected across the surface of the summit. Our results suggest that seamounts may provide enhanced foraging opportunities for beaked whales during the night through a combination of increased productivity, vertical migrations by micronekton and local retention of prey. Furthermore, the summit of the seamount may act as a barrier against which whales concentrate prey. PMID:18252660

  16. Extruded Bread Classification on the Basis of Acoustic Emission Signal With Application of Artificial Neural Networks

    NASA Astrophysics Data System (ADS)

    Świetlicka, Izabela; Muszyński, Siemowit; Marzec, Agata

    2015-04-01

    The presented work covers the problem of developing a method of extruded bread classification with the application of artificial neural networks. Extruded flat graham, corn, and rye breads differening in water activity were used. The breads were subjected to the compression test with simultaneous registration of acoustic signal. The amplitude-time records were analyzed both in time and frequency domains. Acoustic emission signal parameters: single energy, counts, amplitude, and duration acoustic emission were determined for the breads in four water activities: initial (0.362 for rye, 0.377 for corn, and 0.371 for graham bread), 0.432, 0.529, and 0.648. For classification and the clustering process, radial basis function, and self-organizing maps (Kohonen network) were used. Artificial neural networks were examined with respect to their ability to classify or to cluster samples according to the bread type, water activity value, and both of them. The best examination results were achieved by the radial basis function network in classification according to water activity (88%), while the self-organizing maps network yielded 81% during bread type clustering.

  17. Incident signal power comparison for localization of concurrent multiple acoustic sources.

    PubMed

    Salvati, Daniele; Canazza, Sergio

    2014-01-01

    In this paper, a method to solve the localization of concurrent multiple acoustic sources in large open spaces is presented. The problem of the multisource localization in far-field conditions is to correctly associate the direction of arrival (DOA) estimated by a network array system to the same source. The use of systems implementing a Bayesian filter is a traditional approach to address the problem of localization in multisource acoustic scenario. However, in a real noisy open space the acoustic sources are often discontinuous with numerous short-duration events and thus the filtering methods may have difficulty to track the multiple sources. Incident signal power comparison (ISPC) is proposed to compute DOAs association. ISPC is based on identifying the incident signal power (ISP) of the sources on a microphone array using beamforming methods and comparing the ISP between different arrays using spectral distance (SD) measurement techniques. This method solves the ambiguities, due to the presence of simultaneous sources, by identifying sounds through a minimization of an error criterion on SD measures of DOA combinations. The experimental results were conducted in an outdoor real noisy environment and the ISPC performance is reported using different beamforming techniques and SD functions. PMID:24701179

  18. Incident Signal Power Comparison for Localization of Concurrent Multiple Acoustic Sources

    PubMed Central

    2014-01-01

    In this paper, a method to solve the localization of concurrent multiple acoustic sources in large open spaces is presented. The problem of the multisource localization in far-field conditions is to correctly associate the direction of arrival (DOA) estimated by a network array system to the same source. The use of systems implementing a Bayesian filter is a traditional approach to address the problem of localization in multisource acoustic scenario. However, in a real noisy open space the acoustic sources are often discontinuous with numerous short-duration events and thus the filtering methods may have difficulty to track the multiple sources. Incident signal power comparison (ISPC) is proposed to compute DOAs association. ISPC is based on identifying the incident signal power (ISP) of the sources on a microphone array using beamforming methods and comparing the ISP between different arrays using spectral distance (SD) measurement techniques. This method solves the ambiguities, due to the presence of simultaneous sources, by identifying sounds through a minimization of an error criterion on SD measures of DOA combinations. The experimental results were conducted in an outdoor real noisy environment and the ISPC performance is reported using different beamforming techniques and SD functions. PMID:24701179

  19. Speaker specificity in speech perception: the importance of what is and is not in the signal

    NASA Astrophysics Data System (ADS)

    Dahan, Delphine; Scarborough, Rebecca A.

    2005-09-01

    In some American English dialects, /ae/ before /g/ (but not before /k/) raises to a vowel approaching [E], in effect reducing phonetic overlap between (e.g.) ``bag'' and ``back.'' Here, participants saw four written words on a computer screen (e.g., ``bag,'' ``back,'' ``dog,'' ``dock'') and heard a spoken word. Their task was to indicate which word they heard. Participants' eye movements to the written words were recorded. Participants in the ``ae-raising'' group heard identity-spliced ``bag''-like words containing the raised vowel [E] participants in the ``control'' group heard cross-spliced ``bag''-like words containing standard [ae]. Acoustically identical ``back''-like words were subsequently presented to both groups. The ae-raising-group participants identified ``back''-like words faster and more accurately, and made fewer fixations to the competitor ``bag,'' than control-group participants did. Thus, exposure to ae-raised realizations of ``bag'' facilitated the identification of ``back'' because of the reduced fit between the input and the altered representation of the competing hypothesis ``bag.'' This demonstrates that listeners evaluate the spoken input with respect to what is, but also what is not, in the signal, and that this evaluation involves speaker-specific representations. [Work supported by NSF Human and Social Dynamics 0433567.

  20. Effects of temporal envelope modulation on acoustic signal recognition in a vocal fish, the plainfin midshipman.

    PubMed

    McKibben, J R; Bass, A H

    2001-06-01

    Amplitude modulation is an important parameter defining vertebrate acoustic communication signals. Nesting male plainfin midshipman fish, Porichthys notatus, emit simple, long duration hums in which modulation is strikingly absent. Envelope modulation is, however, introduced when the hums of adjacent males overlap to produce acoustic beats. Hums attract gravid females and can be mimicked with continuous tones at the fundamental frequency. While individual hums have flat envelopes, other midshipman signals are amplitude modulated. This study used one-choice playback tests with gravid females to examine the role of envelope modulation in hum recognition. Various pulse train and two-tone beat stimuli resembling natural communication signals were presented individually, and the responses compared to those for continuous pure tones. The effectiveness of pulse trains was graded and depended upon both pulse duration and the ratio of pulse to gap length. Midshipman were sensitive to beat modulations from 0.5 to 10 Hz, with fewer fish approaching the beat than the pure tone. Reducing the degree of modulation increased the effectiveness of beat stimuli. Hence, the lack of modulation in the midshipman's advertisement call corresponds to the importance of envelope modulation for the categorization of communication signals even in this relatively simple system. PMID:11425135

  1. A hardware model of the auditory periphery to transduce acoustic signals into neural activity

    PubMed Central

    Tateno, Takashi; Nishikawa, Jun; Tsuchioka, Nobuyoshi; Shintaku, Hirofumi; Kawano, Satoyuki

    2013-01-01

    To improve the performance of cochlear implants, we have integrated a microdevice into a model of the auditory periphery with the goal of creating a microprocessor. We constructed an artificial peripheral auditory system using a hybrid model in which polyvinylidene difluoride was used as a piezoelectric sensor to convert mechanical stimuli into electric signals. To produce frequency selectivity, the slit on a stainless steel base plate was designed such that the local resonance frequency of the membrane over the slit reflected the transfer function. In the acoustic sensor, electric signals were generated based on the piezoelectric effect from local stress in the membrane. The electrodes on the resonating plate produced relatively large electric output signals. The signals were fed into a computer model that mimicked some functions of inner hair cells, inner hair cell–auditory nerve synapses, and auditory nerve fibers. In general, the responses of the model to pure-tone burst and complex stimuli accurately represented the discharge rates of high-spontaneous-rate auditory nerve fibers across a range of frequencies greater than 1 kHz and middle to high sound pressure levels. Thus, the model provides a tool to understand information processing in the peripheral auditory system and a basic design for connecting artificial acoustic sensors to the peripheral auditory nervous system. Finally, we discuss the need for stimulus control with an appropriate model of the auditory periphery based on auditory brainstem responses that were electrically evoked by different temporal pulse patterns with the same pulse number. PMID:24324432

  2. The effect of artificial rain on backscattered acoustic signal: first measurements

    NASA Astrophysics Data System (ADS)

    Titchenko, Yuriy; Karaev, Vladimir; Meshkov, Evgeny; Goldblat, Vladimir

    The problem of rain influencing on a characteristics of backscattered ultrasonic and microwave signal by water surface is considered. The rain influence on backscattering process of electromagnetic waves was investigated in laboratory and field experiments, for example [1-3]. Raindrops have a significant impact on backscattering of microwave and influence on wave spectrum measurement accuracy by string wave gauge. This occurs due to presence of raindrops in atmosphere and modification of the water surface. For measurements of water surface characteristics during precipitation we propose to use an acoustic system. This allows us obtaining of the water surface parameters independently on precipitation in atmosphere. The measurements of significant wave height of water surface using underwater acoustical systems are well known [4, 5]. Moreover, the variance of orbital velocity can be measure using these systems. However, these methods cannot be used for measurements of slope variance and the other second statistical moments of water surface that required for analyzing the radar backscatter signal. An original design Doppler underwater acoustic wave gauge allows directly measuring the surface roughness characteristics that affect on electromagnetic waves backscattering of the same wavelength [6]. Acoustic wave gauge is Doppler ultrasonic sonar which is fixed near the bottom on the floating disk. Measurements are carried out at vertically orientation of sonar antennas towards water surface. The first experiments were conducted with the first model of an acoustic wave gauge. The acoustic wave gauge (8 mm wavelength) is equipped with a transceiving antenna with a wide symmetrical antenna pattern. The gauge allows us to measure Doppler spectrum and cross section of backscattered signal. Variance of orbital velocity vertical component can be retrieved from Doppler spectrum with high accuracy. The result of laboratory and field experiments during artificial rain is presented

  3. Wavelet Transform Of Acoustic Signal From A Ranque- Hilsch Vortex Tube

    NASA Astrophysics Data System (ADS)

    Istihat, Y.; Wisnoe, W.

    2015-09-01

    This paper presents the frequency analysis of flow in a Ranque-Hilsch Vortex Tube (RHVT) obtained from acoustic signal using microphones in an isolated formation setup. Data Acquisition System (DAS) that incorporates Analog to Digital Converter (ADC) with laptop computer has been used to acquire the wave data. Different inlet pressures (20, 30, 40, 50 and 60 psi) are supplied and temperature differences are recorded. Frequencies produced from a RHVT are experimentally measured and analyzed by means of Wavelet Transform (WT). Morlet Wavelet is used and relation between Pressure variation, Temperature and Frequency are studied. Acoustic data has been analyzed using Matlab® and time-frequency analysis (Scalogram) is presented. Results show that the Pressure is proportional with the Frequency inside the RHVT whereby two distinct working frequencies is pronounced in between 4-8 kHz.

  4. Detection of geodesic acoustic mode oscillations, using multiple signal classification analysis of Doppler backscattering signal on Tore Supra

    NASA Astrophysics Data System (ADS)

    Vermare, L.; Hennequin, P.; Gürcan, Ö. D.; the Tore Supra Team

    2012-06-01

    This paper presents the first observation of geodesic acoustic modes (GAMs) on Tore Supra plasmas. Using the Doppler backscattering system, the oscillations of the plasma flow velocity, localized between r/a = 0.85 and r/a = 0.95, and with a frequency, typically around 10 kHz, have been observed at the plasma edge in numerous discharges. When the additional heating power is varied, the frequency is found to scale with Cs/R. The MUltiple SIgnal Classification (MUSIC) algorithm is employed to access the temporal evolution of the perpendicular velocity of density fluctuations. The method is presented in some detail, and is validated and compared against standard methods, such as the conventional fast Fourier transform method, using a synthetic signal. It stands out as a powerful data analysis method to follow the Doppler frequency with a high temporal resolution, which is important in order to extract the dynamics of GAMs.

  5. Comparison of a short-time speech-based intelligibility metric to the speech transmission index and intelligibility dataa

    PubMed Central

    Payton, Karen L.; Shrestha, Mona

    2013-01-01

    Several algorithms have been shown to generate a metric corresponding to the Speech Transmission Index (STI) using speech as a probe stimulus [e.g., Goldsworthy and Greenberg, J. Acoust. Soc. Am. 116, 3679–3689 (2004)]. The time-domain approaches work well on long speech segments and have the added potential to be used for short-time analysis. This study investigates the performance of the Envelope Regression (ER) time-domain STI method as a function of window length, in acoustically degraded environments with multiple talkers and speaking styles. The ER method is compared with a short-time Theoretical STI, derived from octave-band signal-to-noise ratios and reverberation times. For windows as short as 0.3 s, the ER method tracks short-time Theoretical STI changes in stationary speech-shaped noise, fluctuating restaurant babble and stationary noise plus reverberation. The metric is also compared to intelligibility scores on conversational speech and speech articulated clearly but at normal speaking rates (Clear/Norm) in stationary noise. Correlation between the metric and intelligibility scores is high and, consistent with the subject scores, the metrics are higher for Clear/Norm speech than for conversational speech and higher for the first word in a sentence than for the last word. PMID:24180791

  6. Comparison of a short-time speech-based intelligibility metric to the speech transmission index and intelligibility data.

    PubMed

    Payton, Karen L; Shrestha, Mona

    2013-11-01

    Several algorithms have been shown to generate a metric corresponding to the Speech Transmission Index (STI) using speech as a probe stimulus [e.g., Goldsworthy and Greenberg, J. Acoust. Soc. Am. 116, 3679-3689 (2004)]. The time-domain approaches work well on long speech segments and have the added potential to be used for short-time analysis. This study investigates the performance of the Envelope Regression (ER) time-domain STI method as a function of window length, in acoustically degraded environments with multiple talkers and speaking styles. The ER method is compared with a short-time Theoretical STI, derived from octave-band signal-to-noise ratios and reverberation times. For windows as short as 0.3 s, the ER method tracks short-time Theoretical STI changes in stationary speech-shaped noise, fluctuating restaurant babble and stationary noise plus reverberation. The metric is also compared to intelligibility scores on conversational speech and speech articulated clearly but at normal speaking rates (Clear/Norm) in stationary noise. Correlation between the metric and intelligibility scores is high and, consistent with the subject scores, the metrics are higher for Clear/Norm speech than for conversational speech and higher for the first word in a sentence than for the last word. PMID:24180791

  7. Classroom acoustics: Three pilot studies

    NASA Astrophysics Data System (ADS)

    Smaldino, Joseph J.

    2005-04-01

    This paper summarizes three related pilot projects designed to focus on the possible effects of classroom acoustics on fine auditory discrimination as it relates to language acquisition, especially English as a second language. The first study investigated the influence of improving the signal-to-noise ratio on the differentiation of English phonemes. The results showed better differentiation with better signal-to-noise ratio. The second studied speech perception in noise by young adults for whom English was a second language. The outcome indicated that the second language learners required a better signal-to-noise ratio to perform equally to the native language participants. The last study surveyed the acoustic conditions of preschool and day care classrooms, wherein first and second language learning occurs. The survey suggested an unfavorable acoustic environment for language learning.

  8. Cued speech for enhancing speech perception and first language development of children with cochlear implants.

    PubMed

    Leybaert, Jacqueline; LaSasso, Carol J

    2010-06-01

    Nearly 300 million people worldwide have moderate to profound hearing loss. Hearing impairment, if not adequately managed, has strong socioeconomic and affective impact on individuals. Cochlear implants have become the most effective vehicle for helping profoundly deaf children and adults to understand spoken language, to be sensitive to environmental sounds, and, to some extent, to listen to music. The auditory information delivered by the cochlear implant remains non-optimal for speech perception because it delivers a spectrally degraded signal and lacks some of the fine temporal acoustic structure. In this article, we discuss research revealing the multimodal nature of speech perception in normally-hearing individuals, with important inter-subject variability in the weighting of auditory or visual information. We also discuss how audio-visual training, via Cued Speech, can improve speech perception in cochlear implantees, particularly in noisy contexts. Cued Speech is a system that makes use of visual information from speechreading combined with hand shapes positioned in different places around the face in order to deliver completely unambiguous information about the syllables and the phonemes of spoken language. We support our view that exposure to Cued Speech before or after the implantation could be important in the aural rehabilitation process of cochlear implantees. We describe five lines of research that are converging to support the view that Cued Speech can enhance speech perception in individuals with cochlear implants. PMID:20724357

  9. Cued Speech for Enhancing Speech Perception and First Language Development of Children With Cochlear Implants

    PubMed Central

    Leybaert, Jacqueline; LaSasso, Carol J.

    2010-01-01

    Nearly 300 million people worldwide have moderate to profound hearing loss. Hearing impairment, if not adequately managed, has strong socioeconomic and affective impact on individuals. Cochlear implants have become the most effective vehicle for helping profoundly deaf children and adults to understand spoken language, to be sensitive to environmental sounds, and, to some extent, to listen to music. The auditory information delivered by the cochlear implant remains non-optimal for speech perception because it delivers a spectrally degraded signal and lacks some of the fine temporal acoustic structure. In this article, we discuss research revealing the multimodal nature of speech perception in normally-hearing individuals, with important inter-subject variability in the weighting of auditory or visual information. We also discuss how audio-visual training, via Cued Speech, can improve speech perception in cochlear implantees, particularly in noisy contexts. Cued Speech is a system that makes use of visual information from speechreading combined with hand shapes positioned in different places around the face in order to deliver completely unambiguous information about the syllables and the phonemes of spoken language. We support our view that exposure to Cued Speech before or after the implantation could be important in the aural rehabilitation process of cochlear implantees. We describe five lines of research that are converging to support the view that Cued Speech can enhance speech perception in individuals with cochlear implants. PMID:20724357

  10. Phylogenetic signal in the acoustic parameters of the advertisement calls of four clades of anurans

    PubMed Central

    2013-01-01

    Background Anuran vocalizations, especially their advertisement calls, are largely species-specific and can be used to identify taxonomic affiliations. Because anurans are not vocal learners, their vocalizations are generally assumed to have a strong genetic component. This suggests that the degree of similarity between advertisement calls may be related to large-scale phylogenetic relationships. To test this hypothesis, advertisement calls from 90 species belonging to four large clades (Bufo, Hylinae, Leptodactylus, and Rana) were analyzed. Phylogenetic distances were estimated based on the DNA sequences of the 12S mitochondrial ribosomal RNA gene, and, for a subset of 49 species, on the rhodopsin gene. Mean values for five acoustic parameters (coefficient of variation of root-mean-square amplitude, dominant frequency, spectral flux, spectral irregularity, and spectral flatness) were computed for each species. We then tested for phylogenetic signal on the body-size-corrected residuals of these five parameters, using three statistical tests (Moran’s I, Mantel, and Blomberg’s K) and three models of genetic distance (pairwise distances, Abouheif’s proximities, and the variance-covariance matrix derived from the phylogenetic tree). Results A significant phylogenetic signal was detected for most acoustic parameters on the 12S dataset, across statistical tests and genetic distance models, both for the entire sample of 90 species and within clades in several cases. A further analysis on a subset of 49 species using genetic distances derived from rhodopsin and from 12S broadly confirmed the results obtained on the larger sample, indicating that the phylogenetic signals observed in these acoustic parameters can be detected using a variety of genetic distance models derived either from a variable mitochondrial sequence or from a conserved nuclear gene. Conclusions We found a robust relationship, in a large number of species, between anuran phylogenetic relatedness and

  11. Influence of attenuation on acoustic emission signals in carbon fiber reinforced polymer panels.

    PubMed

    Asamene, Kassahun; Hudson, Larry; Sundaresan, Mannur

    2015-05-01

    Influence of attenuation on acoustic emission (AE) signals in Carbon Fiber Reinforced Polymer (CFRP) crossply and quasi-isotropic panels is examined in this paper. Attenuation coefficients of the fundamental antisymmetric (A0) and symmetric (S0) wave modes were determined experimentally along different directions for the two types of CFRP panels. In the frequency range from 100 kHz to 500 kHz, the A0 mode undergoes significantly greater changes due to material related attenuation compared to the S0 mode. Moderate to strong changes in the attenuation levels were noted with propagation directions. Such mode and frequency dependent attenuation introduces major changes in the characteristics of AE signals depending on the position of the AE sensor relative to the source. Results from finite element simulations of a microscopic damage event in the composite laminates are used to illustrate attenuation related changes in modal and frequency components of AE signals. PMID:25682294

  12. Auditory-tactile echo-reverberating stuttering speech corrector

    NASA Astrophysics Data System (ADS)

    Kuniszyk-Jozkowiak, Wieslawa; Adamczyk, Bogdan

    1997-02-01

    The work presents the construction of a device, which transforms speech sounds into acoustical and tactile signals of echo and reverberation. Research has been done on the influence of the echo and reverberation, which are transmitted as acoustic and tactile stimuli, on speech fluency. Introducing the echo or reverberation into the auditory feedback circuit results in a reduction of stuttering. A bit less, but still significant corrective effects are observed while using the tactile channel for transmitting the signals. The use of joined auditory and tactile channels increases the effects of their corrective influence on the stutterers' speech. The results of the experiment justify the use of the tactile channel in the stutterers' therapy.

  13. Estimates of the prevalence of anomalous signal losses in the Yellow Sea derived from acoustic and oceanographic computer model simulations

    NASA Astrophysics Data System (ADS)

    Chin-Bing, Stanley A.; King, David B.; Warn-Varnas, Alex C.; Lamb, Kevin G.; Hawkins, James A.; Teixeira, Marvi

    2002-05-01

    The results from collocated oceanographic and acoustic simulations in a region of the Yellow Sea near the Shandong peninsula have been presented [Chin-Bing et al., J. Acoust. Soc. Am. 108, 2577 (2000)]. In that work, the tidal flow near the peninsula was used to initialize a 2.5-dimensional ocean model [K. G. Lamb, J. Geophys. Res. 99, 843-864 (1994)] that subsequently generated internal solitary waves (solitons). The validity of these soliton simulations was established by matching satellite imagery taken over the region. Acoustic propagation simulations through this soliton field produced results similar to the anomalous signal loss measured by Zhou, Zhang, and Rogers [J. Acoust. Soc. Am. 90, 2042-2054 (1991)]. Analysis of the acoustic interactions with the solitons also confirmed the hypothesis that the loss mechanism involved acoustic mode coupling. Recently we have attempted to estimate the prevalence of these anomalous signal losses in this region. These estimates were made from simulating acoustic effects over an 80 hour space-time evolution of soliton packets. Examples will be presented that suggest the conditions necessary for anomalous signal loss may be more prevalent than previously thought. [Work supported by ONR/NRL and by a High Performance Computing DoD grant.

  14. Differential Diagnosis of Severe Speech Disorders Using Speech Gestures

    ERIC Educational Resources Information Center

    Bahr, Ruth Huntley

    2005-01-01

    The differentiation of childhood apraxia of speech from severe phonological disorder is a common clinical problem. This article reports on an attempt to describe speech errors in children with childhood apraxia of speech on the basis of gesture use and acoustic analyses of articulatory gestures. The focus was on the movement of articulators and…

  15. Noise affects the shape of female preference functions for acoustic signals.

    PubMed

    Reichert, Michael S; Ronacher, Bernhard

    2015-02-01

    The shape of female mate preference functions influences the speed and direction of sexual signal evolution. However, the expression of female preferences is modulated by interactions between environmental conditions and the female's sensory processing system. Noise is an especially relevant environmental condition because it interferes directly with the neural processing of signals. Although noise is therefore likely a significant force in the evolution of communication systems, little is known about its effects on preference function shape. In the grasshopper Chorthippus biguttulus, female preferences for male calling song characteristics are likely to be affected by noise because its auditory system is sensitive to fine temporal details of songs. We measured female preference functions for variation in male song characteristics in several levels of masking noise and found strong effects of noise on preference function shape. The overall responsiveness to signals in noise generally decreased. Preference strength increased for some signal characteristics and decreased for others, largely corresponding to expectations based on neurophysiological studies of acoustic signal processing. These results suggest that different signal characteristics will be favored under different noise conditions, and thus that signal evolution may proceed differently depending on the extent and temporal patterning of environmental noise. PMID:25546134

  16. Neural mechanisms underlying auditory feedback control of speech

    PubMed Central

    Reilly, Kevin J.; Guenther, Frank H.

    2013-01-01

    The neural substrates underlying auditory feedback control of speech were investigated using a combination of functional magnetic resonance imaging (fMRI) and computational modeling. Neural responses were measured while subjects spoke monosyllabic words under two conditions: (i) normal auditory feedback of their speech, and (ii) auditory feedback in which the first formant frequency of their speech was unexpectedly shifted in real time. Acoustic measurements showed compensation to the shift within approximately 135 ms of onset. Neuroimaging revealed increased activity in bilateral superior temporal cortex during shifted feedback, indicative of neurons coding mismatches between expected and actual auditory signals, as well as right prefrontal and Rolandic cortical activity. Structural equation modeling revealed increased influence of bilateral auditory cortical areas on right frontal areas during shifted speech, indicating that projections from auditory error cells in posterior superior temporal cortex to motor correction cells in right frontal cortex mediate auditory feedback control of speech. PMID:18035557

  17. Predictions of acoustic signals from explosions above and below the ocean surface: source region calculations

    SciTech Connect

    Clarke, D.B.; Piacsek, A.; White, J.W.

    1996-12-01

    In support of the Comprehensive Test Ban, research is underway on the long range propagation of signals from nuclear explosions in the deep underwater sound (SOFAR) channel. This first phase of our work at LLNL on signals in the source regions considered explosions in or above the deep (5000 m) ocean. We studied the variation of wave properties and source region energy coupling as a function of height or depth of burst. Initial calculations on CALE, a two-dimensional hydrodynamics code developed at LLNL by Robert Tipton, were linked at a few hundred milliseconds to a version of NRL`s weak shock code, NPE, which solves the nonlinear progressive wave equation. The wave propagation simulation was performed down to 5000 m depth and out to 10,000 m range. We have developed a procedure to convert the acoustic signals at 10 km range into `starter fields` for calculations on a linear acoustics code which will extend the propagation to ocean basin distances. Recently we have completed calculations to evaluate environmental effects (shallow water, bottom interactions) on signal propagation. We compared results at 25 km range from three calculations of the same I kiloton burst (50 m height-of-burst) in three different environments, namely, deep water, shallow water, and a case with shallow water sloping to deep water. Several results from this last `sloping bottom` case will be 2016 discussed below. In this shallow water study, we found that propagation through shallow water complicates and attenuates the signal; the changes made to the signal may impact detection and discrimination for bursts in some locations.

  18. The contribution of auditory temporal processing to the separation of competing speech signals in listeners with normal hearing

    NASA Astrophysics Data System (ADS)

    Adam, Trudy J.; Pichora-Fuller, Kathy

    2002-05-01

    The hallmark of auditory function in aging adults is difficulty listening in a background of competing talkers, even when hearing sensitivity in quiet is good. Age-related physiological changes may contribute by introducing small timing errors (jitter) to the neural representation of sound, compromising the fidelity of the signal's fine temporal structure. This may preclude the association of spectral features to form an accurate percept of one complex stimulus, distinct from competing sounds. For simple voiced speech (vowels), the separation of two competing stimuli can be achieved on the basis of their respective harmonic (temporal) structures. Fundamental frequency (F0) differences in competing stimuli facilitate their segregation. This benefit was hypothesized to rely on the adequate temporal representation of the speech signal(s). Auditory aging was simulated via the desynchronization (~0.25-ms jitter) of the spectral bands of synthesized vowels. The perceptual benefit of F0 difference for the identification of concurrent vowel pairs was examined for intact and jittered vowels in young adults with normal hearing thresholds. Results suggest a role for reduced signal fidelity in the perceptual difficulties encountered in noisy everyday environments by aging listeners. [Work generously supported by the Michael Smith Foundation for Health Research.

  19. Signal diversification in Oecanthus tree crickets is shaped by energetic, morphometric, and acoustic trade-offs.

    PubMed

    Symes, L B; Ayres, M P; Cowdery, C P; Costello, R A

    2015-06-01

    Physiology, physics, and ecological interactions can generate trade-offs within species, but may also shape divergence among species. We tested whether signal divergence in Oecanthus tree crickets is shaped by acoustic, energetic, and behavioral trade-offs. We found that species with faster pulse rates, produced by opening and closing wings up to twice as many times per second, did not have higher metabolic costs of calling. The relatively constant energetic cost across species is explained by trade-offs between the duration and repetition rate of acoustic signals-species with fewer stridulatory teeth closed their wings more frequently such that the number of teeth struck per second of calling and the resulting duty cycle were relatively constant across species. Further trade-offs were evident in relationships between signals and body size. Calling was relatively inexpensive for small males, permitting them to call for much of the night, but at low amplitude. Large males produced much louder calls, reaching up to four times more area, but the energetic costs increased substantially with increasing size and the time spent calling dropped to only 20% of the night. These trade-offs indicate that the trait combinations that arise in these species represent a limited subset of conceivable trait combinations. PMID:25903317

  20. The potential influence of morphology on the evolutionary divergence of an acoustic signal

    PubMed Central

    Pitchers, W. R.; Klingenberg, C.P.; Tregenza, Tom; Hunt, J.; Dworkin, I.

    2014-01-01

    The evolution of acoustic behaviour and that of the morphological traits mediating its production are often coupled. Lack of variation in the underlying morphology of signalling traits has the potential to constrain signal evolution. This relationship is particularly likely in field crickets, where males produce acoustic advertisement signals to attract females by stridulating with specialized structures on their forewings. In this study, we characterise the size and geometric shape of the forewings of males from six allopatric populations of the black field cricket (Teleogryllus commodus) known to have divergent advertisement calls. We sample from each of these populations using both wild-caught and common-garden reared cohorts, allowing us to test for multivariate relationships between wing morphology and call structure. We show that the allometry of shape has diverged across populations. However, there was a surprisingly small amount of covariation between wing shape and call structure within populations. Given the importance of male size for sexual selection in crickets, the divergence we observe among populations has the potential to influence the evolution of advertisement calls in this species. PMID:25223712

  1. The potential influence of morphology on the evolutionary divergence of an acoustic signal.

    PubMed

    Pitchers, W R; Klingenberg, C P; Tregenza, T; Hunt, J; Dworkin, I

    2014-10-01

    The evolution of acoustic behaviour and that of the morphological traits mediating its production are often coupled. Lack of variation in the underlying morphology of signalling traits has the potential to constrain signal evolution. This relationship is particularly likely in field crickets, where males produce acoustic advertisement signals to attract females by stridulating with specialized structures on their forewings. In this study, we characterize the size and geometric shape of the forewings of males from six allopatric populations of the black field cricket (Teleogryllus commodus) known to have divergent advertisement calls. We sample from each of these populations using both wild-caught and common-garden-reared cohorts, allowing us to test for multivariate relationships between wing morphology and call structure. We show that the allometry of shape has diverged across populations. However, there was a surprisingly small amount of covariation between wing shape and call structure within populations. Given the importance of male size for sexual selection in crickets, the divergence we observe among populations has the potential to influence the evolution of advertisement calls in this species. PMID:25223712

  2. Long Recording Sequences: How to Track the Intra-Individual Variability of Acoustic Signals

    PubMed Central

    Lengagne, Thierry; Gomez, Doris; Josserand, Rémy; Voituron, Yann

    2015-01-01

    Recently developed acoustic technologies - like automatic recording units - allow the recording of long sequences in natural environments. These devices are used for biodiversity survey but they could also help researchers to estimate global signal variability at various (individual, population, species) scales. While sexually-selected signals are expected to show a low intra-individual variability at relatively short time scale, this variability has never been estimated so far. Yet, measuring signal variability in controlled conditions should prove useful to understand sexual selection processes and should help design acoustic sampling schedules and to analyse long call recordings. We here use the overall call production of 36 male treefrogs (Hyla arborea) during one night to evaluate within-individual variability in call dominant frequency and to test the efficiency of different sampling methods at capturing such variability. Our results confirm that using low number of calls underestimates call dominant frequency variation of about 35% in the tree frog and suggest that the assessment of this variability is better by using 2 or 3 short and well-distributed records than by using samples made of consecutive calls. Hence, 3 well-distributed 2-minutes records (beginning, middle and end of the calling period) are sufficient to capture on average all the nightly variability, whereas a sample of 10 000 consecutive calls captures only 86% of it. From a biological point of view, the call dominant frequency variability observed in H. arborea (116Hz on average but up to 470 Hz of variability during the course of the night for one male) challenge about its reliability in mate quality assessment. Automatic acoustic recording units will provide long call sequences in the near future and it will be then possible to confirm such results on large samples recorded in more complex field conditions. PMID:25970183

  3. Auditory perception bias in speech imitation

    PubMed Central

    Postma-Nilsenová, Marie; Postma, Eric

    2013-01-01

    In an experimental study, we explored the role of auditory perception bias in vocal pitch imitation. Psychoacoustic tasks involving a missing fundamental indicate that some listeners are attuned to the relationship between all the higher harmonics present in the signal, which supports their perception of the fundamental frequency (the primary acoustic correlate of pitch). Other listeners focus on the lowest harmonic constituents of the complex sound signal which may hamper the perception of the fundamental. These two listener types are referred to as fundamental and spectral listeners, respectively. We hypothesized that the individual differences in speakers' capacity to imitate F0 found in earlier studies, may at least partly be due to the capacity to extract information about F0 from the speech signal. Participants' auditory perception bias was determined with a standard missing fundamental perceptual test. Subsequently, speech data were collected in a shadowing task with two conditions, one with a full speech signal and one with high-pass filtered speech above 300 Hz. The results showed that perception bias toward fundamental frequency was related to the degree of F0 imitation. The effect was stronger in the condition with high-pass filtered speech. The experimental outcomes suggest advantages for fundamental listeners in communicative situations where F0 imitation is used as a behavioral cue. Future research needs to determine to what extent auditory perception bias may be related to other individual properties known to improve imitation, such as phonetic talent. PMID:24204361

  4. Heterodyne signal-to-noise ratios in acoustic mode scattering experiments

    NASA Technical Reports Server (NTRS)

    Cochran, W. R.

    1980-01-01

    The relation between the signal to noise ratio (SNR) obtained in heterodyne detection of radiation scattered from acoustic modes in crystalline solids and the scattered spectral density function is studied. It is shown that in addition to the information provided by the measured frequency shifts and line widths, measurement of the SNR provides a determination of the absolute elasto-optical (Pockel's) constants. Examples are given for cubic crystals, and acceptable SNR values are obtained for scattering from thermally excited phonons at 10.6 microns, with no external perturbation of the sample necessary. The results indicate the special advantages of the method for the study of semiconductors.

  5. Mate preference in the painted goby: the influence of visual and acoustic courtship signals.

    PubMed

    Amorim, M Clara P; da Ponte, Ana Nunes; Caiano, Manuel; Pedroso, Silvia S; Pereira, Ricardo; Fonseca, Paulo J

    2013-11-01

    We tested the hypothesis that females of a small vocal marine fish with exclusive paternal care, the painted goby, prefer high parental-quality mates such as large or high-condition males. We tested the effect of male body size and male visual and acoustic courtship behaviour (playback experiments) on female mating preferences by measuring time spent near one of a two-choice stimuli. Females did not show preference for male size but preferred males that showed higher levels of courtship, a trait known to advertise condition (fat reserves). Also, time spent near the preferred male depended on male courtship effort. Playback experiments showed that when sound was combined with visual stimuli (a male confined in a small aquarium placed near each speaker), females spent more time near the male associated with courtship sound than with the control male (associated with white noise or silence). Although male visual courtship effort also affected female preference in the pre-playback period, this effect decreased during playback and disappeared in the post-playback period. Courtship sound stimuli alone did not elicit female preference in relation to a control. Taken together, the results suggest that visual and mainly acoustic courtship displays are subject to mate preference and may advertise parental quality in this species. Our results indicate that visual and acoustic signals interplay in a complex fashion and highlight the need to examine how different sensory modalities affect mating preferences in fish and other vertebrates. PMID:23948469

  6. Frequency band-importance functions for auditory and auditory-visual speech recognition

    NASA Astrophysics Data System (ADS)

    Grant, Ken W.

    2005-04-01

    In many everyday listening environments, speech communication involves the integration of both acoustic and visual speech cues. This is especially true in noisy and reverberant environments where the speech signal is highly degraded, or when the listener has a hearing impairment. Understanding the mechanisms involved in auditory-visual integration is a primary interest of this work. Of particular interest is whether listeners are able to allocate their attention to various frequency regions of the speech signal differently under auditory-visual conditions and auditory-alone conditions. For auditory speech recognition, the most important frequency regions tend to be around 1500-3000 Hz, corresponding roughly to important acoustic cues for place of articulation. The purpose of this study is to determine the most important frequency region under auditory-visual speech conditions. Frequency band-importance functions for auditory and auditory-visual conditions were obtained by having subjects identify speech tokens under conditions where the speech-to-noise ratio of different parts of the speech spectrum is independently and randomly varied on every trial. Point biserial correlations were computed for each separate spectral region and the normalized correlations are interpreted as weights indicating the importance of each region. Relations among frequency-importance functions for auditory and auditory-visual conditions will be discussed.

  7. Neural Mechanisms for Acoustic Signal Detection under Strong Masking in an Insect.

    PubMed

    Kostarakos, Konstantinos; Römer, Heiner

    2015-07-22

    Communication is fundamental for our understanding of behavior. In the acoustic modality, natural scenes for communication in humans and animals are often very noisy, decreasing the chances for signal detection and discrimination. We investigated the mechanisms enabling selective hearing under natural noisy conditions for auditory receptors and interneurons of an insect. In the studied katydid Mecopoda elongata species-specific calling songs (chirps) are strongly masked by signals of another species, both communicating in sympatry. The spectral properties of the two signals are similar and differ only in a small frequency band at 2 kHz present in the chirping species. Receptors sharply tuned to 2 kHz are completely unaffected by the masking signal of the other species, whereas receptors tuned to higher audio and ultrasonic frequencies show complete masking. Intracellular recordings of identified interneurons revealed two mechanisms providing response selectivity to the chirp. (1) Response selectivity is when several identified interneurons exhibit remarkably selective responses to the chirps, even at signal-to-noise ratios of -21 dB, since they are sharply tuned to 2 kHz. Their dendritic arborizations indicate selective connectivity with low-frequency receptors tuned to 2 kHz. (2) Novelty detection is when a second group of interneurons is broadly tuned but, because of strong stimulus-specific adaptation to the masker spectrum and "novelty detection" to the 2 kHz band present only in the conspecific signal, these interneurons start to respond selectively to the chirp shortly after the onset of the continuous masker. Both mechanisms provide the sensory basis for hearing at unfavorable signal-to-noise ratios. Significance statement: Animal and human acoustic communication may suffer from the same "cocktail party problem," when communication happens in noisy social groups. We address solutions for this problem in a model system of two katydids, where one species

  8. Neural Mechanisms for Acoustic Signal Detection under Strong Masking in an Insect

    PubMed Central

    Römer, Heiner

    2015-01-01

    Communication is fundamental for our understanding of behavior. In the acoustic modality, natural scenes for communication in humans and animals are often very noisy, decreasing the chances for signal detection and discrimination. We investigated the mechanisms enabling selective hearing under natural noisy conditions for auditory receptors and interneurons of an insect. In the studied katydid Mecopoda elongata species-specific calling songs (chirps) are strongly masked by signals of another species, both communicating in sympatry. The spectral properties of the two signals are similar and differ only in a small frequency band at 2 kHz present in the chirping species. Receptors sharply tuned to 2 kHz are completely unaffected by the masking signal of the other species, whereas receptors tuned to higher audio and ultrasonic frequencies show complete masking. Intracellular recordings of identified interneurons revealed two mechanisms providing response selectivity to the chirp. (1) Response selectivity is when several identified interneurons exhibit remarkably selective responses to the chirps, even at signal-to-noise ratios of −21 dB, since they are sharply tuned to 2 kHz. Their dendritic arborizations indicate selective connectivity with low-frequency receptors tuned to 2 kHz. (2) Novelty detection is when a second group of interneurons is broadly tuned but, because of strong stimulus-specific adaptation to the masker spectrum and “novelty detection” to the 2 kHz band present only in the conspecific signal, these interneurons start to respond selectively to the chirp shortly after the onset of the continuous masker. Both mechanisms provide the sensory basis for hearing at unfavorable signal-to-noise ratios. SIGNIFICANCE STATEMENT Animal and human acoustic communication may suffer from the same “cocktail party problem,” when communication happens in noisy social groups. We address solutions for this problem in a model system of two katydids, where one

  9. Talker variability in audio-visual speech perception

    PubMed Central

    Heald, Shannon L. M.; Nusbaum, Howard C.

    2014-01-01

    A change in talker is a change in the context for the phonetic interpretation of acoustic patterns of speech. Different talkers have different mappings between acoustic patterns and phonetic categories and listeners need to adapt to these differences. Despite this complexity, listeners are adept at comprehending speech in multiple-talker contexts, albeit at a slight but measurable performance cost (e.g., slower recognition). So far, this talker variability cost has been demonstrated only in audio-only speech. Other research in single-talker contexts have shown, however, that when listeners are able to see a talker’s face, speech recognition is improved under adverse listening (e.g., noise or distortion) conditions that can increase uncertainty in the mapping between acoustic patterns and phonetic categories. Does seeing a talker’s face reduce the cost of word recognition in multiple-talker contexts? We used a speeded word-monitoring task in which listeners make quick judgments about target word recognition in single- and multiple-talker contexts. Results show faster recognition performance in single-talker conditions compared to multiple-talker conditions for both audio-only and audio-visual speech. However, recognition time in a multiple-talker context was slower in the audio-visual condition compared to audio-only condition. These results suggest that seeing a talker’s face during speech perception may slow recognition by increasing the importance of talker identification, signaling to the listener a change in talker has occurred. PMID:25076919

  10. Perception and the temporal properties of speech

    NASA Astrophysics Data System (ADS)

    Gordon, Peter C.

    1991-11-01

    Four experiments addressing the role of attention in phonetic perception are reported. The first experiment shows that the relative importance of two cues to the voicing distinction changes when subjects must perform an arithmetic distractor task at the same time as identifying a speech stimulus. The voice onset time cue loses phonetic significance when subjects are distracted, while the F0 onset frequency cue does not. The second experiment shows a similar pattern for two cues to the distinction between the vowels /i/ (as in 'beat') and /I/ (as in 'bit'). Together these experiments indicate that careful attention to speech perception is necessary for strong acoustic cues to achieve their full phonetic impact, while weaker acoustic cues achieve their full phonetic impact without close attention. Experiment 3 shows that this pattern is obtained when the distractor task places little demand on verbal short term memory. Experiment 4 provides a large data set for testing formal models of the role of attention in speech perception. Attention is shown to influence the signal to noise ratio in phonetic encoding. This principle is instantiated in a network model in which the role of attention is to reduce noise in the phonetic encoding of acoustic cues. Implications of this work for understanding speech perception and general theories of the role of attention in perception are discussed.

  11. Clustering reveals cavitation-related acoustic emission signals from dehydrating branches.

    PubMed

    Vergeynst, Lidewei L; Sause, Markus G R; De Baerdemaeker, Niels J F; De Roo, Linus; Steppe, Kathy

    2016-06-01

    The formation of air emboli in the xylem during drought is one of the key processes leading to plant mortality due to loss in hydraulic conductivity, and strongly fuels the interest in quantifying vulnerability to cavitation. The acoustic emission (AE) technique can be used to measure hydraulic conductivity losses and construct vulnerability curves. For years, it has been believed that all the AE signals are produced by the formation of gas emboli in the xylem sap under tension. More recent experiments, however, demonstrate that gas emboli formation cannot explain all the signals detected during drought, suggesting that different sources of AE exist. This complicates the use of the AE technique to measure emboli formation in plants. We therefore analysed AE waveforms measured on branches of grapevine (Vitis vinifera L. 'Chardonnay') during bench dehydration with broadband sensors, and applied an automated clustering algorithm in order to find natural clusters of AE signals. We used AE features and AE activity patterns during consecutive dehydration phases to identify the different AE sources. Based on the frequency spectrum of the signals, we distinguished three different types of AE signals, of which the frequency cluster with high 100-200 kHz frequency content was strongly correlated with cavitation. Our results indicate that cavitation-related AE signals can be filtered from other AE sources, which presents a promising avenue into quantifying xylem embolism in plants in laboratory and field conditions. PMID:27095256

  12. Optical observations of meteors generating infrasound-I: Acoustic signal identification and phenomenology

    NASA Astrophysics Data System (ADS)

    Silber, Elizabeth A.; Brown, Peter G.

    2014-11-01

    We analyse infrasound signals from 71 bright meteors/fireballs simultaneously detected by video to investigate the phenomenology and characteristics of meteor-generated near-field infrasound (<300 km) and shock production. A taxonomy for meteor generated infrasound signal classification has been developed using the time-pressure signal of the infrasound arrivals. Based on the location along the meteor trail where the infrasound signal originates, we find most signals are associated with cylindrical shocks, with about a quarter of events evidencing spherical shocks associated with fragmentation episodes and optical flares. The video data indicate that all events with ray launch angles >117° from the trajectory heading are most likely generated by a spherical shock, while infrasound produced by the meteors with ray launch angles ≤117° can be attributed to both a cylindrical line source and a spherical shock. We find that meteors preferentially produce infrasound toward the end of their trails with a smaller number showing a preference for mid-trail production. Meteors producing multiple infrasound arrivals show a strong infrasound source height skewness to the end of trails and are much more likely to be associated with optical flares. We find that about 1% of all our optically-recorded meteors have associated detected infrasound and estimate that regional meteor infrasound events should occur on the order of once per week and dominate in numbers over infrasound associated with more energetic (but rarer) bolides. While a significant fraction of our meteors generating infrasound (~1/4 of single arrivals) are produced by fragmentation events, we find no instances where acoustic radiation is detectable more than about 60° beyond the ballistic regime at our meteoroid sizes (grams to tens of kilograms) emphasizing the strong anisotropy in acoustic radiation for meteors which are dominated by cylindrical line source geometry, even in the presence of fragmentation.

  13. Multiple target tracking and classification improvement using data fusion at node level using acoustic signals

    NASA Astrophysics Data System (ADS)

    Damarla, T. R.; Whipps, Gene

    2005-05-01

    Target tracking and classification using passive acoustic signals is difficult at best as the signals are contaminated by wind noise, multi-path effects, road conditions, and are generally not deterministic. In addition, microphone characteristics, such as sensitivity, vary with the weather conditions. The problem is further compounded if there are multiple targets, especially if some are measured with higher signal-to-noise ratios (SNRs) than the others and they share spectral information. At the U. S. Army Research Laboratory we have conducted several field experiments with a convoy of two, three, four and five vehicles traveling on different road surfaces, namely gravel, asphalt, and dirt roads. The largest convoy is comprised of two tracked vehicles and three wheeled vehicles. Two of the wheeled vehicles are heavy trucks and one is a light vehicle. We used a super-resolution direction-of-arrival estimator, specifically the minimum variance distortionless response, to compute the bearings of the targets. In order to classify the targets, we modeled the acoustic signals emanated from the targets as a set of coupled harmonics, which are related to the engine-firing rate, and subsequently used a multivariate Gaussian classifier. Independent of the classifier, we find tracking of wheeled vehicles to be intermittent as the signals from vehicles with high SNR dominate the much quieter wheeled vehicles. We used several fusion techniques to combine tracking and classification results to improve final tracking and classification estimates. We will present the improvements (or losses) made in tracking and classification of all targets. Although improvements in the estimates for tracked vehicles are not noteworthy, significant improvements are seen in the case of wheeled vehicles. We will present the fusion algorithm used.

  14. Degradation of Coherence of Acoustic Signals Resulting from Inhomogeneities in the Sea.

    NASA Astrophysics Data System (ADS)

    Dobbins, Peter F.

    Available from UMI in association with The British Library. This thesis attempts to answer the question 'Is there an ultimate limit to the resolution of a sonar transducer due to sea water inhomogeneity?' The problem has been separated into three components: the sea water medium, acoustic propagation through this random medium, and the effects of the resulting phase and amplitude fluctuations on the performance of transducer arrays. The model for the medium is based on a spatial spectrum of the fluctuations in temperature and refractive index, and is divided into wavenumber ranges where phenomena such as turbulence of viscous dissipation dominate. The model for acoustic propagation uses the Rytov method, assuming that multiple scattering is not significant. The effects of fluctuations of array directivity were investigated using both numerical simulations and theory based on the plane wave spectrum. Laboratory experiments were conducted to confirm aspects of the propagation theory. Experiments at sea were carried out to validate the model of the medium; profiles of temperature, salinity, sound speed and current velocity were measured, and good agreement with the model was obtained in all cases. These data were used with the propagation theory to determine spatial correlation functions of phase and amplitude fluctuations in acoustic signals at a number of frequencies. Finally, these results were used to estimate the changes in beampattern for arrays of various sizes. It was found that there is a limit to the angular resolution of a linear array, determined by the width of the plane wave spectrum, and this limit is reached when the length of the array approaches the correlation scale of the acoustic fluctuations.

  15. System and method for characterizing, synthesizing, and/or canceling out acoustic signals from inanimate sound sources

    DOEpatents

    Holzrichter, John F.; Burnett, Greg C.; Ng, Lawrence C.

    2007-10-16

    A system and method for characterizing, synthesizing, and/or canceling out acoustic signals from inanimate sound sources is disclosed. Propagating wave electromagnetic sensors monitor excitation sources in sound producing systems, such as machines, musical instruments, and various other structures. Acoustical output from these sound producing systems is also monitored. From such information, a transfer function characterizing the sound producing system is generated. From the transfer function, acoustical output from the sound producing system may be synthesized or canceled. The methods disclosed enable accurate calculation of matched transfer functions relating specific excitations to specific acoustical outputs. Knowledge of such signals and functions can be used to effect various sound replication, sound source identification, and sound cancellation applications.

  16. System and method for characterizing synthesizing and/or canceling out acoustic signals from inanimate sound sources

    DOEpatents

    Holzrichter, John F.; Burnett, Greg C.; Ng, Lawrence C.

    2003-01-01

    A system and method for characterizing, synthesizing, and/or canceling out acoustic signals from inanimate sound sources is disclosed. Propagating wave electromagnetic sensors monitor excitation sources in sound producing systems, such as machines, musical instruments, and various other structures. Acoustical output from these sound producing systems is also monitored. From such information, a transfer function characterizing the sound producing system is generated. From the transfer function, acoustical output from the sound producing system may be synthesized or canceled. The methods disclosed enable accurate calculation of matched transfer functions relating specific excitations to specific acoustical outputs. Knowledge of such signals and functions can be used to effect various sound replication, sound source identification, and sound cancellation applications.

  17. System and method for characterizing, synthesizing, and/or canceling out acoustic signals from inanimate sound sources

    DOEpatents

    Holzrichter, John F; Burnett, Greg C; Ng, Lawrence C

    2013-05-21

    A system and method for characterizing, synthesizing, and/or canceling out acoustic signals from inanimate sound sources is disclosed. Propagating wave electromagnetic sensors monitor excitation sources in sound producing systems, such as machines, musical instruments, and various other structures. Acoustical output from these sound producing systems is also monitored. From such information, a transfer function characterizing the sound producing system is generated. From the transfer function, acoustical output from the sound producing system may be synthesized or canceled. The methods disclosed enable accurate calculation of matched transfer functions relating specific excitations to specific acoustical outputs. Knowledge of such signals and functions can be used to effect various sound replication, sound source identification, and sound cancellation applications.

  18. Signal-to-noise ratio for acoustic detection in the deep ocean

    NASA Technical Reports Server (NTRS)

    Bowen, T.

    1979-01-01

    A simple method is presented for studying the thermoacoustic wave generated by a heat pulse. The signal-to-noise ratio (S/N) is then calculated for a typical hadronic-electromagnetic cascade in the deep ocean where low frequencies are masked by surface noise. It is found that a maximum useful range of about 16 km is found for typical conditions at 5 km depth. It is shown that in order to obtain useful signals with S/N greater than 100 at distances of 1 to 16 km, the cascade energy must be 10 to the 16th to 10 to the 18th eV. Finally, attention is given to further refinements of the theory of acoustic detection which remain to be investigated.

  19. Neuronal Spoken Word Recognition: The Time Course of Processing Variation in the Speech Signal

    ERIC Educational Resources Information Center

    Schild, Ulrike; Roder, Brigitte; Friedrich, Claudia K.

    2012-01-01

    Recent neurobiological studies revealed evidence for lexical representations that are not specified for the coronal place of articulation (PLACE; Friedrich, Eulitz, & Lahiri, 2006; Friedrich, Lahiri, & Eulitz, 2008). Here we tested when these types of underspecified representations influence neuronal speech recognition. In a unimodal…

  20. Detection of Delamination in Concrete Bridge Decks Using Mfcc of Acoustic Impact Signals

    NASA Astrophysics Data System (ADS)

    Zhang, G.; Harichandran, R. S.; Ramuhalli, P.

    2010-02-01

    Delamination of the concrete cover is a commonly observed damage in concrete bridge decks. The delamination is typically initiated by corrosion of the upper reinforcing bars and promoted by freeze-thaw cycling and traffic loading. The detection of delamination is important for bridge maintenance and acoustic non-destructive evaluation (NDE) is widely used due to its low cost, speed, and easy implementation. In traditional acoustic approaches, the inspector sounds the surface of the deck by impacting it with a hammer or bar, or by dragging a chain, and assesses delamination by the "hollowness" of the sound. The detection of the delamination is subjective and requires extensive training. To improve performance, this paper proposes an objective method for delamination detection. In this method, mel-frequency cepstral coefficients (MFCC) of the signal are extracted. Some MFCC are then selected as features for detection purposes using a mutual information criterion. Finally, the selected features are used to train a classifier which is subsequently used for detection. In this work, a simple quadratic Bayesian classifier is used. Different numbers of features are used to compare the performance of the detection method. The results show that the performance first increases with the number of features, but then decreases after an optimal value. The optimal number of features based on the recorded signals is four, and the mean error rate is only 3.3% when four features are used. Therefore, the proposed algorithm has sufficient accuracy to be used in field detection.