Science.gov

Sample records for acoustic speech signal

  1. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2006-02-14

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  2. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2004-03-23

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  3. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2006-08-08

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  4. System And Method For Characterizing Voiced Excitations Of Speech And Acoustic Signals, Removing Acoustic Noise From Speech, And Synthesizi

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2006-04-25

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  5. Multiexpert automatic speech recognition using acoustic and myoelectric signals.

    PubMed

    Chan, Adrian D C; Englehart, Kevin B; Hudgins, Bernard; Lovely, Dennis F

    2006-04-01

    Classification accuracy of conventional automatic speech recognition (ASR) systems can decrease dramatically under acoustically noisy conditions. To improve classification accuracy and increase system robustness a multiexpert ASR system is implemented. In this system, acoustic speech information is supplemented with information from facial myoelectric signals (MES). A new method of combining experts, known as the plausibility method, is employed to combine an acoustic ASR expert and a MES ASR expert. The plausibility method of combining multiple experts, which is based on the mathematical framework of evidence theory, is compared to the Borda count and score-based methods of combination. Acoustic and facial MES data were collected from 5 subjects, using a 10-word vocabulary across an 18-dB range of acoustic noise. As expected the performance of an acoustic expert decreases with increasing acoustic noise; classification accuracies of the acoustic ASR expert are as low as 11.5%. The effect of noise is significantly reduced with the addition of the MES ASR expert. Classification accuracies remain above 78.8% across the 18-dB range of acoustic noise, when the plausibility method is used to combine the opinions of multiple experts. In addition, the plausibility method produced classification accuracies higher than any individual expert at all noise levels, as well as the highest classification accuracies, except at the 9-dB noise level. Using the Borda count and score-based multiexpert systems, classification accuracies are improved relative to the acoustic ASR expert but are as low as 51.5% and 59.5%, respectively.

  6. Denoising of human speech using combined acoustic and em sensor signal processing

    SciTech Connect

    Ng, L C; Burnett, G C; Holzrichter, J F; Gable, T J

    1999-11-29

    Low Power EM radar-like sensors have made it possible to measure properties of the human speech production system in real-time, without acoustic interference. This greatly enhances the quality and quantify of information for many speech related applications. See Holzrichter, Burnett, Ng, and Lea, J. Acoustic. Soc. Am. 103 (1) 622 (1998). By using combined Glottal-EM- Sensor- and Acoustic-signals, segments of voiced, unvoiced, and no-speech can be reliably defined. Real-time Denoising filters can be constructed to remove noise from the user's corresponding speech signal.

  7. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2002-01-01

    Low power EM waves are used to detect motions of vocal tract tissues of the human speech system before, during, and after voiced speech. A voiced excitation function is derived. The excitation function provides speech production information to enhance speech characterization and to enable noise removal from human speech.

  8. Estimation of glottal source features from the spectral envelope of the acoustic speech signal

    NASA Astrophysics Data System (ADS)

    Torres, Juan Felix

    Speech communication encompasses diverse types of information, including phonetics, affective state, voice quality, and speaker identity. From a speech production standpoint, the acoustic speech signal can be mainly divided into glottal source and vocal tract components, which play distinct roles in rendering the various types of information it contains. Most deployed speech analysis systems, however, do not explicitly represent these two components as distinct entities, as their joint estimation from the acoustic speech signal becomes an ill-defined blind deconvolution problem. Nevertheless, because of the desire to understand glottal behavior and how it relates to perceived voice quality, there has been continued interest in explicitly estimating the glottal component of the speech signal. To this end, several inverse filtering (IF) algorithms have been proposed, but they are unreliable in practice because of the blind formulation of the separation problem. In an effort to develop a method that can bypass the challenging IF process, this thesis proposes a new glottal source information extraction method that relies on supervised machine learning to transform smoothed spectral representations of speech, which are already used in some of the most widely deployed and successful speech analysis applications, into a set of glottal source features. A transformation method based on Gaussian mixture regression (GMR) is presented and compared to current IF methods in terms of feature similarity, reliability, and speaker discrimination capability on a large speech corpus, and potential representations of the spectral envelope of speech are investigated for their ability represent glottal source variation in a predictable manner. The proposed system was found to produce glottal source features that reasonably matched their IF counterparts in many cases, while being less susceptible to spurious errors. The development of the proposed method entailed a study into the aspects

  9. Classifying acoustic signals into phoneme categories: average and dyslexic readers make use of complex dynamical patterns and multifractal scaling properties of the speech signal

    PubMed Central

    2015-01-01

    Several competing aetiologies of developmental dyslexia suggest that the problems with acquiring literacy skills are causally entailed by low-level auditory and/or speech perception processes. The purpose of this study is to evaluate the diverging claims about the specific deficient peceptual processes under conditions of strong inference. Theoretically relevant acoustic features were extracted from a set of artificial speech stimuli that lie on a /bAk/-/dAk/ continuum. The features were tested on their ability to enable a simple classifier (Quadratic Discriminant Analysis) to reproduce the observed classification performance of average and dyslexic readers in a speech perception experiment. The ‘classical’ features examined were based on component process accounts of developmental dyslexia such as the supposed deficit in Envelope Rise Time detection and the deficit in the detection of rapid changes in the distribution of energy in the frequency spectrum (formant transitions). Studies examining these temporal processing deficit hypotheses do not employ measures that quantify the temporal dynamics of stimuli. It is shown that measures based on quantification of the dynamics of complex, interaction-dominant systems (Recurrence Quantification Analysis and the multifractal spectrum) enable QDA to classify the stimuli almost identically as observed in dyslexic and average reading participants. It seems unlikely that participants used any of the features that are traditionally associated with accounts of (impaired) speech perception. The nature of the variables quantifying the temporal dynamics of the speech stimuli imply that the classification of speech stimuli cannot be regarded as a linear aggregate of component processes that each parse the acoustic signal independent of one another, as is assumed by the ‘classical’ aetiologies of developmental dyslexia. It is suggested that the results imply that the differences in speech perception performance between

  10. Speech acoustics: How much science?

    PubMed

    Tiwari, Manjul

    2012-01-01

    Human vocalizations are sounds made exclusively by a human vocal tract. Among other vocalizations, for example, laughs or screams, speech is the most important. Speech is the primary medium of that supremely human symbolic communication system called language. One of the functions of a voice, perhaps the main one, is to realize language, by conveying some of the speaker's thoughts in linguistic form. Speech is language made audible. Moreover, when phoneticians compare and describe voices, they usually do so with respect to linguistic units, especially speech sounds, like vowels or consonants. It is therefore necessary to understand the structure as well as nature of speech sounds and how they are described. In order to understand and evaluate the speech, it is important to have at least a basic understanding of science of speech acoustics: how the acoustics of speech are produced, how they are described, and how differences, both between speakers and within speakers, arise in an acoustic output. One of the aims of this article is try to facilitate this understanding.

  11. Perceptual centres in speech - an acoustic analysis

    NASA Astrophysics Data System (ADS)

    Scott, Sophie Kerttu

    Perceptual centres, or P-centres, represent the perceptual moments of occurrence of acoustic signals - the 'beat' of a sound. P-centres underlie the perception and production of rhythm in perceptually regular speech sequences. P-centres have been modelled both in speech and non speech (music) domains. The three aims of this thesis were toatest out current P-centre models to determine which best accounted for the experimental data bto identify a candidate parameter to map P-centres onto (a local approach) as opposed to the previous global models which rely upon the whole signal to determine the P-centre the final aim was to develop a model of P-centre location which could be applied to speech and non speech signals. The first aim was investigated by a series of experiments in which a) speech from different speakers was investigated to determine whether different models could account for variation between speakers b) whether rendering the amplitude time plot of a speech signal affects the P-centre of the signal c) whether increasing the amplitude at the offset of a speech signal alters P-centres in the production and perception of speech. The second aim was carried out by a) manipulating the rise time of different speech signals to determine whether the P-centre was affected, and whether the type of speech sound ramped affected the P-centre shift b) manipulating the rise time and decay time of a synthetic vowel to determine whether the onset alteration was had more affect on P-centre than the offset manipulation c) and whether the duration of a vowel affected the P-centre, if other attributes (amplitude, spectral contents) were held constant. The third aim - modelling P-centres - was based on these results. The Frequency dependent Amplitude Increase Model of P-centre location (FAIM) was developed using a modelling protocol, the APU GammaTone Filterbank and the speech from different speakers. The P-centres of the stimuli corpus were highly predicted by attributes of

  12. Acoustic Signal Processing

    NASA Astrophysics Data System (ADS)

    Hartmann, William M.; Candy, James V.

    Signal processing refers to the acquisition, storage, display, and generation of signals - also to the extraction of information from signals and the re-encoding of information. As such, signal processing in some form is an essential element in the practice of all aspects of acoustics. Signal processing algorithms enable acousticians to separate signals from noise, to perform automatic speech recognition, or to compress information for more efficient storage or transmission. Signal processing concepts are the building blocks used to construct models of speech and hearing. Now, in the 21st century, all signal processing is effectively digital signal processing. Widespread access to high-speed processing, massive memory, and inexpensive software make signal processing procedures of enormous sophistication and power available to anyone who wants to use them. Because advanced signal processing is now accessible to everybody, there is a need for primers that introduce basic mathematical concepts that underlie the digital algorithms. The present handbook chapter is intended to serve such a purpose.

  13. Mapping acoustics to kinematics in speech

    NASA Astrophysics Data System (ADS)

    Bali, Rohan

    An accurate mapping from speech acoustics to speech articulator movements has many practical applications, as well as theoretical implications of speech planning and perception science. This work can be divided into two parts. In the first part, we show that a simple codebook can be used to map acoustics to speech articulator movements in natural, conversational speech. In the second part, we incorporate cost optimization principles that have been shown to be relevant in motor control tasks into the codebook approach. These cost optimizations are defined as minimization of integral of magnitude velocity, acceleration and jerk of the speech articulators, and are implemented using a dynamic programming technique. Results show that incorporating cost minimization of speech articulator movements can significantly improve mapping acoustics to speech articulator movements. This suggests underlying physiological or neural planning principles used by speech articulators during speech production.

  14. Acoustic Markers of Prosodic Boundaries in Spanish Spontaneous Alaryngeal Speech

    ERIC Educational Resources Information Center

    Cuenca, M. H.; Barrio, M. M.

    2010-01-01

    Prosodic information aids segmentation of the continuous speech signal and thereby facilitates auditory speech processing. Durational and pitch variations are prosodic cues especially necessary to convey prosodic boundaries, but alaryngeal speakers have inconsistent control over acoustic parameters such as F0 and duration, being as a result noisy…

  15. Detecting suspicious behaviour using speech: acoustic correlates of deceptive speech -- an exploratory investigation.

    PubMed

    Kirchhübel, Christin; Howard, David M

    2013-09-01

    The current work intended to enhance our knowledge of changes or lack of changes in the speech signal when people were being deceptive. In particular, the study attempted to investigate the appropriateness of using speech cues in detecting deception. Truthful, deceptive and control speech were elicited from ten speakers in an interview setting. The data were subjected to acoustic analysis and results are presented on a range of speech parameters including fundamental frequency (f0), overall amplitude and mean vowel formants F1, F2 and F3. A significant correlation could not be established between deceptiveness/truthfulness and any of the acoustic features examined. Directions for future work are highlighted.

  16. Acoustic markers of prosodic boundaries in Spanish spontaneous alaryngeal speech.

    PubMed

    Cuenca, M H; Barrio, M M

    2010-11-01

    Prosodic information aids segmentation of the continuous speech signal and thereby facilitates auditory speech processing. Durational and pitch variations are prosodic cues especially necessary to convey prosodic boundaries, but alaryngeal speakers have inconsistent control over acoustic parameters such as F0 and duration, being as a result noisy and less intelligible than normal speech. This case study has investigated whether one Spanish alaryngeal speaker proficient in both oesophageal and tracheoesophageal speech modes used the same acoustic cues for prosodic boundaries in both types of voicing. Pre-boundary lengthening, F0-excursions and pausing (number of pauses and position) were measured in spontaneous speech samples, using Praat. The acoustic analysis has revealed that the subject has relied on a different combination of cues in each type of voicing to convey the presence of prosodic boundaries.

  17. Acoustics of Clear Speech: Effect of Instruction

    ERIC Educational Resources Information Center

    Lam, Jennifer; Tjaden, Kris; Wilding, Greg

    2012-01-01

    Purpose: This study investigated how different instructions for eliciting clear speech affected selected acoustic measures of speech. Method: Twelve speakers were audio-recorded reading 18 different sentences from the Assessment of Intelligibility of Dysarthric Speech (Yorkston & Beukelman, 1984). Sentences were produced in habitual, clear,…

  18. Acoustic richness modulates the neural networks supporting intelligible speech processing.

    PubMed

    Lee, Yune-Sang; Min, Nam Eun; Wingfield, Arthur; Grossman, Murray; Peelle, Jonathan E

    2016-03-01

    The information contained in a sensory signal plays a critical role in determining what neural processes are engaged. Here we used interleaved silent steady-state (ISSS) functional magnetic resonance imaging (fMRI) to explore how human listeners cope with different degrees of acoustic richness during auditory sentence comprehension. Twenty-six healthy young adults underwent scanning while hearing sentences that varied in acoustic richness (high vs. low spectral detail) and syntactic complexity (subject-relative vs. object-relative center-embedded clause structures). We manipulated acoustic richness by presenting the stimuli as unprocessed full-spectrum speech, or noise-vocoded with 24 channels. Importantly, although the vocoded sentences were spectrally impoverished, all sentences were highly intelligible. These manipulations allowed us to test how intelligible speech processing was affected by orthogonal linguistic and acoustic demands. Acoustically rich speech showed stronger activation than acoustically less-detailed speech in a bilateral temporoparietal network with more pronounced activity in the right hemisphere. By contrast, listening to sentences with greater syntactic complexity resulted in increased activation of a left-lateralized network including left posterior lateral temporal cortex, left inferior frontal gyrus, and left dorsolateral prefrontal cortex. Significant interactions between acoustic richness and syntactic complexity occurred in left supramarginal gyrus, right superior temporal gyrus, and right inferior frontal gyrus, indicating that the regions recruited for syntactic challenge differed as a function of acoustic properties of the speech. Our findings suggest that the neural systems involved in speech perception are finely tuned to the type of information available, and that reducing the richness of the acoustic signal dramatically alters the brain's response to spoken language, even when intelligibility is high.

  19. Evaluation of disfluent speech by means of automatic acoustic measurements.

    PubMed

    Lustyk, Tomas; Bergl, Petr; Cmejla, Roman

    2014-03-01

    An experiment was carried out to determine whether the level of the speech fluency disorder can be estimated by means of automatic acoustic measurements. These measures analyze, for example, the amount of silence in a recording or the number of abrupt spectral changes in a speech signal. All the measures were designed to take into account symptoms of stuttering. In the experiment, 118 audio recordings of read speech by Czech native speakers were employed. The results indicate that the human-made rating of the speech fluency disorder in read speech can be predicted on the basis of automatic measurements. The number of abrupt spectral changes in the speech segments turns out to be the most appropriate measure to describe the overall speech performance. The results also imply that there are measures with good results describing partial symptoms (especially fixed postures without audible airflow).

  20. Study of acoustic correlates associate with emotional speech

    NASA Astrophysics Data System (ADS)

    Yildirim, Serdar; Lee, Sungbok; Lee, Chul Min; Bulut, Murtaza; Busso, Carlos; Kazemzadeh, Ebrahim; Narayanan, Shrikanth

    2004-10-01

    This study investigates the acoustic characteristics of four different emotions expressed in speech. The aim is to obtain detailed acoustic knowledge on how a speech signal is modulated by changes from neutral to a certain emotional state. Such knowledge is necessary for automatic emotion recognition and classification and emotional speech synthesis. Speech data obtained from two semi-professional actresses are analyzed and compared. Each subject produces 211 sentences with four different emotions; neutral, sad, angry, happy. We analyze changes in temporal and acoustic parameters such as magnitude and variability of segmental duration, fundamental frequency and the first three formant frequencies as a function of emotion. Acoustic differences among the emotions are also explored with mutual information computation, multidimensional scaling and acoustic likelihood comparison with normal speech. Results indicate that speech associated with anger and happiness is characterized by longer duration, shorter interword silence, higher pitch and rms energy with wider ranges. Sadness is distinguished from other emotions by lower rms energy and longer interword silence. Interestingly, the difference in formant pattern between [happiness/anger] and [neutral/sadness] are better reflected in back vowels such as /a/(/father/) than in front vowels. Detailed results on intra- and interspeaker variability will be reported.

  1. Speech Intelligibility Advantages using an Acoustic Beamformer Display

    NASA Technical Reports Server (NTRS)

    Begault, Durand R.; Sunder, Kaushik; Godfroy, Martine; Otto, Peter

    2015-01-01

    A speech intelligibility test conforming to the Modified Rhyme Test of ANSI S3.2 "Method for Measuring the Intelligibility of Speech Over Communication Systems" was conducted using a prototype 12-channel acoustic beamformer system. The target speech material (signal) was identified against speech babble (noise), with calculated signal-noise ratios of 0, 5 and 10 dB. The signal was delivered at a fixed beam orientation of 135 deg (re 90 deg as the frontal direction of the array) and the noise at 135 deg (co-located) and 0 deg (separated). A significant improvement in intelligibility from 57% to 73% was found for spatial separation for the same signal-noise ratio (0 dB). Significant effects for improved intelligibility due to spatial separation were also found for higher signal-noise ratios (5 and 10 dB).

  2. Analog Acoustic Expression in Speech Communication

    ERIC Educational Resources Information Center

    Shintel, Hadas; Nusbaum, Howard C.; Okrent, Arika

    2006-01-01

    We present the first experimental evidence of a phenomenon in speech communication we call "analog acoustic expression." Speech is generally thought of as conveying information in two distinct ways: discrete linguistic-symbolic units such as words and sentences represent linguistic meaning, and continuous prosodic forms convey information about…

  3. Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

    DOEpatents

    Holzrichter, John F.; Ng, Lawrence C.

    1998-01-01

    The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching.

  4. Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

    DOEpatents

    Holzrichter, J.F.; Ng, L.C.

    1998-03-17

    The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching. 35 figs.

  5. The acoustic-modeling problem in automatic speech recognition

    NASA Astrophysics Data System (ADS)

    Brown, Peter F.

    1987-12-01

    This thesis examines the acoustic-modeling problem in automatic speech recognition from an information-theoretic point of view. This problem is to design a speech-recognition system which can extract from the speech waveform as much information as possible about the corresponding word sequence. The information extraction process is broken down into two steps: a signal processing step which converts a speech waveform into a sequence of information bearing acoustic feature vectors, and a step which models such a sequence. This thesis is primarily concerned with the use of hidden Markov models to model sequences of feature vectors which lie in a continuous space such as R sub N. It explores the trade-off between packing a lot of information into such sequences and being able to model them accurately. The difficulty of developing accurate models of continuous parameter sequences is addressed by investigating a method of parameter estimation which is specifically designed to cope with inaccurate modeling assumptions.

  6. Does Signal Degradation Affect Top-Down Processing of Speech?

    PubMed

    Wagner, Anita; Pals, Carina; de Blecourt, Charlotte M; Sarampalis, Anastasios; Başkent, Deniz

    2016-01-01

    Speech perception is formed based on both the acoustic signal and listeners' knowledge of the world and semantic context. Access to semantic information can facilitate interpretation of degraded speech, such as speech in background noise or the speech signal transmitted via cochlear implants (CIs). This paper focuses on the latter, and investigates the time course of understanding words, and how sentential context reduces listeners' dependency on the acoustic signal for natural and degraded speech via an acoustic CI simulation.In an eye-tracking experiment we combined recordings of listeners' gaze fixations with pupillometry, to capture effects of semantic information on both the time course and effort of speech processing. Normal-hearing listeners were presented with sentences with or without a semantically constraining verb (e.g., crawl) preceding the target (baby), and their ocular responses were recorded to four pictures, including the target, a phonological (bay) competitor and a semantic (worm) and an unrelated distractor.The results show that in natural speech, listeners' gazes reflect their uptake of acoustic information, and integration of preceding semantic context. Degradation of the signal leads to a later disambiguation of phonologically similar words, and to a delay in integration of semantic information. Complementary to this, the pupil dilation data show that early semantic integration reduces the effort in disambiguating phonologically similar words. Processing degraded speech comes with increased effort due to the impoverished nature of the signal. Delayed integration of semantic information further constrains listeners' ability to compensate for inaudible signals.

  7. Frequency overlap between electric and acoustic stimulation and speech-perception benefit in patients with combined electric and acoustic stimulation

    PubMed Central

    Zhang, Ting; Spahr, Anthony J.; Dorman, Michael F.

    2010-01-01

    Objectives Our aim was to assess, for patients with a cochlear implant in one ear and low-frequency acoustic hearing in the contralateral ear, whether reducing the overlap in frequencies conveyed in the acoustic signal and those analyzed by the cochlear implant speech processor would improve speech recognition. Design The recognition of monosyllabic words in quiet and sentences in noise was evaluated in three listening configurations: electric stimulation alone, acoustic stimulation alone, and combined electric and acoustic stimulation. The acoustic stimuli were either unfiltered or low-pass (LP) filtered at 250 Hz, 500 Hz, or 750 Hz. The electric stimuli were either unfiltered or high-pass (HP) filtered at 250 Hz, 500 Hz or 750 Hz. In the combined condition the unfiltered acoustic signal was paired with the unfiltered electric signal, the 250 LP acoustic signal was paired with the 250 Hz HP electric signal, the 500 Hz LP acoustic signal was paired with the 500 Hz HP electric signal and the 750 Hz LP acoustic signal was paired with the 750 Hz HP electric signal. Results For both acoustic and electric signals performance increased as the bandwith increased. The highest level of performance in the combined condition was observed in the unfiltered acoustic plus unfiltered electric condition. Conclusions Reducing the overlap in frequency representation between acoustic and electric stimulation does not increase speech understanding scores for patients who have residual hearing in the ear contralateral to the implant. We find that acoustic information below 250 Hz significantly improves performance for patients who combine electric and acoustic stimulation and accounts for the majority of the speech-perception benefit when acoustic stimulation is combined with electric stimulation. PMID:19915474

  8. Automatic speech segmentation using throat-acoustic correlation coefficients

    NASA Astrophysics Data System (ADS)

    Mussabayev, Rustam Rafikovich; Kalimoldayev, Maksat N.; Amirgaliyev, Yedilkhan N.; Mussabayev, Timur R.

    2016-11-01

    This work considers one of the approaches to the solution of the task of discrete speech signal automatic segmentation. The aim of this work is to construct such an algorithm which should meet the following requirements: segmentation of a signal into acoustically homogeneous segments, high accuracy and segmentation speed, unambiguity and reproducibility of segmentation results, lack of necessity of preliminary training with the use of a special set consisting of manually segmented signals. Development of the algorithm which corresponds to the given requirements was conditioned by the necessity of formation of automatically segmented speech databases that have a large volume. One of the new approaches to the solution of this task is viewed in this article. For this purpose we use the new type of informative features named TAC-coefficients (Throat-Acoustic Correlation coefficients) which provide sufficient segmentation accuracy and effi- ciency.

  9. Acoustic Evidence for Phonologically Mismatched Speech Errors

    ERIC Educational Resources Information Center

    Gormley, Andrea

    2015-01-01

    Speech errors are generally said to accommodate to their new phonological context. This accommodation has been validated by several transcription studies. The transcription methodology is not the best choice for detecting errors at this level, however, as this type of error can be difficult to perceive. This paper presents an acoustic analysis of…

  10. Influences of noise-interruption and information-bearing acoustic changes on understanding simulated electric-acoustic speech.

    PubMed

    Stilp, Christian; Donaldson, Gail; Oh, Soohee; Kong, Ying-Yee

    2016-11-01

    In simulations of electrical-acoustic stimulation (EAS), vocoded speech intelligibility is aided by preservation of low-frequency acoustic cues. However, the speech signal is often interrupted in everyday listening conditions, and effects of interruption on hybrid speech intelligibility are poorly understood. Additionally, listeners rely on information-bearing acoustic changes to understand full-spectrum speech (as measured by cochlea-scaled entropy [CSE]) and vocoded speech (CSECI), but how listeners utilize these informational changes to understand EAS speech is unclear. Here, normal-hearing participants heard noise-vocoded sentences with three to six spectral channels in two conditions: vocoder-only (80-8000 Hz) and simulated hybrid EAS (vocoded above 500 Hz; original acoustic signal below 500 Hz). In each sentence, four 80-ms intervals containing high-CSECI or low-CSECI acoustic changes were replaced with speech-shaped noise. As expected, performance improved with the preservation of low-frequency fine-structure cues (EAS). This improvement decreased for continuous EAS sentences as more spectral channels were added, but increased as more channels were added to noise-interrupted EAS sentences. Performance was impaired more when high-CSECI intervals were replaced by noise than when low-CSECI intervals were replaced, but this pattern did not differ across listening modes. Utilizing information-bearing acoustic changes to understand speech is predicted to generalize to cochlear implant users who receive EAS inputs.

  11. Start/End Delays of Voiced and Unvoiced Speech Signals

    SciTech Connect

    Herrnstein, A

    1999-09-24

    Recent experiments using low power EM-radar like sensors (e.g, GEMs) have demonstrated a new method for measuring vocal fold activity and the onset times of voiced speech, as vocal fold contact begins to take place. Similarly the end time of a voiced speech segment can be measured. Secondly it appears that in most normal uses of American English speech, unvoiced-speech segments directly precede or directly follow voiced-speech segments. For many applications, it is useful to know typical duration times of these unvoiced speech segments. A corpus, assembled earlier of spoken ''Timit'' words, phrases, and sentences and recorded using simultaneously measured acoustic and EM-sensor glottal signals, from 16 male speakers, was used for this study. By inspecting the onset (or end) of unvoiced speech, using the acoustic signal, and the onset (or end) of voiced speech using the EM sensor signal, the average duration times for unvoiced segments preceding onset of vocalization were found to be 300ms, and for following segments, 500ms. An unvoiced speech period is then defined in time, first by using the onset of the EM-sensed glottal signal, as the onset-time marker for the voiced speech segment and end marker for the unvoiced segment. Then, by subtracting 300ms from the onset time mark of voicing, the unvoiced speech segment start time is found. Similarly, the times for a following unvoiced speech segment can be found. While data of this nature have proven to be useful for work in our laboratory, a great deal of additional work remains to validate such data for use with general populations of users. These procedures have been useful for applying optimal processing algorithms over time segments of unvoiced, voiced, and non-speech acoustic signals. For example, these data appear to be of use in speaker validation, in vocoding, and in denoising algorithms.

  12. Acoustic Detail But Not Predictability of Task-Irrelevant Speech Disrupts Working Memory

    PubMed Central

    Wöstmann, Malte; Obleser, Jonas

    2016-01-01

    Attended speech is comprehended better not only if more acoustic detail is available, but also if it is semantically highly predictable. But can more acoustic detail or higher predictability turn into disadvantages and distract a listener if the speech signal is to be ignored? Also, does the degree of distraction increase for older listeners who typically show a decline in attentional control ability? Adopting the irrelevant-speech paradigm, we tested whether younger (age 23–33 years) and older (60–78 years) listeners’ working memory for the serial order of spoken digits would be disrupted by the presentation of task-irrelevant speech varying in its acoustic detail (using noise-vocoding) and its semantic predictability (of sentence endings). More acoustic detail, but not higher predictability, of task-irrelevant speech aggravated memory interference. This pattern of results did not differ between younger and older listeners, despite generally lower performance in older listeners. Our findings suggest that the focus of attention determines how acoustics and predictability affect the processing of speech: first, as more acoustic detail is known to enhance speech comprehension and memory for speech, we here demonstrate that more acoustic detail of ignored speech enhances the degree of distraction. Second, while higher predictability of attended speech is known to also enhance speech comprehension under acoustically adverse conditions, higher predictability of ignored speech is unable to exert any distracting effect upon working memory performance in younger or older listeners. These findings suggest that features that make attended speech easier to comprehend do not necessarily enhance distraction by ignored speech. PMID:27826235

  13. Millimeter Wave Radar for detecting the speech signal applications

    NASA Astrophysics Data System (ADS)

    Li, Zong-Wen

    1996-12-01

    MilliMeter Wave (MMW) Doppler Radar with grating structures for the applications of detecting speech signals has been discovered in our laboratory. The operating principle of detection the acoustic wave signals based on the Wave Propagation Theory and Wave Equations of The ElectroMagnetic Wave (EMW) and Acoustic Wave (AW) propagating, scattering, reflecting and interacting has been investigated. The experimental and observation results have been provided to verify that MMW CW 40GHz dielectric integrated radar can detect and identify out exactly the existential speech signals in free space from a person speaking. The received sound signal have been reproduced by the DSP and the reproducer.

  14. Acoustic Analysis of PD Speech

    PubMed Central

    Chenausky, Karen; MacAuslan, Joel; Goldhor, Richard

    2011-01-01

    According to the U.S. National Institutes of Health, approximately 500,000 Americans have Parkinson's disease (PD), with roughly another 50,000 receiving new diagnoses each year. 70%–90% of these people also have the hypokinetic dysarthria associated with PD. Deep brain stimulation (DBS) substantially relieves motor symptoms in advanced-stage patients for whom medication produces disabling dyskinesias. This study investigated speech changes as a result of DBS settings chosen to maximize motor performance. The speech of 10 PD patients and 12 normal controls was analyzed for syllable rate and variability, syllable length patterning, vowel fraction, voice-onset time variability, and spirantization. These were normalized by the controls' standard deviation to represent distance from normal and combined into a composite measure. Results show that DBS settings relieving motor symptoms can improve speech, making it up to three standard deviations closer to normal. However, the clinically motivated settings evaluated here show greater capacity to impair, rather than improve, speech. A feedback device developed from these findings could be useful to clinicians adjusting DBS parameters, as a means for ensuring they do not unwittingly choose DBS settings which impair patients' communication. PMID:21977333

  15. Applications of Hilbert Spectral Analysis for Speech and Sound Signals

    NASA Technical Reports Server (NTRS)

    Huang, Norden E.

    2003-01-01

    A new method for analyzing nonlinear and nonstationary data has been developed, and the natural applications are to speech and sound signals. The key part of the method is the Empirical Mode Decomposition method with which any complicated data set can be decomposed into a finite and often small number of Intrinsic Mode Functions (IMF). An IMF is defined as any function having the same numbers of zero-crossing and extrema, and also having symmetric envelopes defined by the local maxima and minima respectively. The IMF also admits well-behaved Hilbert transform. This decomposition method is adaptive, and, therefore, highly efficient. Since the decomposition is based on the local characteristic time scale of the data, it is applicable to nonlinear and nonstationary processes. With the Hilbert transform, the Intrinsic Mode Functions yield instantaneous frequencies as functions of time, which give sharp identifications of imbedded structures. This method invention can be used to process all acoustic signals. Specifically, it can process the speech signals for Speech synthesis, Speaker identification and verification, Speech recognition, and Sound signal enhancement and filtering. Additionally, as the acoustical signals from machinery are essentially the way the machines are talking to us. Therefore, the acoustical signals, from the machines, either from sound through air or vibration on the machines, can tell us the operating conditions of the machines. Thus, we can use the acoustic signal to diagnosis the problems of machines.

  16. Signals voice biofeedback for speech fluency disorders

    NASA Astrophysics Data System (ADS)

    Martin, Jose Francisco; Fernandez-Ramos, Raquel; Romero-Sanchez, Jorge; Rios, Francisco

    2003-04-01

    The knowledge about mechanisms of voice production as well as the parameters obtaining, allow us to present solutions for coding, transmission and establishment of properties to distinguish between the responsible physiological mechanisms. In this work, we are interested in the evaluation of syllabic Sequences in Continuous Speech. We keep in mind this evaluation is very interesting and useful for Foniatrics and Logopaedia applications focus on the measurement and control of Speech Fluency. Moreover, we are interested in studying and evaluating sequential programming and muscular coordination. In this way, the main objective of our work is focus on the study of production mechanisms, model, evaluation methods and introduction of a reliable algorithm to catalogue and classify the phenomena of rythm and speech fluency. In this paper, we present an algorithm for syllabic analysis based on Short Time Energy concept. Firstly, the algorithm extracts the temporary syllabic intervals of speech and silence, and then compared with normality intervals. Secondly, it proceeds to feedback in real time to the patient luminous and acoustic signals indicating the degree of mismatching with the normality model. This methodology is useful to improve fluency disorder. We present an ASIC microelectronic solution for the syllabic analyser and a portable prototype to be used in a clinic level as much as individualized tool for the patient.

  17. Wavelet Preprocessing of Acoustic Signals

    DTIC Science & Technology

    1991-12-01

    wavelet transform to preprocess acoustic broadband signals in a system that discriminates between different classes of acoustic bursts. This is motivated by the similarity between the proportional bandwidth filters provided by the wavelet transform and those found in biological hearing systems. The experiment involves comparing statistical pattern classifier effects of wavelet and FFT preprocessed acoustic signals. The data used was from the DARPA Phase I database, which consists of artificially generated signals with real ocean background. The

  18. Gender difference in speech intelligibility using speech intelligibility tests and acoustic analyses

    PubMed Central

    2010-01-01

    PURPOSE The purpose of this study was to compare men with women in terms of speech intelligibility, to investigate the validity of objective acoustic parameters related with speech intelligibility, and to try to set up the standard data for the future study in various field in prosthodontics. MATERIALS AND METHODS Twenty men and women were served as subjects in the present study. After recording of sample sounds, speech intelligibility tests by three speech pathologists and acoustic analyses were performed. Comparison of the speech intelligibility test scores and acoustic parameters such as fundamental frequency, fundamental frequency range, formant frequency, formant ranges, vowel working space area, and vowel dispersion were done between men and women. In addition, the correlations between the speech intelligibility values and acoustic variables were analyzed. RESULTS Women showed significantly higher speech intelligibility scores than men and there were significant difference between men and women in most of acoustic parameters used in the present study. However, the correlations between the speech intelligibility scores and acoustic parameters were low. CONCLUSION Speech intelligibility test and acoustic parameters used in the present study were effective in differentiating male voice from female voice and their values might be used in the future studies related patients involved with maxillofacial prosthodontics. However, further studies are needed on the correlation between speech intelligibility tests and objective acoustic parameters. PMID:21165272

  19. Infant Perception of Atypical Speech Signals

    ERIC Educational Resources Information Center

    Vouloumanos, Athena; Gelfand, Hanna M.

    2013-01-01

    The ability to decode atypical and degraded speech signals as intelligible is a hallmark of speech perception. Human adults can perceive sounds as speech even when they are generated by a variety of nonhuman sources including computers and parrots. We examined how infants perceive the speech-like vocalizations of a parrot. Further, we examined how…

  20. Acoustic assessment of erygmophonic speech of Moroccan laryngectomized patients

    PubMed Central

    Ouattassi, Naouar; Benmansour, Najib; Ridal, Mohammed; Zaki, Zouheir; Bendahhou, Karima; Nejjari, Chakib; Cherkaoui, Abdeljabbar; El Alami, Mohammed Nouredine El Amine

    2015-01-01

    Introduction Acoustic evaluation of alaryngeal voices is among the most prominent issues in speech analysis field. In fact, many methods have been developed to date to substitute the classic perceptual evaluation. The Aim of this study is to present our experience in erygmophonic speech objective assessment and to discuss the most widely used methods of acoustic speech appraisal. through a prospective case-control study we have measured acoustic parameters of speech quality during one year of erygmophonic rehabilitation therapy of Moroccan laryngectomized patients. Methods We have assessed acoustic parameters of erygmophonic speech samples of eleven laryngectomized patients through the speech rehabilitation therapy. Acoustic parameters were obtained by perturbation analysis method and linear predictive coding algorithms also through the broadband spectrogram. Results Using perturbation analysis methods, we have found erygmophonic voice to be significantly poorer than normal speech and it exhibits higher formant frequency values. However, erygmophonic voice shows also higher and extremely variable Error values that were greater than the acceptable level. And thus, live a doubt on the reliability of those analytic methods results. Conclusion Acoustic parameters for objective evaluation of alaryngeal voices should allow a reliable representation of the perceptual evaluation of the quality of speech. This requirement has not been fulfilled by the common methods used so far. Therefore, acoustical assessment of erygmophonic speech needs more investigations. PMID:26587121

  1. Acoustic Study of Acted Emotions in Speech

    NASA Astrophysics Data System (ADS)

    Wang, Rong

    An extensive set of carefully recorded utterances provided a speech database for investigating acoustic correlates among eight emotional states. Four professional actors and four professional actresses simulated the emotional states of joy, conversation, nervousness, anger, sadness, hate, fear, and depression. The values of 14 acoustic parameters were extracted from analyses of the simulated portrayals. Normalization of the parameters was made to reduce the talker-dependence. Correlates of emotion were investigated by means of principal components analysis. Sadness and depression were found to be "ambiguous" with respect to each other, but "unique" with respect to joy and anger in the principal components space. Joy, conversation, nervousness, anger, hate, and fear did not separate well in the space and so exhibited ambiguity with respect to one another. The different talkers expressed joy, anger, sadness, and depression more consistently than the other four emotions. The analysis results were compared with the results of a subjective study using the same speech database and considerable consistency between the two was found.

  2. Acoustics in Halls for Speech and Music

    NASA Astrophysics Data System (ADS)

    Gade, Anders C.

    This chapter deals specifically with concepts, tools, and architectural variables of importance when designing auditoria for speech and music. The focus will be on cultivating the useful components of the sound in the room rather than on avoiding noise from outside or from installations, which is dealt with in Chap. 11. The chapter starts by presenting the subjective aspects of the room acoustic experience according to consensus at the time of writing. Then follows a description of their objective counterparts, the objective room acoustic parameters, among which the classical reverberation time measure is only one of many, but still of fundamental value. After explanations on how these parameters can be measured and predicted during the design phase, the remainder of the chapter deals with how the acoustic properties can be controlled by the architectural design of auditoria. This is done by presenting the influence of individual design elements as well as brief descriptions of halls designed for specific purposes, such as drama, opera, and symphonic concerts. Finally, some important aspects of loudspeaker installations in auditoria are briefly touched upon.

  3. Low Bandwidth Vocoding using EM Sensor and Acoustic Signal Processing

    SciTech Connect

    Ng, L C; Holzrichter, J F; Larson, P E

    2001-10-25

    Low-power EM radar-like sensors have made it possible to measure properties of the human speech production system in real-time, without acoustic interference [1]. By combining these data with the corresponding acoustic signal, we've demonstrated an almost 10-fold bandwidth reduction in speech compression, compared to a standard 2.4 kbps LPC10 protocol used in the STU-III (Secure Terminal Unit, third generation) telephone. This paper describes a potential EM sensor/acoustic based vocoder implementation.

  4. Speaker verification using combined acoustic and EM sensor signal processing

    SciTech Connect

    Ng, L C; Gable, T J; Holzrichter, J F

    2000-11-10

    Low Power EM radar-like sensors have made it possible to measure properties of the human speech production system in real-time, without acoustic interference. This greatly enhances the quality and quantity of information for many speech related applications. See Holzrichter, Burnett, Ng, and Lea, J. Acoustic. SOC. Am . 103 ( 1) 622 (1998). By combining the Glottal-EM-Sensor (GEMS) with the Acoustic-signals, we've demonstrated an almost 10 fold reduction in error rates from a speaker verification system experiment under a moderate noisy environment (-10dB).

  5. Acoustic properties of naturally produced clear speech at normal speaking rates

    NASA Astrophysics Data System (ADS)

    Krause, Jean C.; Braida, Louis D.

    2004-01-01

    Sentences spoken ``clearly'' are significantly more intelligible than those spoken ``conversationally'' for hearing-impaired listeners in a variety of backgrounds [Picheny et al., J. Speech Hear. Res. 28, 96-103 (1985); Uchanski et al., ibid. 39, 494-509 (1996); Payton et al., J. Acoust. Soc. Am. 95, 1581-1592 (1994)]. While producing clear speech, however, talkers often reduce their speaking rate significantly [Picheny et al., J. Speech Hear. Res. 29, 434-446 (1986); Uchanski et al., ibid. 39, 494-509 (1996)]. Yet speaking slowly is not solely responsible for the intelligibility benefit of clear speech (over conversational speech), since a recent study [Krause and Braida, J. Acoust. Soc. Am. 112, 2165-2172 (2002)] showed that talkers can produce clear speech at normal rates with training. This finding suggests that clear speech has inherent acoustic properties, independent of rate, that contribute to improved intelligibility. Identifying these acoustic properties could lead to improved signal processing schemes for hearing aids. To gain insight into these acoustical properties, conversational and clear speech produced at normal speaking rates were analyzed at three levels of detail (global, phonological, and phonetic). Although results suggest that talkers may have employed different strategies to achieve clear speech at normal rates, two global-level properties were identified that appear likely to be linked to the improvements in intelligibility provided by clear/normal speech: increased energy in the 1000-3000-Hz range of long-term spectra and increased modulation depth of low frequency modulations of the intensity envelope. Other phonological and phonetic differences associated with clear/normal speech include changes in (1) frequency of stop burst releases, (2) VOT of word-initial voiceless stop consonants, and (3) short-term vowel spectra.

  6. School cafeteria noise-The impact of room acoustics and speech intelligibility on children's voice levels

    NASA Astrophysics Data System (ADS)

    Bridger, Joseph F.

    2002-05-01

    The impact of room acoustics and speech intelligibility conditions of different school cafeterias on the voice levels of children is examined. Methods of evaluating cafeteria designs and predicting noise levels are discussed. Children are shown to modify their voice levels with changes in speech intelligibility like adults. Reverberation and signal to noise ratio are the important acoustical factors affecting speech intelligibility. Children have much more difficulty than adults in conditions where noise and reverberation are present. To evaluate the relationship of voice level and speech intelligibility, a database of real sound levels and room acoustics data was generated from measurements and data recorded during visits to a variety of existing cafeterias under different occupancy conditions. The effects of speech intelligibility and room acoustics on childrens voice levels are demonstrated. A new method is presented for predicting speech intelligibility conditions and resulting noise levels for the design of new cafeterias and renovation of existing facilities. Measurements are provided for an existing school cafeteria before and after new room acoustics treatments were added. This will be helpful for acousticians, architects, school systems, regulatory agencies, and Parent Teacher Associations to create less noisy cafeteria environments.

  7. Detection and Classification of Whale Acoustic Signals

    NASA Astrophysics Data System (ADS)

    Xian, Yin

    vocalization data set. The word error rate of the DCTNet feature is similar to the MFSC in speech recognition tasks, suggesting that the convolutional network is able to reveal acoustic content of speech signals.

  8. Age-Related Changes in Acoustic Characteristics of Adult Speech

    ERIC Educational Resources Information Center

    Torre, Peter, III; Barlow, Jessica A.

    2009-01-01

    This paper addresses effects of age and sex on certain acoustic properties of speech, given conflicting findings on such effects reported in prior research. The speech of 27 younger adults (15 women, 12 men; mean age 25.5 years) and 59 older adults (32 women, 27 men; mean age 75.2 years) was evaluated for identification of differences for sex and…

  9. Clear Speech Variants: An Acoustic Study in Parkinson's Disease

    ERIC Educational Resources Information Center

    Lam, Jennifer; Tjaden, Kris

    2016-01-01

    Purpose: The authors investigated how different variants of clear speech affect segmental and suprasegmental acoustic measures of speech in speakers with Parkinson's disease and a healthy control group. Method: A total of 14 participants with Parkinson's disease and 14 control participants served as speakers. Each speaker produced 18 different…

  10. Mathematical model of acoustic speech production with mobile walls of the vocal tract

    NASA Astrophysics Data System (ADS)

    Lyubimov, N. A.; Zakharov, E. V.

    2016-03-01

    A mathematical speech production model is considered that describes acoustic oscillation propagation in a vocal tract with mobile walls. The wave field function satisfies the Helmholtz equation with boundary conditions of the third kind (impedance type). The impedance mode corresponds to a threeparameter pendulum oscillation model. The experimental research demonstrates the nonlinear character of how the mobility of the vocal tract walls influence the spectral envelope of a speech signal.

  11. Wavelet preprocessing of acoustic signals

    NASA Astrophysics Data System (ADS)

    Huang, W. Y.; Solorzano, M. R.

    1991-12-01

    This paper describes results using the wavelet transform to preprocess acoustic broadband signals in a system that discriminates between different classes of acoustic bursts. This is motivated by the similarity between the proportional bandwidth filters provided by the wavelet transform and those found in biological hearing systems. The experiment involves comparing statistical pattern classifier effects of wavelet and FFT preprocessed acoustic signals. The data used was from the DARPA Phase 1 database, which consists of artificially generated signals with real ocean background. The results show that the wavelet transform did provide improved performance when classifying in a frame-by-frame basis. The DARPA Phase 1 database is well matched to proportional bandwidth filtering; i.e., signal classes that contain high frequencies do tend to have shorter duration in this database. It is also noted that the decreasing background levels at high frequencies compensate for the poor match of the wavelet transform for long duration (high frequency) signals.

  12. Acoustic assessment of speech privacy curtains in two nursing units.

    PubMed

    Pope, Diana S; Miller-Klein, Erik T

    2016-01-01

    Hospitals have complex soundscapes that create challenges to patient care. Extraneous noise and high reverberation rates impair speech intelligibility, which leads to raised voices. In an unintended spiral, the increasing noise may result in diminished speech privacy, as people speak loudly to be heard over the din. The products available to improve hospital soundscapes include construction materials that absorb sound (acoustic ceiling tiles, carpet, wall insulation) and reduce reverberation rates. Enhanced privacy curtains are now available and offer potential for a relatively simple way to improve speech privacy and speech intelligibility by absorbing sound at the hospital patient's bedside. Acoustic assessments were performed over 2 days on two nursing units with a similar design in the same hospital. One unit was built with the 1970s' standard hospital construction and the other was newly refurbished (2013) with sound-absorbing features. In addition, we determined the effect of an enhanced privacy curtain versus standard privacy curtains using acoustic measures of speech privacy and speech intelligibility indexes. Privacy curtains provided auditory protection for the patients. In general, that protection was increased by the use of enhanced privacy curtains. On an average, the enhanced curtain improved sound absorption from 20% to 30%; however, there was considerable variability, depending on the configuration of the rooms tested. Enhanced privacy curtains provide measureable improvement to the acoustics of patient rooms but cannot overcome larger acoustic design issues. To shorten reverberation time, additional absorption, and compact and more fragmented nursing unit floor plate shapes should be considered.

  13. Acoustic assessment of speech privacy curtains in two nursing units

    PubMed Central

    Pope, Diana S.; Miller-Klein, Erik T.

    2016-01-01

    Hospitals have complex soundscapes that create challenges to patient care. Extraneous noise and high reverberation rates impair speech intelligibility, which leads to raised voices. In an unintended spiral, the increasing noise may result in diminished speech privacy, as people speak loudly to be heard over the din. The products available to improve hospital soundscapes include construction materials that absorb sound (acoustic ceiling tiles, carpet, wall insulation) and reduce reverberation rates. Enhanced privacy curtains are now available and offer potential for a relatively simple way to improve speech privacy and speech intelligibility by absorbing sound at the hospital patient's bedside. Acoustic assessments were performed over 2 days on two nursing units with a similar design in the same hospital. One unit was built with the 1970s’ standard hospital construction and the other was newly refurbished (2013) with sound-absorbing features. In addition, we determined the effect of an enhanced privacy curtain versus standard privacy curtains using acoustic measures of speech privacy and speech intelligibility indexes. Privacy curtains provided auditory protection for the patients. In general, that protection was increased by the use of enhanced privacy curtains. On an average, the enhanced curtain improved sound absorption from 20% to 30%; however, there was considerable variability, depending on the configuration of the rooms tested. Enhanced privacy curtains provide measureable improvement to the acoustics of patient rooms but cannot overcome larger acoustic design issues. To shorten reverberation time, additional absorption, and compact and more fragmented nursing unit floor plate shapes should be considered. PMID:26780959

  14. Speech and melody recognition in binaurally combined acoustic and electric hearing

    NASA Astrophysics Data System (ADS)

    Kong, Ying-Yee; Stickney, Ginger S.; Zeng, Fan-Gang

    2005-03-01

    Speech recognition in noise and music perception is especially challenging for current cochlear implant users. The present study utilizes the residual acoustic hearing in the nonimplanted ear in five cochlear implant users to elucidate the role of temporal fine structure at low frequencies in auditory perception and to test the hypothesis that combined acoustic and electric hearing produces better performance than either mode alone. The first experiment measured speech recognition in the presence of competing noise. It was found that, although the residual low-frequency (<1000 Hz) acoustic hearing produced essentially no recognition for speech recognition in noise, it significantly enhanced performance when combined with the electric hearing. The second experiment measured melody recognition in the same group of subjects and found that, contrary to the speech recognition result, the low-frequency acoustic hearing produced significantly better performance than the electric hearing. It is hypothesized that listeners with combined acoustic and electric hearing might use the correlation between the salient pitch in low-frequency acoustic hearing and the weak pitch in the envelope to enhance segregation between signal and noise. The present study suggests the importance and urgency of accurately encoding the fine-structure cue in cochlear implants. .

  15. Optimizing acoustical conditions for speech intelligibility in classrooms

    NASA Astrophysics Data System (ADS)

    Yang, Wonyoung

    High speech intelligibility is imperative in classrooms where verbal communication is critical. However, the optimal acoustical conditions to achieve a high degree of speech intelligibility have previously been investigated with inconsistent results, and practical room-acoustical solutions to optimize the acoustical conditions for speech intelligibility have not been developed. This experimental study validated auralization for speech-intelligibility testing, investigated the optimal reverberation for speech intelligibility for both normal and hearing-impaired listeners using more realistic room-acoustical models, and proposed an optimal sound-control design for speech intelligibility based on the findings. The auralization technique was used to perform subjective speech-intelligibility tests. The validation study, comparing auralization results with those of real classroom speech-intelligibility tests, found that if the room to be auralized is not very absorptive or noisy, speech-intelligibility tests using auralization are valid. The speech-intelligibility tests were done in two different auralized sound fields---approximately diffuse and non-diffuse---using the Modified Rhyme Test and both normal and hearing-impaired listeners. A hybrid room-acoustical prediction program was used throughout the work, and it and a 1/8 scale-model classroom were used to evaluate the effects of ceiling barriers and reflectors. For both subject groups, in approximately diffuse sound fields, when the speech source was closer to the listener than the noise source, the optimal reverberation time was zero. When the noise source was closer to the listener than the speech source, the optimal reverberation time was 0.4 s (with another peak at 0.0 s) with relative output power levels of the speech and noise sources SNS = 5 dB, and 0.8 s with SNS = 0 dB. In non-diffuse sound fields, when the noise source was between the speaker and the listener, the optimal reverberation time was 0.6 s with

  16. Dynamic Encoding of Acoustic Features in Neural Responses to Continuous Speech.

    PubMed

    Khalighinejad, Bahar; Cruzatto da Silva, Guilherme; Mesgarani, Nima

    2017-02-22

    Humans are unique in their ability to communicate using spoken language. However, it remains unclear how the speech signal is transformed and represented in the brain at different stages of the auditory pathway. In this study, we characterized electroencephalography responses to continuous speech by obtaining the time-locked responses to phoneme instances (phoneme-related potential). We showed that responses to different phoneme categories are organized by phonetic features. We found that each instance of a phoneme in continuous speech produces multiple distinguishable neural responses occurring as early as 50 ms and as late as 400 ms after the phoneme onset. Comparing the patterns of phoneme similarity in the neural responses and the acoustic signals confirms a repetitive appearance of acoustic distinctions of phonemes in the neural data. Analysis of the phonetic and speaker information in neural activations revealed that different time intervals jointly encode the acoustic similarity of both phonetic and speaker categories. These findings provide evidence for a dynamic neural transformation of low-level speech features as they propagate along the auditory pathway, and form an empirical framework to study the representational changes in learning, attention, and speech disorders.SIGNIFICANCE STATEMENT We characterized the properties of evoked neural responses to phoneme instances in continuous speech. We show that each instance of a phoneme in continuous speech produces several observable neural responses at different times occurring as early as 50 ms and as late as 400 ms after the phoneme onset. Each temporal event explicitly encodes the acoustic similarity of phonemes, and linguistic and nonlinguistic information are best represented at different time intervals. Finally, we show a joint encoding of phonetic and speaker information, where the neural representation of speakers is dependent on phoneme category. These findings provide compelling new evidence for

  17. Dynamic Encoding of Acoustic Features in Neural Responses to Continuous Speech

    PubMed Central

    Khalighinejad, Bahar; Cruzatto da Silva, Guilherme

    2017-01-01

    Humans are unique in their ability to communicate using spoken language. However, it remains unclear how the speech signal is transformed and represented in the brain at different stages of the auditory pathway. In this study, we characterized electroencephalography responses to continuous speech by obtaining the time-locked responses to phoneme instances (phoneme-related potential). We showed that responses to different phoneme categories are organized by phonetic features. We found that each instance of a phoneme in continuous speech produces multiple distinguishable neural responses occurring as early as 50 ms and as late as 400 ms after the phoneme onset. Comparing the patterns of phoneme similarity in the neural responses and the acoustic signals confirms a repetitive appearance of acoustic distinctions of phonemes in the neural data. Analysis of the phonetic and speaker information in neural activations revealed that different time intervals jointly encode the acoustic similarity of both phonetic and speaker categories. These findings provide evidence for a dynamic neural transformation of low-level speech features as they propagate along the auditory pathway, and form an empirical framework to study the representational changes in learning, attention, and speech disorders. SIGNIFICANCE STATEMENT We characterized the properties of evoked neural responses to phoneme instances in continuous speech. We show that each instance of a phoneme in continuous speech produces several observable neural responses at different times occurring as early as 50 ms and as late as 400 ms after the phoneme onset. Each temporal event explicitly encodes the acoustic similarity of phonemes, and linguistic and nonlinguistic information are best represented at different time intervals. Finally, we show a joint encoding of phonetic and speaker information, where the neural representation of speakers is dependent on phoneme category. These findings provide compelling new evidence for

  18. Acoustic Localization with Infrasonic Signals

    NASA Astrophysics Data System (ADS)

    Threatt, Arnesha; Elbing, Brian

    2015-11-01

    Numerous geophysical and anthropogenic events emit infrasonic frequencies (<20 Hz), including volcanoes, hurricanes, wind turbines and tornadoes. These sounds, which cannot be heard by the human ear, can be detected from large distances (in excess of 100 miles) due to low frequency acoustic signals having a very low decay rate in the atmosphere. Thus infrasound could be used for long-range, passive monitoring and detection of these events. An array of microphones separated by known distances can be used to locate a given source, which is known as acoustic localization. However, acoustic localization with infrasound is particularly challenging due to contamination from other signals, sensitivity to wind noise and producing a trusted source for system development. The objective of the current work is to create an infrasonic source using a propane torch wand or a subwoofer and locate the source using multiple infrasonic microphones. This presentation will present preliminary results from various microphone configurations used to locate the source.

  19. Empirical mode decomposition for analyzing acoustical signals

    NASA Technical Reports Server (NTRS)

    Huang, Norden E. (Inventor)

    2005-01-01

    The present invention discloses a computer implemented signal analysis method through the Hilbert-Huang Transformation (HHT) for analyzing acoustical signals, which are assumed to be nonlinear and nonstationary. The Empirical Decomposition Method (EMD) and the Hilbert Spectral Analysis (HSA) are used to obtain the HHT. Essentially, the acoustical signal will be decomposed into the Intrinsic Mode Function Components (IMFs). Once the invention decomposes the acoustic signal into its constituting components, all operations such as analyzing, identifying, and removing unwanted signals can be performed on these components. Upon transforming the IMFs into Hilbert spectrum, the acoustical signal may be compared with other acoustical signals.

  20. Methods and apparatus for non-acoustic speech characterization and recognition

    DOEpatents

    Holzrichter, John F.

    1999-01-01

    By simultaneously recording EM wave reflections and acoustic speech information, the positions and velocities of the speech organs as speech is articulated can be defined for each acoustic speech unit. Well defined time frames and feature vectors describing the speech, to the degree required, can be formed. Such feature vectors can uniquely characterize the speech unit being articulated each time frame. The onset of speech, rejection of external noise, vocalized pitch periods, articulator conditions, accurate timing, the identification of the speaker, acoustic speech unit recognition, and organ mechanical parameters can be determined.

  1. Methods and apparatus for non-acoustic speech characterization and recognition

    SciTech Connect

    Holzrichter, J.F.

    1999-12-21

    By simultaneously recording EM wave reflections and acoustic speech information, the positions and velocities of the speech organs as speech is articulated can be defined for each acoustic speech unit. Well defined time frames and feature vectors describing the speech, to the degree required, can be formed. Such feature vectors can uniquely characterize the speech unit being articulated each time frame. The onset of speech, rejection of external noise, vocalized pitch periods, articulator conditions, accurate timing, the identification of the speaker, acoustic speech unit recognition, and organ mechanical parameters can be determined.

  2. Feature analysis of pathological speech signals using local discriminant bases technique.

    PubMed

    Umapathy, K; Krishnan, S

    2005-07-01

    Speech is an integral part of the human communication system. Various pathological conditions affect the vocal functions, inducing speech disorders. Acoustic parameters of speech are commonly used for the assessment of speech disorders and for monitoring the progress of the patient over the course of therapy. In the last two decades, signal-processing techniques have been successfully applied in screening speech disorders. In the paper, a novel approach is proposed to classify pathological speech signals using a local discriminant bases (LDB) algorithm and wavelet packet decompositions. The focus of the paper was to demonstrate the significance of identifying the signal subspaces that contribute to the discriminatory characteristics of normal and pathological speech signals in a computationally efficient way. Features were extracted from target subspaces for classification, and time-frequency decomposition was used to eliminate the need for segmentation of the speech signals. The technique was tested with a database of 212 speech signals (51 normal and 161 pathological) using the Daubechies wavelet (db4). Classification accuracies up to 96% were achieved for a two-group classification as normal and pathological speech signals, and 74% was achieved for a four-group classification as male normal, female normal, male pathological and female pathological signals.

  3. Relationship between acoustic measures and speech naturalness ratings in Parkinson's disease: A within-speaker approach.

    PubMed

    Klopfenstein, Marie

    2015-01-01

    This study investigated the acoustic basis of across-utterance, within-speaker variation in speech naturalness for four speakers with dysarthria secondary to Parkinson's disease (PD). Speakers read sentences and produced spontaneous speech. Acoustic measures of fundamental frequency, phrase-final syllable lengthening, intensity and speech rate were obtained. A group of listeners judged speech naturalness using a nine-point Likert scale. Relationships between judgements of speech naturalness and acoustic measures were determined for individual speakers with PD. Relationships among acoustic measures also were quantified. Despite variability between speakers, measures of mean F0, intensity range, articulation rate, average syllable duration, duration of final syllables, vocalic nucleus length of final unstressed syllables and pitch accent of final syllables emerged as possible acoustic variables contributing to within-speaker variations in speech naturalness. Results suggest that acoustic measures correlate with speech naturalness, but in dysarthric speech they depend on the speaker due to the within-speaker variation in speech impairment.

  4. Role of the middle ear muscle apparatus in mechanisms of speech signal discrimination

    NASA Technical Reports Server (NTRS)

    Moroz, B. S.; Bazarov, V. G.; Sachenko, S. V.

    1980-01-01

    A method of impedance reflexometry was used to examine 101 students with hearing impairment in order to clarify the interrelation between speech discrimination and the state of the middle ear muscles. Ability to discriminate speech signals depends to some extent on the functional state of intraaural muscles. Speech discrimination was greatly impaired in the absence of stapedial muscle acoustic reflex, in the presence of low thresholds of stimulation and in very small values of reflex amplitude increase. Discrimination was not impeded in positive AR, high values of relative thresholds and normal increase of reflex amplitude in response to speech signals with augmenting intensity.

  5. Clear Speech Variants: An Acoustic Study in Parkinson's Disease

    PubMed Central

    Tjaden, Kris

    2016-01-01

    Purpose The authors investigated how different variants of clear speech affect segmental and suprasegmental acoustic measures of speech in speakers with Parkinson's disease and a healthy control group. Method A total of 14 participants with Parkinson's disease and 14 control participants served as speakers. Each speaker produced 18 different sentences selected from the Sentence Intelligibility Test (Yorkston & Beukelman, 1996). All speakers produced stimuli in 4 speaking conditions (habitual, clear, overenunciate, and hearing impaired). Segmental acoustic measures included vowel space area and first moment (M1) coefficient difference measures for consonant pairs. Second formant slope of diphthongs and measures of vowel and fricative durations were also obtained. Suprasegmental measures included fundamental frequency, sound pressure level, and articulation rate. Results For the majority of adjustments, all variants of clear speech instruction differed from the habitual condition. The overenunciate condition elicited the greatest magnitude of change for segmental measures (vowel space area, vowel durations) and the slowest articulation rates. The hearing impaired condition elicited the greatest fricative durations and suprasegmental adjustments (fundamental frequency, sound pressure level). Conclusions Findings have implications for a model of speech production for healthy speakers as well as for speakers with dysarthria. Findings also suggest that particular clear speech instructions may target distinct speech subsystems. PMID:27355431

  6. An Acoustic Measure for Word Prominence in Spontaneous Speech

    PubMed Central

    Wang, Dagen; Narayanan, Shrikanth

    2010-01-01

    An algorithm for automatic speech prominence detection is reported in this paper. We describe a comparative analysis on various acoustic features for word prominence detection and report results using a spoken dialog corpus with manually assigned prominence labels. The focus is on features such as spectral intensity and speech rate that are directly extracted from speech based on a correlation-based approach without requiring explicit linguistic or phonetic knowledge. Additionally, various pitch-based measures are studied with respect to their discriminating ability for prominence detection. A parametric scheme for modeling pitch plateau is proposed and this feature alone is found to outperform the traditional local pitch statistics. Two sets of experiments are used to explore the usefulness of the acoustic score generated using these features. The first set focuses on a more traditional way of word prominence detection based on a manually-tagged corpus. A 76.8% classification accuracy was achieved on a corpus of role-playing spoken dialogs. Due to difficulties in manually tagging speech prominence into discrete levels (categories), the second set of experiments focuses on evaluating the score indirectly. Specifically, through experiments on the Switchboard corpus, it is shown that the proposed acoustic score can discriminate between content word and function words in a statistically significant way. The relation between speech prominence and content/function words is also explored. Since prominent words tend to be predominantly content words, and since content words can be automatically marked from text-derived part of speech (POS) information, it is shown that the proposed acoustic score can be indirectly cross-validated through POS information. PMID:20454538

  7. Acoustic sleepiness detection: framework and validation of a speech-adapted pattern recognition approach.

    PubMed

    Krajewski, Jarek; Batliner, Anton; Golz, Martin

    2009-08-01

    This article describes a general framework for detecting sleepiness states on the basis of prosody, articulation, and speech-quality-related speech characteristics. The advantages of this automatic real-time approach are that obtaining speech data is nonobstrusive and is free from sensor application and calibration efforts. Different types of acoustic features derived from speech, speaker, and emotion recognition were employed (frame-level-based speech features). Combing these features with high-level contour descriptors, which capture the temporal information of frame-level descriptor contours, results in 45,088 features per speech sample. In general, the measurement process follows the speech-adapted steps of pattern recognition: (1) recording speech, (2) preprocessing, (3) feature computation (using perceptual and signal-processing-related features such as, e.g., fundamental frequency, intensity, pause patterns, formants, and cepstral coefficients), (4) dimensionality reduction, (5) classification, and (6) evaluation. After a correlation-filter-based feature subset selection employed on the feature space in order to find most relevant features, different classification models were trained. The best model-namely, the support-vector machine-achieved 86.1% classification accuracy in predicting sleepiness in a sleep deprivation study (two-class problem, N=12; 01.00-08.00 a.m.).

  8. Amplitude Modulations of Acoustic Communication Signals

    NASA Astrophysics Data System (ADS)

    Turesson, Hjalmar K.

    2011-12-01

    In human speech, amplitude modulations at 3 -- 8 Hz are important for discrimination and detection. Two different neurophysiological theories have been proposed to explain this effect. The first theory proposes that, as a consequence of neocortical synaptic dynamics, signals that are amplitude modulated at 3 -- 8 Hz are propagated better than un-modulated signals, or signals modulated above 8 Hz. This suggests that neural activity elicited by vocalizations modulated at 3 -- 8 Hz is optimally transmitted, and the vocalizations better discriminated and detected. The second theory proposes that 3 -- 8 Hz amplitude modulations interact with spontaneous neocortical oscillations. Specifically, vocalizations modulated at 3 -- 8 Hz entrain local populations of neurons, which in turn, modulate the amplitude of high frequency gamma oscillations. This suggests that vocalizations modulated at 3 -- 8 Hz should induce stronger cross-frequency coupling. Similar to human speech, we found that macaque monkey vocalizations also are amplitude modulated between 3 and 8 Hz. Humans and macaque monkeys share similarities in vocal production, implying that the auditory systems subserving perception of acoustic communication signals also share similarities. Based on the similarities between human speech and macaque monkey vocalizations, we addressed how amplitude modulated vocalizations are processed in the auditory cortex of macaque monkeys, and what behavioral relevance modulations may have. Recording single neuron activity, as well as, the activity of local populations of neurons allowed us to test both of the neurophysiological theories presented above. We found that single neuron responses to vocalizations amplitude modulated at 3 -- 8 Hz resulted in better stimulus discrimination than vocalizations lacking 3 -- 8 Hz modulations, and that the effect most likely was mediated by synaptic dynamics. In contrast, we failed to find support for the oscillation-based model proposing a

  9. Effects of human fatigue on speech signals

    NASA Astrophysics Data System (ADS)

    Stamoulis, Catherine

    2004-05-01

    Cognitive performance may be significantly affected by fatigue. In the case of critical personnel, such as pilots, monitoring human fatigue is essential to ensure safety and success of a given operation. One of the modalities that may be used for this purpose is speech, which is sensitive to respiratory changes and increased muscle tension of vocal cords, induced by fatigue. Age, gender, vocal tract length, physical and emotional state may significantly alter speech intensity, duration, rhythm, and spectral characteristics. In addition to changes in speech rhythm, fatigue may also affect the quality of speech, such as articulation. In a noisy environment, detecting fatigue-related changes in speech signals, particularly subtle changes at the onset of fatigue, may be difficult. Therefore, in a performance-monitoring system, speech parameters which are significantly affected by fatigue need to be identified and extracted from input signals. For this purpose, a series of experiments was performed under slowly varying cognitive load conditions and at different times of the day. The results of the data analysis are presented here.

  10. Acoustic Characteristics of Ataxic Speech in Japanese Patients with Spinocerebellar Degeneration (SCD)

    ERIC Educational Resources Information Center

    Ikui, Yukiko; Tsukuda, Mamoru; Kuroiwa, Yoshiyuki; Koyano, Shigeru; Hirose, Hajime; Taguchi, Takahide

    2012-01-01

    Background: In English- and German-speaking countries, ataxic speech is often described as showing scanning based on acoustic impressions. Although the term "scanning" is generally considered to represent abnormal speech features including prosodic excess or insufficiency, any precise acoustic analysis of ataxic speech has not been…

  11. Accuracy of perceptual and acoustic methods for the detection of inspiratory loci in spontaneous speech.

    PubMed

    Wang, Yu-Tsai; Nip, Ignatius S B; Green, Jordan R; Kent, Ray D; Kent, Jane Finley; Ullman, Cara

    2012-12-01

    The present study investigates the accuracy of perceptually and acoustically determined inspiratory loci in spontaneous speech for the purpose of identifying breath groups. Sixteen participants were asked to talk about simple topics in daily life at a comfortable speaking rate and loudness while connected to a pneumotach and audio microphone. The locations of inspiratory loci were determined on the basis of the aerodynamic signal, which served as a reference for loci identified perceptually and acoustically. Signal detection theory was used to evaluate the accuracy of the methods. The results showed that the greatest accuracy in pause detection was achieved (1) perceptually, on the basis of agreement between at least two of three judges, and (2) acoustically, using a pause duration threshold of 300 ms. In general, the perceptually based method was more accurate than was the acoustically based method. Inconsistencies among perceptually determined, acoustically determined, and aerodynamically determined inspiratory loci for spontaneous speech should be weighed in selecting a method of breath group determination.

  12. Speech intelligibility in complex acoustic environments in young children

    NASA Astrophysics Data System (ADS)

    Litovsky, Ruth

    2003-04-01

    While the auditory system undergoes tremendous maturation during the first few years of life, it has become clear that in complex scenarios when multiple sounds occur and when echoes are present, children's performance is significantly worse than their adult counterparts. The ability of children (3-7 years of age) to understand speech in a simulated multi-talker environment and to benefit from spatial separation of the target and competing sounds was investigated. In these studies, competing sources vary in number, location, and content (speech, modulated or unmodulated speech-shaped noise and time-reversed speech). The acoustic spaces were also varied in size and amount of reverberation. Finally, children with chronic otitis media who received binaural training were tested pre- and post-training on a subset of conditions. Results indicated the following. (1) Children experienced significantly more masking than adults, even in the simplest conditions tested. (2) When the target and competing sounds were spatially separated speech intelligibility improved, but the amount varied with age, type of competing sound, and number of competitors. (3) In a large reverberant classroom there was no benefit of spatial separation. (4) Binaural training improved speech intelligibility performance in children with otitis media. Future work includes similar studies in children with unilateral and bilateral cochlear implants. [Work supported by NIDCD, DRF, and NOHR.

  13. Spatial acoustic signal processing for immersive communication

    NASA Astrophysics Data System (ADS)

    Atkins, Joshua

    Computing is rapidly becoming ubiquitous as users expect devices that can augment and interact naturally with the world around them. In these systems it is necessary to have an acoustic front-end that is able to capture and reproduce natural human communication. Whether the end point is a speech recognizer or another human listener, the reduction of noise, reverberation, and acoustic echoes are all necessary and complex challenges. The focus of this dissertation is to provide a general method for approaching these problems using spherical microphone and loudspeaker arrays.. In this work, a theory of capturing and reproducing three-dimensional acoustic fields is introduced from a signal processing perspective. In particular, the decomposition of the spatial part of the acoustic field into an orthogonal basis of spherical harmonics provides not only a general framework for analysis, but also many processing advantages. The spatial sampling error limits the upper frequency range with which a sound field can be accurately captured or reproduced. In broadband arrays, the cost and complexity of using multiple transducers is an issue. This work provides a flexible optimization method for determining the location of array elements to minimize the spatial aliasing error. The low frequency array processing ability is also limited by the SNR, mismatch, and placement error of transducers. To address this, a robust processing method is introduced and used to design a reproduction system for rendering over arbitrary loudspeaker arrays or binaurally over headphones. In addition to the beamforming problem, the multichannel acoustic echo cancellation (MCAEC) issue is also addressed. A MCAEC must adaptively estimate and track the constantly changing loudspeaker-room-microphone response to remove the sound field presented over the loudspeakers from that captured by the microphones. In the multichannel case, the system is overdetermined and many adaptive schemes fail to converge to

  14. Tongue-Palate Contact Pressure, Oral Air Pressure, and Acoustics of Clear Speech

    ERIC Educational Resources Information Center

    Searl, Jeff; Evitts, Paul M.

    2013-01-01

    Purpose: The authors compared articulatory contact pressure (ACP), oral air pressure (Po), and speech acoustics for conversational versus clear speech. They also assessed the relationship of these measures to listener perception. Method: Twelve adults with normal speech produced monosyllables in a phrase using conversational and clear speech.…

  15. Effects of age, acoustic challenge, and verbal working memory on recall of narrative speech

    PubMed Central

    Ward, Caitlin M.; Rogers, Chad S.; Van Engen, Kristin J.; Peelle, Jonathan E.

    2016-01-01

    Background A common goal during speech comprehension is to remember what we have heard. Encoding speech into long-term memory frequently requires processes such as verbal working memory that may also be involved in processing degraded speech. Here we tested whether young and older adult listeners’ memory for short stories was worse when the stories were acoustically degraded, or whether the additional contextual support provided by a narrative would protect against these effects. Methods We tested 30 young adults (aged 18–28 years) and 30 older adults (aged 65–79 years) with good self-reported hearing. Participants heard short stories that were presented as normal (unprocessed) speech, or acoustically degraded using a noise vocoding algorithm with 24 or 16 channels. The degraded stories were still fully intelligible. Following each story, participants were asked to repeat the story in as much detail as possible. Recall was scored using a modified idea unit scoring approach, which included separately scoring hierarchical levels of narrative detail. Results Memory for acoustically degraded stories was significantly worse than for normal stories at some levels of narrative detail. Older adults’ memory for the stories was significantly worse overall, but there was no interaction between age and acoustic clarity or level of narrative detail. Verbal working memory (assessed by reading span) significantly correlated with recall accuracy for both young and older adults, whereas hearing ability (better ear pure-tone average) did not. Conclusion Our findings are consistent with a framework in which the additional cognitive demands caused by a degraded acoustic signal use resources that would otherwise be available for memory encoding for both young and older adults. Verbal working memory is a likely candidate for supporting both of these processes. PMID:26683044

  16. Language identification from visual-only speech signals

    PubMed Central

    Ronquest, Rebecca E.; Levi, Susannah V.; Pisoni, David B.

    2010-01-01

    Our goal in the present study was to examine how observers identify English and Spanish from visual-only displays of speech. First, we replicated the recent findings of Soto-Faraco et al. (2007) with Spanish and English bilingual and monolingual observers using different languages and a different experimental paradigm (identification). We found that prior linguistic experience affected response bias but not sensitivity (Experiment 1). In two additional experiments, we investigated the visual cues that observers use to complete the language-identification task. The results of Experiment 2 indicate that some lexical information is available in the visual signal but that it is limited. Acoustic analyses confirmed that our Spanish and English stimuli differed acoustically with respect to linguistic rhythmic categories. In Experiment 3, we tested whether this rhythmic difference could be used by observers to identify the language when the visual stimuli is temporally reversed, thereby eliminating lexical information but retaining rhythmic differences. The participants performed above chance even in the backward condition, suggesting that the rhythmic differences between the two languages may aid language identification in visual-only speech signals. The results of Experiments 3A and 3B also confirm previous findings that increased stimulus length facilitates language identification. Taken together, the results of these three experiments replicate earlier findings and also show that prior linguistic experience, lexical information, rhythmic structure, and utterance length influence visual-only language identification. PMID:20675804

  17. Adding articulatory features to acoustic features for automatic speech recognition

    SciTech Connect

    Zlokarnik, I.

    1995-05-01

    A hidden-Markov-model (HMM) based speech recognition system was evaluated that makes use of simultaneously recorded acoustic and articulatory data. The articulatory measurements were gathered by means of electromagnetic articulography and describe the movement of small coils fixed to the speakers` tongue and jaw during the production of German V{sub 1}CV{sub 2} sequences [P. Hoole and S. Gfoerer, J. Acoust. Soc. Am. Suppl. 1 {bold 87}, S123 (1990)]. Using the coordinates of the coil positions as an articulatory representation, acoustic and articulatory features were combined to make up an acoustic--articulatory feature vector. The discriminant power of this combined representation was evaluated for two subjects on a speaker-dependent isolated word recognition task. When the articulatory measurements were used both for training and testing the HMMs, the articulatory representation was capable of reducing the error rate of comparable acoustic-based HMMs by a relative percentage of more than 60%. In a separate experiment, the articulatory movements during the testing phase were estimated using a multilayer perceptron that performed an acoustic-to-articulatory mapping. Under these more realistic conditions, when articulatory measurements are only available during the training, the error rate could be reduced by a relative percentage of 18% to 25%.

  18. Effects and modeling of phonetic and acoustic confusions in accented speech

    NASA Astrophysics Data System (ADS)

    Fung, Pascale; Liu, Yi

    2005-11-01

    Accented speech recognition is more challenging than standard speech recognition due to the effects of phonetic and acoustic confusions. Phonetic confusion in accented speech occurs when an expected phone is pronounced as a different one, which leads to erroneous recognition. Acoustic confusion occurs when the pronounced phone is found to lie acoustically between two baseform models and can be equally recognized as either one. We propose that it is necessary to analyze and model these confusions separately in order to improve accented speech recognition without degrading standard speech recognition. Since low phonetic confusion units in accented speech do not give rise to automatic speech recognition errors, we focus on analyzing and reducing phonetic and acoustic confusability under high phonetic confusion conditions. We propose using likelihood ratio test to measure phonetic confusion, and asymmetric acoustic distance to measure acoustic confusion. Only accent-specific phonetic units with low acoustic confusion are used in an augmented pronunciation dictionary, while phonetic units with high acoustic confusion are reconstructed using decision tree merging. Experimental results show that our approach is effective and superior to methods modeling phonetic confusion or acoustic confusion alone in accented speech, with a significant 5.7% absolute WER reduction, without degrading standard speech recognition.

  19. A 94-GHz millimeter-wave sensor for speech signal acquisition.

    PubMed

    Li, Sheng; Tian, Ying; Lu, Guohua; Zhang, Yang; Lv, Hao; Yu, Xiao; Xue, Huijun; Zhang, Hua; Wang, Jianqi; Jing, Xijing

    2013-10-24

    High frequency millimeter-wave (MMW) radar-like sensors enable the detection of speech signals. This novel non-acoustic speech detection method has some special advantages not offered by traditional microphones, such as preventing strong-acoustic interference, high directional sensitivity with penetration, and long detection distance. A 94-GHz MMW radar sensor was employed in this study to test its speech acquisition ability. A 34-GHz zero intermediate frequency radar, a 34-GHz superheterodyne radar, and a microphone were also used for comparison purposes. A short-time phase-spectrum-compensation algorithm was used to enhance the detected speech. The results reveal that the 94-GHz radar sensor showed the highest sensitivity and obtained the highest speech quality subjective measurement score. This result suggests that the MMW radar sensor has better performance than a traditional microphone in terms of speech detection for detection distances longer than 1 m. As a substitute for the traditional speech acquisition method, this novel speech acquisition method demonstrates a large potential for many speech related applications.

  20. A 94-GHz Millimeter-Wave Sensor for Speech Signal Acquisition

    PubMed Central

    Li, Sheng; Tian, Ying; Lu, Guohua; Zhang, Yang; Lv, Hao; Yu, Xiao; Xue, Huijun; Zhang, Hua; Wang, Jianqi; Jing, Xijing

    2013-01-01

    High frequency millimeter-wave (MMW) radar-like sensors enable the detection of speech signals. This novel non-acoustic speech detection method has some special advantages not offered by traditional microphones, such as preventing strong-acoustic interference, high directional sensitivity with penetration, and long detection distance. A 94-GHz MMW radar sensor was employed in this study to test its speech acquisition ability. A 34-GHz zero intermediate frequency radar, a 34-GHz superheterodyne radar, and a microphone were also used for comparison purposes. A short-time phase-spectrum-compensation algorithm was used to enhance the detected speech. The results reveal that the 94-GHz radar sensor showed the highest sensitivity and obtained the highest speech quality subjective measurement score. This result suggests that the MMW radar sensor has better performance than a traditional microphone in terms of speech detection for detection distances longer than 1 m. As a substitute for the traditional speech acquisition method, this novel speech acquisition method demonstrates a large potential for many speech related applications. PMID:24284764

  1. Speech Recognition and Acoustic Features in Combined Electric and Acoustic Stimulation

    ERIC Educational Resources Information Center

    Yoon, Yang-soo; Li, Yongxin; Fu, Qian-Jie

    2012-01-01

    Purpose: In this study, the authors aimed to identify speech information processed by a hearing aid (HA) that is additive to information processed by a cochlear implant (CI) as a function of signal-to-noise ratio (SNR). Method: Speech recognition was measured with CI alone, HA alone, and CI + HA. Ten participants were separated into 2 groups; good…

  2. Massively parallel network architectures for automatic recognition of visual speech signals. Final technical report

    SciTech Connect

    Sejnowski, T.J.; Goldstein, M.

    1990-01-01

    This research sought to produce a massively-parallel network architecture that could interpret speech signals from video recordings of human talkers. This report summarizes the project's results: (1) A corpus of video recordings from two human speakers was analyzed with image processing techniques and used as the data for this study; (2) We demonstrated that a feed forward network could be trained to categorize vowels from these talkers. The performance was comparable to that of the nearest neighbors techniques and to trained humans on the same data; (3) We developed a novel approach to sensory fusion by training a network to transform from facial images to short-time spectral amplitude envelopes. This information can be used to increase the signal-to-noise ratio and hence the performance of acoustic speech recognition systems in noisy environments; (4) We explored the use of recurrent networks to perform the same mapping for continuous speech. Results of this project demonstrate the feasibility of adding a visual speech recognition component to enhance existing speech recognition systems. Such a combined system could be used in noisy environments, such as cockpits, where improved communication is needed. This demonstration of presymbolic fusion of visual and acoustic speech signals is consistent with our current understanding of human speech perception.

  3. Array Signal Processing Under Model Errors With Application to Speech Separation

    DTIC Science & Technology

    1992-10-31

    Acoust. Speech Sig. Proc., pp. 1149-1152, Alburqueque NM, 1990. [37] E. A. Patrick , Fundamentals of Pattern Recognition, Englewood Cliffs, NJ, 1972...Proc., Toronto, pp. 1365-1368, 1991. [411 S. U. Pillai , Array Signal Processing, Springer Verlag, NY, 1989. [42] B. Porat and B. Friedlander

  4. Massively-Parallel Architectures for Automatic Recognition of Visual Speech Signals

    DTIC Science & Technology

    1988-10-12

    sometimes the tongue ), others are not. The visible articulators’ contribution to the acoustic signal result in speech sounds that are much more...parallel formant synthesizer," JASA, vol. 67, pp. 971-995 , March 1980. Kohonen, T., Self-Organization and Associative Memory, Springer-Verlag, Berlin, 1984

  5. Speech Intelligibility with Acoustic and Contact Microphones

    DTIC Science & Technology

    2005-04-01

    each category containing 16 word pairs that differ only in the initial consonant. The six consonant categories are voicing, nasality, sustention ...voiced) are paired with their bilabial stop counterparts. meat (nasal) vs. beat (voiced, bilabial stop) Sustention (Sust) No movement compared...and sustention categories. The current results clearly demonstrate that while the throat microphone enhances the signal-to-noise ratio, the

  6. Quantifying the effect of compression hearing aid release time on speech acoustics and intelligibility.

    PubMed

    Jenstad, Lorienne M; Souza, Pamela E

    2005-06-01

    Compression hearing aids have the inherent, and often adjustable, feature of release time from compression. Research to date does not provide a consensus on how to choose or set release time. The current study had 2 purposes: (a) a comprehensive evaluation of the acoustic effects of release time for a single-channel compression system in quiet and (b) an evaluation of the relation between the acoustic changes and speech recognition. The release times under study were 12, 100, and 800 ms. All of the stimuli were VC syllables from the Nonsense Syllable Task spoken by a female talker. The stimuli were processed through a hearing aid simulator at 3 input levels. Two acoustic measures were made on individual syllables: the envelope-difference index and CV ratio. These measurements allowed for quantification of the short-term amplitude characteristics of the speech signal and the changes to these amplitude characteristics caused by compression. The acoustic analyses revealed statistically significant effects among the 3 release times. The size of the effect was dependent on characteristics of the phoneme. Twelve listeners with moderate sensorineural hearing loss were tested for their speech recognition for the same stimuli. Although release time for this single-channel, 3:1 compression ratio system did not directly predict overall intelligibility for these nonsense syllables in quiet, the acoustic measurements reflecting the changes due to release time were significant predictors of phoneme recognition. Increased temporal-envelope distortion was predictive of reduced recognition for some individual phonemes, which is consistent with previous research on the importance of relative amplitude as a cue to syllable recognition for some phonemes.

  7. Suppressed Alpha Oscillations Predict Intelligibility of Speech and its Acoustic Details

    PubMed Central

    Weisz, Nathan

    2012-01-01

    Modulations of human alpha oscillations (8–13 Hz) accompany many cognitive processes, but their functional role in auditory perception has proven elusive: Do oscillatory dynamics of alpha reflect acoustic details of the speech signal and are they indicative of comprehension success? Acoustically presented words were degraded in acoustic envelope and spectrum in an orthogonal design, and electroencephalogram responses in the frequency domain were analyzed in 24 participants, who rated word comprehensibility after each trial. First, the alpha power suppression during and after a degraded word depended monotonically on spectral and, to a lesser extent, envelope detail. The magnitude of this alpha suppression exhibited an additional and independent influence on later comprehension ratings. Second, source localization of alpha suppression yielded superior parietal, prefrontal, as well as anterior temporal brain areas. Third, multivariate classification of the time–frequency pattern across participants showed that patterns of late posterior alpha power allowed best for above-chance classification of word intelligibility. Results suggest that both magnitude and topography of late alpha suppression in response to single words can indicate a listener's sensitivity to acoustic features and the ability to comprehend speech under adverse listening conditions. PMID:22100354

  8. Talker Differences in Clear and Conversational Speech: Acoustic Characteristics of Vowels

    ERIC Educational Resources Information Center

    Ferguson, Sarah Hargus; Kewley-Port, Diane

    2007-01-01

    Purpose: To determine the specific acoustic changes that underlie improved vowel intelligibility in clear speech. Method: Seven acoustic metrics were measured for conversational and clear vowels produced by 12 talkers--6 who previously were found (S. H. Ferguson, 2004) to produce a large clear speech vowel intelligibility effect for listeners with…

  9. Automatic Speech Recognition from Neural Signals: A Focused Review

    PubMed Central

    Herff, Christian; Schultz, Tanja

    2016-01-01

    Speech interfaces have become widely accepted and are nowadays integrated in various real-life applications and devices. They have become a part of our daily life. However, speech interfaces presume the ability to produce intelligible speech, which might be impossible due to either loud environments, bothering bystanders or incapabilities to produce speech (i.e., patients suffering from locked-in syndrome). For these reasons it would be highly desirable to not speak but to simply envision oneself to say words or sentences. Interfaces based on imagined speech would enable fast and natural communication without the need for audible speech and would give a voice to otherwise mute people. This focused review analyzes the potential of different brain imaging techniques to recognize speech from neural signals by applying Automatic Speech Recognition technology. We argue that modalities based on metabolic processes, such as functional Near Infrared Spectroscopy and functional Magnetic Resonance Imaging, are less suited for Automatic Speech Recognition from neural signals due to low temporal resolution but are very useful for the investigation of the underlying neural mechanisms involved in speech processes. In contrast, electrophysiologic activity is fast enough to capture speech processes and is therefor better suited for ASR. Our experimental results indicate the potential of these signals for speech recognition from neural data with a focus on invasively measured brain activity (electrocorticography). As a first example of Automatic Speech Recognition techniques used from neural signals, we discuss the Brain-to-text system. PMID:27729844

  10. Automatic Speech Recognition from Neural Signals: A Focused Review.

    PubMed

    Herff, Christian; Schultz, Tanja

    2016-01-01

    Speech interfaces have become widely accepted and are nowadays integrated in various real-life applications and devices. They have become a part of our daily life. However, speech interfaces presume the ability to produce intelligible speech, which might be impossible due to either loud environments, bothering bystanders or incapabilities to produce speech (i.e., patients suffering from locked-in syndrome). For these reasons it would be highly desirable to not speak but to simply envision oneself to say words or sentences. Interfaces based on imagined speech would enable fast and natural communication without the need for audible speech and would give a voice to otherwise mute people. This focused review analyzes the potential of different brain imaging techniques to recognize speech from neural signals by applying Automatic Speech Recognition technology. We argue that modalities based on metabolic processes, such as functional Near Infrared Spectroscopy and functional Magnetic Resonance Imaging, are less suited for Automatic Speech Recognition from neural signals due to low temporal resolution but are very useful for the investigation of the underlying neural mechanisms involved in speech processes. In contrast, electrophysiologic activity is fast enough to capture speech processes and is therefor better suited for ASR. Our experimental results indicate the potential of these signals for speech recognition from neural data with a focus on invasively measured brain activity (electrocorticography). As a first example of Automatic Speech Recognition techniques used from neural signals, we discuss the Brain-to-text system.

  11. Acoustic Analysis of the Voiced-Voiceless Distinction in Dutch Tracheoesophageal Speech

    ERIC Educational Resources Information Center

    Jongmans, Petra; Wempe, Ton G.; van Tinteren, Harm; Hilgers, Frans J. M.; Pols, Louis C. W.; van As-Brooks, Corina J.

    2010-01-01

    Purpose: Confusions between voiced and voiceless plosives and voiced and voiceless fricatives are common in Dutch tracheoesophageal (TE) speech. This study investigates (a) which acoustic measures are found to convey a correct voicing contrast in TE speech and (b) whether different measures are found in TE speech than in normal laryngeal (NL)…

  12. Perceptual and Acoustic Reliability Estimates for the Speech Disorders Classification System (SDCS)

    ERIC Educational Resources Information Center

    Shriberg, Lawrence D.; Fourakis, Marios; Hall, Sheryl D.; Karlsson, Heather B.; Lohmeier, Heather L.; McSweeny, Jane L.; Potter, Nancy L.; Scheer-Cohen, Alison R.; Strand, Edythe A.; Tilkens, Christie M.; Wilson, David L.

    2010-01-01

    A companion paper describes three extensions to a classification system for paediatric speech sound disorders termed the Speech Disorders Classification System (SDCS). The SDCS uses perceptual and acoustic data reduction methods to obtain information on a speaker's speech, prosody, and voice. The present paper provides reliability estimates for…

  13. Acoustic Predictors of Intelligibility for Segmentally Interrupted Speech: Temporal Envelope, Voicing, and Duration

    ERIC Educational Resources Information Center

    Fogerty, Daniel

    2013-01-01

    Purpose: Temporal interruption limits the perception of speech to isolated temporal glimpses. An analysis was conducted to determine the acoustic parameter that best predicts speech recognition from temporal fragments that preserve different types of speech information--namely, consonants and vowels. Method: Young listeners with normal hearing…

  14. Acoustic-Emergent Phonology in the Amplitude Envelope of Child-Directed Speech

    PubMed Central

    Leong, Victoria; Goswami, Usha

    2015-01-01

    When acquiring language, young children may use acoustic spectro-temporal patterns in speech to derive phonological units in spoken language (e.g., prosodic stress patterns, syllables, phonemes). Children appear to learn acoustic-phonological mappings rapidly, without direct instruction, yet the underlying developmental mechanisms remain unclear. Across different languages, a relationship between amplitude envelope sensitivity and phonological development has been found, suggesting that children may make use of amplitude modulation (AM) patterns within the envelope to develop a phonological system. Here we present the Spectral Amplitude Modulation Phase Hierarchy (S-AMPH) model, a set of algorithms for deriving the dominant AM patterns in child-directed speech (CDS). Using Principal Components Analysis, we show that rhythmic CDS contains an AM hierarchy comprising 3 core modulation timescales. These timescales correspond to key phonological units: prosodic stress (Stress AM, ~2 Hz), syllables (Syllable AM, ~5 Hz) and onset-rime units (Phoneme AM, ~20 Hz). We argue that these AM patterns could in principle be used by naïve listeners to compute acoustic-phonological mappings without lexical knowledge. We then demonstrate that the modulation statistics within this AM hierarchy indeed parse the speech signal into a primitive hierarchically-organised phonological system comprising stress feet (proto-words), syllables and onset-rime units. We apply the S-AMPH model to two other CDS corpora, one spontaneous and one deliberately-timed. The model accurately identified 72–82% (freely-read CDS) and 90–98% (rhythmically-regular CDS) stress patterns, syllables and onset-rime units. This in-principle demonstration that primitive phonology can be extracted from speech AMs is termed Acoustic-Emergent Phonology (AEP) theory. AEP theory provides a set of methods for examining how early phonological development is shaped by the temporal modulation structure of speech across

  15. Method and apparatus for obtaining complete speech signals for speech recognition applications

    NASA Technical Reports Server (NTRS)

    Abrash, Victor (Inventor); Cesari, Federico (Inventor); Franco, Horacio (Inventor); George, Christopher (Inventor); Zheng, Jing (Inventor)

    2009-01-01

    The present invention relates to a method and apparatus for obtaining complete speech signals for speech recognition applications. In one embodiment, the method continuously records an audio stream comprising a sequence of frames to a circular buffer. When a user command to commence or terminate speech recognition is received, the method obtains a number of frames of the audio stream occurring before or after the user command in order to identify an augmented audio signal for speech recognition processing. In further embodiments, the method analyzes the augmented audio signal in order to locate starting and ending speech endpoints that bound at least a portion of speech to be processed for recognition. At least one of the speech endpoints is located using a Hidden Markov Model.

  16. Speech as a breakthrough signaling resource in the cognitive evolution of biological complex adaptive systems.

    PubMed

    Mattei, Tobias A

    2014-12-01

    In self-adapting dynamical systems, a significant improvement in the signaling flow among agents constitutes one of the most powerful triggering events for the emergence of new complex behaviors. Ackermann and colleagues' comprehensive phylogenetic analysis of the brain structures involved in acoustic communication provides further evidence of the essential role which speech, as a breakthrough signaling resource, has played in the evolutionary development of human cognition viewed from the standpoint of complex adaptive system analysis.

  17. Acoustic landmarks drive delta-theta oscillations to enable speech comprehension by facilitating perceptual parsing.

    PubMed

    Doelling, Keith B; Arnal, Luc H; Ghitza, Oded; Poeppel, David

    2014-01-15

    A growing body of research suggests that intrinsic neuronal slow (<10 Hz) oscillations in auditory cortex appear to track incoming speech and other spectro-temporally complex auditory signals. Within this framework, several recent studies have identified critical-band temporal envelopes as the specific acoustic feature being reflected by the phase of these oscillations. However, how this alignment between speech acoustics and neural oscillations might underpin intelligibility is unclear. Here we test the hypothesis that the 'sharpness' of temporal fluctuations in the critical band envelope acts as a temporal cue to speech syllabic rate, driving delta-theta rhythms to track the stimulus and facilitate intelligibility. We interpret our findings as evidence that sharp events in the stimulus cause cortical rhythms to re-align and parse the stimulus into syllable-sized chunks for further decoding. Using magnetoencephalographic recordings, we show that by removing temporal fluctuations that occur at the syllabic rate, envelope-tracking activity is reduced. By artificially reinstating these temporal fluctuations, envelope-tracking activity is regained. These changes in tracking correlate with intelligibility of the stimulus. Together, the results suggest that the sharpness of fluctuations in the stimulus, as reflected in the cochlear output, drive oscillatory activity to track and entrain to the stimulus, at its syllabic rate. This process likely facilitates parsing of the stimulus into meaningful chunks appropriate for subsequent decoding, enhancing perception and intelligibility.

  18. Location and acoustic scale cues in concurrent speech recognition1

    PubMed Central

    Ives, D. Timothy; Vestergaard, Martin D.; Kistler, Doris J.; Patterson, Roy D.

    2010-01-01

    Location and acoustic scale cues have both been shown to have an effect on the recognition of speech in multi-speaker environments. This study examines the interaction of these variables. Subjects were presented with concurrent triplets of syllables from a target voice and a distracting voice, and asked to recognize a specific target syllable. The task was made more or less difficult by changing (a) the location of the distracting speaker, (b) the scale difference between the two speakers, and∕or (c) the relative level of the two speakers. Scale differences were produced by changing the vocal tract length and glottal pulse rate during syllable synthesis: 32 acoustic scale differences were used. Location cues were produced by convolving head-related transfer functions with the stimulus. The angle between the target speaker and the distracter was 0°, 4°, 8°, 16°, or 32° on the 0° horizontal plane. The relative level of the target to the distracter was 0 or −6 dB. The results show that location and scale difference interact, and the interaction is greatest when one of these cues is small. Increasing either the acoustic scale or the angle between target and distracter speakers quickly elevates performance to ceiling levels. PMID:20550271

  19. Mandarin Speech Perception in Combined Electric and Acoustic Stimulation

    PubMed Central

    Li, Yongxin; Zhang, Guoping; Galvin, John J.; Fu, Qian-Jie

    2014-01-01

    For deaf individuals with residual low-frequency acoustic hearing, combined use of a cochlear implant (CI) and hearing aid (HA) typically provides better speech understanding than with either device alone. Because of coarse spectral resolution, CIs do not provide fundamental frequency (F0) information that contributes to understanding of tonal languages such as Mandarin Chinese. The HA can provide good representation of F0 and, depending on the range of aided acoustic hearing, first and second formant (F1 and F2) information. In this study, Mandarin tone, vowel, and consonant recognition in quiet and noise was measured in 12 adult Mandarin-speaking bimodal listeners with the CI-only and with the CI+HA. Tone recognition was significantly better with the CI+HA in noise, but not in quiet. Vowel recognition was significantly better with the CI+HA in quiet, but not in noise. There was no significant difference in consonant recognition between the CI-only and the CI+HA in quiet or in noise. There was a wide range in bimodal benefit, with improvements often greater than 20 percentage points in some tests and conditions. The bimodal benefit was compared to CI subjects’ HA-aided pure-tone average (PTA) thresholds between 250 and 2000 Hz; subjects were divided into two groups: “better” PTA (<50 dB HL) or “poorer” PTA (>50 dB HL). The bimodal benefit differed significantly between groups only for consonant recognition. The bimodal benefit for tone recognition in quiet was significantly correlated with CI experience, suggesting that bimodal CI users learn to better combine low-frequency spectro-temporal information from acoustic hearing with temporal envelope information from electric hearing. Given the small number of subjects in this study (n = 12), further research with Chinese bimodal listeners may provide more information regarding the contribution of acoustic and electric hearing to tonal language perception. PMID:25386962

  20. Acoustical Characteristics of Mastication Sounds: Application of Speech Analysis Techniques

    NASA Astrophysics Data System (ADS)

    Brochetti, Denise

    Food scientists have used acoustical methods to study characteristics of mastication sounds in relation to food texture. However, a model for analysis of the sounds has not been identified, and reliability of the methods has not been reported. Therefore, speech analysis techniques were applied to mastication sounds, and variation in measures of the sounds was examined. To meet these objectives, two experiments were conducted. In the first experiment, a digital sound spectrograph generated waveforms and wideband spectrograms of sounds by 3 adult subjects (1 male, 2 females) for initial chews of food samples differing in hardness and fracturability. Acoustical characteristics were described and compared. For all sounds, formants appeared in the spectrograms, and energy occurred across a 0 to 8000-Hz range of frequencies. Bursts characterized waveforms for peanut, almond, raw carrot, ginger snap, and hard candy. Duration and amplitude of the sounds varied with the subjects. In the second experiment, the spectrograph was used to measure the duration, amplitude, and formants of sounds for the initial 2 chews of cylindrical food samples (raw carrot, teething toast) differing in diameter (1.27, 1.90, 2.54 cm). Six adult subjects (3 males, 3 females) having normal occlusions and temporomandibular joints chewed the samples between the molar teeth and with the mouth open. Ten repetitions per subject were examined for each food sample. Analysis of estimates of variation indicated an inconsistent intrasubject variation in the acoustical measures. Food type and sample diameter also affected the estimates, indicating the variable nature of mastication. Generally, intrasubject variation was greater than intersubject variation. Analysis of ranks of the data indicated that the effect of sample diameter on the acoustical measures was inconsistent and depended on the subject and type of food. If inferences are to be made concerning food texture from acoustical measures of mastication

  1. Automatic speech recognition using articulatory features from subject-independent acoustic-to-articulatory inversion.

    PubMed

    Ghosh, Prasanta Kumar; Narayanan, Shrikanth

    2011-10-01

    An automatic speech recognition approach is presented which uses articulatory features estimated by a subject-independent acoustic-to-articulatory inversion. The inversion allows estimation of articulatory features from any talker's speech acoustics using only an exemplary subject's articulatory-to-acoustic map. Results are reported on a broad class phonetic classification experiment on speech from English talkers using data from three distinct English talkers as exemplars for inversion. Results indicate that the inclusion of the articulatory information improves classification accuracy but the improvement is more significant when the speaking style of the exemplar and the talker are matched compared to when they are mismatched.

  2. Discrimination of speech stimuli based on neuronal response phase patterns depends on acoustics but not comprehension.

    PubMed

    Howard, Mary F; Poeppel, David

    2010-11-01

    Speech stimuli give rise to neural activity in the listener that can be observed as waveforms using magnetoencephalography. Although waveforms vary greatly from trial to trial due to activity unrelated to the stimulus, it has been demonstrated that spoken sentences can be discriminated based on theta-band (3-7 Hz) phase patterns in single-trial response waveforms. Furthermore, manipulations of the speech signal envelope and fine structure that reduced intelligibility were found to produce correlated reductions in discrimination performance, suggesting a relationship between theta-band phase patterns and speech comprehension. This study investigates the nature of this relationship, hypothesizing that theta-band phase patterns primarily reflect cortical processing of low-frequency (<40 Hz) modulations present in the acoustic signal and required for intelligibility, rather than processing exclusively related to comprehension (e.g., lexical, syntactic, semantic). Using stimuli that are quite similar to normal spoken sentences in terms of low-frequency modulation characteristics but are unintelligible (i.e., their time-inverted counterparts), we find that discrimination performance based on theta-band phase patterns is equal for both types of stimuli. Consistent with earlier findings, we also observe that whereas theta-band phase patterns differ across stimuli, power patterns do not. We use a simulation model of the single-trial response to spoken sentence stimuli to demonstrate that phase-locked responses to low-frequency modulations of the acoustic signal can account not only for the phase but also for the power results. The simulation offers insight into the interpretation of the empirical results with respect to phase-resetting and power-enhancement models of the evoked response.

  3. Discrimination of Speech Stimuli Based on Neuronal Response Phase Patterns Depends on Acoustics But Not Comprehension

    PubMed Central

    Poeppel, David

    2010-01-01

    Speech stimuli give rise to neural activity in the listener that can be observed as waveforms using magnetoencephalography. Although waveforms vary greatly from trial to trial due to activity unrelated to the stimulus, it has been demonstrated that spoken sentences can be discriminated based on theta-band (3–7 Hz) phase patterns in single-trial response waveforms. Furthermore, manipulations of the speech signal envelope and fine structure that reduced intelligibility were found to produce correlated reductions in discrimination performance, suggesting a relationship between theta-band phase patterns and speech comprehension. This study investigates the nature of this relationship, hypothesizing that theta-band phase patterns primarily reflect cortical processing of low-frequency (<40 Hz) modulations present in the acoustic signal and required for intelligibility, rather than processing exclusively related to comprehension (e.g., lexical, syntactic, semantic). Using stimuli that are quite similar to normal spoken sentences in terms of low-frequency modulation characteristics but are unintelligible (i.e., their time-inverted counterparts), we find that discrimination performance based on theta-band phase patterns is equal for both types of stimuli. Consistent with earlier findings, we also observe that whereas theta-band phase patterns differ across stimuli, power patterns do not. We use a simulation model of the single-trial response to spoken sentence stimuli to demonstrate that phase-locked responses to low-frequency modulations of the acoustic signal can account not only for the phase but also for the power results. The simulation offers insight into the interpretation of the empirical results with respect to phase-resetting and power-enhancement models of the evoked response. PMID:20484530

  4. Negative blood oxygen level dependent signals during speech comprehension.

    PubMed

    Rodriguez Moreno, Diana; Schiff, Nicholas D; Hirsch, Joy

    2015-05-01

    Speech comprehension studies have generally focused on the isolation and function of regions with positive blood oxygen level dependent (BOLD) signals with respect to a resting baseline. Although regions with negative BOLD signals in comparison to a resting baseline have been reported in language-related tasks, their relationship to regions of positive signals is not fully appreciated. Based on the emerging notion that the negative signals may represent an active function in language tasks, the authors test the hypothesis that negative BOLD signals during receptive language are more associated with comprehension than content-free versions of the same stimuli. Regions associated with comprehension of speech were isolated by comparing responses to passive listening to natural speech to two incomprehensible versions of the same speech: one that was digitally time reversed and one that was muffled by removal of high frequencies. The signal polarity was determined by comparing the BOLD signal during each speech condition to the BOLD signal during a resting baseline. As expected, stimulation-induced positive signals relative to resting baseline were observed in the canonical language areas with varying signal amplitudes for each condition. Negative BOLD responses relative to resting baseline were observed primarily in frontoparietal regions and were specific to the natural speech condition. However, the BOLD signal remained indistinguishable from baseline for the unintelligible speech conditions. Variations in connectivity between brain regions with positive and negative signals were also specifically related to the comprehension of natural speech. These observations of anticorrelated signals related to speech comprehension are consistent with emerging models of cooperative roles represented by BOLD signals of opposite polarity.

  5. Frequency Spreading in Underwater Acoustic Signal Transmission.

    DTIC Science & Technology

    1980-04-15

    acoustic signal transmitted and received underwater J-2 J.2 Signal spectrum computing block diagram. J-3 Chapter I. Frequency spreading 1.0 Introduction... transmitted frequency can be expected in the received signal [1] - [18]. This frequency spreading behavior is the result of the amplitude and phase...result of phase modulation of the transmitted sinusoid by the moving surface, and the separation between the spectral lines at the receiving point is

  6. Speech masking and cancelling and voice obscuration

    DOEpatents

    Holzrichter, John F.

    2013-09-10

    A non-acoustic sensor is used to measure a user's speech and then broadcasts an obscuring acoustic signal diminishing the user's vocal acoustic output intensity and/or distorting the voice sounds making them unintelligible to persons nearby. The non-acoustic sensor is positioned proximate or contacting a user's neck or head skin tissue for sensing speech production information.

  7. Speech Recognition and Acoustic Features in Combined Electric and Acoustic Stimulation

    PubMed Central

    Yoon, Yang-soo; Li, Yongxin; Fu, Qian-Jie

    2011-01-01

    Purpose This study aimed to identify speech information processed by a hearing aid (HA) that is additive to information processed by a cochlear implant (CI) as a function of signal-to-noise ratio (SNR). Method Speech recognition was measured with CI alone, HA alone, and CI+HA. Ten participants were separated into two groups; good (aided pure-tone average (PTA) < 55 dB) and poor (aided PTA ≥ 55 dB) at audiometric frequencies ≤ 1 kHz in HA. Results Results showed that the good aided PTA group derived a clear bimodal benefit (performance difference between CI+HA and CI alone) for vowel and sentence recognition in noise while the poor aided PTA group received little benefit across speech tests and SNRs. Results also showed that a better aided PTA helped in processing cues embedded in both low and high frequencies; none of these cues were significantly perceived by the poor aided PTA group. Conclusions The aided PTA is an important indicator for bimodal advantage in speech perception. The lack of bimodal benefits in the poor group may be attributed to the non-optimal HA fitting. Bimodal listening provides a synergistic effect for cues in both low and high frequency components in speech. PMID:22199183

  8. Effect of Reflective Practice on Student Recall of Acoustics for Speech Science

    ERIC Educational Resources Information Center

    Walden, Patrick R.; Bell-Berti, Fredericka

    2013-01-01

    Researchers have developed models of learning through experience; however, these models are rarely named as a conceptual frame for educational research in the sciences. This study examined the effect of reflective learning responses on student recall of speech acoustics concepts. Two groups of undergraduate students enrolled in a speech science…

  9. Formant Centralization Ratio: A Proposal for a New Acoustic Measure of Dysarthric Speech

    ERIC Educational Resources Information Center

    Sapir, Shimon; Ramig, Lorraine O.; Spielman, Jennifer L.; Fox, Cynthia

    2010-01-01

    Purpose: The vowel space area (VSA) has been used as an acoustic metric of dysarthric speech, but with varying degrees of success. In this study, the authors aimed to test an alternative metric to the VSA--the "formant centralization ratio" (FCR), which is hypothesized to more effectively differentiate dysarthric from healthy speech and register…

  10. Intelligibility of normal speech I: Global and fine-grained acoustic-phonetic talker characteristics1,2

    PubMed Central

    Bradlow, Ann R.; Torretta, Gina M.; Pisoni, David B.

    2011-01-01

    This study used a multi-talker database containing intelligibility scores for 2000 sentences (20 talkers, 100 sentences), to identify talker-related correlates of speech intelligibility. We first investigated “global” talker characteristics (e.g., gender, F0 and speaking rate). Findings showed female talkers to be more intelligible as a group than male talkers. Additionally, we found a tendency for F0 range to correlate positively with higher speech intelligibility scores. However, F0 mean and speaking rate did not correlate with intelligibility. We then examined several fine-grained acoustic-phonetic talker-characteristics as correlates of overall intelligibility. We found that talkers with larger vowel spaces were generally more intelligible than talkers with reduced spaces. In investigating two cases of consistent listener errors (segment deletion and syllable affiliation), we found that these perceptual errors could be traced directly to detailed timing characteristics in the speech signal. Results suggest that a substantial portion of variability in normal speech intelligibility is traceable to specific acoustic-phonetic characteristics of the talker. Knowledge about these factors may be valuable for improving speech synthesis and recognition strategies, and for special populations (e.g., the hearing-impaired and second-language learners) who are particularly sensitive to intelligibility differences among talkers. PMID:21461127

  11. Multilevel Analysis in Analyzing Speech Data

    ERIC Educational Resources Information Center

    Guddattu, Vasudeva; Krishna, Y.

    2011-01-01

    The speech produced by human vocal tract is a complex acoustic signal, with diverse applications in phonetics, speech synthesis, automatic speech recognition, speaker identification, communication aids, speech pathology, speech perception, machine translation, hearing research, rehabilitation and assessment of communication disorders and many…

  12. Speech perception of sine-wave signals by children with cochlear implants

    PubMed Central

    Nittrouer, Susan; Kuess, Jamie; Lowenstein, Joanna H.

    2015-01-01

    Children need to discover linguistically meaningful structures in the acoustic speech signal. Being attentive to recurring, time-varying formant patterns helps in that process. However, that kind of acoustic structure may not be available to children with cochlear implants (CIs), thus hindering development. The major goal of this study was to examine whether children with CIs are as sensitive to time-varying formant structure as children with normal hearing (NH) by asking them to recognize sine-wave speech. The same materials were presented as speech in noise, as well, to evaluate whether any group differences might simply reflect general perceptual deficits on the part of children with CIs. Vocabulary knowledge, phonemic awareness, and “top-down” language effects were all also assessed. Finally, treatment factors were examined as possible predictors of outcomes. Results showed that children with CIs were as accurate as children with NH at recognizing sine-wave speech, but poorer at recognizing speech in noise. Phonemic awareness was related to that recognition. Top-down effects were similar across groups. Having had a period of bimodal stimulation near the time of receiving a first CI facilitated these effects. Results suggest that children with CIs have access to the important time-varying structure of vocal-tract formants. PMID:25994709

  13. Speech production knowledge in automatic speech recognition.

    PubMed

    King, Simon; Frankel, Joe; Livescu, Karen; McDermott, Erik; Richmond, Korin; Wester, Mirjam

    2007-02-01

    Although much is known about how speech is produced, and research into speech production has resulted in measured articulatory data, feature systems of different kinds, and numerous models, speech production knowledge is almost totally ignored in current mainstream approaches to automatic speech recognition. Representations of speech production allow simple explanations for many phenomena observed in speech which cannot be easily analyzed from either acoustic signal or phonetic transcription alone. In this article, a survey of a growing body of work in which such representations are used to improve automatic speech recognition is provided.

  14. The Use of Artificial Neural Networks to Estimate Speech Intelligibility from Acoustic Variables: A Preliminary Analysis.

    ERIC Educational Resources Information Center

    Metz, Dale Evan; And Others

    1992-01-01

    A preliminary scheme for estimating the speech intelligibility of hearing-impaired speakers from acoustic parameters, using a computerized artificial neural network to process mathematically the acoustic input variables, is outlined. Tests with 60 hearing-impaired speakers found the scheme to be highly accurate in identifying speakers separated by…

  15. The effects of noise on speech and warning signals

    NASA Astrophysics Data System (ADS)

    Suter, Alice H.

    1989-06-01

    To assess the effects of noise on speech communication it is necessary to examine certain characteristics of the speech signal. Speech level can be measured by a variety of methods, none of which has yet been standardized, and it should be kept in mind that vocal effort increases with background noise level and with different types of activity. Noise and filtering commonly degrade the speech signal, especially as it is transmitted through communications systems. Intelligibility is also adversely affected by distance, reverberation, and monaural listening. Communication systems currently in use may cause strain and delays on the part of the listener, but there are many possibilities for improvement. Individuals who need to communicate in noise may be subject to voice disorders. Shouted speech becomes progressively less intelligible at high voice levels, but improvements can be realized when talkers use clear speech. Tolerable listening levels are lower for negative than for positive S/Ns, and comfortable listening levels should be at a S/N of at least 5 dB, and preferably above 10 dB. Popular methods to predict speech intelligibility in noise include the Articulation Index, Speech Interference Level, Speech Transmission Index, and the sound level meter's A-weighting network. This report describes these methods, discussing certain advantages and disadvantages of each, and shows their interrelations.

  16. A maximum likelihood approach to estimating articulator positions from speech acoustics

    SciTech Connect

    Hogden, J.

    1996-09-23

    This proposal presents an algorithm called maximum likelihood continuity mapping (MALCOM) which recovers the positions of the tongue, jaw, lips, and other speech articulators from measurements of the sound-pressure waveform of speech. MALCOM differs from other techniques for recovering articulator positions from speech in three critical respects: it does not require training on measured or modeled articulator positions, it does not rely on any particular model of sound propagation through the vocal tract, and it recovers a mapping from acoustics to articulator positions that is linearly, not topographically, related to the actual mapping from acoustics to articulation. The approach categorizes short-time windows of speech into a finite number of sound types, and assumes the probability of using any articulator position to produce a given sound type can be described by a parameterized probability density function. MALCOM then uses maximum likelihood estimation techniques to: (1) find the most likely smooth articulator path given a speech sample and a set of distribution functions (one distribution function for each sound type), and (2) change the parameters of the distribution functions to better account for the data. Using this technique improves the accuracy of articulator position estimates compared to continuity mapping -- the only other technique that learns the relationship between acoustics and articulation solely from acoustics. The technique has potential application to computer speech recognition, speech synthesis and coding, teaching the hearing impaired to speak, improving foreign language instruction, and teaching dyslexics to read. 34 refs., 7 figs.

  17. Hot topics: Signal processing in acoustics

    NASA Astrophysics Data System (ADS)

    Candy, James

    2002-05-01

    Signal processing represents a technology that provides the mechanism to extract the desired information from noisy acoustical measurement data. The desired result can range from extracting a single number like sound intensity level in the case of marine mammals to the seemingly impossible task of imaging the complex bottom in a hostile ocean environment. Some of the latest approaches to solving acoustical processing problems including sophisticated Bayesian processors in architectural acoustics, iterative flaw removal processing for non-destructive evaluation, time-reversal imaging for buried objects and time-reversal receivers in communications as well as some of the exciting breakthroughs using so-called blind processing techniques for deconvolution are discussed. Processors discussed range from the simple to the sophisticated as dictated by the particular application. It is shown how processing techniques are crucial to extracting the required information for success in the underlying application.

  18. Language-specific developmental differences in speech production: A cross-language acoustic study

    PubMed Central

    Li, Fangfang

    2013-01-01

    Speech productions of 40 English- and 40 Japanese-speaking children (aged 2 to 5) were examined and compared with the speech produced by 20 adult speakers (10 speakers per language). Participants were recorded while repeating words that began with “s” and “sh” sounds. Clear language-specific patterns in adults’ speech were found, with English speakers differentiating “s” and “sh” in one acoustic dimension (i.e., spectral mean) and Japanese speakers differentiating the two categories in three acoustic dimensions (i.e., spectral mean, standard deviation, and onset F2 frequency). For both language groups, children’s speech exhibited a gradual change from an early undifferentiated form to later differentiated categories. The separation processes, however, only occur in those acoustic dimensions used by adults in the corresponding languages. PMID:22540834

  19. Intelligibility and acoustic characteristics of clear and conversational speech in telugu (a South Indian dravidian language).

    PubMed

    Durisala, Naresh; Prakash, S G R; Nambi, Arivudai; Batra, Ridhima

    2011-04-01

    The overall goal of this study is to examine the intelligibility differences of clear and conversational speech and also to objectively analyze the acoustic properties contributing to these differences. Seventeen post-lingual stable sensory-neural hearing impaired listeners with an age range of 17-40 years were recruited for the study. Forty Telugu sentences spoken by a female Telugu speaker in both clear and conversational speech styles were used as stimuli for the subjects. Results revealed that mean scores of clear speech were higher (mean = 84.5) when compared to conversational speech (mean = 61.4) with an advantage of 23.1% points. Acoustic properties revealed greater fundamental frequency (f0) and intensity, longer duration, higher consonant-vowel ratio (CVR) and greater temporal energy in clear speech.

  20. Random Deep Belief Networks for Recognizing Emotions from Speech Signals

    PubMed Central

    Li, Huihui; Huang, Jubing; Li, Danyang; Xun, Eryang

    2017-01-01

    Now the human emotions can be recognized from speech signals using machine learning methods; however, they are challenged by the lower recognition accuracies in real applications due to lack of the rich representation ability. Deep belief networks (DBN) can automatically discover the multiple levels of representations in speech signals. To make full of its advantages, this paper presents an ensemble of random deep belief networks (RDBN) method for speech emotion recognition. It firstly extracts the low level features of the input speech signal and then applies them to construct lots of random subspaces. Each random subspace is then provided for DBN to yield the higher level features as the input of the classifier to output an emotion label. All outputted emotion labels are then fused through the majority voting to decide the final emotion label for the input speech signal. The conducted experimental results on benchmark speech emotion databases show that RDBN has better accuracy than the compared methods for speech emotion recognition. PMID:28356908

  1. A Hybrid Acoustic and Pronunciation Model Adaptation Approach for Non-native Speech Recognition

    NASA Astrophysics Data System (ADS)

    Oh, Yoo Rhee; Kim, Hong Kook

    In this paper, we propose a hybrid model adaptation approach in which pronunciation and acoustic models are adapted by incorporating the pronunciation and acoustic variabilities of non-native speech in order to improve the performance of non-native automatic speech recognition (ASR). Specifically, the proposed hybrid model adaptation can be performed at either the state-tying or triphone-modeling level, depending at which acoustic model adaptation is performed. In both methods, we first analyze the pronunciation variant rules of non-native speakers and then classify each rule as either a pronunciation variant or an acoustic variant. The state-tying level hybrid method then adapts pronunciation models and acoustic models by accommodating the pronunciation variants in the pronunciation dictionary and by clustering the states of triphone acoustic models using the acoustic variants, respectively. On the other hand, the triphone-modeling level hybrid method initially adapts pronunciation models in the same way as in the state-tying level hybrid method; however, for the acoustic model adaptation, the triphone acoustic models are then re-estimated based on the adapted pronunciation models and the states of the re-estimated triphone acoustic models are clustered using the acoustic variants. From the Korean-spoken English speech recognition experiments, it is shown that ASR systems employing the state-tying and triphone-modeling level adaptation methods can relatively reduce the average word error rates (WERs) by 17.1% and 22.1% for non-native speech, respectively, when compared to a baseline ASR system.

  2. Acoustic signal processing toolbox for array processing

    NASA Astrophysics Data System (ADS)

    Pham, Tien; Whipps, Gene T.

    2003-08-01

    The US Army Research Laboratory (ARL) has developed an acoustic signal processing toolbox (ASPT) for acoustic sensor array processing. The intent of this document is to describe the toolbox and its uses. The ASPT is a GUI-based software that is developed and runs under MATLAB. The current version, ASPT 3.0, requires MATLAB 6.0 and above. ASPT contains a variety of narrowband (NB) and incoherent and coherent wideband (WB) direction-of-arrival (DOA) estimation and beamforming algorithms that have been researched and developed at ARL. Currently, ASPT contains 16 DOA and beamforming algorithms. It contains several different NB and WB versions of the MVDR, MUSIC and ESPRIT algorithms. In addition, there are a variety of pre-processing, simulation and analysis tools available in the toolbox. The user can perform simulation or real data analysis for all algorithms with user-defined signal model parameters and array geometries.

  3. Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing.

    PubMed

    Jørgensen, Søren; Dau, Torsten

    2011-09-01

    A model for predicting the intelligibility of processed noisy speech is proposed. The speech-based envelope power spectrum model has a similar structure as the model of Ewert and Dau [(2000). J. Acoust. Soc. Am. 108, 1181-1196], developed to account for modulation detection and masking data. The model estimates the speech-to-noise envelope power ratio, SNR(env), at the output of a modulation filterbank and relates this metric to speech intelligibility using the concept of an ideal observer. Predictions were compared to data on the intelligibility of speech presented in stationary speech-shaped noise. The model was further tested in conditions with noisy speech subjected to reverberation and spectral subtraction. Good agreement between predictions and data was found in all cases. For spectral subtraction, an analysis of the model's internal representation of the stimuli revealed that the predicted decrease of intelligibility was caused by the estimated noise envelope power exceeding that of the speech. The classical concept of the speech transmission index fails in this condition. The results strongly suggest that the signal-to-noise ratio at the output of a modulation frequency selective process provides a key measure of speech intelligibility.

  4. The acoustics for speech of eight auditoriums in the city of Sao Paulo

    NASA Astrophysics Data System (ADS)

    Bistafa, Sylvio R.

    2002-11-01

    Eight auditoriums with a proscenium type of stage, which usually operate as dramatic theaters in the city of Sao Paulo, were acoustically surveyed in terms of their adequacy to unassisted speech. Reverberation times, early decay times, and speech levels were measured in different positions, together with objective measures of speech intelligibility. The measurements revealed reverberation time values rather uniform throughout the rooms, whereas significant variations were found in the values of the other acoustical measures with position. The early decay time was found to be better correlated with the objective measures of speech intelligibility than the reverberation time. The results from the objective measurements of speech intelligibility revealed that the speech transmission index STI, and its simplified version RaSTI, are strongly correlated with the early-to-late sound ratio C50 (1 kHz). However, it was found that the criterion value of acceptability of the latter is more easily met than the former. The results from these measurements enable to understand how the characteristics of the architectural design determine the acoustical quality for speech. Measurements of ST1-Gade were made as an attempt to validate it as an objective measure of ''support'' for the actor. The preliminary diagnosing results with ray tracing simulations will also be presented.

  5. A magnetic resonance imaging study on the articulatory and acoustic speech parameters of Malay vowels.

    PubMed

    Zourmand, Alireza; Mirhassani, Seyed Mostafa; Ting, Hua-Nong; Bux, Shaik Ismail; Ng, Kwan Hoong; Bilgen, Mehmet; Jalaludin, Mohd Amin

    2014-07-25

    The phonetic properties of six Malay vowels are investigated using magnetic resonance imaging (MRI) to visualize the vocal tract in order to obtain dynamic articulatory parameters during speech production. To resolve image blurring due to the tongue movement during the scanning process, a method based on active contour extraction is used to track tongue contours. The proposed method efficiently tracks tongue contours despite the partial blurring of MRI images. Consequently, the articulatory parameters that are effectively measured as tongue movement is observed, and the specific shape of the tongue and its position for all six uttered Malay vowels are determined.Speech rehabilitation procedure demands some kind of visual perceivable prototype of speech articulation. To investigate the validity of the measured articulatory parameters based on acoustic theory of speech production, an acoustic analysis based on the uttered vowels by subjects has been performed. As the acoustic speech and articulatory parameters of uttered speech were examined, a correlation between formant frequencies and articulatory parameters was observed. The experiments reported a positive correlation between the constriction location of the tongue body and the first formant frequency, as well as a negative correlation between the constriction location of the tongue tip and the second formant frequency. The results demonstrate that the proposed method is an effective tool for the dynamic study of speech production.

  6. Nonlinear dynamics approach to speech detection in noisy signals

    NASA Astrophysics Data System (ADS)

    Bronakowski, Lukasz J.

    2009-06-01

    The presented paper describes a novel approach to detection of speech corrupted by noise. The proposed procedure is based on fractal dimension, which is being evaluated directly from speech signal samples using two different methods: box-counting and the approach proposed by Katz. The recordings, taken from TIMIT database, were corrupted by five different types of noise (white, pink, hf-channel, babble and factory) with four noise amplitudes (5,10,15,20 dB). The resulting noisy speech was the subject of the analysis. The Otsu's method was used to determine a threshold value for differentiating between noise-only and noisy-speech segments. It has been shown that fractal dimension-based approach provides good basis for detecting speech under a presence of noise.

  7. Acoustic signal propagation characterization of conduit networks

    NASA Astrophysics Data System (ADS)

    Khan, Muhammad Safeer

    Analysis of acoustic signal propagation in conduit networks has been an important area of research in acoustics. One major aspect of analyzing conduit networks as acoustic channels is that a propagating signal suffers frequency dependent attenuation due to thermo-viscous boundary layer effects and the presence of impedance mismatches such as side branches. The signal attenuation due to side branches is strongly influenced by their numbers and dimensions such as diameter and length. Newly developed applications for condition based monitoring of underground conduit networks involve measurement of acoustic signal attenuation through tests in the field. In many cases the exact installation layout of the field measurement location may not be accessible or actual installation may differ from the documented layout. The lack of exact knowledge of numbers and lengths of side branches, therefore, introduces uncertainty in the measurements of attenuation and contributes to the random variable error between measured results and those predicted from theoretical models. There are other random processes in and around conduit networks in the field that also affect the propagation of an acoustic signal. These random processes include but are not limited to the presence of strong temperature and humidity gradients within the conduits, blockages of variable sizes and types, effects of aging such as cracks, bends, sags and holes, ambient noise variations and presence of variable layer of water. It is reasonable to consider that the random processes contributing to the error in the measured attenuation are independent and arbitrarily distributed. The error, contributed by a large number of independent sources of arbitrary probability distributions, is best described by an approximately normal probability distribution in accordance with the central limit theorem. Using an analytical approach to model the attenuating effect of each of the random variable sources can be very complex and

  8. Changes in speech production in a child with a cochlear implant: acoustic and kinematic evidence.

    PubMed

    Goffman, Lisa; Ertmer, David J; Erdle, Christa

    2002-10-01

    A method is presented for examining change in motor patterns used to produce linguistic contrasts. In this case study, the method is applied to a child receiving new auditory input following cochlear implantation. This child experienced hearing loss at age 3 years and received a multichannel cochlear implant at age 7 years. Data collection points occurred both pre- and postimplant and included acoustic and kinematic analyses. Overall, this child's speech output was transcribed as accurate across the pre- and postimplant periods. Postimplant, with the onset of new auditory experience, acoustic durations showed a predictable maturational change, usually decreasing in duration. Conversely, the spatiotemporal stability of speech movements initially became more variable postimplantation. The auditory perturbations experienced by this child during development led to changes in the physiological underpinnings of speech production, even when speech output was perceived as accurate.

  9. Correlation of orofacial speeds with voice acoustic measures in the fluent speech of persons who stutter.

    PubMed

    McClean, Michael D; Tasko, Stephen M

    2004-12-01

    Stuttering is often viewed as a problem in coordinating the movements of different muscle systems involved in speech production. From this perspective, it is logical that efforts be made to quantify and compare the strength of neural coupling between muscle systems in persons who stutter (PS) and those who do not stutter (NS). This problem was addressed by correlating the speeds of different orofacial structures with vowel fundamental frequency (F0) and intensity as subjects produced fluent repetitions of a simple nonsense phrase at habitual, high, and low intensity levels. It is assumed that resulting correlations indirectly reflect the strength of neural coupling between particular orofacial structures and the respiratory-laryngeal system. An electromagnetic system was employed to record movements of the upper lip, lower lip, tongue, and jaw in 43 NS and 39 PS. The acoustic speech signal was recorded and used to obtain measures of vowel F0 and intensity. For each subject, correlation measures were obtained relating peak orofacial speeds to F0 and intensity. Correlations were significantly reduced in PS compared to NS for the lower lip and tongue, although the magnitude of these group differences covaried with the correlation levels relating F0 and intensity. It is suggested that the group difference in correlation pattern reflects a reduced strength of neural coupling of the lower lip and tongue systems to the respiratory-laryngeal system in PS. Consideration is given to how this may contribute to temporal discoordination and stuttering.

  10. Improving Robustness of Deep Neural Network Acoustic Models via Speech Separation and Joint Adaptive Training

    PubMed Central

    Narayanan, Arun; Wang, DeLiang

    2015-01-01

    Although deep neural network (DNN) acoustic models are known to be inherently noise robust, especially with matched training and testing data, the use of speech separation as a frontend and for deriving alternative feature representations has been shown to improve performance in challenging environments. We first present a supervised speech separation system that significantly improves automatic speech recognition (ASR) performance in realistic noise conditions. The system performs separation via ratio time-frequency masking; the ideal ratio mask (IRM) is estimated using DNNs. We then propose a framework that unifies separation and acoustic modeling via joint adaptive training. Since the modules for acoustic modeling and speech separation are implemented using DNNs, unification is done by introducing additional hidden layers with fixed weights and appropriate network architecture. On the CHiME-2 medium-large vocabulary ASR task, and with log mel spectral features as input to the acoustic model, an independently trained ratio masking frontend improves word error rates by 10.9% (relative) compared to the noisy baseline. In comparison, the jointly trained system improves performance by 14.4%. We also experiment with alternative feature representations to augment the standard log mel features, like the noise and speech estimates obtained from the separation module, and the standard feature set used for IRM estimation. Our best system obtains a word error rate of 15.4% (absolute), an improvement of 4.6 percentage points over the next best result on this corpus. PMID:26973851

  11. Seismic and acoustic signal identification algorithms

    SciTech Connect

    LADD,MARK D.; ALAM,M. KATHLEEN; SLEEFE,GERARD E.; GALLEGOS,DANIEL E.

    2000-04-03

    This paper will describe an algorithm for detecting and classifying seismic and acoustic signals for unattended ground sensors. The algorithm must be computationally efficient and continuously process a data stream in order to establish whether or not a desired signal has changed state (turned-on or off). The paper will focus on describing a Fourier based technique that compares the running power spectral density estimate of the data to a predetermined signature in order to determine if the desired signal has changed state. How to establish the signature and the detection thresholds will be discussed as well as the theoretical statistics of the algorithm for the Gaussian noise case with results from simulated data. Actual seismic data results will also be discussed along with techniques used to reduce false alarms due to the inherent nonstationary noise environments found with actual data.

  12. Speech privacy and annoyance considerations in the acoustic environment of passenger cars of high-speed trains.

    PubMed

    Jeon, Jin Yong; Hong, Joo Young; Jang, Hyung Suk; Kim, Jae Hyeon

    2015-12-01

    It is necessary to consider not only annoyance of interior noises but also speech privacy to achieve acoustic comfort in a passenger car of a high-speed train because speech from other passengers can be annoying. This study aimed to explore an optimal acoustic environment to satisfy speech privacy and reduce annoyance in a passenger car. Two experiments were conducted using speech sources and compartment noise of a high speed train with varying speech-to-noise ratios (SNRA) and background noise levels (BNL). Speech intelligibility was tested in experiment I, and in experiment II, perceived speech privacy, annoyance, and acoustic comfort of combined sounds with speech and background noise were assessed. The results show that speech privacy and annoyance were significantly influenced by the SNRA. In particular, the acoustic comfort was evaluated as acceptable when the SNRA was less than -6 dB for both speech privacy and noise annoyance. In addition, annoyance increased significantly as the BNL exceeded 63 dBA, whereas the effect of the background-noise level on the speech privacy was not significant. These findings suggest that an optimal level of interior noise in a passenger car might exist between 59 and 63 dBA, taking normal speech levels into account.

  13. Prosodic influences on speech production in children with specific language impairment and speech deficits: kinematic, acoustic, and transcription evidence.

    PubMed

    Goffman, L

    1999-12-01

    It is often hypothesized that young children's difficulties with producing weak-strong (iambic) prosodic forms arise from perceptual or linguistically based production factors. A third possible contributor to errors in the iambic form may be biological constraints, or biases, of the motor system. In the present study, 7 children with specific language impairment (SLI) and speech deficits were matched to same age peers. Multiple levels of analysis, including kinematic (modulation and stability of movement), acoustic, and transcription, were applied to children's productions of iambic (weak-strong) and trochaic (strong-weak) prosodic forms. Findings suggest that a motor bias toward producing unmodulated rhythmic articulatory movements, similar to that observed in canonical babbling, contribute to children's acquisition of metrical forms. Children with SLI and speech deficits show less mature segmental and speech motor systems, as well as decreased modulation of movement in later developing iambic forms. Further, components of prosodic and segmental acquisition develop independently and at different rates.

  14. Precategorical Acoustic Storage and the Perception of Speech

    ERIC Educational Resources Information Center

    Frankish, Clive

    2008-01-01

    Theoretical accounts of both speech perception and of short term memory must consider the extent to which perceptual representations of speech sounds might survive in relatively unprocessed form. This paper describes a novel version of the serial recall task that can be used to explore this area of shared interest. In immediate recall of digit…

  15. Vowel Acoustics in Adults with Apraxia of Speech

    ERIC Educational Resources Information Center

    Jacks, Adam; Mathes, Katey A.; Marquardt, Thomas P.

    2010-01-01

    Purpose: To investigate the hypothesis that vowel production is more variable in adults with acquired apraxia of speech (AOS) relative to healthy individuals with unimpaired speech. Vowel formant frequency measures were selected as the specific target of focus. Method: Seven adults with AOS and aphasia produced 15 repetitions of 6 American English…

  16. Bird population density estimated from acoustic signals

    USGS Publications Warehouse

    Dawson, D.K.; Efford, M.G.

    2009-01-01

    Many animal species are detected primarily by sound. Although songs, calls and other sounds are often used for population assessment, as in bird point counts and hydrophone surveys of cetaceans, there are few rigorous methods for estimating population density from acoustic data. 2. The problem has several parts - distinguishing individuals, adjusting for individuals that are missed, and adjusting for the area sampled. Spatially explicit capture-recapture (SECR) is a statistical methodology that addresses jointly the second and third parts of the problem. We have extended SECR to use uncalibrated information from acoustic signals on the distance to each source. 3. We applied this extension of SECR to data from an acoustic survey of ovenbird Seiurus aurocapilla density in an eastern US deciduous forest with multiple four-microphone arrays. We modelled average power from spectrograms of ovenbird songs measured within a window of 0??7 s duration and frequencies between 4200 and 5200 Hz. 4. The resulting estimates of the density of singing males (0??19 ha -1 SE 0??03 ha-1) were consistent with estimates of the adult male population density from mist-netting (0??36 ha-1 SE 0??12 ha-1). The fitted model predicts sound attenuation of 0??11 dB m-1 (SE 0??01 dB m-1) in excess of losses from spherical spreading. 5.Synthesis and applications. Our method for estimating animal population density from acoustic signals fills a gap in the census methods available for visually cryptic but vocal taxa, including many species of bird and cetacean. The necessary equipment is simple and readily available; as few as two microphones may provide adequate estimates, given spatial replication. The method requires that individuals detected at the same place are acoustically distinguishable and all individuals vocalize during the recording interval, or that the per capita rate of vocalization is known. We believe these requirements can be met, with suitable field methods, for a significant

  17. The effects of temporal modification of second speech signals on stuttering inhibition at two speech rates in adults.

    PubMed

    Guntupalli, Vijaya K; Kalinowski, Joseph; Saltuklaroglu, Tim; Nanjundeswaran, Chayadevie

    2005-09-02

    The recovery of 'gestural' speech information via the engagement of mirror neurons has been suggested to be the key agent in stuttering inhibition during the presentation of exogenous second speech signals. Based on this hypothesis, we expect the amount of stuttering inhibition to depend on the ease of recovery of exogenous speech gestures. To examine this possibility, linguistically non-congruent second speech signals were temporally compressed and expanded in two experiments. In Experiment 1, 12 participants who stutter read passages aloud at normal and fast speech rates while listening to second speech signals that were 0, 40, 80% compressed, and 40 and 80% expanded. Except for the 80% compressed speech signal, all other stimuli induced significant stuttering inhibition relative to the control condition. The 80% compressed speech signal was the first exogenously presented speech signal that failed to significantly reduce stuttering frequency by 60--70% that has been the case in our research over the years. It was hypothesized that at a compression ratio of 80%, exogenous speech signals generated too many gestures per unit time to allow for adequate gestural recovery via mirror neurons. However, considering that 80% compressed signal was also highly unintelligible, a second experiment was conducted to further examine whether the effects of temporal compression on stuttering inhibition are mediated by speech intelligibility. In Experiment 2, 10 participants who stutter read passages at a normal rate while listening to linguistically non-congruent second speech signals that were compressed by 0, 20, 40, 60, and 80%. Results revealed that 0 and 20% compressed speech signals induced approximately 52% stuttering inhibition. In contrast, compression ratios of 40% and beyond induced only 27% stuttering inhibition although 40 and 60% compressed signals were perceptually intelligible. Our findings suggest that recovery of gestural information is affected by temporal

  18. Perceiving speech in context: Compensation for contextual variability during acoustic cue encoding and categorization

    NASA Astrophysics Data System (ADS)

    Toscano, Joseph Christopher

    Several fundamental questions about speech perception concern how listeners understand spoken language despite considerable variability in speech sounds across different contexts (the problem of lack of invariance in speech). This contextual variability is caused by several factors, including differences between individual talkers' voices, variation in speaking rate, and effects of coarticulatory context. A number of models have been proposed to describe how the speech system handles differences across contexts. Critically, these models make different predictions about (1) whether contextual variability is handled at the level of acoustic cue encoding or categorization, (2) whether it is driven by feedback from category-level processes or interactions between cues, and (3) whether listeners discard fine-grained acoustic information to compensate for contextual variability. Separating the effects of cue- and category-level processing has been difficult because behavioral measures tap processes that occur well after initial cue encoding and are influenced by task demands and linguistic information. Recently, we have used the event-related brain potential (ERP) technique to examine cue encoding and online categorization. Specifically, we have looked at differences in the auditory N1 as a measure of acoustic cue encoding and the P3 as a measure of categorization. This allows us to examine multiple levels of processing during speech perception and can provide a useful tool for studying effects of contextual variability. Here, I apply this approach to determine the point in processing at which context has an effect on speech perception and to examine whether acoustic cues are encoded continuously. Several types of contextual variability (talker gender, speaking rate, and coarticulation), as well as several acoustic cues (voice onset time, formant frequencies, and bandwidths), are examined in a series of experiments. The results suggest that (1) at early stages of speech

  19. Signal Processing for Robust Speech Recognition

    DTIC Science & Technology

    1994-01-01

    identity. Since this phoneme -based approach relies on information from the acoustic-phonetic and language models to determine the compensation vectors, it...time for reviewing instructions , searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the...NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e . TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS

  20. Aging Affects Neural Synchronization to Speech-Related Acoustic Modulations

    PubMed Central

    Goossens, Tine; Vercammen, Charlotte; Wouters, Jan; van Wieringen, Astrid

    2016-01-01

    As people age, speech perception problems become highly prevalent, especially in noisy situations. In addition to peripheral hearing and cognition, temporal processing plays a key role in speech perception. Temporal processing of speech features is mediated by synchronized activity of neural oscillations in the central auditory system. Previous studies indicate that both the degree and hemispheric lateralization of synchronized neural activity relate to speech perception performance. Based on these results, we hypothesize that impaired speech perception in older persons may, in part, originate from deviances in neural synchronization. In this study, auditory steady-state responses that reflect synchronized activity of theta, beta, low and high gamma oscillations (i.e., 4, 20, 40, and 80 Hz ASSR, respectively) were recorded in young, middle-aged, and older persons. As all participants had normal audiometric thresholds and were screened for (mild) cognitive impairment, differences in synchronized neural activity across the three age groups were likely to be attributed to age. Our data yield novel findings regarding theta and high gamma oscillations in the aging auditory system. At an older age, synchronized activity of theta oscillations is increased, whereas high gamma synchronization is decreased. In contrast to young persons who exhibit a right hemispheric dominance for processing of high gamma range modulations, older adults show a symmetrical processing pattern. These age-related changes in neural synchronization may very well underlie the speech perception problems in aging persons. PMID:27378906

  1. Acoustic Markers of Prominence Influence Infants' and Adults' Segmentation of Speech Sequences

    ERIC Educational Resources Information Center

    Bion, Ricardo A. H.; Benavides-Varela, Silvia; Nespor, Marina

    2011-01-01

    Two experiments investigated the way acoustic markers of prominence influence the grouping of speech sequences by adults and 7-month-old infants. In the first experiment, adults were familiarized with and asked to memorize sequences of adjacent syllables that alternated in either pitch or duration. During the test phase, participants heard pairs…

  2. Acoustic and Articulatory Features of Diphthong Production: A Speech Clarity Study

    ERIC Educational Resources Information Center

    Tasko, Stephen M.; Greilick, Kristin

    2010-01-01

    Purpose: The purpose of this study was to evaluate how speaking clearly influences selected acoustic and orofacial kinematic measures associated with diphthong production. Method: Forty-nine speakers, drawn from the University of Wisconsin X-Ray Microbeam Speech Production Database (J. R. Westbury, 1994), served as participants. Samples of clear…

  3. Effects of contextual cues on speech recognition in simulated electric-acoustic stimulation

    PubMed Central

    Kong, Ying-Yee; Donaldson, Gail; Somarowthu, Ala

    2015-01-01

    Low-frequency acoustic cues have shown to improve speech perception in cochlear-implant listeners. However, the mechanisms underlying this benefit are still not well understood. This study investigated the extent to which low-frequency cues can facilitate listeners' use of linguistic knowledge in simulated electric-acoustic stimulation (EAS). Experiment 1 examined differences in the magnitude of EAS benefit at the phoneme, word, and sentence levels. Speech materials were processed via noise-channel vocoding and lowpass (LP) filtering. The amount of spectral degradation in the vocoder speech was varied by applying different numbers of vocoder channels. Normal-hearing listeners were tested on vocoder-alone, LP-alone, and vocoder + LP conditions. Experiment 2 further examined factors that underlie the context effect on EAS benefit at the sentence level by limiting the low-frequency cues to temporal envelope and periodicity (AM + FM). Results showed that EAS benefit was greater for higher-context than for lower-context speech materials even when the LP ear received only low-frequency AM + FM cues. Possible explanations for the greater EAS benefit observed with higher-context materials may lie in the interplay between perceptual and expectation-driven processes for EAS speech recognition, and/or the band-importance functions for different types of speech materials. PMID:25994712

  4. Effects of contextual cues on speech recognition in simulated electric-acoustic stimulation.

    PubMed

    Kong, Ying-Yee; Donaldson, Gail; Somarowthu, Ala

    2015-05-01

    Low-frequency acoustic cues have shown to improve speech perception in cochlear-implant listeners. However, the mechanisms underlying this benefit are still not well understood. This study investigated the extent to which low-frequency cues can facilitate listeners' use of linguistic knowledge in simulated electric-acoustic stimulation (EAS). Experiment 1 examined differences in the magnitude of EAS benefit at the phoneme, word, and sentence levels. Speech materials were processed via noise-channel vocoding and lowpass (LP) filtering. The amount of spectral degradation in the vocoder speech was varied by applying different numbers of vocoder channels. Normal-hearing listeners were tested on vocoder-alone, LP-alone, and vocoder + LP conditions. Experiment 2 further examined factors that underlie the context effect on EAS benefit at the sentence level by limiting the low-frequency cues to temporal envelope and periodicity (AM + FM). Results showed that EAS benefit was greater for higher-context than for lower-context speech materials even when the LP ear received only low-frequency AM + FM cues. Possible explanations for the greater EAS benefit observed with higher-context materials may lie in the interplay between perceptual and expectation-driven processes for EAS speech recognition, and/or the band-importance functions for different types of speech materials.

  5. Contributions of electric and acoustic hearing to bimodal speech and music perception.

    PubMed

    Crew, Joseph D; Galvin, John J; Landsberger, David M; Fu, Qian-Jie

    2015-01-01

    Cochlear implant (CI) users have difficulty understanding speech in noisy listening conditions and perceiving music. Aided residual acoustic hearing in the contralateral ear can mitigate these limitations. The present study examined contributions of electric and acoustic hearing to speech understanding in noise and melodic pitch perception. Data was collected with the CI only, the hearing aid (HA) only, and both devices together (CI+HA). Speech reception thresholds (SRTs) were adaptively measured for simple sentences in speech babble. Melodic contour identification (MCI) was measured with and without a masker instrument; the fundamental frequency of the masker was varied to be overlapping or non-overlapping with the target contour. Results showed that the CI contributes primarily to bimodal speech perception and that the HA contributes primarily to bimodal melodic pitch perception. In general, CI+HA performance was slightly improved relative to the better ear alone (CI-only) for SRTs but not for MCI, with some subjects experiencing a decrease in bimodal MCI performance relative to the better ear alone (HA-only). Individual performance was highly variable, and the contribution of either device to bimodal perception was both subject- and task-dependent. The results suggest that individualized mapping of CIs and HAs may further improve bimodal speech and music perception.

  6. Contributions of Electric and Acoustic Hearing to Bimodal Speech and Music Perception

    PubMed Central

    Crew, Joseph D.; Galvin III, John J.; Landsberger, David M.; Fu, Qian-Jie

    2015-01-01

    Cochlear implant (CI) users have difficulty understanding speech in noisy listening conditions and perceiving music. Aided residual acoustic hearing in the contralateral ear can mitigate these limitations. The present study examined contributions of electric and acoustic hearing to speech understanding in noise and melodic pitch perception. Data was collected with the CI only, the hearing aid (HA) only, and both devices together (CI+HA). Speech reception thresholds (SRTs) were adaptively measured for simple sentences in speech babble. Melodic contour identification (MCI) was measured with and without a masker instrument; the fundamental frequency of the masker was varied to be overlapping or non-overlapping with the target contour. Results showed that the CI contributes primarily to bimodal speech perception and that the HA contributes primarily to bimodal melodic pitch perception. In general, CI+HA performance was slightly improved relative to the better ear alone (CI-only) for SRTs but not for MCI, with some subjects experiencing a decrease in bimodal MCI performance relative to the better ear alone (HA-only). Individual performance was highly variable, and the contribution of either device to bimodal perception was both subject- and task-dependent. The results suggest that individualized mapping of CIs and HAs may further improve bimodal speech and music perception. PMID:25790349

  7. Primary Progressive Apraxia of Speech: Clinical Features and Acoustic and Neurologic Correlates

    PubMed Central

    Strand, Edythe A.; Clark, Heather; Machulda, Mary; Whitwell, Jennifer L.; Josephs, Keith A.

    2015-01-01

    Purpose This study summarizes 2 illustrative cases of a neurodegenerative speech disorder, primary progressive apraxia of speech (AOS), as a vehicle for providing an overview of the disorder and an approach to describing and quantifying its perceptual features and some of its temporal acoustic attributes. Method Two individuals with primary progressive AOS underwent speech-language and neurologic evaluations on 2 occasions, ranging from 2.0 to 7.5 years postonset. Performance on several tests, tasks, and rating scales, as well as several acoustic measures, were compared over time within and between cases. Acoustic measures were compared with performance of control speakers. Results Both patients initially presented with AOS as the only or predominant sign of disease and without aphasia or dysarthria. The presenting features and temporal progression were captured in an AOS Rating Scale, an Articulation Error Score, and temporal acoustic measures of utterance duration, syllable rates per second, rates of speechlike alternating motion and sequential motion, and a pairwise variability index measure. Conclusions AOS can be the predominant manifestation of neurodegenerative disease. Clinical ratings of its attributes and acoustic measures of some of its temporal characteristics can support its diagnosis and help quantify its salient characteristics and progression over time. PMID:25654422

  8. Effect of continuous speech and non-speech signals on stuttering frequency in adults who stutter.

    PubMed

    Dayalu, Vikram N; Guntupalli, Vijaya K; Kalinowski, Joseph; Stuart, Andrew; Saltuklaroglu, Tim; Rastatter, Michael P

    2011-10-01

    The inhibitory effects of continuously presented audio signals (/a/, /s/, 1,000 Hz pure-tone) on stuttering were examined. Eleven adults who stutter participated. Participants read four 300-syllable passages (i.e. in the presence and absence of the audio signals). All of the audio signals induced a significant reduction in stuttering frequency relative to the control condition (P = 0.005). A significantly greater reduction in stuttering occurred in the /a/ condition (P < 0.05), while there was no significant difference between the /s/ and 1,000 Hz pure-tone conditions (P > 0.05). These findings are consistent with the notion that the percept of a second signal as speech or non-speech can respectively augment or attenuate the potency for reducing stuttering frequency.

  9. Cross-modal interactions during perception of audiovisual speech and nonspeech signals: an fMRI study.

    PubMed

    Hertrich, Ingo; Dietrich, Susanne; Ackermann, Hermann

    2011-01-01

    During speech communication, visual information may interact with the auditory system at various processing stages. Most noteworthy, recent magnetoencephalography (MEG) data provided first evidence for early and preattentive phonetic/phonological encoding of the visual data stream--prior to its fusion with auditory phonological features [Hertrich, I., Mathiak, K., Lutzenberger, W., & Ackermann, H. Time course of early audiovisual interactions during speech and non-speech central-auditory processing: An MEG study. Journal of Cognitive Neuroscience, 21, 259-274, 2009]. Using functional magnetic resonance imaging, the present follow-up study aims to further elucidate the topographic distribution of visual-phonological operations and audiovisual (AV) interactions during speech perception. Ambiguous acoustic syllables--disambiguated to /pa/ or /ta/ by the visual channel (speaking face)--served as test materials, concomitant with various control conditions (nonspeech AV signals, visual-only and acoustic-only speech, and nonspeech stimuli). (i) Visual speech yielded an AV-subadditive activation of primary auditory cortex and the anterior superior temporal gyrus (STG), whereas the posterior STG responded both to speech and nonspeech motion. (ii) The inferior frontal and the fusiform gyrus of the right hemisphere showed a strong phonetic/phonological impact (differential effects of visual /pa/ vs. /ta/) upon hemodynamic activation during presentation of speaking faces. Taken together with the previous MEG data, these results point at a dual-pathway model of visual speech information processing: On the one hand, access to the auditory system via the anterior supratemporal “what" path may give rise to direct activation of "auditory objects." On the other hand, visual speech information seems to be represented in a right-hemisphere visual working memory, providing a potential basis for later interactions with auditory information such as the McGurk effect.

  10. Experiment in Learning to Discriminate Frequency Transposed Speech.

    ERIC Educational Resources Information Center

    Ahlstrom, K.G.; And Others

    In order to improve speech perception by transposing the speech signals to lower frequencies, to determine which aspects of the information in the acoustic speech signals were influenced by transposition, and to compare two different methods of training speech perception, 44 subjects were trained to discriminate between transposed words or…

  11. Measurements of speech intelligibility in common rooms for older adults as a first step towards acoustical guidelines.

    PubMed

    Reinten, Jikke; van Hout, Nicole; Hak, Constant; Kort, Helianthe

    2015-01-01

    Adapting the built environment to the needs of nursing- or care-home residents has become common practice. Even though hearing loss due to ageing is a normal occurring biological process, little research has been performed on the effects of room acoustic parameters on the speech intelligibility for older adults. This article presents the results of room acoustic measurements in common rooms for older adults and the effect on speech intelligibility. Perceived speech intelligibility amongst the users of the rooms was also investigated. The results have led to ongoing research at Utrecht University of Applied Sciences and Eindhoven University of Technology, aimed at the development of acoustical guidelines for elderly care facilities.

  12. Acoustic signal detection of manatee calls

    NASA Astrophysics Data System (ADS)

    Niezrecki, Christopher; Phillips, Richard; Meyer, Michael; Beusse, Diedrich O.

    2003-04-01

    The West Indian manatee (trichechus manatus latirostris) has become endangered partly because of a growing number of collisions with boats. A system to warn boaters of the presence of manatees, that can signal to boaters that manatees are present in the immediate vicinity, could potentially reduce these boat collisions. In order to identify the presence of manatees, acoustic methods are employed. Within this paper, three different detection algorithms are used to detect the calls of the West Indian manatee. The detection systems are tested in the laboratory using simulated manatee vocalizations from an audio compact disc. The detection method that provides the best overall performance is able to correctly identify ~=96% of the manatee vocalizations. However the system also results in a false positive rate of ~=16%. The results of this work may ultimately lead to the development of a manatee warning system that can warn boaters of the presence of manatees.

  13. The advantages of sound localization and speech perception of bilateral electric acoustic stimulation

    PubMed Central

    Moteki, Hideaki; Kitoh, Ryosuke; Tsukada, Keita; Iwasaki, Satoshi; Nishio, Shin-Ya

    2015-01-01

    Conclusion: Bilateral electric acoustic stimulation (EAS) effectively improved speech perception in noise and sound localization in patients with high-frequency hearing loss. Objective: To evaluate bilateral EAS efficacy of sound localization detection and speech perception in noise in two cases of high-frequency hearing loss. Methods: Two female patients, aged 38 and 45 years, respectively, received bilateral EAS sequentially. Pure-tone audiometry was performed preoperatively and postoperatively to evaluate the hearing preservation in the lower frequencies. Speech perception outcomes in quiet and noise and sound localization were assessed with unilateral and bilateral EAS. Results: Residual hearing in the lower frequencies was well preserved after insertion of a FLEX24 electrode (24 mm) using the round window approach. After bilateral EAS, speech perception improved in quiet and even more so in noise. In addition, the sound localization ability of both cases with bilateral EAS improved remarkably. PMID:25423260

  14. Vowel Acoustics in Dysarthria: Speech Disorder Diagnosis and Classification

    ERIC Educational Resources Information Center

    Lansford, Kaitlin L.; Liss, Julie M.

    2014-01-01

    Purpose: The purpose of this study was to determine the extent to which vowel metrics are capable of distinguishing healthy from dysarthric speech and among different forms of dysarthria. Method: A variety of vowel metrics were derived from spectral and temporal measurements of vowel tokens embedded in phrases produced by 45 speakers with…

  15. Acoustical properties of speech as indicators of depression and suicidal risk.

    PubMed

    France, D J; Shiavi, R G; Silverman, S; Silverman, M; Wilkes, D M

    2000-07-01

    Acoustic properties of speech have previously been identified as possible cues to depression, and there is evidence that certain vocal parameters may be used further to objectively discriminate between depressed and suicidal speech. Studies were performed to analyze and compare the speech acoustics of separate male and female samples comprised of normal individuals and individuals carrying diagnoses of depression and high-risk, near-term suicidality. The female sample consisted of ten control subjects, 17 dysthymic patients, and 21 major depressed patients. The male sample contained 24 control subjects, 21 major depressed patients, and 22 high-risk suicidal patients. Acoustic analyses of voice fundamental frequency (Fo), amplitude modulation (AM), formants, and power distribution were performed on speech samples extracted from audio recordings collected from the sample members. Multivariate feature and discriminant analyses were performed on feature vectors representing the members of the control and disordered classes. Features derived from the formant and power spectral density measurements were found to be the best discriminators of class membership in both the male and female studies. AM features emerged as strong class discriminators of the male classes. Features describing Fo were generally ineffective discriminators in both studies. The results support theories that identify psychomotor disturbances as central elements in depression and suicidality.

  16. A novel radar sensor for the non-contact detection of speech signals.

    PubMed

    Jiao, Mingke; Lu, Guohua; Jing, Xijing; Li, Sheng; Li, Yanfeng; Wang, Jianqi

    2010-01-01

    Different speech detection sensors have been developed over the years but they are limited by the loss of high frequency speech energy, and have restricted non-contact detection due to the lack of penetrability. This paper proposes a novel millimeter microwave radar sensor to detect speech signals. The utilization of a high operating frequency and a superheterodyne receiver contributes to the high sensitivity of the radar sensor for small sound vibrations. In addition, the penetrability of microwaves allows the novel sensor to detect speech signals through nonmetal barriers. Results show that the novel sensor can detect high frequency speech energies and that the speech quality is comparable to traditional microphone speech. Moreover, the novel sensor can detect speech signals through a nonmetal material of a certain thickness between the sensor and the subject. Thus, the novel speech sensor expands traditional speech detection techniques and provides an exciting alternative for broader application prospects.

  17. A Novel Radar Sensor for the Non-Contact Detection of Speech Signals

    PubMed Central

    Jiao, Mingke; Lu, Guohua; Jing, Xijing; Li, Sheng; Li, Yanfeng; Wang, Jianqi

    2010-01-01

    Different speech detection sensors have been developed over the years but they are limited by the loss of high frequency speech energy, and have restricted non-contact detection due to the lack of penetrability. This paper proposes a novel millimeter microwave radar sensor to detect speech signals. The utilization of a high operating frequency and a superheterodyne receiver contributes to the high sensitivity of the radar sensor for small sound vibrations. In addition, the penetrability of microwaves allows the novel sensor to detect speech signals through nonmetal barriers. Results show that the novel sensor can detect high frequency speech energies and that the speech quality is comparable to traditional microphone speech. Moreover, the novel sensor can detect speech signals through a nonmetal material of a certain thickness between the sensor and the subject. Thus, the novel speech sensor expands traditional speech detection techniques and provides an exciting alternative for broader application prospects. PMID:22399895

  18. A Statistical Model-Based Speech Enhancement Using Acoustic Noise Classification for Robust Speech Communication

    NASA Astrophysics Data System (ADS)

    Choi, Jae-Hun; Chang, Joon-Hyuk

    In this paper, we present a speech enhancement technique based on the ambient noise classification that incorporates the Gaussian mixture model (GMM). The principal parameters of the statistical model-based speech enhancement algorithm such as the weighting parameter in the decision-directed (DD) method and the long-term smoothing parameter of the noise estimation, are set according to the classified context to ensure best performance under each noise. For real-time context awareness, the noise classification is performed on a frame-by-frame basis using the GMM with the soft decision framework. The speech absence probability (SAP) is used in detecting the speech absence periods and updating the likelihood of the GMM.

  19. Effects of cognitive workload on speech production: acoustic analyses and perceptual consequences.

    PubMed

    Lively, S E; Pisoni, D B; Van Summers, W; Bernacki, R H

    1993-05-01

    The present investigation examined the effects of cognitive workload on speech production. Workload was manipulated by having talkers perform a compensatory visual tracking task while speaking test sentences of the form "Say hVd again." Acoustic measurements were made to compare utterances produced under workload with the same utterances produced in a control condition. In the workload condition, some talkers produced utterances with increased amplitude and amplitude variability, decreased spectral tilt and F0 variability and increased speaking rate. No changes in F1, F2, or F3 were observed across conditions for any of the talkers. These findings indicate both laryngeal and sublaryngeal adjustments in articulation, as well as modifications in the absolute timing of articulatory gestures. The results of a perceptual identification experiment paralleled the acoustic measurements. Small but significant advantages in intelligibility were observed for utterances produced under workload for talkers who showed robust changes in speech production. Changes in amplitude and amplitude variability for utterances produced under workload appeared to be the major factor controlling intelligibility. The results of the present investigation support the assumptions of Lindblom's ["Explaining phonetic variation: A sketch of the H&H theory," in Speech Production and Speech Modeling (Klewer Academic, The Netherlands, 1990)] H&H model: Talkers adapt their speech to suit the demands of the environment and these modifications are designed to maximize intelligibility.

  20. Acoustic signalling reflects personality in a social mammal

    PubMed Central

    Friel, Mary; Kunc, Hansjoerg P.; Griffin, Kym; Asher, Lucy; Collins, Lisa M.

    2016-01-01

    Social interactions among individuals are often mediated through acoustic signals. If acoustic signals are consistent and related to an individual's personality, these consistent individual differences in signalling may be an important driver in social interactions. However, few studies in non-human mammals have investigated the relationship between acoustic signalling and personality. Here we show that acoustic signalling rate is repeatable and strongly related to personality in a highly social mammal, the domestic pig (Sus scrofa domestica). Furthermore, acoustic signalling varied between environments of differing quality, with males from a poor-quality environment having a reduced vocalization rate compared with females and males from an enriched environment. Such differences may be mediated by personality with pigs from a poor-quality environment having more reactive and more extreme personality scores compared with pigs from an enriched environment. Our results add to the evidence that acoustic signalling reflects personality in a non-human mammal. Signals reflecting personalities may have far reaching consequences in shaping the evolution of social behaviours as acoustic communication forms an integral part of animal societies. PMID:27429775

  1. Acoustic and auditory phonetics: the adaptive design of speech sound systems.

    PubMed

    Diehl, Randy L

    2008-03-12

    Speech perception is remarkably robust. This paper examines how acoustic and auditory properties of vowels and consonants help to ensure intelligibility. First, the source-filter theory of speech production is briefly described, and the relationship between vocal-tract properties and formant patterns is demonstrated for some commonly occurring vowels. Next, two accounts of the structure of preferred sound inventories, quantal theory and dispersion theory, are described and some of their limitations are noted. Finally, it is suggested that certain aspects of quantal and dispersion theories can be unified in a principled way so as to achieve reasonable predictive accuracy.

  2. Subphonetic Acoustic Modeling for Speaker-Independent Continuous Speech Recognition

    DTIC Science & Technology

    1993-12-17

    concept of senone sharing across all hidden Markov models, such as triphones, multi-phones, words, or even phrase models ................. 50 3.15 The...For instance, training the 50 phone HMMs for English usually requires only 1-2 hours of training data, while to sufficiently train syllable models may...require 50 hours of speech. Faced with a limited amount of training data, the advantage of the improved structure of the stochastic model may not be

  3. From prosodic structure to acoustic saliency: A fMRI investigation of speech rate, clarity, and emphasis

    NASA Astrophysics Data System (ADS)

    Golfinopoulos, Elisa

    Acoustic variability in fluent speech can arise at many stages in speech production planning and execution. For example, at the phonological encoding stage, the grouping of phonemes into syllables determines which segments are coarticulated and, by consequence, segment-level acoustic variation. Likewise phonetic encoding, which determines the spatiotemporal extent of articulatory gestures, will affect the acoustic detail of segments. Functional magnetic resonance imaging (fMRI) was used to measure brain activity of fluent adult speakers in four speaking conditions: fast, normal, clear, and emphatic (or stressed) speech. These speech manner changes typically result in acoustic variations that do not change the lexical or semantic identity of productions but do affect the acoustic saliency of phonemes, syllables and/or words. Acoustic responses recorded inside the scanner were assessed quantitatively using eight acoustic measures and sentence duration was used as a covariate of non-interest in the neuroimaging analysis. Compared to normal speech, emphatic speech was characterized acoustically by a greater difference between stressed and unstressed vowels in intensity, duration, and fundamental frequency, and neurally by increased activity in right middle premotor cortex and supplementary motor area, and bilateral primary sensorimotor cortex. These findings are consistent with right-lateralized motor planning of prosodic variation in emphatic speech. Clear speech involved an increase in average vowel and sentence durations and average vowel spacing, along with increased activity in left middle premotor cortex and bilateral primary sensorimotor cortex. These findings are consistent with an increased reliance on feedforward control, resulting in hyper-articulation, under clear as compared to normal speech. Fast speech was characterized acoustically by reduced sentence duration and average vowel spacing, and neurally by increased activity in left anterior frontal

  4. Subauditory Speech Recognition based on EMG/EPG Signals

    NASA Technical Reports Server (NTRS)

    Jorgensen, Charles; Lee, Diana Dee; Agabon, Shane; Lau, Sonie (Technical Monitor)

    2003-01-01

    Sub-vocal electromyogram/electro palatogram (EMG/EPG) signal classification is demonstrated as a method for silent speech recognition. Recorded electrode signals from the larynx and sublingual areas below the jaw are noise filtered and transformed into features using complex dual quad tree wavelet transforms. Feature sets for six sub-vocally pronounced words are trained using a trust region scaled conjugate gradient neural network. Real time signals for previously unseen patterns are classified into categories suitable for primitive control of graphic objects. Feature construction, recognition accuracy and an approach for extension of the technique to a variety of real world application areas are presented.

  5. Speech after Radial Forearm Free Flap Reconstruction of the Tongue: A Longitudinal Acoustic Study of Vowel and Diphthong Sounds

    ERIC Educational Resources Information Center

    Laaksonen, Juha-Pertti; Rieger, Jana; Happonen, Risto-Pekka; Harris, Jeffrey; Seikaly, Hadi

    2010-01-01

    The purpose of this study was to use acoustic analyses to describe speech outcomes over the course of 1 year after radial forearm free flap (RFFF) reconstruction of the tongue. Eighteen Canadian English-speaking females and males with reconstruction for oral cancer had speech samples recorded (pre-operative, and 1 month, 6 months, and 1 year…

  6. Acoustic-Phonetic Differences between Infant- and Adult-Directed Speech: The Role of Stress and Utterance Position

    ERIC Educational Resources Information Center

    Wang, Yuanyuan; Seidl, Amanda; Cristia, Alejandrina

    2015-01-01

    Previous studies have shown that infant-directed speech (IDS) differs from adult-directed speech (ADS) on a variety of dimensions. The aim of the current study was to investigate whether acoustic differences between IDS and ADS in English are modulated by prosodic structure. We compared vowels across the two registers (IDS, ADS) in both stressed…

  7. [The acoustic aspect of the speech development in children during the third year of life].

    PubMed

    Liakso, E E; Gromova, A D; Frolova, O V; Romanova, O D

    2004-01-01

    The current part of a Russian language acquisition longitudinal study based on auditory, phonetic and instrumental analysis is devoted to the third year of child's life. We examined the development of supplementary acoustic and phonetic features of the child's speech providing for the possibility for the speech to be recognized. The instrumental analysis and statistical processing of vowel formant dynamics as well as stress, palatalization and VOT development, has been performed for the first time in Russian children. We showed that the high probability of children words recognition by auditors was due to establishment of a system of acoustically stable features which, in combination with each other, provide for the informative sufficiency of a message.

  8. An eighth-scale speech source for subjective assessments in acoustic models

    NASA Astrophysics Data System (ADS)

    Orlowski, R. J.

    1981-08-01

    The design of a source is described which is suitable for making speech recordings in eighth-scale acoustic models of auditoria. An attempt was made to match the directionality of the source with the directionality of the human voice using data reported in the literature. A narrow aperture was required for the design which was provided by mounting an inverted conical horn over the diaphragm of a high frequency loudspeaker. Resonance problems were encountered with the use of a horn and a description is given of the electronic techniques adopted to minimize the effect of these resonances. Subjective and objective assessments on the completed speech source have proved satisfactory. It has been used in a modelling exercise concerned with the acoustic design of a theatre with a thrust-type stage.

  9. The Acoustic-Modeling Problem in Automatic Speech Recognition.

    DTIC Science & Technology

    1987-12-01

    systems that use an artificial grammar do so in order to set this uncertainty by fiat, thereby ensuring that their task, will not be too difficult...an artificial grammar , the Pr (W = w)’s are known and Hm (W) can, in fact, achieve its lower bound if the system simply uses these probabilities. In a...finite-state grammar represented by that chain. As Jim Baker points out, the modeling of speech by a hidden Markov model should not be regarded as a

  10. Are you a good mimic? Neuro-acoustic signatures for speech imitation ability.

    PubMed

    Reiterer, Susanne M; Hu, Xiaochen; Sumathi, T A; Singh, Nandini C

    2013-01-01

    We investigated individual differences in speech imitation ability in late bilinguals using a neuro-acoustic approach. One hundred and thirty-eight German-English bilinguals matched on various behavioral measures were tested for "speech imitation ability" in a foreign language, Hindi, and categorized into "high" and "low ability" groups. Brain activations and speech recordings were obtained from 26 participants from the two extreme groups as they performed a functional neuroimaging experiment which required them to "imitate" sentences in three conditions: (A) German, (B) English, and (C) German with fake English accent. We used recently developed novel acoustic analysis, namely the "articulation space" as a metric to compare speech imitation abilities of the two groups. Across all three conditions, direct comparisons between the two groups, revealed brain activations (FWE corrected, p < 0.05) that were more widespread with significantly higher peak activity in the left supramarginal gyrus and postcentral areas for the low ability group. The high ability group, on the other hand showed significantly larger articulation space in all three conditions. In addition, articulation space also correlated positively with imitation ability (Pearson's r = 0.7, p < 0.01). Our results suggest that an expanded articulation space for high ability individuals allows access to a larger repertoire of sounds, thereby providing skilled imitators greater flexibility in pronunciation and language learning.

  11. Are you a good mimic? Neuro-acoustic signatures for speech imitation ability

    PubMed Central

    Reiterer, Susanne M.; Hu, Xiaochen; Sumathi, T. A.; Singh, Nandini C.

    2013-01-01

    We investigated individual differences in speech imitation ability in late bilinguals using a neuro-acoustic approach. One hundred and thirty-eight German-English bilinguals matched on various behavioral measures were tested for “speech imitation ability” in a foreign language, Hindi, and categorized into “high” and “low ability” groups. Brain activations and speech recordings were obtained from 26 participants from the two extreme groups as they performed a functional neuroimaging experiment which required them to “imitate” sentences in three conditions: (A) German, (B) English, and (C) German with fake English accent. We used recently developed novel acoustic analysis, namely the “articulation space” as a metric to compare speech imitation abilities of the two groups. Across all three conditions, direct comparisons between the two groups, revealed brain activations (FWE corrected, p < 0.05) that were more widespread with significantly higher peak activity in the left supramarginal gyrus and postcentral areas for the low ability group. The high ability group, on the other hand showed significantly larger articulation space in all three conditions. In addition, articulation space also correlated positively with imitation ability (Pearson's r = 0.7, p < 0.01). Our results suggest that an expanded articulation space for high ability individuals allows access to a larger repertoire of sounds, thereby providing skilled imitators greater flexibility in pronunciation and language learning. PMID:24155739

  12. Evolution of acoustic and visual signals in Asian barbets.

    PubMed

    Gonzalez-Voyer, A; den Tex, R-J; Castelló, A; Leonard, J A

    2013-03-01

    The study of animal communication systems is an important step towards gaining greater understanding of the processes influencing diversification because signals often play an important role in mate choice and can lead to reproductive isolation. Signal evolution can be influenced by a diversity of factors such as biophysical constraints on the emitter, the signalling environment, or selection to avoid heterospecific matings. Furthermore, because signals can be costly to produce, trade-offs may exist between different types of signals. Here, we apply phylogenetic comparative analyses to study the evolution of acoustic and visual signals in Asian barbets, a clade of non-Passerine, forest-dependent birds. Our results suggest that evolution of acoustic and visual signals in barbets is influenced by diverse factors, such as morphology and signalling environment, suggesting a potential effect of sensory drive. We found no trade-offs between visual and acoustic signals. Quite to the contrary, more colourful species sing significantly longer songs. Song characteristics presented distinct patterns of evolution. Song frequency diverged early on and the rate of evolution of this trait appears to be constrained by body size. On the other hand, characteristics associated with length of the song presented evidence for more recent divergence. Finally, our results indicate that there is a spatial component to the evolution of visual signals, and that visual signals are more divergent between closely related taxa than acoustic signals. Hence, visual signals in these species could play a role in speciation or reinforcement of reproductive isolation following secondary contacts.

  13. Speech perception at positive signal-to-noise ratios using adaptive adjustment of time compression.

    PubMed

    Schlueter, Anne; Brand, Thomas; Lemke, Ulrike; Nitzschner, Stefan; Kollmeier, Birger; Holube, Inga

    2015-11-01

    Positive signal-to-noise ratios (SNRs) characterize listening situations most relevant for hearing-impaired listeners in daily life and should therefore be considered when evaluating hearing aid algorithms. For this, a speech-in-noise test was developed and evaluated, in which the background noise is presented at fixed positive SNRs and the speech rate (i.e., the time compression of the speech material) is adaptively adjusted. In total, 29 younger and 12 older normal-hearing, as well as 24 older hearing-impaired listeners took part in repeated measurements. Younger normal-hearing and older hearing-impaired listeners conducted one of two adaptive methods which differed in adaptive procedure and step size. Analysis of the measurements with regard to list length and estimation strategy for thresholds resulted in a practical method measuring the time compression for 50% recognition. This method uses time-compression adjustment and step sizes according to Versfeld and Dreschler [(2002). J. Acoust. Soc. Am. 111, 401-408], with sentence scoring, lists of 30 sentences, and a maximum likelihood method for threshold estimation. Evaluation of the procedure showed that older participants obtained higher test-retest reliability compared to younger participants. Depending on the group of listeners, one or two lists are required for training prior to data collection.

  14. Department of Cybernetic Acoustics

    NASA Astrophysics Data System (ADS)

    The development of the theory, instrumentation and applications of methods and systems for the measurement, analysis, processing and synthesis of acoustic signals within the audio frequency range, particularly of the speech signal and the vibro-acoustic signal emitted by technical and industrial equipments treated as noise and vibration sources was discussed. The research work, both theoretical and experimental, aims at applications in various branches of science, and medicine, such as: acoustical diagnostics and phoniatric rehabilitation of pathological and postoperative states of the speech organ; bilateral ""man-machine'' speech communication based on the analysis, recognition and synthesis of the speech signal; vibro-acoustical diagnostics and continuous monitoring of the state of machines, technical equipments and technological processes.

  15. Combining acoustic and electric stimulation in the service of speech recognition

    PubMed Central

    Dorman, Michael F.; Gifford, Rene H.

    2010-01-01

    The majority of recently implanted, cochlear implant patients can potentially benefit from a hearing aid in the ear contralateral to the implant. When patients combine electric and acoustic stimulation, word recognition in quiet and sentence recognition in noise increase significantly. Several studies suggest that the acoustic information that leads to the increased level of performance resides mostly in the frequency region of the voice fundamental, e.g. 125 Hz for a male voice. Recent studies suggest that this information aids speech recognition in noise by improving the recognition of lexical boundaries or word onsets. In some noise environments, patients with bilateral implants can achieve similar levels of performance as patients who combine electric and acoustic stimulation. Patients who have undergone hearing preservation surgery, and who have electric stimulation from a cochlear implant and who have low-frequency hearing in both the implanted and not-implanted ears, achieve the best performance in a high noise environment. PMID:20874053

  16. The Effectiveness of Clear Speech as a Masker

    ERIC Educational Resources Information Center

    Calandruccio, Lauren; Van Engen, Kristin; Dhar, Sumitrajit; Bradlow, Ann R.

    2010-01-01

    Purpose: It is established that speaking clearly is an effective means of enhancing intelligibility. Because any signal-processing scheme modeled after known acoustic-phonetic features of clear speech will likely affect both target and competing speech, it is important to understand how speech recognition is affected when a competing speech signal…

  17. Acoustic signals of Chinese alligators (Alligator sinensis): social communication.

    PubMed

    Wang, Xianyan; Wang, Ding; Wu, Xiaobing; Wang, Renping; Wang, Chaolin

    2007-05-01

    This paper reports the first systematic study of acoustic signals during social interactions of the Chinese alligator (Alligator sinensis). Sound pressure level (SPL) measurements revealed that Chinese alligators have an elaborate acoustic communication system with both long-distance signal-bellowing-and short-distance signals that include tooting, bubble blowing, hissing, mooing, head slapping and whining. Bellows have high SPL and appear to play an important role in the alligator's long range intercommunion. Sounds characterized by low SPL are short-distance signals used when alligators are in close spatial proximity to one another. The signal spectrographic analysis showed that the acoustic signals of Chinese alligators have a very low dominant frequency, less than 500 Hz. These frequencies are consistent with adaptation to a habitat with high density vegetation. Low dominant frequency sound attenuates less and could therefore cover a larger spatial range by diffraction in a densely vegetated environment relative to a higher dominant frequency sound.

  18. Experimental investigation of the effects of the acoustical conditions in a simulated classroom on speech recognition and learning in children.

    PubMed

    Valente, Daniel L; Plevinsky, Hallie M; Franco, John M; Heinrichs-Graham, Elizabeth C; Lewis, Dawna E

    2012-01-01

    The potential effects of acoustical environment on speech understanding are especially important as children enter school where students' ability to hear and understand complex verbal information is critical to learning. However, this ability is compromised because of widely varied and unfavorable classroom acoustics. The extent to which unfavorable classroom acoustics affect children's performance on longer learning tasks is largely unknown as most research has focused on testing children using words, syllables, or sentences as stimuli. In the current study, a simulated classroom environment was used to measure comprehension performance of two classroom learning activities: a discussion and lecture. Comprehension performance was measured for groups of elementary-aged students in one of four environments with varied reverberation times and background noise levels. The reverberation time was either 0.6 or 1.5 s, and the signal-to-noise level was either +10 or +7 dB. Performance is compared to adult subjects as well as to sentence-recognition in the same condition. Significant differences were seen in comprehension scores as a function of age and condition; both increasing background noise and reverberation degraded performance in comprehension tasks compared to minimal differences in measures of sentence-recognition.

  19. A Method of Speech Periodicity Enhancement Using Transform-domain Signal Decomposition.

    PubMed

    Huang, Huang; Lee, Tan; Kleijn, W Bastiaan; Kong, Ying-Yee

    2015-03-01

    Periodicity is an important property of speech signals. It is the basis of the signal's fundamental frequency and the pitch of voice, which is crucial to speech communication. This paper presents a novel framework of periodicity enhancement for noisy speech. The enhancement is applied to the linear prediction residual of speech. The residual signal goes through a constant-pitch time warping process and two sequential lapped-frequency transforms, by which the periodic component is concentrated in certain transform coefficients. By emphasizing the respective transform coefficients, periodicity enhancement of noisy residual signal is achieved. The enhanced residual signal and estimated linear prediction filter parameters are used to synthesize the output speech. An adaptive algorithm is proposed for adjusting the weights for the periodic and aperiodic components. Effectiveness of the proposed approach is demonstrated via experimental evaluation. It is observed that harmonic structure of the original speech could be properly restored to improve the perceptual quality of enhanced speech.

  20. On the Acoustics of Emotion in Audio: What Speech, Music, and Sound have in Common

    PubMed Central

    Weninger, Felix; Eyben, Florian; Schuller, Björn W.; Mortillaro, Marcello; Scherer, Klaus R.

    2013-01-01

    Without doubt, there is emotional information in almost any kind of sound received by humans every day: be it the affective state of a person transmitted by means of speech; the emotion intended by a composer while writing a musical piece, or conveyed by a musician while performing it; or the affective state connected to an acoustic event occurring in the environment, in the soundtrack of a movie, or in a radio play. In the field of affective computing, there is currently some loosely connected research concerning either of these phenomena, but a holistic computational model of affect in sound is still lacking. In turn, for tomorrow’s pervasive technical systems, including affective companions and robots, it is expected to be highly beneficial to understand the affective dimensions of “the sound that something makes,” in order to evaluate the system’s auditory environment and its own audio output. This article aims at a first step toward a holistic computational model: starting from standard acoustic feature extraction schemes in the domains of speech, music, and sound analysis, we interpret the worth of individual features across these three domains, considering four audio databases with observer annotations in the arousal and valence dimensions. In the results, we find that by selection of appropriate descriptors, cross-domain arousal, and valence regression is feasible achieving significant correlations with the observer annotations of up to 0.78 for arousal (training on sound and testing on enacted speech) and 0.60 for valence (training on enacted speech and testing on music). The high degree of cross-domain consistency in encoding the two main dimensions of affect may be attributable to the co-evolution of speech and music from multimodal affect bursts, including the integration of nature sounds for expressive effects. PMID:23750144

  1. Segment-based acoustic models for continuous speech recognition

    NASA Astrophysics Data System (ADS)

    Ostendorf, Mari; Rohlicek, J. R.

    1993-07-01

    This research aims to develop new and more accurate stochastic models for speaker-independent continuous speech recognition, by extending previous work in segment-based modeling and by introducing a new hierarchical approach to representing intra-utterance statistical dependencies. These techniques, which are more costly than traditional approaches because of the large search space associated with higher order models, are made feasible through rescoring a set of HMM-generated N-best sentence hypotheses. We expect these different modeling techniques to result in improved recognition performance over that achieved by current systems, which handle only frame-based observations and assume that these observations are independent given an underlying state sequence. In the fourth quarter of the project, we have completed the following: (1) ported our recognition system to the Wall Street Journal task, a standard task in the ARPA community; (2) developed an initial dependency-tree model of intra-utterance observation correlation; and (3) implemented baseline language model estimation software. Our initial results on the Wall Street Journal task are quite good and represent significantly improved performance over most HMM systems reporting on the Nov. 1992 5k vocabulary test set.

  2. Control of Spoken Vowel Acoustics and the Influence of Phonetic Context in Human Speech Sensorimotor Cortex

    PubMed Central

    Bouchard, Kristofer E.

    2014-01-01

    Speech production requires the precise control of vocal tract movements to generate individual speech sounds (phonemes) which, in turn, are rapidly organized into complex sequences. Multiple productions of the same phoneme can exhibit substantial variability, some of which is inherent to control of the vocal tract and its biomechanics, and some of which reflects the contextual effects of surrounding phonemes (“coarticulation”). The role of the CNS in these aspects of speech motor control is not well understood. To address these issues, we recorded multielectrode cortical activity directly from human ventral sensory-motor cortex (vSMC) during the production of consonant-vowel syllables. We analyzed the relationship between the acoustic parameters of vowels (pitch and formants) and cortical activity on a single-trial level. We found that vSMC activity robustly predicted acoustic parameters across vowel categories (up to 80% of variance), as well as different renditions of the same vowel (up to 25% of variance). Furthermore, we observed significant contextual effects on vSMC representations of produced phonemes that suggest active control of coarticulation: vSMC representations for vowels were biased toward the representations of the preceding consonant, and conversely, representations for consonants were biased toward upcoming vowels. These results reveal that vSMC activity for phonemes are not invariant and provide insight into the cortical mechanisms of coarticulation. PMID:25232105

  3. Teachers and Teaching: Speech Production Accommodations Due to Changes in the Acoustic Environment

    PubMed Central

    Hunter, Eric J.; Bottalico, Pasquale; Graetzer, Simone; Leishman, Timothy W.; Berardi, Mark L.; Eyring, Nathan G.; Jensen, Zachary R.; Rolins, Michael K.; Whiting, Jennifer K.

    2016-01-01

    School teachers have an elevated risk of voice problems due to the vocal demands in the workplace. This manuscript presents the results of three studies investigating teachers’ voice use at work. In the first study, 57 teachers were observed for 2 weeks (waking hours) to compare how they used their voice in the school environment and in non-school environments. In a second study, 45 participants performed a short vocal task in two different rooms: a variable acoustic room and an anechoic chamber. Subjects were taken back and forth between the two rooms. Each time they entered the variable acoustics room, the reverberation time and/or the background noise condition had been modified. In this latter study, subjects responded to questions about their vocal comfort and their perception of changes in the acoustic environment. In a third study, 20 untrained vocalists performed a simple vocal task in the following conditions: with and without background babble and with and without transparent plexiglass shields to increase the first reflection. Relationships were examined between [1] the results for the room acoustic parameters; [2] the subjects’ perception of the room; and [3] the recorded speech acoustic. Several differences between male and female subjects were found; some of those differences held for each room condition (at school vs. not at school, reverberation level, noise level, and early reflection). PMID:26949426

  4. Effective Acoustic Modeling for Pronunciation Quality Scoring of Strongly Accented Mandarin Speech

    NASA Astrophysics Data System (ADS)

    Ge, Fengpei; Liu, Changliang; Shao, Jian; Pan, Fuping; Dong, Bin; Yan, Yonghong

    In this paper we present our investigation into improving the performance of our computer-assisted language learning (CALL) system through exploiting the acoustic model and features within the speech recognition framework. First, to alleviate channel distortion, speaker-dependent cepstrum mean normalization (CMN) is adopted and the average correlation coefficient (average CC) between machine and expert scores is improved from 78.00% to 84.14%. Second, heteroscedastic linear discriminant analysis (HLDA) is adopted to enhance the discriminability of the acoustic model, which successfully increases the average CC from 84.14% to 84.62%. Additionally, HLDA causes the scoring accuracy to be more stable at various pronunciation proficiency levels, and thus leads to an increase in the speaker correct-rank rate from 85.59% to 90.99%. Finally, we use maximum a posteriori (MAP) estimation to tune the acoustic model to fit strongly accented test speech. As a result, the average CC is improved from 84.62% to 86.57%. These three novel techniques improve the accuracy of evaluating pronunciation quality.

  5. Logopenic and Nonfluent Variants of Primary Progressive Aphasia Are Differentiated by Acoustic Measures of Speech Production

    PubMed Central

    Ballard, Kirrie J.; Savage, Sharon; Leyton, Cristian E.; Vogel, Adam P.; Hornberger, Michael; Hodges, John R.

    2014-01-01

    Differentiation of logopenic (lvPPA) and nonfluent/agrammatic (nfvPPA) variants of Primary Progressive Aphasia is important yet remains challenging since it hinges on expert based evaluation of speech and language production. In this study acoustic measures of speech in conjunction with voxel-based morphometry were used to determine the success of the measures as an adjunct to diagnosis and to explore the neural basis of apraxia of speech in nfvPPA. Forty-one patients (21 lvPPA, 20 nfvPPA) were recruited from a consecutive sample with suspected frontotemporal dementia. Patients were diagnosed using the current gold-standard of expert perceptual judgment, based on presence/absence of particular speech features during speaking tasks. Seventeen healthy age-matched adults served as controls. MRI scans were available for 11 control and 37 PPA cases; 23 of the PPA cases underwent amyloid ligand PET imaging. Measures, corresponding to perceptual features of apraxia of speech, were periods of silence during reading and relative vowel duration and intensity in polysyllable word repetition. Discriminant function analyses revealed that a measure of relative vowel duration differentiated nfvPPA cases from both control and lvPPA cases (r2 = 0.47) with 88% agreement with expert judgment of presence of apraxia of speech in nfvPPA cases. VBM analysis showed that relative vowel duration covaried with grey matter intensity in areas critical for speech motor planning and programming: precentral gyrus, supplementary motor area and inferior frontal gyrus bilaterally, only affected in the nfvPPA group. This bilateral involvement of frontal speech networks in nfvPPA potentially affects access to compensatory mechanisms involving right hemisphere homologues. Measures of silences during reading also discriminated the PPA and control groups, but did not increase predictive accuracy. Findings suggest that a measure of relative vowel duration from of a polysyllable word repetition task

  6. Logopenic and nonfluent variants of primary progressive aphasia are differentiated by acoustic measures of speech production.

    PubMed

    Ballard, Kirrie J; Savage, Sharon; Leyton, Cristian E; Vogel, Adam P; Hornberger, Michael; Hodges, John R

    2014-01-01

    Differentiation of logopenic (lvPPA) and nonfluent/agrammatic (nfvPPA) variants of Primary Progressive Aphasia is important yet remains challenging since it hinges on expert based evaluation of speech and language production. In this study acoustic measures of speech in conjunction with voxel-based morphometry were used to determine the success of the measures as an adjunct to diagnosis and to explore the neural basis of apraxia of speech in nfvPPA. Forty-one patients (21 lvPPA, 20 nfvPPA) were recruited from a consecutive sample with suspected frontotemporal dementia. Patients were diagnosed using the current gold-standard of expert perceptual judgment, based on presence/absence of particular speech features during speaking tasks. Seventeen healthy age-matched adults served as controls. MRI scans were available for 11 control and 37 PPA cases; 23 of the PPA cases underwent amyloid ligand PET imaging. Measures, corresponding to perceptual features of apraxia of speech, were periods of silence during reading and relative vowel duration and intensity in polysyllable word repetition. Discriminant function analyses revealed that a measure of relative vowel duration differentiated nfvPPA cases from both control and lvPPA cases (r(2) = 0.47) with 88% agreement with expert judgment of presence of apraxia of speech in nfvPPA cases. VBM analysis showed that relative vowel duration covaried with grey matter intensity in areas critical for speech motor planning and programming: precentral gyrus, supplementary motor area and inferior frontal gyrus bilaterally, only affected in the nfvPPA group. This bilateral involvement of frontal speech networks in nfvPPA potentially affects access to compensatory mechanisms involving right hemisphere homologues. Measures of silences during reading also discriminated the PPA and control groups, but did not increase predictive accuracy. Findings suggest that a measure of relative vowel duration from of a polysyllable word repetition task

  7. Integration of auditory and somatosensory error signals in the neural control of speech movements.

    PubMed

    Feng, Yongqiang; Gracco, Vincent L; Max, Ludo

    2011-08-01

    We investigated auditory and somatosensory feedback contributions to the neural control of speech. In task I, sensorimotor adaptation was studied by perturbing one of these sensory modalities or both modalities simultaneously. The first formant (F1) frequency in the auditory feedback was shifted up by a real-time processor and/or the extent of jaw opening was increased or decreased with a force field applied by a robotic device. All eight subjects lowered F1 to compensate for the up-shifted F1 in the feedback signal regardless of whether or not the jaw was perturbed. Adaptive changes in subjects' acoustic output resulted from adjustments in articulatory movements of the jaw or tongue. Adaptation in jaw opening extent in response to the mechanical perturbation occurred only when no auditory feedback perturbation was applied or when the direction of adaptation to the force was compatible with the direction of adaptation to a simultaneous acoustic perturbation. In tasks II and III, subjects' auditory and somatosensory precision and accuracy were estimated. Correlation analyses showed that the relationships 1) between F1 adaptation extent and auditory acuity for F1 and 2) between jaw position adaptation extent and somatosensory acuity for jaw position were weak and statistically not significant. Taken together, the combined findings from this work suggest that, in speech production, sensorimotor adaptation updates the underlying control mechanisms in such a way that the planning of vowel-related articulatory movements takes into account a complex integration of error signals from previous trials but likely with a dominant role for the auditory modality.

  8. Acoustic temporal modulation detection and speech perception in cochlear implant listeners.

    PubMed

    Won, Jong Ho; Drennan, Ward R; Nie, Kaibao; Jameyson, Elyse M; Rubinstein, Jay T

    2011-07-01

    The goals of the present study were to measure acoustic temporal modulation transfer functions (TMTFs) in cochlear implant listeners and examine the relationship between modulation detection and speech recognition abilities. The effects of automatic gain control, presentation level and number of channels on modulation detection thresholds (MDTs) were examined using the listeners' clinical sound processor. The general form of the TMTF was low-pass, consistent with previous studies. The operation of automatic gain control had no effect on MDTs when the stimuli were presented at 65 dBA. MDTs were not dependent on the presentation levels (ranging from 50 to 75 dBA) nor on the number of channels. Significant correlations were found between MDTs and speech recognition scores. The rates of decay of the TMTFs were predictive of speech recognition abilities. Spectral-ripple discrimination was evaluated to examine the relationship between temporal and spectral envelope sensitivities. No correlations were found between the two measures, and 56% of the variance in speech recognition was predicted jointly by the two tasks. The present study suggests that temporal modulation detection measured with the sound processor can serve as a useful measure of the ability of clinical sound processing strategies to deliver clinically pertinent temporal information.

  9. The role of metrical information in apraxia of speech. Perceptual and acoustic analyses of word stress.

    PubMed

    Aichert, Ingrid; Späth, Mona; Ziegler, Wolfram

    2016-02-01

    Several factors are known to influence speech accuracy in patients with apraxia of speech (AOS), e.g., syllable structure or word length. However, the impact of word stress has largely been neglected so far. More generally, the role of prosodic information at the phonetic encoding stage of speech production often remains unconsidered in models of speech production. This study aimed to investigate the influence of word stress on error production in AOS. Two-syllabic words with stress on the first (trochees) vs. the second syllable (iambs) were compared in 14 patients with AOS, three of them exhibiting pure AOS, and in a control group of six normal speakers. The patients produced significantly more errors on iambic than on trochaic words. A most prominent metrical effect was obtained for segmental errors. Acoustic analyses of word durations revealed a disproportionate advantage of the trochaic meter in the patients relative to the healthy controls. The results indicate that German apraxic speakers are sensitive to metrical information. It is assumed that metrical patterns function as prosodic frames for articulation planning, and that the regular metrical pattern in German, the trochaic form, has a facilitating effect on word production in patients with AOS.

  10. Pulse analysis of acoustic emission signals

    NASA Technical Reports Server (NTRS)

    Houghton, J. R.; Packman, P. F.

    1977-01-01

    A method for the signature analysis of pulses in the frequency domain and the time domain is presented. Fourier spectrum, Fourier transfer function, shock spectrum and shock spectrum ratio were examined in the frequency domain analysis and pulse shape deconvolution was developed for use in the time domain analysis. Comparisons of the relative performance of each analysis technique are made for the characterization of acoustic emission pulses recorded by a measuring system. To demonstrate the relative sensitivity of each of the methods to small changes in the pulse shape, signatures of computer modeled systems with analytical pulses are presented. Optimization techniques are developed and used to indicate the best design parameter values for deconvolution of the pulse shape. Several experiments are presented that test the pulse signature analysis methods on different acoustic emission sources. These include acoustic emission associated with (a) crack propagation, (b) ball dropping on a plate, (c) spark discharge, and (d) defective and good ball bearings. Deconvolution of the first few micro-seconds of the pulse train is shown to be the region in which the significant signatures of the acoustic emission event are to be found.

  11. Pulse analysis of acoustic emission signals

    NASA Technical Reports Server (NTRS)

    Houghton, J. R.; Packman, P. F.

    1977-01-01

    A method for the signature analysis of pulses in the frequency domain and the time domain is presented. Fourier spectrum, Fourier transfer function, shock spectrum and shock spectrum ratio were examined in the frequency domain analysis, and pulse shape deconvolution was developed for use in the time domain analysis. Comparisons of the relative performance of each analysis technique are made for the characterization of acoustic emission pulses recorded by a measuring system. To demonstrate the relative sensitivity of each of the methods to small changes in the pulse shape, signatures of computer modeled systems with analytical pulses are presented. Optimization techniques are developed and used to indicate the best design parameters values for deconvolution of the pulse shape. Several experiments are presented that test the pulse signature analysis methods on different acoustic emission sources. These include acoustic emissions associated with: (1) crack propagation, (2) ball dropping on a plate, (3) spark discharge and (4) defective and good ball bearings. Deconvolution of the first few micro-seconds of the pulse train are shown to be the region in which the significant signatures of the acoustic emission event are to be found.

  12. Signal Processing Aspects of Nonlinear Acoustics.

    DTIC Science & Technology

    1980-07-07

    D. F., and Widener, M. W.: 1979, " PARRAY Technology Papers Presented at Scientific and Technical Meetings," Applied Research Laboratories Technical...Report No. 79-4, Applied Research Laboratories, The University of Texas at Austin. AD A077 726. 19. Goldsberry, T. G.: 1979, "The PARRAY as an Acoustic

  13. Emotion in speech: the acoustic attributes of fear, anger, sadness, and joy.

    PubMed

    Sobin, C; Alpert, M

    1999-07-01

    Decoders can detect emotion in voice with much greater accuracy than can be achieved by objective acoustic analysis. Studies that have established this advantage, however, used methods that may have favored decoders and disadvantaged acoustic analysis. In this study, we applied several methodologic modifications for the analysis of the acoustic differentiation of fear, anger, sadness, and joy. Thirty-one female subjects between the ages of 18 and 35 (encoders) were audio-recorded during an emotion-induction procedure and produced a total of 620 emotion-laden sentences. Twelve female judges (decoders), three for each of the four emotions, were assigned to rate the intensity of one emotion each. Their combined ratings were used to select 38 prototype samples per emotion. Past acoustic findings were replicated, and increased acoustic differentiation among the emotions was achieved. Multiple regression analysis suggested that some, although not all, of the acoustic variables were associated with decoders' ratings. Signal detection analysis gave some insight into this disparity. However, the analysis of the classic constellation of acoustic variables may not completely capture the acoustic features that influence decoders' ratings. Future analyses would likely benefit from the parallel assessment of respiration, phonation, and articulation.

  14. A Bayesian view on acoustic model-based techniques for robust speech recognition

    NASA Astrophysics Data System (ADS)

    Maas, Roland; Huemmer, Christian; Sehr, Armin; Kellermann, Walter

    2015-12-01

    This article provides a unifying Bayesian view on various approaches for acoustic model adaptation, missing feature, and uncertainty decoding that are well-known in the literature of robust automatic speech recognition. The representatives of these classes can often be deduced from a Bayesian network that extends the conventional hidden Markov models used in speech recognition. These extensions, in turn, can in many cases be motivated from an underlying observation model that relates clean and distorted feature vectors. By identifying and converting the observation models into a Bayesian network representation, we formulate the corresponding compensation rules. We thus summarize the various approaches as approximations or modifications of the same Bayesian decoding rule leading to a unified view on known derivations as well as to new formulations for certain approaches.

  15. Environmental Acoustic Transfer Functions and the Filtering of Acoustic Signals

    DTIC Science & Technology

    2005-03-21

    function by the Sturm - Liouville Theorem (7). Then the left-hand side of the inner product equation is*X l;m;n Kl;m;nFl (z;H)Fm (y;L)Fn (x;W )l;m;nc...results of this thesis enable us to determine under which conditions a �ltering operation can successfully be performed on a set of received signals...signal being propagated at a location ~x0, and so the use of the Dirac delta function is appropriate in the use of a forcing function. A time-dependent

  16. Fluctuations of Broadband Acoustic Signals in Shallow Water

    DTIC Science & Technology

    2015-09-30

    Signals in Shallow Water Mohsen Badiey College of Earth, Ocean, and Environment University of Delaware Newark, DE 19716 Phone: (302) 831-3687 Fax...refraction, and scattering in shallow water and coastal regions in the presence of temporal and spatial ocean variability. OBJECTIVES The scientific...of water column and dynamic sea surface variability, as well as source/receiver motion on acoustic wave propagation for underwater acoustic

  17. Acoustic Aspects of Photoacoustic Signal Generation and Detection in Gases

    NASA Astrophysics Data System (ADS)

    Miklós, A.

    2015-09-01

    In this paper photoacoustic signal generation and detection in gases is investigated and discussed from the standpoint of acoustics. Four topics are considered: the effect of the absorption-desorption process of modulated and pulsed light on the heat power density released in the gas; the generation of the primary sound by the released heat in an unbounded medium; the excitation of an acoustic resonator by the primary sound; and finally, the generation of the measurable PA signal by a microphone. When light is absorbed by a molecule and the excess energy is relaxed by collisions with the surrounding molecules, the average kinetic energy, thus also the temperature of an ensemble of molecules (called "particle" in acoustics) will increase. In other words heat energy is added to the energy of the particle. The rate of the energy transfer is characterized by the heat power density. A simple two-level model of absorption-desorption is applied for describing the heat power generation process for modulated and pulsed illumination. Sound generation by a laser beam in an unbounded medium is discussed by means of the Green's function technique. It is shown that the duration of the generated sound pulse depends mostly on beam geometry. A photoacoustic signal is mostly detected in a photoacoustic cell composed of acoustic resonators, buffers, filters, etc. It is not easy to interpret the measured PA signal in such a complicated acoustic system. The acoustic response of a PA detector to different kinds of excitations (modulated cw, pulsed, periodic pulse train) is discussed. It is shown that acoustic resonators respond very differently to modulated cw excitation and to excitation by a pulse train. The microphone for detecting the PA signal is also a part of the acoustic system; its properties have to be taken into account by the design of a PA detector. The moving membrane of the microphone absorbs acoustic energy; thus, it may influence the resonance frequency and

  18. Finding Acoustic Regularities in Speech: Applications to Phonetic Recognition

    DTIC Science & Technology

    1988-12-01

    three words ’seven,’ ’less,’ and ’ten.’ The second /E/ is influenced by the adjacent /1/, such that the second formant shows articulatory undershoot...while the third / / is heavily nasalized, as evidenced by the smearing of the first formant . Other examples of coarticulation may be found in the...signal. Often, for example, the phoneme /t/ in a word such as ’butter’ is realized as a flap, which is a quick, contin- uous gesture of the tongue tip

  19. The effects of selective attention and speech acoustics on neural speech-tracking in a multi-talker scene.

    PubMed

    Rimmele, Johanna M; Zion Golumbic, Elana; Schröger, Erich; Poeppel, David

    2015-07-01

    Attending to one speaker in multi-speaker situations is challenging. One neural mechanism proposed to underlie the ability to attend to a particular speaker is phase-locking of low-frequency activity in auditory cortex to speech's temporal envelope ("speech-tracking"), which is more precise for attended speech. However, it is not known what brings about this attentional effect, and specifically if it reflects enhanced processing of the fine structure of attended speech. To investigate this question we compared attentional effects on speech-tracking of natural versus vocoded speech which preserves the temporal envelope but removes the fine structure of speech. Pairs of natural and vocoded speech stimuli were presented concurrently and participants attended to one stimulus and performed a detection task while ignoring the other stimulus. We recorded magnetoencephalography (MEG) and compared attentional effects on the speech-tracking response in auditory cortex. Speech-tracking of natural, but not vocoded, speech was enhanced by attention, whereas neural tracking of ignored speech was similar for natural and vocoded speech. These findings suggest that the more precise speech-tracking of attended natural speech is related to processing its fine structure, possibly reflecting the application of higher-order linguistic processes. In contrast, when speech is unattended its fine structure is not processed to the same degree and thus elicits less precise speech-tracking more similar to vocoded speech.

  20. Acoustic measurements through analysis of binaural recordings of speech and music

    NASA Astrophysics Data System (ADS)

    Griesinger, David

    2004-10-01

    This paper will present and demonstrate some recent work on the measurement of acoustic properties from binaural recordings of live performances. It is found that models of the process of stream formation can be used to measure intelligibility, and, when combined with band-limited running cross-correlation, can be used to measure spaciousness and envelopment. Analysis of the running cross correlation during sound onsets can be used to measure the accuracy of azimuth perception. It is additionally found that the ease of detecting fundamental pitch from the upper partials of speech and music can be used as a measure of sound quality, particularly for solo instruments and singers.

  1. Temporal acoustic measures distinguish primary progressive apraxia of speech from primary progressive aphasia.

    PubMed

    Duffy, Joseph R; Hanley, Holly; Utianski, Rene; Clark, Heather; Strand, Edythe; Josephs, Keith A; Whitwell, Jennifer L

    2017-02-07

    The purpose of this study was to determine if acoustic measures of duration and syllable rate during word and sentence repetition, and a measure of within-word lexical stress, distinguish speakers with primary progressive apraxia of speech (PPAOS) from nonapraxic speakers with the agrammatic or logopenic variants of primary progressive aphasia (PPA), and control speakers. Results revealed that the PPAOS group had longer durations and reduced rate of syllable production for most words and sentences, and the measure of lexical stress. Sensitivity and specificity indices for the PPAOS versus the other groups were highest for longer multisyllabic words and sentences. For the PPAOS group, correlations between acoustic measures and perceptual ratings of AOS were moderately high to high. Several temporal measures used in this study may aid differential diagnosis and help quantify features of PPAOS that are distinct from those associated with PPA in which AOS is not present.

  2. The effects of selective attention and speech acoustics on neural speech-tracking in a multi-talker scene

    PubMed Central

    Rimmele, Johanna M.; Golumbic, Elana Zion; Schröger, Erich; Poeppel, David

    2015-01-01

    Attending to one speaker in multi-speaker situations is challenging. One neural mechanism proposed to underlie the ability to attend to a particular speaker is phase-locking of low-frequency activity in auditory cortex to speech’s temporal envelope (“speech-tracking”), which is more precise for attended speech. However, it is not known what brings about this attentional effect, and specifically if it reflects enhanced processing of the fine structure of attended speech. To investigate this question we compared attentional effects on speech-tracking of natural vs. vocoded speech which preserves the temporal envelope but removes the fine-structure of speech. Pairs of natural and vocoded speech stimuli were presented concurrently and participants attended to one stimulus and performed a detection task while ignoring the other stimulus. We recorded magnetoencephalography (MEG) and compared attentional effects on the speech-tracking response in auditory cortex. Speech-tracking of natural, but not vocoded, speech was enhanced by attention, whereas neural tracking of ignored speech was similar for natural and vocoded speech. These findings suggest that the more precise speech tracking of attended natural speech is related to processing its fine structure, possibly reflecting the application of higher-order linguistic processes. In contrast, when speech is unattended its fine structure is not processed to the same degree and thus elicits less precise speech tracking more similar to vocoded speech. PMID:25650107

  3. A physiologically-inspired model reproducing the speech intelligibility benefit in cochlear implant listeners with residual acoustic hearing.

    PubMed

    Zamaninezhad, Ladan; Hohmann, Volker; Büchner, Andreas; Schädler, Marc René; Jürgens, Tim

    2017-02-01

    This study introduces a speech intelligibility model for cochlear implant users with ipsilateral preserved acoustic hearing that aims at simulating the observed speech-in-noise intelligibility benefit when receiving simultaneous electric and acoustic stimulation (EA-benefit). The model simulates the auditory nerve spiking in response to electric and/or acoustic stimulation. The temporally and spatially integrated spiking patterns were used as the final internal representation of noisy speech. Speech reception thresholds (SRTs) in stationary noise were predicted for a sentence test using an automatic speech recognition framework. The model was employed to systematically investigate the effect of three physiologically relevant model factors on simulated SRTs: (1) the spatial spread of the electric field which co-varies with the number of electrically stimulated auditory nerves, (2) the "internal" noise simulating the deprivation of auditory system, and (3) the upper bound frequency limit of acoustic hearing. The model results show that the simulated SRTs increase monotonically with increasing spatial spread for fixed internal noise, and also increase with increasing the internal noise strength for a fixed spatial spread. The predicted EA-benefit does not follow such a systematic trend and depends on the specific combination of the model parameters. Beyond 300 Hz, the upper bound limit for preserved acoustic hearing is less influential on speech intelligibility of EA-listeners in stationary noise. The proposed model-predicted EA-benefits are within the range of EA-benefits shown by 18 out of 21 actual cochlear implant listeners with preserved acoustic hearing.

  4. Acoustic Context Alters Vowel Categorization in Perception of Noise-Vocoded Speech.

    PubMed

    Stilp, Christian E

    2017-03-09

    Normal-hearing listeners' speech perception is widely influenced by spectral contrast effects (SCEs), where perception of a given sound is biased away from stable spectral properties of preceding sounds. Despite this influence, it is not clear how these contrast effects affect speech perception for cochlear implant (CI) users whose spectral resolution is notoriously poor. This knowledge is important for understanding how CIs might better encode key spectral properties of the listening environment. Here, SCEs were measured in normal-hearing listeners using noise-vocoded speech to simulate poor spectral resolution. Listeners heard a noise-vocoded sentence where low-F1 (100-400 Hz) or high-F1 (550-850 Hz) frequency regions were amplified to encourage "eh" (/ɛ/) or "ih" (/ɪ/) responses to the following target vowel, respectively. This was done by filtering with +20 dB (experiment 1a) or +5 dB gain (experiment 1b) or filtering using 100 % of the difference between spectral envelopes of /ɛ/ and /ɪ/ endpoint vowels (experiment 2a) or only 25 % of this difference (experiment 2b). SCEs influenced identification of noise-vocoded vowels in each experiment at every level of spectral resolution. In every case but one, SCE magnitudes exceeded those reported for full-spectrum speech, particularly when spectral peaks in the preceding sentence were large (+20 dB gain, 100 % of the spectral envelope difference). Even when spectral resolution was insufficient for accurate vowel recognition, SCEs were still evident. Results are suggestive of SCEs influencing CI users' speech perception as well, encouraging further investigation of CI users' sensitivity to acoustic context.

  5. Model-based speech enhancement using a bone-conducted signal.

    PubMed

    Kechichian, Patrick; Srinivasan, Sriram

    2012-03-01

    Codebook-based single-microphone noise suppressors, which exploit prior knowledge about speech and noise statistics, provide better performance in nonstationary noise. However, as the enhancement involves a joint optimization over speech and noise codebooks, this results in high computational complexity. A codebook-based method is proposed that uses a reference signal observed by a bone-conduction microphone, and a mapping between air- and bone-conduction codebook entries generated during an offline training phase. A smaller subset of air-conducted speech codebook entries that accurately models the clean speech signal is selected using this reference signal. Experiments support the expected improvement in performance at low computational complexity.

  6. Prosodic Influences on Speech Production in Children with Specific Language Impairment and Speech Deficits: Kinematic, Acoustic, and Transcription Evidence.

    ERIC Educational Resources Information Center

    Goffman, Lisa

    1999-01-01

    In this study, seven children with specific language impairment (SLI) and speech deficits were matched with same age peers and evaluated for iambic (weak-strong) and trochaic (strong-weak) prosodic speech forms. Findings indicated that children with SLI and speech deficits show less mature segmental and speech motor systems, as well as decreased…

  7. Acoustic evaluation of short-term effects of repetitive transcranial magnetic stimulation on motor aspects of speech in Parkinson's disease.

    PubMed

    Eliasova, I; Mekyska, J; Kostalova, M; Marecek, R; Smekal, Z; Rektorova, I

    2013-04-01

    Hypokinetic dysarthria in Parkinson's disease (PD) can be characterized by monotony of pitch and loudness, reduced stress, variable rate, imprecise consonants, and a breathy and harsh voice. Using acoustic analysis, we studied the effects of high-frequency repetitive transcranial magnetic stimulation (rTMS) applied over the primary orofacial sensorimotor area (SM1) and the left dorsolateral prefrontal cortex (DLPFC) on motor aspects of voiced speech in PD. Twelve non-depressed and non-demented men with PD (mean age 64.58 ± 8.04 years, mean PD duration 10.75 ± 7.48 years) and 21 healthy age-matched men (a control group, mean age 64 ± 8.55 years) participated in the speech study. The PD patients underwent two sessions of 10 Hz rTMS over the dominant hemisphere with 2,250 stimuli/day in a random order: (1) over the SM1; (2) over the left DLPFC in the "on" motor state. Speech examination comprised the perceptual rating of global speech performance and an acoustic analysis based upon a standardized speech task. The Mann-Whitney U test was used to compare acoustic speech variables between controls and PD patients. The Wilcoxon test was used to compare data prior to and after each stimulation in the PD group. rTMS applied over the left SM1 was associated with a significant increase in harmonic-to-noise ratio and net speech rate in the sentence tasks. With respect to the vowel task results, increased median values and range of Teager-Kaiser energy operator, increased vowel space area, and significant jitter decrease were observed after the left SM1 stimulation. rTMS over the left DLPFC did not induce any significant effects. The positive results of acoustic analysis were not reflected in a subjective rating of speech performance quality as assessed by a speech therapist. Our pilot results indicate that one session of rTMS applied over the SM1 may lead to measurable improvement in voice quality and intensity and an increase in speech rate and tongue movements

  8. PREDICTIVE MODELING OF ACOUSTIC SIGNALS FROM THERMOACOUSTIC POWER SENSORS (TAPS)

    SciTech Connect

    Dumm, Christopher M.; Vipperman, Jeffrey S.

    2016-06-30

    Thermoacoustic Power Sensor (TAPS) technology offers the potential for self-powered, wireless measurement of nuclear reactor core operating conditions. TAPS are based on thermoacoustic engines, which harness thermal energy from fission reactions to generate acoustic waves by virtue of gas motion through a porous stack of thermally nonconductive material. TAPS can be placed in the core, where they generate acoustic waves whose frequency and amplitude are proportional to the local temperature and radiation flux, respectively. TAPS acoustic signals are not measured directly at the TAPS; rather, they propagate wirelessly from an individual TAPS through the reactor, and ultimately to a low-power receiver network on the vessel’s exterior. In order to rely on TAPS as primary instrumentation, reactor-specific models which account for geometric/acoustic complexities in the signal propagation environment must be used to predict the amplitude and frequency of TAPS signals at receiver locations. The reactor state may then be derived by comparing receiver signals to the reference levels established by predictive modeling. In this paper, we develop and experimentally benchmark a methodology for predictive modeling of the signals generated by a TAPS system, with the intent of subsequently extending these efforts to modeling of TAPS in a liquid sodium environmen

  9. A Frame-Based Context-Dependent Acoustic Modeling for Speech Recognition

    NASA Astrophysics Data System (ADS)

    Terashima, Ryuta; Zen, Heiga; Nankaku, Yoshihiko; Tokuda, Keiichi

    We propose a novel acoustic model for speech recognition, named FCD (Frame-based Context Dependent) model. It can obtain a probability distribution by using a top-down clustering technique to simultaneously consider the local frame position in phoneme, phoneme duration, and phoneme context. The model topology is derived from connecting left-to-right HMM models without self-loop transition for each phoneme duration. Because the FCD model can change the probability distribution into a sequence corresponding with one phoneme duration, it can has the ability to generate a smooth trajectory of speech feature vector. We also performed an experiment to evaluate the performance of speech recognition for the model. In the experiment, 132 questions for frame position, 66 questions for phoneme duration and 134 questions for phoneme context were used to train the sub-phoneme FCD model. In order to compare the performance, left-to-right HMM and two types of HSMM models with almost same number of states were also trained. As a result, 18% of relative improvement of tri-phone accuracy was achieved by the FCD model.

  10. The effect of intertalker speech rate variation on acoustic vowel space.

    PubMed

    Tsao, Ying-Chiao; Weismer, Gary; Iqbal, Kamran

    2006-02-01

    The present study aimed to examine the size of the acoustic vowel space in talkers who had previously been identified as having slow and fast habitual speaking rates [Tsao, Y.-C. and Weismer, G. (1997) J. Speech Lang. Hear. Res. 40, 858-866]. Within talkers, it is fairly well known that faster speaking rates result in a compression of the vowel space relative to that measured for slower rates, so the current study was completed to determine if the same differences in the size of the vowel space occur across talkers who differ significantly in their habitual speaking rates. Results indicated that there was no difference in the average size of the vowel space for slow vs fast talkers, and no relationship across talkers between vowel duration and formant frequencies. One difference between the slow and fast talkers was in intertalker variability of the vowel spaces, which was clearly greater for the slow talkers, for both speaker sexes. Results are discussed relative to theories of speech production and vowel normalization in speech perception.

  11. Fluctuations of Broadband Acoustic Signals in Shallow Water

    DTIC Science & Technology

    2010-09-30

    DISTRIBUTION STATEMENT A: Distribution approved...for public release; distribution is unlimited. Fluctuations of Broadband Acoustic Signals in Shallow Water Mohsen Badiey College of Earth, Ocean...AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S) 11. SPONSOR/MONITOR’S REPORT NUMBER(S) 12. DISTRIBUTION /AVAILABILITY STATEMENT Approved for

  12. The effect of different cochlear implant microphones on acoustic hearing individuals’ binaural benefits for speech perception in noise

    PubMed Central

    Aronoff, Justin M.; Freed, Daniel J.; Fisher, Laurel M.; Pal, Ivan; Soli, Sigfrid D.

    2011-01-01

    Objectives Cochlear implant microphones differ in placement, frequency response, and other characteristics such as whether they are directional. Although normal hearing individuals are often used as controls in studies examining cochlear implant users’ binaural benefits, the considerable differences across cochlear implant microphones make such comparisons potentially misleading. The goal of this study was to examine binaural benefits for speech perception in noise for normal hearing individuals using stimuli processed by head-related transfer functions (HRTFs) based on the different cochlear implant microphones. Design HRTFs were created for different cochlear implant microphones and used to test participants on the Hearing in Noise Test. Experiment 1 tested cochlear implant users and normal hearing individuals with HRTF-processed stimuli and with sound field testing to determine whether the HRTFs adequately simulated sound field testing. Experiment 2 determined the measurement error and performance-intensity function for the Hearing in Noise Test with normal hearing individuals listening to stimuli processed with the various HRTFs. Experiment 3 compared normal hearing listeners’ performance across HRTFs to determine how the HRTFs affected performance. Experiment 4 evaluated binaural benefits for normal hearing listeners using the various HRTFs, including ones that were modified to investigate the contributions of interaural time and level cues. Results The results indicated that the HRTFs adequately simulated sound field testing for the Hearing in Noise Test. They also demonstrated that the test-retest reliability and performance-intensity function were consistent across HRTFs, and that the measurement error for the test was 1.3 dB, with a change in signal-to-noise ratio of 1 dB reflecting a 10% change in intelligibility. There were significant differences in performance when using the various HRTFs, with particularly good thresholds for the HRTF based on the

  13. Acoustic and electric signals from lightning

    NASA Technical Reports Server (NTRS)

    Balachandran, N. K.

    1983-01-01

    Observations of infrasound apparently generated by the collapse of the electrostatic field in the thundercloud, are presented along with electric field measurements and high-frequency thunder signals. The frequency of the infrasound pulse is about 1 Hz and amplitude a few microbars. The observations seem to confirm some of the theoretical predictions of Wilson (1920) and Dessler (1973). The signal is predominated by a compressional phase and seems to be beamed vertically. Calculation of the parameters of the charged region using the infrasound signal give reasonable values.

  14. Acoustic diagnosis of pulmonary hypertension: automated speech- recognition-inspired classification algorithm outperforms physicians

    PubMed Central

    Kaddoura, Tarek; Vadlamudi, Karunakar; Kumar, Shine; Bobhate, Prashant; Guo, Long; Jain, Shreepal; Elgendi, Mohamed; Coe, James Y; Kim, Daniel; Taylor, Dylan; Tymchak, Wayne; Schuurmans, Dale; Zemp, Roger J.; Adatia, Ian

    2016-01-01

    We hypothesized that an automated speech- recognition-inspired classification algorithm could differentiate between the heart sounds in subjects with and without pulmonary hypertension (PH) and outperform physicians. Heart sounds, electrocardiograms, and mean pulmonary artery pressures (mPAp) were recorded simultaneously. Heart sound recordings were digitized to train and test speech-recognition-inspired classification algorithms. We used mel-frequency cepstral coefficients to extract features from the heart sounds. Gaussian-mixture models classified the features as PH (mPAp ≥ 25 mmHg) or normal (mPAp < 25 mmHg). Physicians blinded to patient data listened to the same heart sound recordings and attempted a diagnosis. We studied 164 subjects: 86 with mPAp ≥ 25 mmHg (mPAp 41 ± 12 mmHg) and 78 with mPAp < 25 mmHg (mPAp 17 ± 5 mmHg) (p  < 0.005). The correct diagnostic rate of the automated speech-recognition-inspired algorithm was 74% compared to 56% by physicians (p = 0.005). The false positive rate for the algorithm was 34% versus 50% (p = 0.04) for clinicians. The false negative rate for the algorithm was 23% and 68% (p = 0.0002) for physicians. We developed an automated speech-recognition-inspired classification algorithm for the acoustic diagnosis of PH that outperforms physicians that could be used to screen for PH and encourage earlier specialist referral. PMID:27609672

  15. Acoustic diagnosis of pulmonary hypertension: automated speech- recognition-inspired classification algorithm outperforms physicians

    NASA Astrophysics Data System (ADS)

    Kaddoura, Tarek; Vadlamudi, Karunakar; Kumar, Shine; Bobhate, Prashant; Guo, Long; Jain, Shreepal; Elgendi, Mohamed; Coe, James Y.; Kim, Daniel; Taylor, Dylan; Tymchak, Wayne; Schuurmans, Dale; Zemp, Roger J.; Adatia, Ian

    2016-09-01

    We hypothesized that an automated speech- recognition-inspired classification algorithm could differentiate between the heart sounds in subjects with and without pulmonary hypertension (PH) and outperform physicians. Heart sounds, electrocardiograms, and mean pulmonary artery pressures (mPAp) were recorded simultaneously. Heart sound recordings were digitized to train and test speech-recognition-inspired classification algorithms. We used mel-frequency cepstral coefficients to extract features from the heart sounds. Gaussian-mixture models classified the features as PH (mPAp ≥ 25 mmHg) or normal (mPAp < 25 mmHg). Physicians blinded to patient data listened to the same heart sound recordings and attempted a diagnosis. We studied 164 subjects: 86 with mPAp ≥ 25 mmHg (mPAp 41 ± 12 mmHg) and 78 with mPAp < 25 mmHg (mPAp 17 ± 5 mmHg) (p  < 0.005). The correct diagnostic rate of the automated speech-recognition-inspired algorithm was 74% compared to 56% by physicians (p = 0.005). The false positive rate for the algorithm was 34% versus 50% (p = 0.04) for clinicians. The false negative rate for the algorithm was 23% and 68% (p = 0.0002) for physicians. We developed an automated speech-recognition-inspired classification algorithm for the acoustic diagnosis of PH that outperforms physicians that could be used to screen for PH and encourage earlier specialist referral.

  16. Link Budget Analysis for Undersea Acoustic Signaling

    DTIC Science & Technology

    2002-06-01

    wireless communications for estimating signal-to- noise ratio ( SNR ) at the receiver. Link-budget analysis considers transmitter power, transmitter...is represented as an intermediate result called the channel SNR . The channel SNR includes ambient-noise and transmission-loss components. Several...to satellite and wireless communications for estimating signal-to-noise ratio ( SNR ) at the receiver. Link-budget analysis considers transmitter

  17. Transient Auditory Storage of Acoustic Details Is Associated with Release of Speech from Informational Masking in Reverberant Conditions

    ERIC Educational Resources Information Center

    Huang, Ying; Huang, Qiang; Chen, Xun; Wu, Xihong; Li, Liang

    2009-01-01

    Perceptual integration of the sound directly emanating from the source with reflections needs both temporal storage and correlation computation of acoustic details. We examined whether the temporal storage is frequency dependent and associated with speech unmasking. In Experiment 1, a break in correlation (BIC) between interaurally correlated…

  18. Effect of Digital Frequency Compression (DFC) on Speech Recognition in Candidates for Combined Electric and Acoustic Stimulation (EAS)

    ERIC Educational Resources Information Center

    Gifford, Rene H.; Dorman, Michael F.; Spahr, Anthony J.; McKarns, Sharon A.

    2007-01-01

    Purpose: To compare the effects of conventional amplification (CA) and digital frequency compression (DFC) amplification on the speech recognition abilities of candidates for a partial-insertion cochlear implant, that is, candidates for combined electric and acoustic stimulation (EAS). Method: The participants were 6 patients whose audiometric…

  19. Comments on "Effects of Noise on Speech Production: Acoustic and Perceptual Analyses" [J. Acoust. Soc. Am. 84, 917-928 (1988)].

    PubMed

    Fitch, H

    1989-11-01

    The effect of background noise on speech production is an important issue, both from the practical standpoint of developing speech recognition algorithms and from the theoretical standpoint of understanding how speech is tuned to the environment in which it is spoken. Summers et al. [J. Acoust. Soc. Am. 84, 917-928 (1988]) address this issue by experimentally manipulating the level of noise delivered through headphones to two talkers and making several kinds of acoustic measurements on the resulting speech. They indicate that they have replicated effects on amplitude, duration, and pitch and have found effects on spectral tilt and first-formant frequency (F1). The authors regard these acoustic changes as effects in themselves rather than as consequences of a change in vocal effort, and thus treat equally the change in spectral tilt and the change in F1. In fact, the change in spectral tilt is a well-documented and understood consequence of the change in the glottal waveform, which is known to occur with increased effort. The situation with F1 is less clear and is made difficult by measurement problems. The bias in linear predictive coding (LPC) techniques related to two of the other changes-fundamental frequency and spectral tilt-is discussed.

  20. Using acoustic emission signals for monitoring of production processes.

    PubMed

    Tönshoff, H K; Jung, M; Männel, S; Rietz, W

    2000-07-01

    The systems for in-process quality assurance offer the possibility of estimating the workpiece quality during machining. Especially for finishing processes like grinding or turning of hardened steels, it is important to control the process continuously in order to avoid rejects and refinishing. This paper describes the use of on-line monitoring systems with process-integrated measurement of acoustic emission to evaluate hard turning and grinding processes. The correlation between acoustic emission signals and subsurface integrity is determined to analyse the progression of the processes and the workpiece quality.

  1. An ALE Meta-Analysis on the Audiovisual Integration of Speech Signals

    PubMed Central

    Erickson, Laura C.; Heeg, Elizabeth; Rauschecker, Josef P.; Turkeltaub, Peter E.

    2014-01-01

    The brain improves speech processing through the integration of audiovisual (AV) signals. Situations involving AV speech integration may be crudely dichotomized into those where auditory and visual inputs contain 1) equivalent, complementary signals (validating AV speech), or 2) inconsistent, different signals (conflicting AV speech). This simple framework may allow for the systematic examination of broad commonalities and differences between AV neural processes engaged by various experimental paradigms frequently used to study AV speech integration. We conducted an activation likelihood estimation (ALE) meta-analysis of 22 functional imaging studies comprising 33 experiments, 311 subjects, and 347 foci examining “conflicting” versus “validating” AV speech. Experimental paradigms included content congruency, timing synchrony, and perceptual measures, such as the McGurk effect or synchrony judgments, across AV speech stimulus types (sub-lexical to sentence). Co-localization of conflicting AV speech experiments revealed consistency across at least two contrast types (e.g., synchrony and congruency) in a network of dorsal-stream regions in the frontal, parietal, and temporal lobes. There was consistency across all contrast types (synchrony, congruency, and percept) in the bilateral posterior superior/middle temporal cortex. Although fewer studies were available, validating AV speech experiments were localized to other regions, such as ventral-stream visual areas in the occipital and inferior temporal cortex. These results suggest that while equivalent, complementary AV speech signals may evoke activity in regions related to the corroboration of sensory input, conflicting AV speech signals recruit widespread dorsal-stream areas likely involved in the resolution of conflicting sensory signals. PMID:24996043

  2. An acoustic key to eight languages/dialects: Factor analyses of critical-band-filtered speech

    PubMed Central

    Ueda, Kazuo; Nakajima, Yoshitaka

    2017-01-01

    The peripheral auditory system functions like a frequency analyser, often modelled as a bank of non-overlapping band-pass filters called critical bands; 20 bands are necessary for simulating frequency resolution of the ear within an ordinary frequency range of speech (up to 7,000 Hz). A far smaller number of filters seemed sufficient, however, to re-synthesise intelligible speech sentences with power fluctuations of the speech signals passing through them; nevertheless, the number and frequency ranges of the frequency bands for efficient speech communication are yet unknown. We derived four common frequency bands—covering approximately 50–540, 540–1,700, 1,700–3,300, and above 3,300 Hz—from factor analyses of spectral fluctuations in eight different spoken languages/dialects. The analyses robustly led to three factors common to all languages investigated—the low & mid-high factor related to the two separate frequency ranges of 50–540 and 1,700–3,300 Hz, the mid-low factor the range of 540–1,700 Hz, and the high factor the range above 3,300 Hz—in these different languages/dialects, suggesting a language universal. PMID:28198405

  3. An acoustic key to eight languages/dialects: Factor analyses of critical-band-filtered speech.

    PubMed

    Ueda, Kazuo; Nakajima, Yoshitaka

    2017-02-15

    The peripheral auditory system functions like a frequency analyser, often modelled as a bank of non-overlapping band-pass filters called critical bands; 20 bands are necessary for simulating frequency resolution of the ear within an ordinary frequency range of speech (up to 7,000 Hz). A far smaller number of filters seemed sufficient, however, to re-synthesise intelligible speech sentences with power fluctuations of the speech signals passing through them; nevertheless, the number and frequency ranges of the frequency bands for efficient speech communication are yet unknown. We derived four common frequency bands-covering approximately 50-540, 540-1,700, 1,700-3,300, and above 3,300 Hz-from factor analyses of spectral fluctuations in eight different spoken languages/dialects. The analyses robustly led to three factors common to all languages investigated-the low &mid-high factor related to the two separate frequency ranges of 50-540 and 1,700-3,300 Hz, the mid-low factor the range of 540-1,700 Hz, and the high factor the range above 3,300 Hz-in these different languages/dialects, suggesting a language universal.

  4. Modulation of Radio Frequency Signals by Nonlinearly Generated Acoustic Fields

    NASA Astrophysics Data System (ADS)

    Johnson, Spencer Joseph

    Acousto-electromagnetic scattering is a process in which an acoustic excitation is utilized to induce modulation on an electromagnetic (EM) wave. This phenomenon can be exploited in remote sensing and detection schemes whereby target objects are mechanically excited by high powered acoustic waves resulting in unique object characterizations when interrogated with EM signals. Implementation of acousto-EM sensing schemes, however, are limited by a lack of fundamental understanding of the nonlinear interaction between acoustic and EM waves and inefficient simulation methods in the determination of the radiation patterns of higher order scattered acoustic fields. To address the insufficient simulation issue, a computationally efficient mathematical model describing higher order scattered sound fields, particularly of third-order in which a 40x increase in computation speed is achieved, is derived using a multi-Gaussian beam (MGB) expansion that expresses the sound field of any arbitrary axially symmetric beam as a series of Gaussian base functions. The third-order intermodulation (IM3) frequency components are produced by considering the cascaded nonlinear second-order effects when analyzing the interaction between the first- and second-order frequency components during the nonlinear scattering of sound by sound from two noncollinear ultrasonic baffled piston sources. The theory is extended to the modeling of the sound beams generated by parametric transducer arrays, showing that the MGB model can be efficiently used to calculate both the second- and third-order sound fields of the array. Additionally, a near-to-far-field (NTFF) transformation method is developed to model the far-field characteristics of scattered sound fields, extending Kirchhoff's theorem, typically applied to EM waves, determining the far-field patterns of an acoustic source from amplitude and phase measurements made in the near-field by including the higher order sound fields generated by the

  5. The effects of acoustic attenuation in optoacoustic signals.

    PubMed

    Deán-Ben, X Luís; Razansky, Daniel; Ntziachristos, Vasilis

    2011-09-21

    In this paper, it is demonstrated that the effects of acoustic attenuation may play a significant role in establishing the quality of tomographic optoacoustic reconstructions. Accordingly, spatially dependent reduction of signal amplitude leads to quantification errors in the reconstructed distribution of the optical absorption coefficient while signal broadening causes loss of image resolution. Here we propose a correction algorithm for accounting for attenuation effects, which is applicable in both the time and frequency domains. It is further investigated which part of the optoacoustic signal spectrum is practically affected by those effects in realistic imaging scenarios. The validity and benefits of the suggested modelling and correction approaches are experimentally validated in phantom measurements.

  6. The sound of motion in spoken language: visual information conveyed by acoustic properties of speech.

    PubMed

    Shintel, Hadas; Nusbaum, Howard C

    2007-12-01

    Language is generally viewed as conveying information through symbols whose form is arbitrarily related to their meaning. This arbitrary relation is often assumed to also characterize the mental representations underlying language comprehension. We explore the idea that visuo-spatial information can be analogically conveyed through acoustic properties of speech and that such information is integrated into an analog perceptual representation as a natural part of comprehension. Listeners heard sentences describing objects, spoken at varying speaking rates. After each sentence, participants saw a picture of an object and judged whether it had been mentioned in the sentence. Participants were faster to recognize the object when motion implied by speaking rate matched the motion implied by the picture. Results suggest that visuo-spatial referential information can be analogically conveyed and represented.

  7. Recognition of Emotions in Mexican Spanish Speech: An Approach Based on Acoustic Modelling of Emotion-Specific Vowels

    PubMed Central

    Caballero-Morales, Santiago-Omar

    2013-01-01

    An approach for the recognition of emotions in speech is presented. The target language is Mexican Spanish, and for this purpose a speech database was created. The approach consists in the phoneme acoustic modelling of emotion-specific vowels. For this, a standard phoneme-based Automatic Speech Recognition (ASR) system was built with Hidden Markov Models (HMMs), where different phoneme HMMs were built for the consonants and emotion-specific vowels associated with four emotional states (anger, happiness, neutral, sadness). Then, estimation of the emotional state from a spoken sentence is performed by counting the number of emotion-specific vowels found in the ASR's output for the sentence. With this approach, accuracy of 87–100% was achieved for the recognition of emotional state of Mexican Spanish speech. PMID:23935410

  8. Recognition of emotions in Mexican Spanish speech: an approach based on acoustic modelling of emotion-specific vowels.

    PubMed

    Caballero-Morales, Santiago-Omar

    2013-01-01

    An approach for the recognition of emotions in speech is presented. The target language is Mexican Spanish, and for this purpose a speech database was created. The approach consists in the phoneme acoustic modelling of emotion-specific vowels. For this, a standard phoneme-based Automatic Speech Recognition (ASR) system was built with Hidden Markov Models (HMMs), where different phoneme HMMs were built for the consonants and emotion-specific vowels associated with four emotional states (anger, happiness, neutral, sadness). Then, estimation of the emotional state from a spoken sentence is performed by counting the number of emotion-specific vowels found in the ASR's output for the sentence. With this approach, accuracy of 87-100% was achieved for the recognition of emotional state of Mexican Spanish speech.

  9. [A modified speech enhancement algorithm for electronic cochlear implant and its digital signal processing realization].

    PubMed

    Wang, Yulin; Tian, Xuelong

    2014-08-01

    In order to improve the speech quality and auditory perceptiveness of electronic cochlear implant under strong noise background, a speech enhancement system used for electronic cochlear implant front-end was constructed. Taking digital signal processing (DSP) as the core, the system combines its multi-channel buffered serial port (McBSP) data transmission channel with extended audio interface chip TLV320AIC10, so speech signal acquisition and output with high speed are realized. Meanwhile, due to the traditional speech enhancement method which has the problems as bad adaptability, slow convergence speed and big steady-state error, versiera function and de-correlation principle were used to improve the existing adaptive filtering algorithm, which effectively enhanced the quality of voice communications. Test results verified the stability of the system and the de-noising performance of the algorithm, and it also proved that they could provide clearer speech signals for the deaf or tinnitus patients.

  10. What Information Is Necessary for Speech Categorization? Harnessing Variability in the Speech Signal by Integrating Cues Computed Relative to Expectations

    ERIC Educational Resources Information Center

    McMurray, Bob; Jongman, Allard

    2011-01-01

    Most theories of categorization emphasize how continuous perceptual information is mapped to categories. However, equally important are the informational assumptions of a model, the type of information subserving this mapping. This is crucial in speech perception where the signal is variable and context dependent. This study assessed the…

  11. Digital signal processing

    NASA Astrophysics Data System (ADS)

    Oppenheim, A. V.; Baggeroer, A. B.; Lim, J. S.; Musicus, B. R.; Mook, D. R.; Duckworth, G. L.; Bordley, T. E.; Curtis, S. R.; Deadrick, D. S.; Dove, W. P.

    1984-01-01

    Signal and image processing research projects are described. Topics include: (1) modeling underwater acoustic propagation; (2) image restoration; (3) signal reconstruction; (4) speech enhancement; (5) pitch detection; (6) spectral analysis; (7) speech synthesis; (8) speech enhancement; (9) autoregressive spectral estimation; (10) knowledge based array processing; (11) speech analysis; (12) estimating the degree of coronary stenosis with image processing; (13) automatic target detection; and (14) video conferencing.

  12. Speech Music Discrimination Using Class-Specific Features

    DTIC Science & Technology

    2004-08-01

    Speech Music Discrimination Using Class-Specific Features Thomas Beierholm...between speech and music . Feature extraction is class-specific and can therefore be tailored to each class meaning that segment size, model orders...interest. Some of the applications of audio signal classification are speech/ music classification [1], acoustical environmental classification [2][3

  13. Coupled Research in Ocean Acoustics and Signal Processing for the Next Generation of Underwater Acoustic Communication Systems

    DTIC Science & Technology

    2015-08-09

    Coupled Research in Ocean Acoustics and Signal Processing for the Next Generation of Underwater Acoustic Communication Systems 5a. CONTRACT NUMBER 5b...Processing for the Next Generation of Underwater Acoustic Communication Systems Principal Investigator’s Name: Dr. James Preisig Period Covered By...correlation structure of received communications signals after they have been converted to the frequency domain via Fourier Transforms as de- scribed in

  14. Coupled Research in Ocean Acoustics and Signal Processing for the Next Generation of Underwater Acoustic Communication Systems

    DTIC Science & Technology

    2016-08-05

    JPAnalytics LLC CC: DCMA Boston DTIC Director, NRL Progress Report #9 Coupled Research in Ocean Acoustics and Signal Processing for the Next Generation...of Underwater Acoustic Communication Systems Principal Investigator’s Name: Dr. James Preisig Period Covered By Report: 4/20/2016 to 7/19/2016 Report...lower dimensional structures in acoustic communications data, specifically fre- quency domain transformations of received communications signals, to

  15. Cross Spectral Analysis of Acoustic Signals

    DTIC Science & Technology

    1978-03-01

    this -- for ground flashes they measured peaks (after correc- tion for wind -noise) in the 40 to 80 HZ range. Some attenua- tion occurs due to...betweer r 2 - rI - v0 T and k: cos a = Cr2 - 1r - t(7) v0 is the wind velocity and usually is neglected. If this method is applied to signals received at... wind velocity, V t ,is ignored, and P(r 2 ,t) = P(rl,t + T), once again one can estimate the time lag 7 and use the time laq to find the source, using

  16. Cavitating vortex characterization based on acoustic signal detection

    NASA Astrophysics Data System (ADS)

    Digulescu, A.; Murgan, I.; Candel, I.; Bunea, F.; Ciocan, G.; Bucur, D. M.; Dunca, G.; Ioana, C.; Vasile, G.; Serbanescu, A.

    2016-11-01

    In hydraulic turbines operating at part loads, a cavitating vortex structure appears at runner outlet. This helical vortex, called vortex rope, can be cavitating in its core if the local pressure is lower that the vaporization pressure. An actual concern is the detection of the cavitation apparition and the characterization of its level. This paper presents a potentially innovative method for the detection of the cavitating vortex presence based on acoustic methods. The method is tested on a reduced scale facility using two acoustic transceivers positioned in ”V” configuration. The received signals were continuously recorded and their frequency content was chosen to fit the flow and the cavitating vortex. Experimental results showed that due to the increasing flow rate, the signal - vortex interaction is observed as modifications on the received signal's high order statistics and bandwidth. Also, the signal processing results were correlated with the data measured with a pressure sensor mounted in the cavitating vortex section. Finally it is shown that this non-intrusive acoustic approach can indicate the apparition, development and the damping of the cavitating vortex. For real scale facilities, applying this method is a work in progress.

  17. A Critical Examination of the Statistic Used for Processing Speech Signals.

    ERIC Educational Resources Information Center

    Knox, Keith

    This paper assesses certain properties of human mental processes by focusing on the tactics utilized in perceiving speech signals. Topics discussed in the paper include the power spectrum approach to fluctuations and noise, with particular reference to biological structures; "l/f-like" fluctuations in speech and music and the functioning of a…

  18. Statistical evidence that musical universals derive from the acoustic characteristics of human speech

    NASA Astrophysics Data System (ADS)

    Schwartz, David; Howe, Catharine; Purves, Dale

    2003-04-01

    Listeners of all ages and societies produce a similar consonance ordering of chromatic scale tone combinations. Despite intense interest in this perceptual phenomenon over several millennia, it has no generally accepted explanation in physical, psychological, or physiological terms. Here we show that the musical universal of consonance ordering can be understood in terms of the statistical relationship between a pattern of sound pressure at the ear and the possible generative sources of the acoustic energy pattern. Since human speech is the principal naturally occurring source of tone-evoking (i.e., periodic) sound energy for human listeners, we obtained normalized spectra from more than 100000 recorded speech segments. The probability distribution of amplitude/frequency combinations derived from these spectra predicts both the fundamental frequency ratios that define the chromatic scale intervals and the consonance ordering of chromatic scale tone combinations. We suggest that these observations reveal the statistical character of the perceptual process by which the auditory system guides biologically successful behavior in response to inherently ambiguous sound stimuli.

  19. Objective assessment of tracheoesophageal and esophageal speech using acoustic analysis of voice.

    PubMed

    Sirić, Ljiljana; Sos, Dario; Rosso, Marinela; Stevanović, Sinisa

    2012-11-01

    The aim of this study was to analyze the voice quality of alaryngeal tracheoesophageal and esophageal speech, and to determine which of them is more similar to laryngeal voice production, and thus more acceptable as a rehabilitation method of laryngectomized persons. Objective voice evaluation was performed on a sample of 20 totally laryngectomized subjects of both sexes, average age 61.3 years. Subjects were divided into two groups: 10 (50%) respondents with built tracheoesophageal prosthesis and 10 (50%) who acquired esophageal speech. Testing included 6 variables: 5 parameters of acoustic analysis of voice and one parameter of aerodynamic measurements. The obtained data was statistically analyzed by analysis of variance. Analysis of the data showed a statistically significant difference between the two groups in the terms of intensity, fundamental frequency and maximum phonation time of vowel at a significance level of 5% and confidence interval of 95%. A statistically significant difference was not found between the values of jitter, shimmer, and harmonic-to-noise ratio between tracheoesophageal and esophageal voice. There is no ideal method of rehabilitation and every one of them requires an individual approach to the patient, but the results shows the advantages of rehabilitation by means of installing voice prosthesis.

  20. Compensation for Coarticulation: Disentangling Auditory and Gestural Theories of Perception of Coarticulatory Effects in Speech

    ERIC Educational Resources Information Center

    Viswanathan, Navin; Magnuson, James S.; Fowler, Carol A.

    2010-01-01

    According to one approach to speech perception, listeners perceive speech by applying general pattern matching mechanisms to the acoustic signal (e.g., Diehl, Lotto, & Holt, 2004). An alternative is that listeners perceive the phonetic gestures that structured the acoustic signal (e.g., Fowler, 1986). The two accounts have offered different…

  1. A Comparison of Signal Enhancement Methods for Extracting Tonal Acoustic Signals

    NASA Technical Reports Server (NTRS)

    Jones, Michael G.

    1998-01-01

    The measurement of pure tone acoustic pressure signals in the presence of masking noise, often generated by mean flow, is a continual problem in the field of passive liner duct acoustics research. In support of the Advanced Subsonic Technology Noise Reduction Program, methods were investigated for conducting measurements of advanced duct liner concepts in harsh, aeroacoustic environments. This report presents the results of a comparison study of three signal extraction methods for acquiring quality acoustic pressure measurements in the presence of broadband noise (used to simulate the effects of mean flow). The performance of each method was compared to a baseline measurement of a pure tone acoustic pressure 3 dB above a uniform, broadband noise background.

  2. Elicitation of the Acoustic Change Complex to Long-Duration Speech Stimuli in Four-Month-Old Infants

    PubMed Central

    Chen, Ke Heng; Small, Susan A.

    2015-01-01

    The acoustic change complex (ACC) is an auditory-evoked potential elicited to changes within an ongoing stimulus that indicates discrimination at the level of the auditory cortex. Only a few studies to date have attempted to record ACCs in young infants. The purpose of the present study was to investigate the elicitation of ACCs to long-duration speech stimuli in English-learning 4-month-old infants. ACCs were elicited to consonant contrasts made up of two concatenated speech tokens. The stimuli included native dental-dental /dada/ and dental-labial /daba/ contrasts and a nonnative Hindi dental-retroflex /daDa/ contrast. Each consonant-vowel speech token was 410 ms in duration. Slow cortical responses were recorded to the onset of the stimulus and to the acoustic change from /da/ to either /ba/ or /Da/ within the stimulus with significantly prolonged latencies compared with adults. ACCs were reliably elicited for all stimulus conditions with more robust morphology compared with our previous findings using stimuli that were shorter in duration. The P1 amplitudes elicited to the acoustic change in /daba/ and /daDa/ were significantly larger compared to /dada/ supporting that the brain discriminated between the speech tokens. These findings provide further evidence for the use of ACCs as an index of discrimination ability. PMID:26798343

  3. Elicitation of the Acoustic Change Complex to Long-Duration Speech Stimuli in Four-Month-Old Infants.

    PubMed

    Chen, Ke Heng; Small, Susan A

    2015-01-01

    The acoustic change complex (ACC) is an auditory-evoked potential elicited to changes within an ongoing stimulus that indicates discrimination at the level of the auditory cortex. Only a few studies to date have attempted to record ACCs in young infants. The purpose of the present study was to investigate the elicitation of ACCs to long-duration speech stimuli in English-learning 4-month-old infants. ACCs were elicited to consonant contrasts made up of two concatenated speech tokens. The stimuli included native dental-dental /dada/ and dental-labial /daba/ contrasts and a nonnative Hindi dental-retroflex /daDa/ contrast. Each consonant-vowel speech token was 410 ms in duration. Slow cortical responses were recorded to the onset of the stimulus and to the acoustic change from /da/ to either /ba/ or /Da/ within the stimulus with significantly prolonged latencies compared with adults. ACCs were reliably elicited for all stimulus conditions with more robust morphology compared with our previous findings using stimuli that were shorter in duration. The P1 amplitudes elicited to the acoustic change in /daba/ and /daDa/ were significantly larger compared to /dada/ supporting that the brain discriminated between the speech tokens. These findings provide further evidence for the use of ACCs as an index of discrimination ability.

  4. Channel noise enhances signal detectability in a model of acoustic neuron through the stochastic resonance paradigm.

    PubMed

    Liberti, M; Paffi, A; Maggio, F; De Angelis, A; Apollonio, F; d'Inzeo, G

    2009-01-01

    A number of experimental investigations have evidenced the extraordinary sensitivity of neuronal cells to weak input stimulations, including electromagnetic (EM) fields. Moreover, it has been shown that biological noise, due to random channels gating, acts as a tuning factor in neuronal processing, according to the stochastic resonant (SR) paradigm. In this work the attention is focused on noise arising from the stochastic gating of ionic channels in a model of Ranvier node of acoustic fibers. The small number of channels gives rise to a high noise level, which is able to cause a spike train generation even in the absence of stimulations. A SR behavior has been observed in the model for the detection of sinusoidal signals at frequencies typical of the speech.

  5. Acoustics in human communication: evolving ideas about the nature of speech.

    PubMed

    Cooper, F S

    1980-07-01

    This paper discusses changes in attitude toward the nature of speech during the past half century. After reviewing early views on the subject, it considers the role of speech spectrograms, speech articulation, speech perception, messages and computers, and the nature of fluent speech.

  6. Signal processing methodologies for an acoustic fetal heart rate monitor

    NASA Technical Reports Server (NTRS)

    Pretlow, Robert A., III; Stoughton, John W.

    1992-01-01

    Research and development is presented of real time signal processing methodologies for the detection of fetal heart tones within a noise-contaminated signal from a passive acoustic sensor. A linear predictor algorithm is utilized for detection of the heart tone event and additional processing derives heart rate. The linear predictor is adaptively 'trained' in a least mean square error sense on generic fetal heart tones recorded from patients. A real time monitor system is described which outputs to a strip chart recorder for plotting the time history of the fetal heart rate. The system is validated in the context of the fetal nonstress test. Comparisons are made with ultrasonic nonstress tests on a series of patients. Comparative data provides favorable indications of the feasibility of the acoustic monitor for clinical use.

  7. Speaker height estimation from speech: Fusing spectral regression and statistical acoustic models.

    PubMed

    Hansen, John H L; Williams, Keri; Bořil, Hynek

    2015-08-01

    Estimating speaker height can assist in voice forensic analysis and provide additional side knowledge to benefit automatic speaker identification or acoustic model selection for automatic speech recognition. In this study, a statistical approach to height estimation that incorporates acoustic models within a non-uniform height bin width Gaussian mixture model structure as well as a formant analysis approach that employs linear regression on selected phones are presented. The accuracy and trade-offs of these systems are explored by examining the consistency of the results, location, and causes of error as well a combined fusion of the two systems using data from the TIMIT corpus. Open set testing is also presented using the Multi-session Audio Research Project corpus and publicly available YouTube audio to examine the effect of channel mismatch between training and testing data and provide a realistic open domain testing scenario. The proposed algorithms achieve a highly competitive performance to previously published literature. Although the different data partitioning in the literature and this study may prevent performance comparisons in absolute terms, the mean average error of 4.89 cm for males and 4.55 cm for females provided by the proposed algorithm on TIMIT utterances containing selected phones suggest a considerable estimation error decrease compared to past efforts.

  8. Speech timing and linguistic rhythm: on the acoustic bases of rhythm typologies.

    PubMed

    Rathcke, Tamara V; Smith, Rachel H

    2015-05-01

    Research into linguistic rhythm has been dominated by the idea that languages can be classified according to rhythmic templates, amenable to assessment by acoustic measures of vowel and consonant durations. This study tested predictions of two proposals explaining the bases of rhythmic typologies: the Rhythm Class Hypothesis which assumes that the templates arise from an extensive vs a limited use of durational contrasts, and the Control and Compensation Hypothesis which proposes that the templates are rooted in more vs less flexible speech production strategies. Temporal properties of segments, syllables and rhythmic feet were examined in two accents of British English, a "stress-timed" variety from Leeds, and a "syllable-timed" variety spoken by Panjabi-English bilinguals from Bradford. Rhythm metrics were calculated. A perception study confirmed that the speakers of the two varieties differed in their perceived rhythm. The results revealed that both typologies were informative in that to a certain degree, they predicted temporal patterns of the two varieties. None of the metrics tested was capable of adequately reflecting the temporal complexity found in the durational data. These findings contribute to the critical evaluation of the explanatory adequacy of rhythm metrics. Acoustic bases and limitations of the traditional rhythmic typologies are discussed.

  9. Can acoustic vowel space predict the habitual speech rate of the speaker?

    PubMed

    Tsao, Y-C; Iqbal, K

    2005-01-01

    This study aims to find whether the acoustic vowel space reflect the habitual speaking rate of the speaker. The vowel space is defined as the area of the quadrilateral formed by the four corner vowels (i.e.,/i/,/æ/,/u/,/α) in the F1F2- 2 plane. The study compares the acoustic vowel space in the speech of habitually slow and fast talkers and further analyzes them by gender. In addition to the measurement of vowel duration and midpoint frequencies of F1 and F2, the F1/F2 vowel space areas were measured and compared across speakers. The results indicate substantial overlap in vowel space area functions between slow and fast talkers, though the slow speakers were found to have larger vowel spaces. Furthermore, large variability in vowel space area functions was noted among interspeakers in each group. Both F1 and F2 formant frequencies were found to be gender sensitive in consistence with the existing data. No predictive relation between vowel duration and formant frequencies was observed among speakers.

  10. Physical and perceptual factors shape the neural mechanisms that integrate audiovisual signals in speech comprehension.

    PubMed

    Lee, HweeLing; Noppeney, Uta

    2011-08-03

    Face-to-face communication challenges the human brain to integrate information from auditory and visual senses with linguistic representations. Yet the role of bottom-up physical (spectrotemporal structure) input and top-down linguistic constraints in shaping the neural mechanisms specialized for integrating audiovisual speech signals are currently unknown. Participants were presented with speech and sinewave speech analogs in visual, auditory, and audiovisual modalities. Before the fMRI study, they were trained to perceive physically identical sinewave speech analogs as speech (SWS-S) or nonspeech (SWS-N). Comparing audiovisual integration (interactions) of speech, SWS-S, and SWS-N revealed a posterior-anterior processing gradient within the left superior temporal sulcus/gyrus (STS/STG): Bilateral posterior STS/STG integrated audiovisual inputs regardless of spectrotemporal structure or speech percept; in left mid-STS, the integration profile was primarily determined by the spectrotemporal structure of the signals; more anterior STS regions discarded spectrotemporal structure and integrated audiovisual signals constrained by stimulus intelligibility and the availability of linguistic representations. In addition to this "ventral" processing stream, a "dorsal" circuitry encompassing posterior STS/STG and left inferior frontal gyrus differentially integrated audiovisual speech and SWS signals. Indeed, dynamic causal modeling and Bayesian model comparison provided strong evidence for a parallel processing structure encompassing a ventral and a dorsal stream with speech intelligibility training enhancing the connectivity between posterior and anterior STS/STG. In conclusion, audiovisual speech comprehension emerges in an interactive process with the integration of auditory and visual signals being progressively constrained by stimulus intelligibility along the STS and spectrotemporal structure in a dorsal fronto-temporal circuitry.

  11. Angle of Arrival Estimation for Saturated Acoustic Signals

    DTIC Science & Technology

    2013-03-01

    to close proximity to a large transient event, which can render target localization difficult with many standard algorithms. Our goal is to develop an...defined threshold on multiple channels. However, close proximity to an 2 acoustic source can result in signal saturation, where data reach a...KINGMAN RD STE 0944 FT BELVOIR VA 22060-6218 4 PDFS US ARMY ARDEC FUZE PRECISION ARMAMENT TECHNOLOGY DIV ATTN A MORCOS ATTN H VANPELT

  12. Very low-frequency signals support perceptual organization of implant-simulated speech for adults and children

    PubMed Central

    Nittrouer, Susan; Tarr, Eric; Bolster, Virginia; Caldwell-Tarr, Amanda; Moberly, Aaron C.; Lowenstein, Joanna H.

    2014-01-01

    Objective Using signals processed to simulate speech received through cochlear implants and low-frequency extended hearing aids, this study examined the proposal that low-frequency signals facilitate the perceptual organization of broader, spectrally degraded signals. Design In two experiments, words and sentences were presented in diotic and dichotic configurations as four-channel noise-vocoded signals (VOC-only), and as those signals combined with the acoustic signal below 250 Hz (LOW-plus). Dependent measures were percent correct recognition scores, and the difference between scores for the two processing conditions given as proportions of recognition scores for VOC-only. The influence of linguistic context was also examined. Study Sample Participants had normal hearing. In all, 40 adults, 40 7-year-olds, and 20 5-year-olds participated. Results Participants of all ages showed benefits of adding the low-frequency signal. The effect was greater for sentences than words, but no effect of configuration was found. The influence of linguistic context was similar across age groups, and did not contribute to the low-frequency effect. Listeners who scored more poorly with VOC-only stimuli showed greater low-frequency effects. Conclusion The benefit of adding a very low-frequency signal to a broader, spectrally degraded signal seems to derive from its facilitative influence on perceptual organization of the sensory input. PMID:24456179

  13. Modeling of Acoustic Emission Signal Propagation in Waveguides

    PubMed Central

    Zelenyak, Andreea-Manuela; Hamstad, Marvin A.; Sause, Markus G. R.

    2015-01-01

    Acoustic emission (AE) testing is a widely used nondestructive testing (NDT) method to investigate material failure. When environmental conditions are harmful for the operation of the sensors, waveguides are typically mounted in between the inspected structure and the sensor. Such waveguides can be built from different materials or have different designs in accordance with the experimental needs. All these variations can cause changes in the acoustic emission signals in terms of modal conversion, additional attenuation or shift in frequency content. A finite element method (FEM) was used to model acoustic emission signal propagation in an aluminum plate with an attached waveguide and was validated against experimental data. The geometry of the waveguide is systematically changed by varying the radius and height to investigate the influence on the detected signals. Different waveguide materials were implemented and change of material properties as function of temperature were taken into account. Development of the option of modeling different waveguide options replaces the time consuming and expensive trial and error alternative of experiments. Thus, the aim of this research has important implications for those who use waveguides for AE testing. PMID:26007731

  14. The concept of signal-to-noise ratio in the modulation domain and speech intelligibility.

    PubMed

    Dubbelboer, Finn; Houtgast, Tammo

    2008-12-01

    A new concept is proposed that relates to intelligibility of speech in noise. The concept combines traditional estimations of signal-to-noise ratios (S/N) with elements from the modulation transfer function model, which results in the definition of the signal-to-noise ratio in the modulation domain: the (SN)(mod). It is argued that this (SN)(mod), quantifying the strength of speech modulations relative to a floor of spurious modulations arising from the speech-noise interaction, is the key factor in relation to speech intelligibility. It is shown that, by using a specific test signal, the strength of these spurious modulations can be measured, allowing an estimation of the (SN)(mod) for various conditions of additive noise, noise suppression, and amplitude compression. By relating these results to intelligibility data for these same conditions, the relevance of the (SN)(mod) as the key factor underlying speech intelligibility is clearly illustrated. For instance, it is shown that the commonly observed limited effect of noise suppression on speech intelligibility is correctly "predicted" by the (SN)(mod), whereas traditional measures such as the speech transmission index, considering only the changes in the speech modulations, fall short in this respect. It is argued that (SN)(mod) may provide a relevant tool in the design of successful noise-suppression systems.

  15. INSTRUMENTATION FOR SURVEYING ACOUSTIC SIGNALS IN NATURAL GAS TRANSMISSION LINES

    SciTech Connect

    John L. Loth; Gary J. Morris; George M. Palmer; Richard Guiler; Deepak Mehra

    2003-09-01

    In the U.S. natural gas is distributed through more than one million miles of high-pressure transmission pipelines. If all leaks and infringements could be detected quickly, it would enhance safety and U.S. energy security. Only low frequency acoustic waves appear to be detectable over distances up to 60 km where pipeline shut-off valves provide access to the inside of the pipeline. This paper describes a Portable Acoustic Monitoring Package (PAMP) developed to record and identify acoustic signals characteristic of: leaks, pump noise, valve and flow metering noise, third party infringement, manual pipeline water and gas blow-off, etc. This PAMP consists of a stainless steel 1/2 inch NPT plumbing tree rated for use on 1000 psi pipelines. Its instrumentation is designed to measure acoustic waves over the entire frequency range from zero to 16,000 Hz by means of four instruments: (1) microphone, (2) 3-inch water full range differential pressure transducer with 0.1% of range sensitivity, (3) a novel 3 inch to 100 inch water range amplifier, using an accumulator with needle valve and (4) a line-pressure transducer. The weight of the PAMP complete with all accessories is 36 pounds. This includes a remote control battery/switch box assembly on a 25-foot extension chord, a laptop data acquisition computer on a field table and a sun shield.

  16. Emotional recognition from the speech signal for a virtual education agent

    NASA Astrophysics Data System (ADS)

    Tickle, A.; Raghu, S.; Elshaw, M.

    2013-06-01

    This paper explores the extraction of features from the speech wave to perform intelligent emotion recognition. A feature extract tool (openSmile) was used to obtain a baseline set of 998 acoustic features from a set of emotional speech recordings from a microphone. The initial features were reduced to the most important ones so recognition of emotions using a supervised neural network could be performed. Given that the future use of virtual education agents lies with making the agents more interactive, developing agents with the capability to recognise and adapt to the emotional state of humans is an important step.

  17. Prediction Errors but Not Sharpened Signals Simulate Multivoxel fMRI Patterns during Speech Perception

    PubMed Central

    Davis, Matthew H.

    2016-01-01

    Successful perception depends on combining sensory input with prior knowledge. However, the underlying mechanism by which these two sources of information are combined is unknown. In speech perception, as in other domains, two functionally distinct coding schemes have been proposed for how expectations influence representation of sensory evidence. Traditional models suggest that expected features of the speech input are enhanced or sharpened via interactive activation (Sharpened Signals). Conversely, Predictive Coding suggests that expected features are suppressed so that unexpected features of the speech input (Prediction Errors) are processed further. The present work is aimed at distinguishing between these two accounts of how prior knowledge influences speech perception. By combining behavioural, univariate, and multivariate fMRI measures of how sensory detail and prior expectations influence speech perception with computational modelling, we provide evidence in favour of Prediction Error computations. Increased sensory detail and informative expectations have additive behavioural and univariate neural effects because they both improve the accuracy of word report and reduce the BOLD signal in lateral temporal lobe regions. However, sensory detail and informative expectations have interacting effects on speech representations shown by multivariate fMRI in the posterior superior temporal sulcus. When prior knowledge was absent, increased sensory detail enhanced the amount of speech information measured in superior temporal multivoxel patterns, but with informative expectations, increased sensory detail reduced the amount of measured information. Computational simulations of Sharpened Signals and Prediction Errors during speech perception could both explain these behavioural and univariate fMRI observations. However, the multivariate fMRI observations were uniquely simulated by a Prediction Error and not a Sharpened Signal model. The interaction between prior

  18. A self-organizing neural network architecture for auditory and speech perception with applications to acoustic and other temporal prediction problems

    NASA Astrophysics Data System (ADS)

    Cohen, Michael; Grossberg, Stephen

    1994-09-01

    This project is developing autonomous neural network models for the real-time perception and production of acoustic and speech signals. Our SPINET pitch model was developed to take realtime acoustic input and to simulate the key pitch data. SPINET was embedded into a model for auditory scene analysis, or how the auditory system separates sound sources in environments with multiple sources. The model groups frequency components based on pitch and spatial location cues and resonantly binds them within different streams. The model simulates psychophysical grouping data, such as how an ascending, tone groups with a descending tone even if noise exists at the intersection point, and how a tone before and after a noise burst is perceived to continue through the noise. These resonant streams input to working memories, wherein phonetic percepts adapt to global speech rate. Computer simulations quantitatively generate the experimentally observed category boundary shifts for voiced stop pairs that have the same or different place of articulation, including why the interval to hear a double (geminate) stop is twice as long as that to hear two different stops. This model also uses resonant feedback, here between list categories and working memory.

  19. Speech transmission index from running speech: A neural network approach

    NASA Astrophysics Data System (ADS)

    Li, F. F.; Cox, T. J.

    2003-04-01

    Speech transmission index (STI) is an important objective parameter concerning speech intelligibility for sound transmission channels. It is normally measured with specific test signals to ensure high accuracy and good repeatability. Measurement with running speech was previously proposed, but accuracy is compromised and hence applications limited. A new approach that uses artificial neural networks to accurately extract the STI from received running speech is developed in this paper. Neural networks are trained on a large set of transmitted speech examples with prior knowledge of the transmission channels' STIs. The networks perform complicated nonlinear function mappings and spectral feature memorization to enable accurate objective parameter extraction from transmitted speech. Validations via simulations demonstrate the feasibility of this new method on a one-net-one-speech extract basis. In this case, accuracy is comparable with normal measurement methods. This provides an alternative to standard measurement techniques, and it is intended that the neural network method can facilitate occupied room acoustic measurements.

  20. Acoustic Source Characteristics, Across-Formant Integration, and Speech Intelligibility Under Competitive Conditions

    PubMed Central

    2015-01-01

    An important aspect of speech perception is the ability to group or select formants using cues in the acoustic source characteristics—for example, fundamental frequency (F0) differences between formants promote their segregation. This study explored the role of more radical differences in source characteristics. Three-formant (F1+F2+F3) synthetic speech analogues were derived from natural sentences. In Experiment 1, F1+F3 were generated by passing a harmonic glottal source (F0 = 140 Hz) through second-order resonators (H1+H3); in Experiment 2, F1+F3 were tonal (sine-wave) analogues (T1+T3). F2 could take either form (H2 or T2). In some conditions, the target formants were presented alone, either monaurally or dichotically (left ear = F1+F3; right ear = F2). In others, they were accompanied by a competitor for F2 (F1+F2C+F3; F2), which listeners must reject to optimize recognition. Competitors (H2C or T2C) were created using the time-reversed frequency and amplitude contours of F2. Dichotic presentation of F2 and F2C ensured that the impact of the competitor arose primarily through informational masking. In the absence of F2C, the effect of a source mismatch between F1+F3 and F2 was relatively modest. When F2C was present, intelligibility was lowest when F2 was tonal and F2C was harmonic, irrespective of which type matched F1+F3. This finding suggests that source type and context, rather than similarity, govern the phonetic contribution of a formant. It is proposed that wideband harmonic analogues are more effective informational maskers than narrowband tonal analogues, and so become dominant in across-frequency integration of phonetic information when placed in competition. PMID:25751040

  1. Tracking the speech signal--time-locked MEG signals during perception of ultra-fast and moderately fast speech in blind and in sighted listeners.

    PubMed

    Hertrich, Ingo; Dietrich, Susanne; Ackermann, Hermann

    2013-01-01

    Blind people can learn to understand speech at ultra-high syllable rates (ca. 20 syllables/s), a capability associated with hemodynamic activation of the central-visual system. To further elucidate the neural mechanisms underlying this skill, magnetoencephalographic (MEG) measurements during listening to sentence utterances were cross-correlated with time courses derived from the speech signal (envelope, syllable onsets and pitch periodicity) to capture phase-locked MEG components (14 blind, 12 sighted subjects; speech rate=8 or 16 syllables/s, pre-defined source regions: auditory and visual cortex, inferior frontal gyrus). Blind individuals showed stronger phase locking in auditory cortex than sighted controls, and right-hemisphere visual cortex activity correlated with syllable onsets in case of ultra-fast speech. Furthermore, inferior-frontal MEG components time-locked to pitch periodicity displayed opposite lateralization effects in sighted (towards right hemisphere) and blind subjects (left). Thus, ultra-fast speech comprehension in blind individuals appears associated with changes in early signal-related processing mechanisms both within and outside the central-auditory terrain.

  2. A Robust Approach For Acoustic Noise Suppression In Speech Using ANFIS

    NASA Astrophysics Data System (ADS)

    Martinek, Radek; Kelnar, Michal; Vanus, Jan; Bilik, Petr; Zidek, Jan

    2015-11-01

    The authors of this article deals with the implementation of a combination of techniques of the fuzzy system and artificial intelligence in the application area of non-linear noise and interference suppression. This structure used is called an Adaptive Neuro Fuzzy Inference System (ANFIS). This system finds practical use mainly in audio telephone (mobile) communication in a noisy environment (transport, production halls, sports matches, etc). Experimental methods based on the two-input adaptive noise cancellation concept was clearly outlined. Within the experiments carried out, the authors created, based on the ANFIS structure, a comprehensive system for adaptive suppression of unwanted background interference that occurs in audio communication and degrades the audio signal. The system designed has been tested on real voice signals. This article presents the investigation and comparison amongst three distinct approaches to noise cancellation in speech; they are LMS (least mean squares) and RLS (recursive least squares) adaptive filtering and ANFIS. A careful review of literatures indicated the importance of non-linear adaptive algorithms over linear ones in noise cancellation. It was concluded that the ANFIS approach had the overall best performance as it efficiently cancelled noise even in highly noise-degraded speech. Results were drawn from the successful experimentation, subjective-based tests were used to analyse their comparative performance while objective tests were used to validate them. Implementation of algorithms was experimentally carried out in Matlab to justify the claims and determine their relative performances.

  3. Differential Effects of Visual-Acoustic Biofeedback Intervention for Residual Speech Errors

    PubMed Central

    McAllister Byun, Tara; Campbell, Heather

    2016-01-01

    Recent evidence suggests that the incorporation of visual biofeedback technologies may enhance response to treatment in individuals with residual speech errors. However, there is a need for controlled research systematically comparing biofeedback versus non-biofeedback intervention approaches. This study implemented a single-subject experimental design with a crossover component to investigate the relative efficacy of visual-acoustic biofeedback and traditional articulatory treatment for residual rhotic errors. Eleven child/adolescent participants received ten sessions of visual-acoustic biofeedback and 10 sessions of traditional treatment, with the order of biofeedback and traditional phases counterbalanced across participants. Probe measures eliciting untreated rhotic words were administered in at least three sessions prior to the start of treatment (baseline), between the two treatment phases (midpoint), and after treatment ended (maintenance), as well as before and after each treatment session. Perceptual accuracy of rhotic production was assessed by outside listeners in a blinded, randomized fashion. Results were analyzed using a combination of visual inspection of treatment trajectories, individual effect sizes, and logistic mixed-effects regression. Effect sizes and visual inspection revealed that participants could be divided into categories of strong responders (n = 4), mixed/moderate responders (n = 3), and non-responders (n = 4). Individual results did not reveal a reliable pattern of stronger performance in biofeedback versus traditional blocks, or vice versa. Moreover, biofeedback versus traditional treatment was not a significant predictor of accuracy in the logistic mixed-effects model examining all within-treatment word probes. However, the interaction between treatment condition and treatment order was significant: biofeedback was more effective than traditional treatment in the first phase of treatment, and traditional treatment was more effective

  4. Low-Frequency Acoustic Signals Propagation in Buried Pipelines

    NASA Astrophysics Data System (ADS)

    Ovchinnikov, A. L.; Lapshin, B. M.

    2016-01-01

    The article deals with the issues concerning acoustic signals propagation in the large-diameter oil pipelines caused by mechanical action on the pipe body. Various mechanisms of signals attenuation are discussed. It is shown that the calculation of the attenuation caused only by internal energy loss, i.e, the presence of viscosity, thermal conductivity and liquid pipeline wall friction lead to low results. The results of experimental studies, carried out using the existing pipeline with a diameter of 1200 mm. are shown. It is experimentally proved that the main mechanism of signal attenuation is the energy emission into the environment. The numerical values of attenuation coefficients that are 0,14- 0.18 dB/m for the pipeline of 1200 mm in diameter, in the frequency range from 50 Hz to 500 Hz, are determined.

  5. Fatigue crack localization with near-field acoustic emission signals

    NASA Astrophysics Data System (ADS)

    Zhou, Changjiang; Zhang, Yunfeng

    2013-04-01

    This paper presents an AE source localization technique using near-field acoustic emission (AE) signals induced by crack growth and propagation. The proposed AE source localization technique is based on the phase difference in the AE signals measured by two identical AE sensing elements spaced apart at a pre-specified distance. This phase difference results in canceling-out of certain frequency contents of signals, which can be related to AE source direction. Experimental data from simulated AE source such as pencil breaks was used along with analytical results from moment tensor analysis. It is observed that the theoretical predictions, numerical simulations and the experimental test results are in good agreement. Real data from field monitoring of an existing fatigue crack on a bridge was also used to test this system. Results show that the proposed method is fairly effective in determining the AE source direction in thick plates commonly encountered in civil engineering structures.

  6. Adaptive plasticity in wild field cricket's acoustic signaling.

    PubMed

    Bertram, Susan M; Harrison, Sarah J; Thomson, Ian R; Fitzsimmons, Lauren P

    2013-01-01

    Phenotypic plasticity can be adaptive when phenotypes are closely matched to changes in the environment. In crickets, rhythmic fluctuations in the biotic and abiotic environment regularly result in diel rhythms in density of sexually active individuals. Given that density strongly influences the intensity of sexual selection, we asked whether crickets exhibit plasticity in signaling behavior that aligns with these rhythmic fluctuations in the socio-sexual environment. We quantified the acoustic mate signaling behavior of wild-caught males of two cricket species, Gryllus veletis and G. pennsylvanicus. Crickets exhibited phenotypically plastic mate signaling behavior, with most males signaling more often and more attractively during the times of day when mating activity is highest in the wild. Most male G. pennsylvanicus chirped more often and louder, with shorter interpulse durations, pulse periods, chirp durations, and interchirp durations, and at slightly higher carrier frequencies during the time of the day that mating activity is highest in the wild. Similarly, most male G. veletis chirped more often, with more pulses per chirp, longer interpulse durations, pulse periods, and chirp durations, shorter interchirp durations, and at lower carrier frequencies during the time of peak mating activity in the wild. Among-male variation in signaling plasticity was high, with some males signaling in an apparently maladaptive manner. Body size explained some of the among-male variation in G. pennsylvanicus plasticity but not G. veletis plasticity. Overall, our findings suggest that crickets exhibit phenotypically plastic mate attraction signals that closely match the fluctuating socio-sexual context they experience.

  7. Automatic speech recognition

    NASA Astrophysics Data System (ADS)

    Espy-Wilson, Carol

    2005-04-01

    Great strides have been made in the development of automatic speech recognition (ASR) technology over the past thirty years. Most of this effort has been centered around the extension and improvement of Hidden Markov Model (HMM) approaches to ASR. Current commercially-available and industry systems based on HMMs can perform well for certain situational tasks that restrict variability such as phone dialing or limited voice commands. However, the holy grail of ASR systems is performance comparable to humans-in other words, the ability to automatically transcribe unrestricted conversational speech spoken by an infinite number of speakers under varying acoustic environments. This goal is far from being reached. Key to the success of ASR is effective modeling of variability in the speech signal. This tutorial will review the basics of ASR and the various ways in which our current knowledge of speech production, speech perception and prosody can be exploited to improve robustness at every level of the system.

  8. Transient noise reduction in speech signal with a modified long-term predictor

    NASA Astrophysics Data System (ADS)

    Choi, Min-Seok; Kang, Hong-Goo

    2011-12-01

    This article proposes an efficient median filter based algorithm to remove transient noise in a speech signal. The proposed algorithm adopts a modified long-term predictor (LTP) as the pre-processor of the noise reduction process to reduce speech distortion caused by the nonlinear nature of the median filter. This article shows that the LTP analysis does not modify to the characteristic of transient noise during the speech modeling process. Oppositely, if a short-term linear prediction (STP) filter is employed as a pre-processor, the enhanced output includes residual noise because the STP analysis and synthesis process keeps and restores transient noise components. To minimize residual noise and speech distortion after the transient noise reduction, a modified LTP method is proposed which estimates the characteristic of speech more accurately. By ignoring transient noise presence regions in the pitch lag detection step, the modified LTP successfully avoids being affected by transient noise. A backward pitch prediction algorithm is also adopted to reduce speech distortion in the onset regions. Experimental results verify that the proposed system efficiently eliminates transient noise while preserving desired speech signal.

  9. Coding Method of LSP Residual Signals Using Wavelets for Speech Synthesis

    NASA Astrophysics Data System (ADS)

    Shimizu, Tadaaki; Kimoto, Masaya; Yoshimura, Hiroki; Isu, Naoki; Sugata, Kazuhiro

    This paper presents a method to use wavelet analysis for speech coding and synthesis by rule. It is a coding system where LSP residual signal is transformed into wavelet coefficients. As wavelet analysis is implemented effectively by filter banks, our method is featured to require less computation than multipulse coding and others where complicated prediction procedures are essential. To achieve good quality speech at low bit rates, we verified to allocate the different bits onto the wavelet coefficients, with more bits in lower frequencies, and less in higher. The synthesized speech by Haar wavelet with 16.538kbits/s has nearly same perceptual quality with 6 bits μlog PCM (66.15kbits/s). We are convinced that coding method of LSP residual signals using wavelet analysis is an effective approach to synthesize speech.

  10. Precursory acoustic signals and ground deformation in volcanic explosions

    NASA Astrophysics Data System (ADS)

    Bowman, D. C.; Kim, K.; Anderson, J.; Lees, J. M.; Taddeucci, J.; Graettinger, A. H.; Sonder, I.; Valentine, G.

    2013-12-01

    We investigate precursory acoustic signals that appear prior to volcanic explosions in real and experimental settings. Acoustic records of a series of experimental blasts designed to mimic maar explosions show precursory energy 0.02 to 0.05 seconds before the high amplitude overpressure arrival. These blasts consisted of 1 to 1/3 lb charges detonated in unconsolidated granular material at depths between 0.5 and 1 m, and were performed during the Buffalo Man Made Maars experiment in Springville, New York, USA. The preliminary acoustic arrival is 1 to 2 orders of magnitude lower in amplitude compared to the main blast wave. The waveforms vary from blast to blast, perhaps reflecting the different explosive yields and burial depths of each shot. Similar arrivals are present in some infrasound records at Santiaguito volcano, Guatemala, where they precede the main blast signal by about 2 seconds and are about 1 order of magnitude weaker. Precursory infrasound has also been described at Sakurajima volcano, Japan (Yokoo et al, 2013; Bull. Volc. Soc. Japan, 58, 163-181) and Suwanosejima volcano, Japan (Yokoo and Iguchi, 2010; JVGR, 196, 287-294), where it is attributed to rapid deformation of the vent region. Vent deformation has not been directly observed at these volcanoes because of the difficulty of visually observing the crater floor. However, particle image velocimetry of video records at Santiaguito has revealed rapid and widespread ground motion just prior to eruptions (Johnson et al, 2008; Nature, 456, 377-381) and may be the cause of much of the infrasound recorded at that volcano (Johnson and Lees, 2010; GRL, 37, L22305). High speed video records of the blasts during the Man Made Maars experiment also show rapid deformation of the ground immediately before the explosion plume breaches the surface. We examine the connection between source yield, burial depths, ground deformation, and the production of initial acoustic phases for each simulated maar explosion. We

  11. Integrated speech enhancement for functional MRI environment.

    PubMed

    Pathak, Nishank; Milani, Ali A; Panahi, Issa; Briggs, Richard

    2009-01-01

    This paper presents an integrated speech enhancement (SE) method for the noisy MRI environment. We show that the performance of SE system improves considerably when the speech signal dominated by MRI acoustic noise at very low SNR is enhanced in two successive stages using two-channel SE methods followed by a single-channel post processing SE algorithm. Actual MRI noisy speech data are used in our experiments showing the improved performance of the proposed SE method.

  12. Experimental investigation of the effects of the acoustical conditions in a simulated classroom on speech recognition and learning in children a

    PubMed Central

    Valente, Daniel L.; Plevinsky, Hallie M.; Franco, John M.; Heinrichs-Graham, Elizabeth C.; Lewis, Dawna E.

    2012-01-01

    The potential effects of acoustical environment on speech understanding are especially important as children enter school where students’ ability to hear and understand complex verbal information is critical to learning. However, this ability is compromised because of widely varied and unfavorable classroom acoustics. The extent to which unfavorable classroom acoustics affect children’s performance on longer learning tasks is largely unknown as most research has focused on testing children using words, syllables, or sentences as stimuli. In the current study, a simulated classroom environment was used to measure comprehension performance of two classroom learning activities: a discussion and lecture. Comprehension performance was measured for groups of elementary-aged students in one of four environments with varied reverberation times and background noise levels. The reverberation time was either 0.6 or 1.5 s, and the signal-to-noise level was either +10 or +7 dB. Performance is compared to adult subjects as well as to sentence-recognition in the same condition. Significant differences were seen in comprehension scores as a function of age and condition; both increasing background noise and reverberation degraded performance in comprehension tasks compared to minimal differences in measures of sentence-recognition. PMID:22280587

  13. What information is necessary for speech categorization? Harnessing variability in the speech signal by integrating cues computed relative to expectations

    PubMed Central

    McMurray, Bob; Jongman, Allard

    2012-01-01

    Most theories of categorization emphasize how continuous perceptual information is mapped to categories. However, equally important is the informational assumptions of a model, the type of information subserving this mapping. This is crucial in speech perception where the signal is variable and context-dependent. This study assessed the informational assumptions of several models of speech categorization, in particular, the number of cues that are the basis of categorization and whether these cues represent the input veridically or have undergone compensation. We collected a corpus of 2880 fricative productions (Jongman, Wayland & Wong, 2000) spanning many talker- and vowel-contexts and measured 24 cues for each. A subset was also presented to listeners in an 8AFC phoneme categorization task. We then trained a common classification model based on logistic regression to categorize the fricative from the cue values, and manipulated the information in the training set to contrast 1) models based on a small number of invariant cues; 2) models using all cues without compensation, and 3) models in which cues underwent compensation for contextual factors. Compensation was modeled by Computing Cues Relative to Expectations (C-CuRE), a new approach to compensation that preserves fine-grained detail in the signal. Only the compensation model achieved a similar accuracy to listeners, and showed the same effects of context. Thus, even simple categorization metrics can overcome the variability in speech when sufficient information is available and compensation schemes like C-CuRE are employed. PMID:21417542

  14. Automatic speech recognition in cocktail-party situations: a specific training for separated speech.

    PubMed

    Marti, Amparo; Cobos, Maximo; Lopez, Jose J

    2012-02-01

    Automatic speech recognition (ASR) refers to the task of extracting a transcription of the linguistic content of an acoustical speech signal automatically. Despite several decades of research in this important area of acoustic signal processing, the accuracy of ASR systems is still far behind human performance, especially in adverse acoustic scenarios. In this context, one of the most challenging situations is the one concerning simultaneous speech in cocktail-party environments. Although source separation methods have already been investigated to deal with this problem, the separation process is not perfect and the resulting artifacts pose an additional problem to ASR performance. In this paper, a specific training to improve the percentage of recognized words in real simultaneous speech cases is proposed. The combination of source separation and this specific training is explored and evaluated under different acoustical conditions, leading to improvements of up to a 35% in ASR performance.

  15. Modern Techniques in Acoustical Signal and Image Processing

    SciTech Connect

    Candy, J V

    2002-04-04

    Acoustical signal processing problems can lead to some complex and intricate techniques to extract the desired information from noisy, sometimes inadequate, measurements. The challenge is to formulate a meaningful strategy that is aimed at performing the processing required even in the face of uncertainties. This strategy can be as simple as a transformation of the measured data to another domain for analysis or as complex as embedding a full-scale propagation model into the processor. The aims of both approaches are the same--to extract the desired information and reject the extraneous, that is, develop a signal processing scheme to achieve this goal. In this paper, we briefly discuss this underlying philosophy from a ''bottom-up'' approach enabling the problem to dictate the solution rather than visa-versa.

  16. Study on demodulated signal distribution and acoustic pressure phase sensitivity of a self-interfered distributed acoustic sensing system

    NASA Astrophysics Data System (ADS)

    Shang, Ying; Yang, Yuan-Hong; Wang, Chen; Liu, Xiao-Hui; Wang, Chang; Peng, Gang-Ding

    2016-06-01

    We propose a demodulated signal distribution theory for a self-interfered distributed acoustic sensing system. The distribution region of Rayleigh backscattering including the acoustic sensing signal in the sensing fiber is investigated theoretically under different combinations of both the path difference and pulse width Additionally we determine the optimal solution between the path difference and pulse width to obtain the maximum phase change per unit length. We experimentally test this theory and realize a good acoustic pressure phase sensitivity of  -150 dB re rad/(μPa·m) of fiber in the frequency range from 200 Hz to 1 kHz.

  17. Observer-based beamforming algorithm for acoustic array signal processing.

    PubMed

    Bai, Long; Huang, Xun

    2011-12-01

    In the field of noise identification with microphone arrays, conventional delay-and-sum (DAS) beamforming is the most popular signal processing technique. However, acoustic imaging results that are generated by DAS beamforming are easily influenced by background noise, particularly for in situ wind tunnel tests. Even when arithmetic averaging is used to statistically remove the interference from the background noise, the results are far from perfect because the interference from the coherent background noise is still present. In addition, DAS beamforming based on arithmetic averaging fails to deliver real-time computational capability. An observer-based approach is introduced in this paper. This so-called observer-based beamforming method has a recursive form similar to the state observer in classical control theory, thus holds a real-time computational capability. In addition, coherent background noise can be gradually rejected in iterations. Theoretical derivations of the observer-based beamforming algorithm are carefully developed in this paper. Two numerical simulations demonstrate the good coherent background noise rejection and real-time computational capability of the observer-based beamforming, which therefore can be regarded as an attractive algorithm for acoustic array signal processing.

  18. Articulatory Mediation of Speech Perception: A Causal Analysis of Multi-Modal Imaging Data

    ERIC Educational Resources Information Center

    Gow, David W., Jr.; Segawa, Jennifer A.

    2009-01-01

    The inherent confound between the organization of articulation and the acoustic-phonetic structure of the speech signal makes it exceptionally difficult to evaluate the competing claims of motor and acoustic-phonetic accounts of how listeners recognize coarticulated speech. Here we use Granger causation analysis of high spatiotemporal resolution…

  19. Deconvolution and signal extraction in geophysics and acoustics

    NASA Astrophysics Data System (ADS)

    Sibul, Leon H.; Roan, Michael J.; Erling, Josh

    2002-11-01

    Deconvolution and signal extraction are fundamental signal processing techniques in geophysics and acoustics. An introductory overview of the standard second-order methods and minimum entropy deconvolution is presented. Limitations of the second-order methods are discussed and the need for more general methods is established. The minimum entropy deconvolution (MED), as proposed by Wiggins in 1977, is a technique for the deconvolution of seismic signals that overcomes limitations of the second-order method of deconvolution. The unifying conceptual framework MED, as presented in the Donoho's classical paper (1981) is discussed. The basic assumption of MED is that input signals to the forward filter are independent, identically distributed non-Gaussian random processes. A forward convolution filter ''makes'' the output of the forward filter more Gaussian which increases its entropy. The minimization of entropy restores the original non-Gaussian input. We also give an overview of recent developments in blind deconvolution (BDC), blind source separation (BSS), and blind signal extraction (BSE). The recent research in these areas uses information theoretic (IT) criteria (entropy, mutual information, K-L divergence, etc.) for optimization objective functions. Gradients of these objective functions are nonlinear functions, resulting in nonlinear algorithms. Some of the recursive algorithms for nonlinear optimization are reviewed.

  20. Signal Restoration of Non-stationary Acoustic Signals in the Time Domain

    NASA Technical Reports Server (NTRS)

    Babkin, Alexander S.

    1988-01-01

    Signal restoration is a method of transforming a nonstationary signal acquired by a ground based microphone to an equivalent stationary signal. The benefit of the signal restoration is a simplification of the flight test requirements because it could dispense with the need to acquire acoustic data with another aircraft flying in concert with the rotorcraft. The data quality is also generally improved because the contamination of the signal by the propeller and wind noise is not present. The restoration methodology can also be combined with other data acquisition methods, such as a multiple linear microphone array for further improvement of the test results. The methodology and software are presented for performing the signal restoration in the time domain. The method has no restrictions on flight path geometry or flight regimes. Only requirement is that the aircraft spatial position be known relative to the microphone location and synchronized with the acoustic data. The restoration process assumes that the moving source radiates a stationary signal, which is then transformed into a nonstationary signal by various modulation processes. The restoration contains only the modulation due to the source motion.

  1. Language-Specific Developmental Differences in Speech Production: A Cross-Language Acoustic Study

    ERIC Educational Resources Information Center

    Li, Fangfang

    2012-01-01

    Speech productions of 40 English- and 40 Japanese-speaking children (aged 2-5) were examined and compared with the speech produced by 20 adult speakers (10 speakers per language). Participants were recorded while repeating words that began with "s" and "sh" sounds. Clear language-specific patterns in adults' speech were found,…

  2. Floc Growth and Changes in ADV Acoustic Backscatter Signal

    NASA Astrophysics Data System (ADS)

    Rouhnia, M.; Keyvani, A.; Strom, K.

    2013-12-01

    A series of experiments were conducted to examine the effect of mud floc growth on the acoustic back-scatter signal recorded by a Nortek Vector acoustic Doppler velocimeter (ADV). Several studies have shown that calibration equations can be developed to link the backscatter strength with average suspended sediment concentration (SSC) when the sediment particle size distribution remains constant. However, when mud is present, the process of flocculation can alter the suspended particle size distribution. Past studies have shown that it is still unclear as to the degree of dependence of the calibration equation on changes in floc size. Part of the ambiguity lies in the fact that flocs can be porous and rather loosely packed and therefore might not scatter to the same extent as a grain of sand. In addition, direct, detailed measurements of floc size have not accompanied experiments examining the dependence of ADV backscatter and suspended sediment concentration. In this research, a set of laboratory experiments is used to test how floc growth affects the backscatter strength. The laboratory data is examined in light of an analytic model that was developed based on scatter theory to account for changes in both SSC and the floc properties of size and density. For the experiments, a turbulent suspension was created in a tank with a rotating paddle. Fixed concentrations of a mixture of kaolinite and montmorillonite were added to the tank in a step-wise manner. For each step, the flocs were allowed to grow to their equilibrium size before breaking the flocs with high turbulent mixing, adding more sediment, and then returning the mixing rate to a range suitable for the re-growth of flocs. During each floc growth phase, data was simultaneously collected at the same elevation in the tank using a floc camera to capture the changes in floc size, a Nortek Vector ADV for the acoustic backscatter, and a Campbell Scientific OBS 3+ for optical backscatter. Physical samples of the

  3. Interactions between distal speech rate, linguistic knowledge, and speech environment.

    PubMed

    Morrill, Tuuli; Baese-Berk, Melissa; Heffner, Christopher; Dilley, Laura

    2015-10-01

    During lexical access, listeners use both signal-based and knowledge-based cues, and information from the linguistic context can affect the perception of acoustic speech information. Recent findings suggest that the various cues used in lexical access are implemented with flexibility and may be affected by information from the larger speech context. We conducted 2 experiments to examine effects of a signal-based cue (distal speech rate) and a knowledge-based cue (linguistic structure) on lexical perception. In Experiment 1, we manipulated distal speech rate in utterances where an acoustically ambiguous critical word was either obligatory for the utterance to be syntactically well formed (e.g., Conner knew that bread and butter (are) both in the pantry) or optional (e.g., Don must see the harbor (or) boats). In Experiment 2, we examined identical target utterances as in Experiment 1 but changed the distribution of linguistic structures in the fillers. The results of the 2 experiments demonstrate that speech rate and linguistic knowledge about critical word obligatoriness can both influence speech perception. In addition, it is possible to alter the strength of a signal-based cue by changing information in the speech environment. These results provide support for models of word segmentation that include flexible weighting of signal-based and knowledge-based cues.

  4. Adaptive Plasticity in Wild Field Cricket’s Acoustic Signaling

    PubMed Central

    Bertram, Susan M.; Harrison, Sarah J.; Thomson, Ian R.; Fitzsimmons, Lauren P.

    2013-01-01

    Phenotypic plasticity can be adaptive when phenotypes are closely matched to changes in the environment. In crickets, rhythmic fluctuations in the biotic and abiotic environment regularly result in diel rhythms in density of sexually active individuals. Given that density strongly influences the intensity of sexual selection, we asked whether crickets exhibit plasticity in signaling behavior that aligns with these rhythmic fluctuations in the socio-sexual environment. We quantified the acoustic mate signaling behavior of wild-caught males of two cricket species, Gryllus veletis and G. pennsylvanicus. Crickets exhibited phenotypically plastic mate signaling behavior, with most males signaling more often and more attractively during the times of day when mating activity is highest in the wild. Most male G. pennsylvanicus chirped more often and louder, with shorter interpulse durations, pulse periods, chirp durations, and interchirp durations, and at slightly higher carrier frequencies during the time of the day that mating activity is highest in the wild. Similarly, most male G. veletis chirped more often, with more pulses per chirp, longer interpulse durations, pulse periods, and chirp durations, shorter interchirp durations, and at lower carrier frequencies during the time of peak mating activity in the wild. Among-male variation in signaling plasticity was high, with some males signaling in an apparently maladaptive manner. Body size explained some of the among-male variation in G. pennsylvanicus plasticity but not G. veletis plasticity. Overall, our findings suggest that crickets exhibit phenotypically plastic mate attraction signals that closely match the fluctuating socio-sexual context they experience. PMID:23935965

  5. Extended amplification of acoustic signals by amphibian burrows.

    PubMed

    Muñoz, Matías I; Penna, Mario

    2016-07-01

    Animals relying on acoustic signals for communication must cope with the constraints imposed by the environment for sound propagation. A resource to improve signal broadcast is the use of structures that favor the emission or the reception of sounds. We conducted playback experiments to assess the effect of the burrows occupied by the frogs Eupsophus emiliopugini and E. calcaratus on the amplitude of outgoing vocalizations. In addition, we evaluated the influence of these cavities on the reception of externally generated sounds potentially interfering with conspecific communication, namely, the vocalizations emitted by four syntopic species of anurans (E. emiliopugini, E. calcaratus, Batrachyla antartandica, and Pleurodema thaul) and the nocturnal owls Strix rufipes and Glaucidium nanum. Eupsophus advertisement calls emitted from within the burrows experienced average amplitude gains of 3-6 dB at 100 cm from the burrow openings. Likewise, the incoming vocalizations of amphibians and birds were amplified on average above 6 dB inside the cavities. The amplification of internally broadcast Eupsophus vocalizations favors signal detection by nearby conspecifics. Reciprocally, the amplification of incoming conspecific and heterospecific signals facilitates the detection of neighboring males and the monitoring of the levels of potentially interfering biotic noise by resident frogs, respectively.

  6. Audiovisual Speech Perception in Children with Developmental Language Disorder in Degraded Listening Conditions

    ERIC Educational Resources Information Center

    Meronen, Auli; Tiippana, Kaisa; Westerholm, Jari; Ahonen, Timo

    2013-01-01

    Purpose: The effect of the signal-to-noise ratio (SNR) on the perception of audiovisual speech in children with and without developmental language disorder (DLD) was investigated by varying the noise level and the sound intensity of acoustic speech. The main hypotheses were that the McGurk effect (in which incongruent visual speech alters the…

  7. Applications of sub-audible speech recognition based upon electromyographic signals

    NASA Technical Reports Server (NTRS)

    Jorgensen, C. Charles (Inventor); Betts, Bradley J. (Inventor)

    2009-01-01

    Method and system for generating electromyographic or sub-audible signals (''SAWPs'') and for transmitting and recognizing the SAWPs that represent the original words and/or phrases. The SAWPs may be generated in an environment that interferes excessively with normal speech or that requires stealth communications, and may be transmitted using encoded, enciphered or otherwise transformed signals that are less subject to signal distortion or degradation in the ambient environment.

  8. Acoustic Emission Signals in Thin Plates Produced by Impact Damage

    NASA Technical Reports Server (NTRS)

    Prosser, William H.; Gorman, Michael R.; Humes, Donald H.

    1999-01-01

    Acoustic emission (AE) signals created by impact sources in thin aluminum and graphite/epoxy composite plates were analyzed. Two different impact velocity regimes were studied. Low-velocity (less than 0.21 km/s) impacts were created with an airgun firing spherical steel projectiles (4.5 mm diameter). High-velocity (1.8 to 7 km/s) impacts were generated with a two-stage light-gas gun firing small cylindrical nylon projectiles (1.5 mm diameter). Both the impact velocity and impact angle were varied. The impacts did not penetrate the aluminum plates at either low or high velocities. For high-velocity impacts in composites, there were both impacts that fully penetrated the plate as well as impacts that did not. All impacts generated very large amplitude AE signals (1-5 V at the sensor), which propagated as plate (extensional and/or flexural) modes. In the low-velocity impact studies, the signal was dominated by a large flexural mode with only a small extensional mode component detected. As the impact velocity was increased within the low velocity regime, the overall amplitudes of both the extensional and flexural modes increased. In addition, a relative increase in the amplitude of high-frequency components of the flexural mode was also observed. Signals caused by high-velocity impacts that did not penetrate the plate contained both a large extensional and flexural mode component of comparable amplitudes. The signals also contained components of much higher frequency and were easily differentiated from those caused by low-velocity impacts. An interesting phenomenon was observed in that the large flexural mode component, seen in every other case, was absent from the signal when the impact particle fully penetrated through the composite plates.

  9. Removal of Noise from Noise-Degraded Speech Signals. Panel on Removal of Noise from a Speech/Noise Signal

    DTIC Science & Technology

    1989-06-01

    listeners with a sensorineural hearing loss . The largest improvements in intelligibility scores were observed with iow-frequency noise (600 to 800 Hz) and...spectrum. Another study of the Zeta Noise Blocker was carried out by Wolinsky (1986) using 18 subjects with moderate to severe sensorineural hearing loss ...speech-enhancement devices for hearing -impaired people was reviewed. Evaluation techniques were reviewed to determine their suitability, particularly for

  10. Fast multi-feature paradigm for recording several mismatch negativities (MMNs) to phonetic and acoustic changes in speech sounds.

    PubMed

    Pakarinen, Satu; Lovio, Riikka; Huotilainen, Minna; Alku, Paavo; Näätänen, Risto; Kujala, Teija

    2009-12-01

    In this study, we addressed whether a new fast multi-feature mismatch negativity (MMN) paradigm can be used for determining the central auditory discrimination accuracy for several acoustic and phonetic changes in speech sounds. We recorded the MMNs in the multi-feature paradigm to changes in syllable intensity, frequency, and vowel length, as well as for consonant and vowel change, and compared these MMNs to those obtained with the traditional oddball paradigm. In addition, we examined the reliability of the multi-feature paradigm by repeating the recordings with the same subjects 1-7 days after the first recordings. The MMNs recorded with the multi-feature paradigm were similar to those obtained with the oddball paradigm. Furthermore, only minor differences were observed in the MMN amplitudes across the two recording sessions. Thus, this new multi-feature paradigm with speech stimuli provides similar results as the oddball paradigm, and the MMNs recorded with the new paradigm were reproducible.

  11. Spectral models of additive and modulation noise in speech and phonatory excitation signals

    NASA Astrophysics Data System (ADS)

    Schoentgen, Jean

    2003-01-01

    The article presents spectral models of additive and modulation noise in speech. The purpose is to learn about the causes of noise in the spectra of normal and disordered voices and to gauge whether the spectral properties of the perturbations of the phonatory excitation signal can be inferred from the spectral properties of the speech signal. The approach to modeling consists of deducing the Fourier series of the perturbed speech, assuming that the Fourier series of the noise and of the clean monocycle-periodic excitation are known. The models explain published data, take into account the effects of supraglottal tremor, demonstrate the modulation distortion owing to vocal tract filtering, establish conditions under which noise cues of different speech signals may be compared, and predict the impossibility of inferring the spectral properties of the frequency modulating noise from the spectral properties of the frequency modulation noise (e.g., phonatory jitter and frequency tremor). The general conclusion is that only phonatory frequency modulation noise is spectrally relevant. Other types of noise in speech are either epiphenomenal, or their spectral effects are masked by the spectral effects of frequency modulation noise.

  12. Detection of Gear Failures via Vibration and Acoustic Signals Using Wavelet Transform

    NASA Astrophysics Data System (ADS)

    Baydar, N.; Ball, A.

    2003-07-01

    Vibration analysis is widely used in machinery diagnostics and the wavelet transform has also been implemented in many applications in the condition monitoring of machinery. In contrast to previous applications, this paper examines whether acoustic signal can be used effectively along vibration signal to detect the various local faults in gearboxes using the wavelet transform. Two commonly encountered local faults, tooth breakage and tooth crack, were simulated. The results from acoustic signals were compared with vibration signals. The results suggest that acoustic signals are very affective for the early detection of faults and may provide a powerful tool to indicate the various types of progressing faults in gearboxes.

  13. The Effects of Noise on Speech and Warning Signals

    DTIC Science & Technology

    1989-06-01

    talking before an audience, on the telephone, and even in the presence of a microphone (Webster, 1984). On the basis of data from van Heusden ait &I...listen to speech within a certain range of levels. A study by van Heusden ah &1- (1979) explores the relationships between selected listening levels...would account for a portion (about 6 dB) of the difference between the results of Beattie a•t.s and the work of van Heusden aL-aL (1979) and Polo at-al

  14. The Perception of Telephone-Processed Speech by Combined Electric and Acoustic Stimulation

    PubMed Central

    Tahmina, Qudsia; Runge, Christina; Friedland, David R.

    2013-01-01

    This study assesses the effects of adding low- or high-frequency information to the band-limited telephone-processed speech on bimodal listeners’ telephone speech perception in quiet environments. In the proposed experiments, bimodal users were presented under quiet listening conditions with wideband speech (WB), bandpass-filtered telephone speech (300–3,400 Hz, BP), high-pass filtered speech (f > 300 Hz, HP, i.e., distorted frequency components above 3,400 Hz in telephone speech were restored), and low-pass filtered speech (f < 3,400 Hz, LP, i.e., distorted frequency components below 300 Hz in telephone speech were restored). Results indicated that in quiet environments, for all four types of stimuli, listening with both hearing aid (HA) and cochlear implant (CI) was significantly better than listening with CI alone. For both bimodal and CI-alone modes, there were no statistically significant differences between the LP and BP scores and between the WB and HP scores. However, the HP scores were significantly better than the BP scores. In quiet conditions, both CI alone and bimodal listening achieved the largest benefits when telephone speech was augmented with high rather than low-frequency information. These findings provide support for the design of algorithms that would extend higher frequency information, at least in quiet environments. PMID:24265213

  15. The Effect of Residual Acoustic Hearing and Adaptation to Uncertainty on Speech Perception in Cochlear Implant Users: Evidence from Eye-Tracking

    PubMed Central

    McMurray, Bob; Farris-Trimble, Ashley; Seedorff, Michael; Rigler, Hannah

    2015-01-01

    Objectives While outcomes with cochlear implants (CIs) are generally good, performance can be fragile. The authors examined two factors that are crucial for good CI performance. First, while there is a clear benefit for adding residual acoustic hearing to CI stimulation (typically in low frequencies), it is unclear whether this contributes directly to phonetic categorization. Thus, the authors examined perception of voicing (which uses low-frequency acoustic cues) and fricative place of articulation (s/ʃ, which does not) in CI users with and without residual acoustic hearing. Second, in speech categorization experiments, CI users typically show shallower identification functions. These are typically interpreted as deriving from noisy encoding of the signal. However, psycholinguistic work suggests shallow slopes may also be a useful way to adapt to uncertainty. The authors thus employed an eye-tracking paradigm to examine this in CI users. Design Participants were 30 CI users (with a variety of configurations) and 22 age-matched normal hearing (NH) controls. Participants heard tokens from six b/p and six s/ʃ continua (eight steps) spanning real words (e.g., beach/peach, sip/ship). Participants selected the picture corresponding to the word they heard from a screen containing four items (a b-, p-, s- and ʃ-initial item). Eye movements to each object were monitored as a measure of how strongly they were considering each interpretation in the moments leading up to their final percept. Results Mouse-click results (analogous to phoneme identification) for voicing showed a shallower slope for CI users than NH listeners, but no differences between CI users with and without residual acoustic hearing. For fricatives, CI users also showed a shallower slope, but unexpectedly, acoustic + electric listeners showed an even shallower slope. Eye movements showed a gradient response to fine-grained acoustic differences for all listeners. Even considering only trials in which a

  16. Acceptance Noise Level: Effects of the Speech Signal, Babble, and Listener Language

    ERIC Educational Resources Information Center

    Shi, Lu-Feng; Azcona, Gabrielly; Buten, Lupe

    2015-01-01

    Purpose: The acceptable noise level (ANL) measure has gained much research/clinical interest in recent years. The present study examined how the characteristics of the speech signal and the babble used in the measure may affect the ANL in listeners with different native languages. Method: Fifteen English monolingual, 16 Russian-English bilingual,…

  17. The Effect of Asymmetrical Signal Degradation on Binaural Speech Recognition in Children and Adults.

    ERIC Educational Resources Information Center

    Rothpletz, Ann M.; Tharpe, Anne Marie; Grantham, D. Wesley

    2004-01-01

    To determine the effect of asymmetrical signal degradation on binaural speech recognition, 28 children and 14 adults were administered a sentence recognition task amidst multitalker babble. There were 3 listening conditions: (a) monaural, with mild degradation in 1 ear; (b) binaural, with mild degradation in both ears (symmetric degradation); and…

  18. Prosodic Contrasts in Ironic Speech

    ERIC Educational Resources Information Center

    Bryant, Gregory A.

    2010-01-01

    Prosodic features in spontaneous speech help disambiguate implied meaning not explicit in linguistic surface structure, but little research has examined how these signals manifest themselves in real conversations. Spontaneously produced verbal irony utterances generated between familiar speakers in conversational dyads were acoustically analyzed…

  19. An acoustic feature-based similarity scoring system for speech rehabilitation assistance.

    PubMed

    Syauqy, Dahnial; Wu, Chao-Min; Setyawati, Onny

    2016-08-01

    The purpose of this study is to develop a tool to assist speech therapy and rehabilitation, which focused on automatic scoring based on the comparison of the patient's speech with another normal speech on several aspects including pitch, vowel, voiced-unvoiced segments, strident fricative and sound intensity. The pitch estimation employed the use of cepstrum-based algorithm for its robustness; the vowel classification used multilayer perceptron (MLP) to classify vowel from pitch and formants; and the strident fricative detection was based on the major peak spectral intensity, location and the pitch existence in the segment. In order to evaluate the performance of the system, this study analyzed eight patient's speech recordings (four males, four females; 4-58-years-old), which had been recorded in previous study in cooperation with Taipei Veterans General Hospital and Taoyuan General Hospital. The experiment result on pitch algorithm showed that the cepstrum method had 5.3% of gross pitch error from a total of 2086 frames. On the vowel classification algorithm, MLP method provided 93% accuracy (men), 87% (women) and 84% (children). In total, the overall results showed that 156 tool's grading results (81%) were consistent compared to 192 audio and visual observations done by four experienced respondents. Implication for Rehabilitation Difficulties in communication may limit the ability of a person to transfer and exchange information. The fact that speech is one of the primary means of communication has encouraged the needs of speech diagnosis and rehabilitation. The advances of technology in computer-assisted speech therapy (CAST) improve the quality, time efficiency of the diagnosis and treatment of the disorders. The present study attempted to develop tool to assist speech therapy and rehabilitation, which provided simple interface to let the assessment be done even by the patient himself without the need of particular knowledge of speech processing while at the

  20. Designing acoustics for linguistically diverse classrooms: Effects of background noise, reverberation and talker foreign accent on speech comprehension by native and non-native English-speaking listeners

    NASA Astrophysics Data System (ADS)

    Peng, Zhao Ellen

    The current classroom acoustics standard (ANSI S12.60-2010) recommends core learning spaces not to exceed background noise level (BNL) of 35 dBA and reverberation time (RT) of 0.6 second, based on speech intelligibility performance mainly by the native English-speaking population. Existing literature has not correlated these recommended values well with student learning outcomes. With a growing population of non-native English speakers in American classrooms, the special needs for perceiving degraded speech among non-native listeners, either due to realistic room acoustics or talker foreign accent, have not been addressed in the current standard. This research seeks to investigate the effects of BNL and RT on the comprehension of English speech from native English and native Mandarin Chinese talkers as perceived by native and non-native English listeners, and to provide acoustic design guidelines to supplement the existing standard. This dissertation presents two studies on the effects of RT and BNL on more realistic classroom learning experiences. How do native and non-native English-speaking listeners perform on speech comprehension tasks under adverse acoustic conditions, if the English speech is produced by talkers of native English (Study 1) versus native Mandarin Chinese (Study 2)? Speech comprehension materials were played back in a listening chamber to individual listeners: native and non-native English-speaking in Study 1; native English, native Mandarin Chinese, and other non-native English-speaking in Study 2. Each listener was screened for baseline English proficiency level, and completed dual tasks simultaneously involving speech comprehension and adaptive dot-tracing under 15 acoustic conditions, comprised of three BNL conditions (RC-30, 40, and 50) and five RT scenarios (0.4 to 1.2 seconds). The results show that BNL and RT negatively affect both objective performance and subjective perception of speech comprehension, more severely for non

  1. Speech perception as categorization

    PubMed Central

    Holt, Lori L.; Lotto, Andrew J.

    2010-01-01

    Speech perception (SP) most commonly refers to the perceptual mapping from the highly variable acoustic speech signal to a linguistic representation, whether it be phonemes, diphones, syllables, or words. This is an example of categorization, in that potentially discriminable speech sounds are assigned to functionally equivalent classes. In this tutorial, we present some of the main challenges to our understanding of the categorization of speech sounds and the conceptualization of SP that has resulted from these challenges. We focus here on issues and experiments that define open research questions relevant to phoneme categorization, arguing that SP is best understood as perceptual categorization, a position that places SP in direct contact with research from other areas of perception and cognition. PMID:20601702

  2. On-Line Acoustic and Semantic Interpretation of Talker Information

    ERIC Educational Resources Information Center

    Creel, Sarah C.; Tumlin, Melanie A.

    2011-01-01

    Recent work demonstrates that listeners utilize talker-specific information in the speech signal to inform real-time language processing. However, there are multiple representational levels at which this may take place. Listeners might use acoustic cues in the speech signal to access the talker's identity and information about what they tend to…

  3. Frontal top-down signals increase coupling of auditory low-frequency oscillations to continuous speech in human listeners.

    PubMed

    Park, Hyojin; Ince, Robin A A; Schyns, Philippe G; Thut, Gregor; Gross, Joachim

    2015-06-15

    Humans show a remarkable ability to understand continuous speech even under adverse listening conditions. This ability critically relies on dynamically updated predictions of incoming sensory information, but exactly how top-down predictions improve speech processing is still unclear. Brain oscillations are a likely mechanism for these top-down predictions [1, 2]. Quasi-rhythmic components in speech are known to entrain low-frequency oscillations in auditory areas [3, 4], and this entrainment increases with intelligibility [5]. We hypothesize that top-down signals from frontal brain areas causally modulate the phase of brain oscillations in auditory cortex. We use magnetoencephalography (MEG) to monitor brain oscillations in 22 participants during continuous speech perception. We characterize prominent spectral components of speech-brain coupling in auditory cortex and use causal connectivity analysis (transfer entropy) to identify the top-down signals driving this coupling more strongly during intelligible speech than during unintelligible speech. We report three main findings. First, frontal and motor cortices significantly modulate the phase of speech-coupled low-frequency oscillations in auditory cortex, and this effect depends on intelligibility of speech. Second, top-down signals are significantly stronger for left auditory cortex than for right auditory cortex. Third, speech-auditory cortex coupling is enhanced as a function of stronger top-down signals. Together, our results suggest that low-frequency brain oscillations play a role in implementing predictive top-down control during continuous speech perception and that top-down control is largely directed at left auditory cortex. This suggests a close relationship between (left-lateralized) speech production areas and the implementation of top-down control in continuous speech perception.

  4. A Computational Model of Word Segmentation from Continuous Speech Using Transitional Probabilities of Atomic Acoustic Events

    ERIC Educational Resources Information Center

    Rasanen, Okko

    2011-01-01

    Word segmentation from continuous speech is a difficult task that is faced by human infants when they start to learn their native language. Several studies indicate that infants might use several different cues to solve this problem, including intonation, linguistic stress, and transitional probabilities between subsequent speech sounds. In this…

  5. Acoustic Analysis of Clear Versus Conversational Speech in Individuals with Parkinson Disease

    ERIC Educational Resources Information Center

    Goberman, A.M.; Elmer, L.W.

    2005-01-01

    A number of studies have been devoted to the examination of clear versus conversational speech in non-impaired speakers. The purpose of these previous studies has been primarily to help increase speech intelligibility for the benefit of hearing-impaired listeners. The goal of the present study was to examine differences between conversational and…

  6. Real Time Implementation of an LPC Algorithm. Speech Signal Processing Research at CHI

    DTIC Science & Technology

    1975-05-01

    large distortion values. Since timbre is just overtone content, some harmonic distortion is tolerable in speech and music ; it is effectively masked by...the signal. Olson has shown that .75% second harmonic distortion is perceptible in 15KHz music .[4] The higher harmonics, however, are more easily...34 musically " related to the signal. The threshold for IM perception is said to be .5%.[1] There are two accepted methods for measuring IM distortion: i

  7. Integrated Spacesuit Audio System Enhances Speech Quality and Reduces Noise

    NASA Technical Reports Server (NTRS)

    Huang, Yiteng Arden; Chen, Jingdong; Chen, Shaoyan Sharyl

    2009-01-01

    A new approach has been proposed for increasing astronaut comfort and speech capture. Currently, the special design of a spacesuit forms an extreme acoustic environment making it difficult to capture clear speech without compromising comfort. The proposed Integrated Spacesuit Audio (ISA) system is to incorporate the microphones into the helmet and use software to extract voice signals from background noise.

  8. Impacts of Underwater Turbulence on Acoustical and Optical Signals and Their Linkage

    DTIC Science & Technology

    2013-02-12

    convected quantities like temperature in turbulence fluid," J. Fluid Mech. 5,113-133(1959). 26. J. W. Goodman , Introduction to Fourier Optics (Roberts...Turbulence on Acoustical and Optical Signals and Their Linkage 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 0602782N 6...Acoustical and optical signal transmission underwater is of vital interest for both civilian and military applications. The range and signal to noise

  9. Effect of Digital Frequency Compression (DFC) on Speech Recognition in Candidates for Combined Electric and Acoustic Stimulation (EAS)

    PubMed Central

    Gifford, René H.; Dorman, Michael F.; Spahr, Anthony J.; McKarns, Sharon A.

    2008-01-01

    Purpose To compare the effects of conventional amplification (CA) and digital frequency compression (DFC) amplification on the speech recognition abilities of candidates for a partial-insertion cochlear implant, that is, candidates for combined electric and acoustic stimulation (EAS). Method The participants were 6 patients whose audiometric thresholds at 500 Hz and below were ≤60 dB HL and whose thresholds at 2000 Hz and above were ≥80 dB HL. Six tests of speech understanding were administered with CA and DFC. The Abbreviated Profile of Hearing Aid Benefit (APHAB) was also administered following use of CA and DFC. Results Group mean scores were not statistically different in the CA and DFC conditions. However, 2 patients received substantial benefit in DFC conditions. APHAB scores suggested increased ease of communication, but also increased aversive sound quality. Conclusion Results suggest that a relatively small proportion of individuals who meet EAS candidacy will receive substantial benefit from a DFC hearing aid and that a larger proportion will receive at least a small benefit when speech is presented against a background of noise. This benefit, however, comes at a cost—aversive sound quality. PMID:17905905

  10. Cumulative and Synergistic Effects of Physical, Biological and Acoustic Signals on Marine Mammal Habitat Use

    DTIC Science & Technology

    2009-09-30

    rather than animals. Note that some animals do utilize the higher frequency bands, e.g. killer and beluga whales , but these animals are only...NOAA-supported projects, including Passive Acoustic monitoring of killer and beluga whales at the Barren Islands, Alaska, the Bering Sea Acoustic...physical, biological and acoustic signals impact marine mammal habitat use. In particular, what are the effects of manmade underwater sound on

  11. Signal-to-noise ratio adaptive post-filtering method for intelligibility enhancement of telephone speech.

    PubMed

    Jokinen, Emma; Yrttiaho, Santeri; Pulakka, Hannu; Vainio, Martti; Alku, Paavo

    2012-12-01

    Post-filtering can be utilized to improve the quality and intelligibility of telephone speech. Previous studies have shown that energy reallocation with a high-pass type filter works effectively in improving the intelligibility of speech in difficult noise conditions. The present study introduces a signal-to-noise ratio adaptive post-filtering method that utilizes energy reallocation to transfer energy from the first formant to higher frequencies. The proposed method adapts to the level of the background noise so that, in favorable noise conditions, the post-filter has a flat frequency response and the effect of the post-filtering is increased as the level of the ambient noise increases. The performance of the proposed method is compared with a similar post-filtering algorithm and unprocessed speech in subjective listening tests which evaluate both intelligibility and listener preference. The results indicate that both of the post-filtering methods maintain the quality of speech in negligible noise conditions and are able to provide intelligibility improvement over unprocessed speech in adverse noise conditions. Furthermore, the proposed post-filtering algorithm performs better than the other post-filtering method under evaluation in moderate to difficult noise conditions, where intelligibility improvement is mostly required.

  12. MASS-DEPENDENT BARYON ACOUSTIC OSCILLATION SIGNAL AND HALO BIAS

    SciTech Connect

    Wang Qiao; Zhan Hu

    2013-05-10

    We characterize the baryon acoustic oscillations (BAO) feature in halo two-point statistics using N-body simulations. We find that nonlinear damping of the BAO signal is less severe for halos in the mass range we investigate than for dark matter. The amount of damping depends weakly on the halo mass. The correlation functions show a mass-dependent drop of the halo clustering bias below roughly 90 h {sup -1} Mpc, which coincides with the scale of the BAO trough. The drop of bias is 4% for halos with mass M > 10{sup 14} h {sup -1} M{sub Sun} and reduces to roughly 2% for halos with mass M > 10{sup 13} h {sup -1} M{sub Sun }. In contrast, halo biases in simulations without BAO change more smoothly around 90 h {sup -1} Mpc. In Fourier space, the bias of M > 10{sup 14} h {sup -1} M{sub Sun} halos decreases smoothly by 11% from wavenumber k = 0.012 h Mpc{sup -1} to 0.2 h Mpc{sup -1}, whereas that of M > 10{sup 13} h {sup -1} M{sub Sun} halos decreases by less than 4% over the same range. By comparing the halo biases in pairs of otherwise identical simulations, one with and the other without BAO, we also observe a modulation of the halo bias. These results suggest that precise calibrations of the mass-dependent BAO signal and scale-dependent bias on large scales would be needed for interpreting precise measurements of the two-point statistics of clusters or massive galaxies in the future.

  13. Speech endpoint detection with non-language speech sounds for generic speech processing applications

    NASA Astrophysics Data System (ADS)

    McClain, Matthew; Romanowski, Brian

    2009-05-01

    Non-language speech sounds (NLSS) are sounds produced by humans that do not carry linguistic information. Examples of these sounds are coughs, clicks, breaths, and filled pauses such as "uh" and "um" in English. NLSS are prominent in conversational speech, but can be a significant source of errors in speech processing applications. Traditionally, these sounds are ignored by speech endpoint detection algorithms, where speech regions are identified in the audio signal prior to processing. The ability to filter NLSS as a pre-processing step can significantly enhance the performance of many speech processing applications, such as speaker identification, language identification, and automatic speech recognition. In order to be used in all such applications, NLSS detection must be performed without the use of language models that provide knowledge of the phonology and lexical structure of speech. This is especially relevant to situations where the languages used in the audio are not known apriori. We present the results of preliminary experiments using data from American and British English speakers, in which segments of audio are classified as language speech sounds (LSS) or NLSS using a set of acoustic features designed for language-agnostic NLSS detection and a hidden-Markov model (HMM) to model speech generation. The results of these experiments indicate that the features and model used are capable of detection certain types of NLSS, such as breaths and clicks, while detection of other types of NLSS such as filled pauses will require future research.

  14. Behavioral Signal Processing: Deriving Human Behavioral Informatics From Speech and Language

    PubMed Central

    Narayanan, Shrikanth; Georgiou, Panayiotis G.

    2013-01-01

    The expression and experience of human behavior are complex and multimodal and characterized by individual and contextual heterogeneity and variability. Speech and spoken language communication cues offer an important means for measuring and modeling human behavior. Observational research and practice across a variety of domains from commerce to healthcare rely on speech- and language-based informatics for crucial assessment and diagnostic information and for planning and tracking response to an intervention. In this paper, we describe some of the opportunities as well as emerging methodologies and applications of human behavioral signal processing (BSP) technology and algorithms for quantitatively understanding and modeling typical, atypical, and distressed human behavior with a specific focus on speech- and language-based communicative, affective, and social behavior. We describe the three important BSP components of acquiring behavioral data in an ecologically valid manner across laboratory to real-world settings, extracting and analyzing behavioral cues from measured data, and developing models offering predictive and decision-making support. We highlight both the foundational speech and language processing building blocks as well as the novel processing and modeling opportunities. Using examples drawn from specific real-world applications ranging from literacy assessment and autism diagnostics to psychotherapy for addiction and marital well being, we illustrate behavioral informatics applications of these signal processing techniques that contribute to quantifying higher level, often subjectively described, human behavior in a domain-sensitive fashion. PMID:24039277

  15. Acoustic Variations in Adductor Spasmodic Dysphonia as a Function of Speech Task.

    ERIC Educational Resources Information Center

    Sapienza, Christine M.; Walton, Suzanne; Murry, Thomas

    1999-01-01

    Acoustic phonatory events were identified in 14 women diagnosed with adductor spasmodic dysphonia (ADSD), a focal laryngeal dystonia that disturbs phonatory function, and compared with those of 14 age-matched women with no vocal dysfunction. Findings indicated ADSD subjects produced more aberrant acoustic events than controls during tasks of…

  16. Delay-coordinates embeddings as a data mining tool for denoising speech signals.

    PubMed

    Napoletani, D; Struppa, D C; Sauer, T; Berenstein, C A; Walnut, D

    2006-12-01

    In this paper, we utilize techniques from the theory of nonlinear dynamical systems to define a notion of embedding estimators. More specifically, we use delay-coordinates embeddings of sets of coefficients of the measured signal (in some chosen frame) as a data mining tool to separate structures that are likely to be generated by signals belonging to some predetermined data set. We implement the embedding estimator in a windowed Fourier frame, and we apply it to speech signals heavily corrupted by white noise. Our experimental work suggests that, after training on the data sets of interest, these estimators perform well for a variety of white noise processes and noise intensity levels.

  17. Intensity Accents in French 2 Year Olds' Speech.

    ERIC Educational Resources Information Center

    Allen, George D.

    The acoustic features and functions of accentuation in French are discussed, and features of accentuation in the speech of French 2-year-olds are explored. The four major acoustic features used to signal accentual distinctions are fundamental frequency of voicing, duration of segments and syllables, intensity of segments and syllables, and…

  18. Sub-Audible Speech Recognition Based upon Electromyographic Signals

    NASA Technical Reports Server (NTRS)

    Jorgensen, Charles C. (Inventor); Lee, Diana D. (Inventor); Agabon, Shane T. (Inventor)

    2012-01-01

    Method and system for processing and identifying a sub-audible signal formed by a source of sub-audible sounds. Sequences of samples of sub-audible sound patterns ("SASPs") for known words/phrases in a selected database are received for overlapping time intervals, and Signal Processing Transforms ("SPTs") are formed for each sample, as part of a matrix of entry values. The matrix is decomposed into contiguous, non-overlapping two-dimensional cells of entries, and neural net analysis is applied to estimate reference sets of weight coefficients that provide sums with optimal matches to reference sets of values. The reference sets of weight coefficients are used to determine a correspondence between a new (unknown) word/phrase and a word/phrase in the database.

  19. Ecology of acoustic signalling and the problem of masking interference in insects.

    PubMed

    Schmidt, Arne K D; Balakrishnan, Rohini

    2015-01-01

    The efficiency of long-distance acoustic signalling of insects in their natural habitat is constrained in several ways. Acoustic signals are not only subjected to changes imposed by the physical structure of the habitat such as attenuation and degradation but also to masking interference from co-occurring signals of other acoustically communicating species. Masking interference is likely to be a ubiquitous problem in multi-species assemblages, but successful communication in natural environments under noisy conditions suggests powerful strategies to deal with the detection and recognition of relevant signals. In this review we present recent work on the role of the habitat as a driving force in shaping insect signal structures. In the context of acoustic masking interference, we discuss the ecological niche concept and examine the role of acoustic resource partitioning in the temporal, spatial and spectral domains as sender strategies to counter masking. We then examine the efficacy of different receiver strategies: physiological mechanisms such as frequency tuning, spatial release from masking and gain control as useful strategies to counteract acoustic masking. We also review recent work on the effects of anthropogenic noise on insect acoustic communication and the importance of insect sounds as indicators of biodiversity and ecosystem health.

  20. Nonlinear Statistical Modeling of Speech

    NASA Astrophysics Data System (ADS)

    Srinivasan, S.; Ma, T.; May, D.; Lazarou, G.; Picone, J.

    2009-12-01

    Contemporary approaches to speech and speaker recognition decompose the problem into four components: feature extraction, acoustic modeling, language modeling and search. Statistical signal processing is an integral part of each of these components, and Bayes Rule is used to merge these components into a single optimal choice. Acoustic models typically use hidden Markov models based on Gaussian mixture models for state output probabilities. This popular approach suffers from an inherent assumption of linearity in speech signal dynamics. Language models often employ a variety of maximum entropy techniques, but can employ many of the same statistical techniques used for acoustic models. In this paper, we focus on introducing nonlinear statistical models to the feature extraction and acoustic modeling problems as a first step towards speech and speaker recognition systems based on notions of chaos and strange attractors. Our goal in this work is to improve the generalization and robustness properties of a speech recognition system. Three nonlinear invariants are proposed for feature extraction: Lyapunov exponents, correlation fractal dimension, and correlation entropy. We demonstrate an 11% relative improvement on speech recorded under noise-free conditions, but show a comparable degradation occurs for mismatched training conditions on noisy speech. We conjecture that the degradation is due to difficulties in estimating invariants reliably from noisy data. To circumvent these problems, we introduce two dynamic models to the acoustic modeling problem: (1) a linear dynamic model (LDM) that uses a state space-like formulation to explicitly model the evolution of hidden states using an autoregressive process, and (2) a data-dependent mixture of autoregressive (MixAR) models. Results show that LDM and MixAR models can achieve comparable performance with HMM systems while using significantly fewer parameters. Currently we are developing Bayesian parameter estimation and

  1. Estimation of the Tool Condition by Applying the Wavelet Transform to Acoustic Emission Signals

    SciTech Connect

    Gomez, M. P.; Piotrkowski, R.; Ruzzante, J. E.; D'Attellis, C. E.

    2007-03-21

    This work follows the search of parameters to evaluate the tool condition in machining processes. The selected sensing technique is acoustic emission and it is applied to a turning process of steel samples. The obtained signals are studied using the wavelet transformation. The tool wear level is quantified as a percentage of the final wear specified by the Standard ISO 3685. The amplitude and relevant scale obtained of acoustic emission signals could be related with the wear level.

  2. A human vocal utterance corpus for perceptual and acoustic analysis of speech, singing, and intermediate vocalizations

    NASA Astrophysics Data System (ADS)

    Gerhard, David

    2002-11-01

    In this paper we present the collection and annotation process of a corpus of human utterance vocalizations used for speech and song research. The corpus was collected to fill a void in current research tools, since no corpus currently exists which is useful for the classification of intermediate utterances between speech and monophonic singing. Much work has been done in the domain of speech versus music discrimination, and several corpora exist which can be used for this research. A specific example is the work done by Eric Scheirer and Malcom Slaney [IEEE ICASSP, 1997, pp. 1331-1334]. The collection of the corpus is described including questionnaire design and intended and actual response characteristics, as well as the collection and annotation of pre-existing samples. The annotation of the corpus consisted of a survey tool for a subset of the corpus samples, including ratings of the clips based on a speech-song continuum, and questions on the perceptual qualities of speech and song, both generally and corresponding to particular clips in the corpus.

  3. A comparison of signal-processing front ends for automatic speech recognition

    NASA Astrophysics Data System (ADS)

    Jankowski, C. R., Jr.; Vo, H.-D. H.; Lippmann, R. P.

    1994-07-01

    The first stage of any system for automatic speech recognition (ASR) is a signal-processing front end that converts a sampled speech waveform into a more suitable representation for later processing. Several front ends are compared, three of which are based on knowledge about the human auditory system. The performance of an ASR system with these front ends was compared to a control mel filter bank (MFB)-based cepstral representation in clean speech and with speech degraded by noise and spectral variability. Using the TI-105 isolated word data base, it was found that auditory front ends performed comparably to MFB cepstra, sometimes slightly better in noise. With MFB cepstral recognition error rates ranging from 0.5% to 26.9%, depending on signal-to-noise ratio (SNR), auditory models could perform as high as four percentage points better. With speech degraded by linear filtering, where MFB cepstra showed error rates ranging from 0.5% to 3.1%, auditory outputs could improve performance by as much as 0.4% for conditions with high baseline error rates. This performance gain comes at a significant computational expense-approximately one-third real time for MFB cepstra as opposed to as much as over 100 times real time for auditory models. These results disagree with previous studies that suggest considerably more improvement with auditory models. However, these earlier studies used a linear predictive coding (LPC)-based control front end, which is shown to perform significantly worse than MFB cepstra under noisy conditions (e.g., 2.7% error rate with mel-cepstra vs. 25.3% with LPC at 18-dB SNR). Data-reduction techniques such as principal component analysis (PCA) and linear discriminant analysis (LDA) were also evaluated. PCA provided no gain in noise and slight gain with spectral variability.

  4. Speech encoding strategies for multielectrode cochlear implants: a digital signal processor approach.

    PubMed

    Dillier, N; Bögli, H; Spillmann, T

    1993-01-01

    The following processing strategies have been implemented on an experimental laboratory system of a cochlear implant digital speech processor (CIDSP) for the Nucleus 22-channel cochlear prosthesis. The first approach (PES, Pitch Excited Sampler) is based on the maximum peak channel vocoder concept whereby the time-varying spectral energy of a number of frequency bands is transformed into electrical stimulation parameters for up to 22 electrodes. The pulse rate at any electrode is controlled by the voice pitch of the input speech signal. The second approach (CIS, Continuous Interleaved Sampler) uses a stimulation pulse rate which is independent of the input signal. The algorithm continuously scans all specified frequency bands (typically between four and 22) and samples their energy levels. As only one electrode can be stimulated at any instance of time, the maximally achievable rate of stimulation is limited by the required stimulus pulse widths (determined individually for each subject) and some additional constraints and parameters. A number of variations of the CIS approach have, therefore, been implemented which either maximize the number of quasi-simultaneous stimulation channels or the pulse rate on a reduced number of electrodes. Evaluation experiments with five experienced cochlear implant users showed significantly better performance in consonant identification tests with the new processing strategies than with the subjects' own wearable speech processors; improvements in vowel identification tasks were rarely observed. Modifications of the basic PES- and CIS strategies resulted in large variations of identification scores. Information transmission analysis of confusion matrices revealed a rather complex pattern across conditions and speech features. Optimization and fine-tuning of processing parameters for these coding strategies will require more data both from speech identification and discrimination evaluations and from psychophysical experiments.

  5. Study of acoustic emission signals during fracture shear deformation

    NASA Astrophysics Data System (ADS)

    Ostapchuk, A. A.; Pavlov, D. V.; Markov, V. K.; Krasheninnikov, A. V.

    2016-07-01

    We study acoustic manifestations of different regimes of shear deformation of a fracture filled with a thin layer of granular material. It is established that the observed acoustic portrait is determined by the structure of the fracture at the mesolevel. Joint analysis of the activity of acoustic pulses and their spectral characteristics makes it possible to construct the pattern of internal evolutionary processes occurring in the thin layer of the interblock contact and consider the fracture deformation process as the evolution of a self-organizing system.

  6. Courtship Initiation Is Stimulated by Acoustic Signals in Drosophila melanogaster

    PubMed Central

    Ejima, Aki; Griffith, Leslie C.

    2008-01-01

    Finding a mating partner is a critical task for many organisms. It is in the interest of males to employ multiple sensory modalities to search for females. In Drosophila melanogaster, vision is thought to be the most important courtship stimulating cue at long distance, while chemosensory cues are used at relatively short distance. In this report, we show that when visual cues are not available, sounds produced by the female allow the male to detect her presence in a large arena. When the target female was artificially immobilized, the male spent a prolonged time searching before starting courtship. This delay in courtship initiation was completely rescued by playing either white noise or recorded fly movement sounds to the male, indicating that the acoustic and/or seismic stimulus produced by movement stimulates courtship initiation, most likely by increasing the general arousal state of the male. Mutant males expressing tetanus toxin (TNT) under the control of Gr68a-GAL4 had a defect in finding active females and a delay in courtship initiation in a large arena, but not in a small arena. Gr68a-GAL4 was found to be expressed pleiotropically not only in putative gustatory pheromone receptor neurons but also in mechanosensory neurons, suggesting that Gr68a-positive mechanosensory neurons, not gustatory neurons, provide motion detection necessary for courtship initiation. TNT/Gr68a males were capable of discriminating the copulation status and age of target females in courtship conditioning, indicating that female discrimination and formation of olfactory courtship memory are independent of the Gr68a-expressing neurons that subserve gustation and mechanosensation. This study suggests for the first time that mechanical signals generated by a female fly have a prominent effect on males' courtship in the dark and leads the way to studying how multimodal sensory information and arousal are integrated in behavioral decision making. PMID:18802468

  7. Synergy of seismic, acoustic, and video signals in blast analysis

    SciTech Connect

    Anderson, D.P.; Stump, B.W.; Weigand, J.

    1997-09-01

    The range of mining applications from hard rock quarrying to coal exposure to mineral recovery leads to a great variety of blasting practices. A common characteristic of many of the sources is that they are detonated at or near the earth`s surface and thus can be recorded by camera or video. Although the primary interest is in the seismic waveforms that these blasts generate, the visual observations of the blasts provide important constraints that can be applied to the physical interpretation of the seismic source function. In particular, high speed images can provide information on detonation times of individuals charges, the timing and amount of mass movement during the blasting process and, in some instances, evidence of wave propagation away from the source. All of these characteristics can be valuable in interpreting the equivalent seismic source function for a set of mine explosions and quantifying the relative importance of the different processes. This paper documents work done at the Los Alamos National Laboratory and Southern Methodist University to take standard Hi-8 video of mine blasts, recover digital images from them, and combine them with ground motion records for interpretation. The steps in the data acquisition, processing, display, and interpretation are outlined. The authors conclude that the combination of video with seismic and acoustic signals can be a powerful diagnostic tool for the study of blasting techniques and seismology. A low cost system for generating similar diagnostics using consumer-grade video camera and direct-to-disk video hardware is proposed. Application is to verification of the Comprehensive Test Ban Treaty.

  8. Direct classification of all American English phonemes using signals from functional speech motor cortex

    NASA Astrophysics Data System (ADS)

    Mugler, Emily M.; Patton, James L.; Flint, Robert D.; Wright, Zachary A.; Schuele, Stephan U.; Rosenow, Joshua; Shih, Jerry J.; Krusienski, Dean J.; Slutzky, Marc W.

    2014-06-01

    Objective. Although brain-computer interfaces (BCIs) can be used in several different ways to restore communication, communicative BCI has not approached the rate or efficiency of natural human speech. Electrocorticography (ECoG) has precise spatiotemporal resolution that enables recording of brain activity distributed over a wide area of cortex, such as during speech production. In this study, we sought to decode elements of speech production using ECoG. Approach. We investigated words that contain the entire set of phonemes in the general American accent using ECoG with four subjects. Using a linear classifier, we evaluated the degree to which individual phonemes within each word could be correctly identified from cortical signal. Main results. We classified phonemes with up to 36% accuracy when classifying all phonemes and up to 63% accuracy for a single phoneme. Further, misclassified phonemes follow articulation organization described in phonology literature, aiding classification of whole words. Precise temporal alignment to phoneme onset was crucial for classification success. Significance. We identified specific spatiotemporal features that aid classification, which could guide future applications. Word identification was equivalent to information transfer rates as high as 3.0 bits s-1 (33.6 words min-1), supporting pursuit of speech articulation for BCI control.

  9. Auditory language comprehension of temporally reversed speech signals in native and non-native speakers.

    PubMed

    Kiss, Miklos; Cristescu, Tamara; Fink, Martina; Wittmann, Marc

    2008-01-01

    Neuropsychological studies in brain-injured patients with aphasia and children with specific language-learning deficits have shown the dependence of language comprehension on auditory processing abilities, i.e. the detection of temporal order. An impairment of temporal-order perception can be simulated by time reversing segments of the speech signal. In our study, we investigated how different lengths of time-reversed segments in speech influenced comprehension in ten native German speakers and ten participants who had acquired German as a second language. Results show that native speakers were still able to understand the distorted speech at segment lengths of 50 ms, whereas non-native speakers only could identify sentences with reversed intervals of 32 ms duration. These differences in performance can be interpreted by different levels of semantic and lexical proficiency. Our method of temporally-distorted speech offers a new approach to assess language skills that indirectly taps into lexical and semantic competence of non-native speakers.

  10. Modulation of Radio Frequency Signals by Nonlinearly Generated Acoustic Fields

    DTIC Science & Technology

    2014-01-01

    Kirchhoff’s theorem, typically applied to EM waves, determining the far-field patterns of an acoustic source from amplitude and phase measurements made in...two noncollinear ultrasonic baffled piston sources. The theory is extended to the modeling of the sound beams generated by parametric transducer arrays ...typically applied to EM waves, determining the far-field patterns of an acoustic source from amplitude and phase measurements made in the near-field by

  11. Revisiting Neil Armstrongs Moon-Landing Quote: Implications for Speech Perception, Function Word Reduction, and Acoustic Ambiguity

    PubMed Central

    Baese-Berk, Melissa M.; Dilley, Laura C.; Schmidt, Stephanie; Morrill, Tuuli H.; Pitt, Mark A.

    2016-01-01

    Neil Armstrong insisted that his quote upon landing on the moon was misheard, and that he had said one small step for a man, instead of one small step for man. What he said is unclear in part because function words like a can be reduced and spectrally indistinguishable from the preceding context. Therefore, their presence can be ambiguous, and they may disappear perceptually depending on the rate of surrounding speech. Two experiments are presented examining production and perception of reduced tokens of for and for a in spontaneous speech. Experiment 1 investigates the distributions of several acoustic features of for and for a. The results suggest that the distributions of for and for a overlap substantially, both in terms of temporal and spectral characteristics. Experiment 2 examines perception of these same tokens when the context speaking rate differs. The perceptibility of the function word a varies as a function of this context speaking rate. These results demonstrate that substantial ambiguity exists in the original quote from Armstrong, and that this ambiguity may be understood through context speaking rate. PMID:27603209

  12. Revisiting Neil Armstrongs Moon-Landing Quote: Implications for Speech Perception, Function Word Reduction, and Acoustic Ambiguity.

    PubMed

    Baese-Berk, Melissa M; Dilley, Laura C; Schmidt, Stephanie; Morrill, Tuuli H; Pitt, Mark A

    2016-01-01

    Neil Armstrong insisted that his quote upon landing on the moon was misheard, and that he had said one small step for a man, instead of one small step for man. What he said is unclear in part because function words like a can be reduced and spectrally indistinguishable from the preceding context. Therefore, their presence can be ambiguous, and they may disappear perceptually depending on the rate of surrounding speech. Two experiments are presented examining production and perception of reduced tokens of for and for a in spontaneous speech. Experiment 1 investigates the distributions of several acoustic features of for and for a. The results suggest that the distributions of for and for a overlap substantially, both in terms of temporal and spectral characteristics. Experiment 2 examines perception of these same tokens when the context speaking rate differs. The perceptibility of the function word a varies as a function of this context speaking rate. These results demonstrate that substantial ambiguity exists in the original quote from Armstrong, and that this ambiguity may be understood through context speaking rate.

  13. Psychophysics of Complex Auditory and Speech Stimuli.

    DTIC Science & Technology

    1996-10-01

    it more distinctive (i.e. in a different instrument timbre than the other musical voices) and less distinctive (i.e. presenting the musical pieces in...complex acoustic signals, including speech and music . Traditional, solid psycho- ,physical procedures were employed to systematically investigate...result in the perception of classes of complex auditory i stimuli, including speech and music . In health, industry, and human factors, the M.. SUBJECT

  14. Design of acoustic logging signal source of imitation based on field programmable gate array

    NASA Astrophysics Data System (ADS)

    Zhang, K.; Ju, X. D.; Lu, J. Q.; Men, B. Y.

    2014-08-01

    An acoustic logging signal source of imitation is designed and realized, based on the Field Programmable Gate Array (FPGA), to improve the efficiency of examining and repairing acoustic logging tools during research and field application, and to inspect and verify acoustic receiving circuits and corresponding algorithms. The design of this signal source contains hardware design and software design,and the hardware design uses an FPGA as the control core. Four signals are made first by reading the Random Access Memory (RAM) data which are inside the FPGA, then dealing with the data by digital to analog conversion, amplification, smoothing and so on. Software design uses VHDL, a kind of hardware description language, to program the FPGA. Experiments illustrate that the ratio of signal to noise for the signal source is high, the waveforms are stable, and also its functions of amplitude adjustment, frequency adjustment and delay adjustment are in accord with the characteristics of real acoustic logging waveforms. These adjustments can be used to imitate influences on sonic logging received waveforms caused by many kinds of factors such as spacing and span of acoustic tools, sonic speeds of different layers and fluids, and acoustic attenuations of different cementation planes.

  15. Limited condition dependence of male acoustic signals in the grasshopper Chorthippus biguttulus

    PubMed Central

    Franzke, Alexandra; Reinhold, Klaus

    2012-01-01

    In many animal species, male acoustic signals serve to attract a mate and therefore often play a major role for male mating success. Male body condition is likely to be correlated with male acoustic signal traits, which signal male quality and provide choosy females indirect benefits. Environmental factors such as food quantity or quality can influence male body condition and therefore possibly lead to condition-dependent changes in the attractiveness of acoustic signals. Here, we test whether stressing food plants influences acoustic signal traits of males via condition-dependent expression of these traits. We examined four male song characteristics, which are vital for mate choice in females of the grasshopper Chorthippus biguttulus. Only one of the examined acoustic traits, loudness, was significantly altered by changing body condition because of drought- and moisture-related stress of food plants. No condition dependence could be observed for syllable to pause ratio, gap duration within syllables, and onset accentuation. We suggest that food plant stress and therefore food plant quality led to shifts in loudness of male grasshopper songs via body condition changes. The other three examined acoustic traits of males do not reflect male body condition induced by food plant quality. PMID:22957192

  16. Impact of Aberrant Acoustic Properties on the Perception of Sound Quality in Electrolarynx Speech

    ERIC Educational Resources Information Center

    Meltzner, Geoffrey S.; Hillman, Robert E.

    2005-01-01

    A large percentage of patients who have undergone laryngectomy to treat advanced laryngeal cancer rely on an electrolarynx (EL) to communicate verbally. Although serviceable, EL speech is plagued by shortcomings in both sound quality and intelligibility. This study sought to better quantify the relative contributions of previously identified…

  17. The Exploitation of Subphonemic Acoustic Detail in L2 Speech Segmentation

    ERIC Educational Resources Information Center

    Shoemaker, Ellenor

    2014-01-01

    The current study addresses an aspect of second language (L2) phonological acquisition that has received little attention to date--namely, the acquisition of allophonic variation as a word boundary cue. The role of subphonemic variation in the segmentation of speech by native speakers has been indisputably demonstrated; however, the acquisition of…

  18. Comparison of Acoustic and Kinematic Approaches to Measuring Utterance-Level Speech Variability

    ERIC Educational Resources Information Center

    Howell, Peter; Anderson, Andrew J.; Bartrip, Jon; Bailey, Eleanor

    2009-01-01

    Purpose: The spatiotemporal index (STI) is one measure of variability. As currently implemented, kinematic data are used, requiring equipment that cannot be used with some patient groups or in scanners. An experiment is reported that addressed whether STI can be extended to an audio measure of sound pressure of the speech envelope over time that…

  19. Call transmission efficiency in native and invasive anurans: competing hypotheses of divergence in acoustic signals.

    PubMed

    Llusia, Diego; Gómez, Miguel; Penna, Mario; Márquez, Rafael

    2013-01-01

    Invasive species are a leading cause of the current biodiversity decline, and hence examining the major traits favouring invasion is a key and long-standing goal of invasion biology. Despite the prominent role of the advertisement calls in sexual selection and reproduction, very little attention has been paid to the features of acoustic communication of invasive species in nonindigenous habitats and their potential impacts on native species. Here we compare for the first time the transmission efficiency of the advertisement calls of native and invasive species, searching for competitive advantages for acoustic communication and reproduction of introduced taxa, and providing insights into competing hypotheses in evolutionary divergence of acoustic signals: acoustic adaptation vs. morphological constraints. Using sound propagation experiments, we measured the attenuation rates of pure tones (0.2-5 kHz) and playback calls (Lithobates catesbeianus and Pelophylax perezi) across four distances (1, 2, 4, and 8 m) and over two substrates (water and soil) in seven Iberian localities. All factors considered (signal type, distance, substrate, and locality) affected transmission efficiency of acoustic signals, which was maximized with lower frequency sounds, shorter distances, and over water surface. Despite being broadcast in nonindigenous habitats, the advertisement calls of invasive L. catesbeianus were propagated more efficiently than those of the native species, in both aquatic and terrestrial substrates, and in most of the study sites. This implies absence of optimal relationship between native environments and propagation of acoustic signals in anurans, in contrast to what predicted by the acoustic adaptation hypothesis, and it might render these vertebrates particularly vulnerable to intrusion of invasive species producing low frequency signals, such as L. catesbeianus. Our findings suggest that mechanisms optimizing sound transmission in native habitat can play a less

  20. Call Transmission Efficiency in Native and Invasive Anurans: Competing Hypotheses of Divergence in Acoustic Signals

    PubMed Central

    Llusia, Diego; Gómez, Miguel; Penna, Mario; Márquez, Rafael

    2013-01-01

    Invasive species are a leading cause of the current biodiversity decline, and hence examining the major traits favouring invasion is a key and long-standing goal of invasion biology. Despite the prominent role of the advertisement calls in sexual selection and reproduction, very little attention has been paid to the features of acoustic communication of invasive species in nonindigenous habitats and their potential impacts on native species. Here we compare for the first time the transmission efficiency of the advertisement calls of native and invasive species, searching for competitive advantages for acoustic communication and reproduction of introduced taxa, and providing insights into competing hypotheses in evolutionary divergence of acoustic signals: acoustic adaptation vs. morphological constraints. Using sound propagation experiments, we measured the attenuation rates of pure tones (0.2–5 kHz) and playback calls (Lithobates catesbeianus and Pelophylax perezi) across four distances (1, 2, 4, and 8 m) and over two substrates (water and soil) in seven Iberian localities. All factors considered (signal type, distance, substrate, and locality) affected transmission efficiency of acoustic signals, which was maximized with lower frequency sounds, shorter distances, and over water surface. Despite being broadcast in nonindigenous habitats, the advertisement calls of invasive L. catesbeianus were propagated more efficiently than those of the native species, in both aquatic and terrestrial substrates, and in most of the study sites. This implies absence of optimal relationship between native environments and propagation of acoustic signals in anurans, in contrast to what predicted by the acoustic adaptation hypothesis, and it might render these vertebrates particularly vulnerable to intrusion of invasive species producing low frequency signals, such as L. catesbeianus. Our findings suggest that mechanisms optimizing sound transmission in native habitat can play a

  1. Wavelet packet transform for detection of single events in acoustic emission signals

    NASA Astrophysics Data System (ADS)

    Bianchi, Davide; Mayrhofer, Erwin; Gröschl, Martin; Betz, Gerhard; Vernes, András

    2015-12-01

    Acoustic emission signals in tribology can be used for monitoring the state of bodies in contact and relative motion. The recorded signal includes information which can be associated with different events, such as the formation and propagation of cracks, appearance of scratches and so on. One of the major challenges in analyzing these acoustic emission signals is to identify parts of the signal which belong to such an event and discern it from noise. In this contribution, a wavelet packet decomposition within the framework of multiresolution analysis theory is considered to analyze acoustic emission signals to investigate the failure of tribological systems. By applying the wavelet packet transform a method for the extraction of single events in rail contact fatigue test is proposed. The extraction of such events at several stages of the test permits a classification and the analysis of the evolution of cracks in the rail.

  2. A unique method to study acoustic transmission through ducts using signal synthesis and averaging of acoustic pulses

    NASA Technical Reports Server (NTRS)

    Salikuddin, M.; Ramakrishnan, R.; Ahuja, K. K.; Brown, W. H.

    1981-01-01

    An acoustic impulse technique using a loudspeaker driver is developed to measure the acoustic properties of a duct/nozzle system. A signal synthesis method is used to generate a desired single pulse with a flat spectrum. The convolution of the desired signal and the inverse Fourier transform of the reciprocal of the driver's response are then fed to the driver. A signal averaging process eliminates the jet mixing noise from the mixture of jet noise and the internal noise, thereby allowing very low intensity signals to be measured accurately, even for high velocity jets. A theoretical analysis is carried out to predict the incident sound field; this is used to help determine the number and locations of the induct measurement points to account for the contributions due to higher order modes present in the incident tube method. The impulse technique is validated by comparing experimentally determined acoustic characteristics of a duct-nozzle system with similar results obtained by the impedance tube method. Absolute agreement in the comparisons was poor, but the overall shapes of the time histories and spectral distributions were much alike.

  3. Development and testing of an audio forensic software for enhancing speech signals masked by loud music

    NASA Astrophysics Data System (ADS)

    Dobre, Robert A.; Negrescu, Cristian; Stanomir, Dumitru

    2016-12-01

    In many situations audio recordings can decide the fate of a trial when accepted as evidence. But until they can be taken into account they must be authenticated at first, but also the quality of the targeted content (speech in most cases) must be good enough to remove any doubt. In this scope two main directions of multimedia forensics come into play: content authentication and noise reduction. This paper presents an application that is included in the latter. If someone would like to conceal their conversation, the easiest way to do it would be to turn loud the nearest audio system. In this situation, if a microphone was placed close by, the recorded signal would be apparently useless because the speech signal would be masked by the loud music signal. The paper proposes an adaptive filters based solution to remove the musical content from a previously described signal mixture in order to recover the masked vocal signal. Two adaptive filtering algorithms were tested in the proposed solution: the Normalised Least Mean Squares (NLMS) and Recursive Least Squares (RLS). Their performances in the described situation were evaluated using Simulink, compared and included in the paper.

  4. Acoustic correlates of inflectional morphology in the speech of children with specific language impairment and their typically developing peers.

    PubMed

    Owen, Amanda J; Goffman, Lisa

    2007-07-01

    The development of the use of the third-person singular -s in open syllable verbs in children with specific language impairment (SLI) and their typically developing peers was examined. Verbs that included overt productions of the third-person singular -s morpheme (e.g. Bobby plays ball everyday; Bear laughs when mommy buys popcorn) were contrasted with clearly bare stem contexts (e.g. Mommy, buy popcorn; I saw Bobby play ball) on both global and local measures of acoustic duration. A durational signature for verbs inflected with -s was identified separately from factors related to sentence length. These duration measures were also used to identify acoustic changes related to the omission of the -s morpheme. The omitted productions from the children with SLI were significantly longer than their correct third-person singular and bare stem productions. This result was unexpected given that the omitted productions have fewer phonemes than correctly inflected productions. Typically developing children did not show the same pattern, instead producing omitted productions that patterned most closely with bare stem forms. These results are discussed in relation to current theoretical approaches to SLI, with an emphasis on performance and speech-motor accounts.

  5. A New Method to Represent Speech Signals Via Predefined Signature and Envelope Sequences

    NASA Astrophysics Data System (ADS)

    Güz, Ümit; Gürkan, Hakan; Yarman, Binboga Sıddık

    2006-12-01

    A novel systematic procedure referred to as "SYMPES" to model speech signals is introduced. The structure of SYMPES is based on the creation of the so-called predefined "signature[InlineEquation not available: see fulltext.] and envelope[InlineEquation not available: see fulltext.]" sets. These sets are speaker and language independent. Once the speech signals are divided into frames with selected lengths, then each frame sequence[InlineEquation not available: see fulltext.] is reconstructed by means of the mathematical form[InlineEquation not available: see fulltext.]. In this representation,[InlineEquation not available: see fulltext.] is called the gain factor,[InlineEquation not available: see fulltext.] and[InlineEquation not available: see fulltext.] are properly assigned from the predefined signature and envelope sets, respectively. Examples are given to exhibit the implementation of SYMPES. It is shown that for the same compression ratio or better, SYMPES yields considerably better speech quality over the commercially available coders such as G.726 (ADPCM) at 16 kbps and voice excited LPC-10E (FS1015) at[InlineEquation not available: see fulltext.] kbps.

  6. Real-Time Speech/Music Classification With a Hierarchical Oblique Decision Tree

    DTIC Science & Technology

    2008-04-01

    REAL-TIME SPEECH/ MUSIC CLASSIFICATION WITH A HIERARCHICAL OBLIQUE DECISION TREE Jun Wang, Qiong Wu, Haojiang Deng, Qin Yan Institute of Acoustics...time speech/ music classification with a hierarchical oblique decision tree. A set of discrimination features in frequency domain are selected...handle signals without discrimination and can not work properly in the existence of multimedia signals. This paper proposes a real-time speech/ music

  7. Cerebral Processing of Emotionally Loaded Acoustic Signals by Tinnitus Patients.

    PubMed

    Georgiewa, Petra; Szczepek, Agnieszka J; Rose, Matthias; Klapp, Burghard F; Mazurek, Birgit

    2016-01-01

    This exploratory study determined the activation pattern in nonauditory brain areas in response to acoustic, emotionally positive, negative or neutral stimuli presented to tinnitus patients and control subjects. Ten patients with chronic tinnitus and without measurable hearing loss and 13 matched control subjects were included in the study and subjected to fMRI with a 1.5-tesla scanner. During the scanning procedure, acoustic stimuli of different emotional value were presented to the subjects. Statistical analyses were performed using statistical parametric mapping (SPM 99). The activation pattern induced by emotionally loaded acoustic stimuli differed significantly within and between both groups tested, depending on the kind of stimuli used. Within-group differences included the limbic system, prefrontal regions, temporal association cortices and striatal regions. Tinnitus patients had a pronounced involvement of limbic regions involved in the processing of chimes (positive stimulus) and neutral words (neutral stimulus), strongly suggesting improperly functioning inhibitory mechanisms that were functioning well in the control subjects. This study supports the hypothesis about the existence of a tinnitus-specific brain network. Such a network could respond to any acoustic stimuli by activating limbic areas involved in stress reactivity and emotional processing and by reducing activation of areas responsible for attention and acoustic filtering (thalamus, frontal regions), possibly reinforcing negative effects of tinnitus.

  8. Speech Signal and Facial Image Processing for Obstructive Sleep Apnea Assessment

    PubMed Central

    Espinoza-Cuadros, Fernando; Fernández-Pozo, Rubén; Toledano, Doroteo T.; Alcázar-Ramírez, José D.; López-Gonzalo, Eduardo; Hernández-Gómez, Luis A.

    2015-01-01

    Obstructive sleep apnea (OSA) is a common sleep disorder characterized by recurring breathing pauses during sleep caused by a blockage of the upper airway (UA). OSA is generally diagnosed through a costly procedure requiring an overnight stay of the patient at the hospital. This has led to proposing less costly procedures based on the analysis of patients' facial images and voice recordings to help in OSA detection and severity assessment. In this paper we investigate the use of both image and speech processing to estimate the apnea-hypopnea index, AHI (which describes the severity of the condition), over a population of 285 male Spanish subjects suspected to suffer from OSA and referred to a Sleep Disorders Unit. Photographs and voice recordings were collected in a supervised but not highly controlled way trying to test a scenario close to an OSA assessment application running on a mobile device (i.e., smartphones or tablets). Spectral information in speech utterances is modeled by a state-of-the-art low-dimensional acoustic representation, called i-vector. A set of local craniofacial features related to OSA are extracted from images after detecting facial landmarks using Active Appearance Models (AAMs). Support vector regression (SVR) is applied on facial features and i-vectors to estimate the AHI. PMID:26664493

  9. Acoustic evaluation of cementing quality using obliquely incident ultrasonic signals

    NASA Astrophysics Data System (ADS)

    Duan, Wen-Xing; Qiao, Wen-Xiao; Che, Xiao-Hua; Xie, Hui

    2014-09-01

    Ultrasonic cement bond logging is a widely used method for evaluating cementing quality. Conventional ultrasonic cement bond logging uses vertical incidence and cannot accurately evaluate lightweight cement bonding. Oblique incidence is a new technology for evaluating cement quality with improved accuracy for lightweight cements. In this study, we simulated models of acoustic impedance of cement and cementing quality using ultrasonic oblique incidence, and we obtained the relation between cementing quality, acoustic impedance of cement, and the acoustic attenuation coefficient of the A0-mode and S0-mode Lamb waves. Then, we simulated models of different cement thickness and we obtained the relation between cement thickness and the time difference of the arrival between the A0 and A0' modes.

  10. System and method for investigating sub-surface features of a rock formation with acoustic sources generating coded signals

    SciTech Connect

    Vu, Cung Khac; Nihei, Kurt; Johnson, Paul A; Guyer, Robert; Ten Cate, James A; Le Bas, Pierre-Yves; Larmat, Carene S

    2014-12-30

    A system and a method for investigating rock formations includes generating, by a first acoustic source, a first acoustic signal comprising a first plurality of pulses, each pulse including a first modulated signal at a central frequency; and generating, by a second acoustic source, a second acoustic signal comprising a second plurality of pulses. A receiver arranged within the borehole receives a detected signal including a signal being generated by a non-linear mixing process from the first-and-second acoustic signal in a non-linear mixing zone within the intersection volume. The method also includes-processing the received signal to extract the signal generated by the non-linear mixing process over noise or over signals generated by a linear interaction process, or both.

  11. The Auditory-Brainstem Response to Continuous, Non-repetitive Speech Is Modulated by the Speech Envelope and Reflects Speech Processing

    PubMed Central

    Reichenbach, Chagit S.; Braiman, Chananel; Schiff, Nicholas D.; Hudspeth, A. J.; Reichenbach, Tobias

    2016-01-01

    The auditory-brainstem response (ABR) to short and simple acoustical signals is an important clinical tool used to diagnose the integrity of the brainstem. The ABR is also employed to investigate the auditory brainstem in a multitude of tasks related to hearing, such as processing speech or selectively focusing on one speaker in a noisy environment. Such research measures the response of the brainstem to short speech signals such as vowels or words. Because the voltage signal of the ABR has a tiny amplitude, several hundred to a thousand repetitions of the acoustic signal are needed to obtain a reliable response. The large number of repetitions poses a challenge to assessing cognitive functions due to neural adaptation. Here we show that continuous, non-repetitive speech, lasting several minutes, may be employed to measure the ABR. Because the speech is not repeated during the experiment, the precise temporal form of the ABR cannot be determined. We show, however, that important structural features of the ABR can nevertheless be inferred. In particular, the brainstem responds at the fundamental frequency of the speech signal, and this response is modulated by the envelope of the voiced parts of speech. We accordingly introduce a novel measure that assesses the ABR as modulated by the speech envelope, at the fundamental frequency of speech and at the characteristic latency of the response. This measure has a high signal-to-noise ratio and can hence be employed effectively to measure the ABR to continuous speech. We use this novel measure to show that the ABR is weaker to intelligible speech than to unintelligible, time-reversed speech. The methods presented here can be employed for further research on speech processing in the auditory brainstem and can lead to the development of future clinical diagnosis of brainstem function. PMID:27303286

  12. Two-microphone spatial filtering provides speech reception benefits for cochlear implant users in difficult acoustic environments

    PubMed Central

    Goldsworthy, Raymond L.; Delhorne, Lorraine A.; Desloge, Joseph G.; Braida, Louis D.

    2014-01-01

    This article introduces and provides an assessment of a spatial-filtering algorithm based on two closely-spaced (∼1 cm) microphones in a behind-the-ear shell. The evaluated spatial-filtering algorithm used fast (∼10 ms) temporal-spectral analysis to determine the location of incoming sounds and to enhance sounds arriving from straight ahead of the listener. Speech reception thresholds (SRTs) were measured for eight cochlear implant (CI) users using consonant and vowel materials under three processing conditions: An omni-directional response, a dipole-directional response, and the spatial-filtering algorithm. The background noise condition used three simultaneous time-reversed speech signals as interferers located at 90°, 180°, and 270°. Results indicated that the spatial-filtering algorithm can provide speech reception benefits of 5.8 to 10.7 dB SRT compared to an omni-directional response in a reverberant room with multiple noise sources. Given the observed SRT benefits, coupled with an efficient design, the proposed algorithm is promising as a CI noise-reduction solution. PMID:25096120

  13. Quantifying the Effect of Compression Hearing Aid Release Time on Speech Acoustics and Intelligibility

    ERIC Educational Resources Information Center

    Jenstad, Lorienne M.; Souza, Pamela E.

    2005-01-01

    Compression hearing aids have the inherent, and often adjustable, feature of release time from compression. Research to date does not provide a consensus on how to choose or set release time. The current study had 2 purposes: (a) a comprehensive evaluation of the acoustic effects of release time for a single-channel compression system in quiet and…

  14. Exploration of Acoustic Features for Automatic Vowel Discrimination in Spontaneous Speech

    ERIC Educational Resources Information Center

    Tyson, Na'im R.

    2012-01-01

    In an attempt to understand what acoustic/auditory feature sets motivated transcribers towards certain labeling decisions, I built machine learning models that were capable of discriminating between canonical and non-canonical vowels excised from the Buckeye Corpus. Specifically, I wanted to model when the dictionary form and the transcribed-form…

  15. Depression Diagnoses and Fundamental Frequency-Based Acoustic Cues in Maternal Infant-Directed Speech

    ERIC Educational Resources Information Center

    Porritt, Laura L.; Zinser, Michael C.; Bachorowski, Jo-Anne; Kaplan, Peter S.

    2014-01-01

    F[subscript 0]-based acoustic measures were extracted from a brief, sentence-final target word spoken during structured play interactions between mothers and their 3- to 14-month-old infants and were analyzed based on demographic variables and DSM-IV Axis-I clinical diagnoses and their common modifiers. F[subscript 0] range (?F[subscript 0]) was…

  16. Changes in Speech Production in a Child with a Cochlear Implant: Acoustic and Kinematic Evidence.

    ERIC Educational Resources Information Center

    Goffman, Lisa; Ertmer, David J.; Erdle, Christa

    2002-01-01

    A method is presented for examining change in motor patterns used to produce linguistic contrasts. In this case study, the method is applied to a child who experienced hearing loss at age 3 and received a multi-channel cochlear implant at 7. Post-implant, acoustic durations showed a maturational change. (Contains references.) (Author/CR)

  17. Characterizing, synthesizing, and/or canceling out acoustic signals from sound sources

    DOEpatents

    Holzrichter, John F.; Ng, Lawrence C.

    2007-03-13

    A system for characterizing, synthesizing, and/or canceling out acoustic signals from inanimate and animate sound sources. Electromagnetic sensors monitor excitation sources in sound producing systems, such as animate sound sources such as the human voice, or from machines, musical instruments, and various other structures. Acoustical output from these sound producing systems is also monitored. From such information, a transfer function characterizing the sound producing system is generated. From the transfer function, acoustical output from the sound producing system may be synthesized or canceled. The systems disclosed enable accurate calculation of transfer functions relating specific excitations to specific acoustical outputs. Knowledge of such signals and functions can be used to effect various sound replication, sound source identification, and sound cancellation applications.

  18. Recognizing articulatory gestures from speech for robust speech recognition.

    PubMed

    Mitra, Vikramjit; Nam, Hosung; Espy-Wilson, Carol; Saltzman, Elliot; Goldstein, Louis

    2012-03-01

    Studies have shown that supplementary articulatory information can help to improve the recognition rate of automatic speech recognition systems. Unfortunately, articulatory information is not directly observable, necessitating its estimation from the speech signal. This study describes a system that recognizes articulatory gestures from speech, and uses the recognized gestures in a speech recognition system. Recognizing gestures for a given utterance involves recovering the set of underlying gestural activations and their associated dynamic parameters. This paper proposes a neural network architecture for recognizing articulatory gestures from speech and presents ways to incorporate articulatory gestures for a digit recognition task. The lack of natural speech database containing gestural information prompted us to use three stages of evaluation. First, the proposed gestural annotation architecture was tested on a synthetic speech dataset, which showed that the use of estimated tract-variable-time-functions improved gesture recognition performance. In the second stage, gesture-recognition models were applied to natural speech waveforms and word recognition experiments revealed that the recognized gestures can improve the noise-robustness of a word recognition system. In the final stage, a gesture-based Dynamic Bayesian Network was trained and the results indicate that incorporating gestural information can improve word recognition performance compared to acoustic-only systems.

  19. Prediction and constraint in audiovisual speech perception.

    PubMed

    Peelle, Jonathan E; Sommers, Mitchell S

    2015-07-01

    During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing the precision of prediction. Electrophysiological studies demonstrate that oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to acoustic information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration

  20. Applications for Subvocal Speech

    NASA Technical Reports Server (NTRS)

    Jorgensen, Charles; Betts, Bradley

    2007-01-01

    A research and development effort now underway is directed toward the use of subvocal speech for communication in settings in which (1) acoustic noise could interfere excessively with ordinary vocal communication and/or (2) acoustic silence or secrecy of communication is required. By "subvocal speech" is meant sub-audible electromyographic (EMG) signals, associated with speech, that are acquired from the surface of the larynx and lingual areas of the throat. Topics addressed in this effort include recognition of the sub-vocal EMG signals that represent specific original words or phrases; transformation (including encoding and/or enciphering) of the signals into forms that are less vulnerable to distortion, degradation, and/or interception; and reconstruction of the original words or phrases at the receiving end of a communication link. Potential applications include ordinary verbal communications among hazardous- material-cleanup workers in protective suits, workers in noisy environments, divers, and firefighters, and secret communications among law-enforcement officers and military personnel in combat and other confrontational situations.

  1. Functional coupling of acoustic and chemical signals in the courtship behaviour of the male Drosophila melanogaster.

    PubMed Central

    Rybak, F; Sureau, G; Aubin, T

    2002-01-01

    During courtship, the male Drosophila melanogaster sends signals to the female through two major sensory channels: chemical and acoustic. These signals are involved in the stimulation of the female to accept copulation. In order to determine the respective importance in the courtship of these signals, their production was controlled using genetical and surgical techniques. Males deprived of the ability to emit both signals are unable to mate, demonstrating that other (e.g. visual or tactile) signals are not sufficient to stimulate the female. If either acoustic or chemical signals are lacking, the courtship success is strongly reduced, the lack of the former having significantly more drastic effects. However, the accelerated matings of males observed with males bearing wild-type hydrocarbons compared with defective ones, whichever the modality of acoustic performance (wing vibration or playback), strongly support the role of cuticular compounds to stimulate females. We can conclude that among the possible factors involved in communication during courtship, acoustic and chemical signals may act in a synergistic way and not separately in D. melanogaster. PMID:11934360

  2. Speech signal denoising with wavelet-transforms and the mean opinion score characterizing the filtering quality

    NASA Astrophysics Data System (ADS)

    Yaseen, Alauldeen S.; Pavlov, Alexey N.; Hramov, Alexander E.

    2016-03-01

    Speech signal processing is widely used to reduce noise impact in acquired data. During the last decades, wavelet-based filtering techniques are often applied in communication systems due to their advantages in signal denoising as compared with Fourier-based methods. In this study we consider applications of a 1-D double density complex wavelet transform (1D-DDCWT) and compare the results with the standard 1-D discrete wavelet-transform (1DDWT). The performances of the considered techniques are compared using the mean opinion score (MOS) being the primary metric for the quality of the processed signals. A two-dimensional extension of this approach can be used for effective image denoising.

  3. Pulse analysis of acoustic emission signals. Ph.D. Thesis

    NASA Technical Reports Server (NTRS)

    Houghton, J. R.

    1976-01-01

    A method for the signature analysis of pulses in the frequency domain and the time domain is presented. Fourier spectrum, Fourier transfer function, shock spectrum and shock spectrum ratio are examined in the frequency domain analysis, and pulse shape deconvolution is developed for use in the time domain analysis. To demonstrate the relative sensitivity of each of the methods to small changes in the pulse shape, signatures of computer modeled systems with analytical pulses are presented. Optimization techniques are developed and used to indicate the best design parameters values for deconvolution of the pulse shape. Several experiments are presented that test the pulse signature analysis methods on different acoustic emission sources. These include acoustic emissions associated with: (1) crack propagation, (2) ball dropping on a plate, (3) spark discharge and (4) defective and good ball bearings.

  4. Perceptually-Driven Signal Analysis for Acoustic Event Classification

    DTIC Science & Technology

    2007-09-26

    study of musical timbre . Defined as "the subjective attribute of sound which differentiates two or more sounds that have the same loudness, pitch and...therefore a better estimate of the likelihood function. 56 Bibliography [1] J. M. Grey, -AMultidimensional perceptual scaling of musical timbres ...Display, 2005. [10] J. M. Grey, "Perceptual effects of spectral modifications on musical timbres ," Journal of the Acoustical Society of America, vol. 63

  5. Predicting Speech Intelligibility with a Multiple Speech Subsystems Approach in Children with Cerebral Palsy

    ERIC Educational Resources Information Center

    Lee, Jimin; Hustad, Katherine C.; Weismer, Gary

    2014-01-01

    Purpose: Speech acoustic characteristics of children with cerebral palsy (CP) were examined with a multiple speech subsystems approach; speech intelligibility was evaluated using a prediction model in which acoustic measures were selected to represent three speech subsystems. Method: Nine acoustic variables reflecting different subsystems, and…

  6. Surface Roughness Evaluation Based on Acoustic Emission Signals in Robot Assisted Polishing

    PubMed Central

    de Agustina, Beatriz; Marín, Marta María; Teti, Roberto; Rubio, Eva María

    2014-01-01

    The polishing process is the most common technology used in applications where a high level of surface quality is demanded. The automation of polishing processes is especially difficult due to the high level of skill and dexterity that is required. Much of this difficulty arises because of the lack of reliable data on the effect of the polishing parameters on the resulting surface roughness. An experimental study was developed to evaluate the surface roughness obtained during Robot Assisted Polishing processes by the analysis of acoustic emission signals in the frequency domain. The aim is to find out a trend of a feature or features calculated from the acoustic emission signals detected along the process. Such an evaluation was made with the objective of collecting valuable information for the establishment of the end point detection of polishing process. As a main conclusion, it can be affirmed that acoustic emission (AE) signals can be considered useful to monitor the polishing process state. PMID:25405509

  7. Improving the speech intelligibility in classrooms

    NASA Astrophysics Data System (ADS)

    Lam, Choi Ling Coriolanus

    of the reverberation time, the indoor ambient noise (or background noise level), the signal-to-noise ratio, and the speech transmission index, it aims to establish a guideline for improving the speech intelligibility in classrooms for any countries and any environmental conditions. The study showed that the acoustical conditions of most of the measured classrooms in Hong Kong are unsatisfactory. The selection of materials inside a classroom is important for improving speech intelligibility at design stage, especially the acoustics ceiling, to shorten the reverberation time inside the classroom. The signal-to-noise should be higher than 11dB(A) for over 70% of speech perception, either tonal or non-tonal languages, without the usage of address system. The unexpected results bring out a call to revise the standard design and to devise acceptable standards for classrooms in Hong Kong. It is also demonstrated a method for assessment on the classroom in other cities with similar environmental conditions.

  8. System and method for investigating sub-surface features of a rock formation with acoustic sources generating conical broadcast signals

    DOEpatents

    Vu, Cung Khac; Skelt, Christopher; Nihei, Kurt; Johnson, Paul A.; Guyer, Robert; Ten Cate, James A.; Le Bas, Pierre -Yves; Larmat, Carene S.

    2015-08-18

    A method of interrogating a formation includes generating a conical acoustic signal, at a first frequency--a second conical acoustic signal at a second frequency each in the between approximately 500 Hz and 500 kHz such that the signals intersect in a desired intersection volume outside the borehole. The method further includes receiving, a difference signal returning to the borehole resulting from a non-linear mixing of the signals in a mixing zone within the intersection volume.

  9. The effect of habitat acoustics on common marmoset vocal signal transmission.

    PubMed

    Morrill, Ryan J; Thomas, A Wren; Schiel, Nicola; Souto, Antonio; Miller, Cory T

    2013-09-01

    Noisy acoustic environments present several challenges for the evolution of acoustic communication systems. Among the most significant is the need to limit degradation of spectro-temporal signal structure in order to maintain communicative efficacy. This can be achieved by selecting for several potentially complementary processes. Selection can act on behavioral mechanisms permitting signalers to control the timing and occurrence of signal production to avoid acoustic interference. Likewise, the signal itself may be the target of selection, biasing the evolution of its structure to comprise acoustic features that avoid interference from ambient noise or degrade minimally in the habitat. Here, we address the latter topic for common marmoset (Callithrix jacchus) long-distance contact vocalizations, known as phee calls. Our aim was to test whether this vocalization is specifically adapted for transmission in a species-typical forest habitat, the Atlantic forests of northeastern Brazil. We combined seasonal analyses of ambient habitat acoustics with experiments in which pure tones, clicks, and vocalizations were broadcast and rerecorded at different distances to characterize signal degradation in the habitat. Ambient sound was analyzed from intervals throughout the day and over rainy and dry seasons, showing temporal regularities across varied timescales. Broadcast experiment results indicated that the tone and click stimuli showed the typically inverse relationship between frequency and signaling efficacy. Although marmoset phee calls degraded over distance with marked predictability compared with artificial sounds, they did not otherwise appear to be specially designed for increased transmission efficacy or minimal interference in this habitat. We discuss these data in the context of other similar studies and evidence of potential behavioral mechanisms for avoiding acoustic interference in order to maintain effective vocal communication in common marmosets.

  10. Acousto-Optic Interaction in Surface Acoustic Waves and Its Application to Real Time Signal Processing.

    DTIC Science & Technology

    1977-12-30

    ACOUSTO - OPTIC INTERACTION IN SURFACE ACOUSTIC WAVES AND ITS APP--ETC(U) DEC 77 0 SCHUMER, P DAS NOOOIJ -75-C-0772 NCLASSIFIED MA-ONR-30 Nt.EE E’h...CHART NAT*NAL BUREAU OF STANDARDS 1-63- ACOUSTO - OPTIC INTERACTION IN SURFACE ACOUSTIC WAVES AND ITS APPLICATION TO REAL TIME SIGNAL PROCESSING By 00 D... Acousto - optics , Integrated optics, Optical Signal Processing. 20. AbSKTRACT (Continue an reverse side it neceary and idewnt& by block mum ber) The

  11. The Role of the Listener's State in Speech Perception

    ERIC Educational Resources Information Center

    Viswanathan, Navin

    2009-01-01

    Accounts of speech perception disagree on whether listeners perceive the acoustic signal (Diehl, Lotto, & Holt, 2004) or the vocal tract gestures that produce the signal (e.g., Fowler, 1986). In this dissertation, I outline a research program using a phenomenon called "perceptual compensation for coarticulation" (Mann, 1980) to examine this…

  12. Beeping and piping: characterization of two mechano-acoustic signals used by honey bees in swarming

    NASA Astrophysics Data System (ADS)

    Schlegel, Thomas; Visscher, P. Kirk; Seeley, Thomas D.

    2012-12-01

    Of the many signals used by honey bees during the process of swarming, two of them—the stop signal and the worker piping signal—are not easily distinguished for both are mechano-acoustic signals produced by scout bees who press their bodies against other bees while vibrating their wing muscles. To clarify the acoustic differences between these two signals, we recorded both signals from the same swarm and at the same time, and compared them in terms of signal duration, fundamental frequency, and frequency modulation. Stop signals and worker piping signals differ in all three variables: duration, 174 ± 64 vs. 602 ± 377 ms; fundamental frequency, 407 vs. 451 Hz; and frequency modulation, absent vs. present. While it remains unclear which differences the bees use to distinguish the two signals, it is clear that they do so for the signals have opposite effects. Stop signals cause inhibition of actively dancing scout bees whereas piping signals cause excitation of quietly resting non-scout bees.

  13. Speaker Race Identification from Acoustic Cues in the Vocal Signal.

    NASA Astrophysics Data System (ADS)

    Walton, Julie Hart

    Sustained /a/ sounds were tape recorded from 50 adult male African-American and 50 adult male European -American speakers. A one-second acoustic sample was extracted from the mid-portion of each sustained vowel. Vowel samples from each African-American subject were randomly paired with those from European-American subjects. A one-second inter-stimulus interval of silence separated the two voices in the pair; the order of the voices in each pair was randomly selected. When presented with a tape of the 50 voice pairs, listeners could determine the race of the speaker with 60% accuracy. An acoustic analysis of the voices revealed that African-American speakers had a tendency toward greater frequency perturbation, significantly greater amplitude perturbation, and a significantly lower harmonics-to-noise ratio than the European-American speakers. An analysis of the listeners' responses revealed that the listeners may have relied on a combination of increased frequency perturbation, increased amplitude perturbation, and a lower harmonics-to-noise ratio to identify the African-American speakers.

  14. Speech communications in noise

    NASA Technical Reports Server (NTRS)

    1984-01-01

    The physical characteristics of speech, the methods of speech masking measurement, and the effects of noise on speech communication are investigated. Topics include the speech signal and intelligibility, the effects of noise on intelligibility, the articulation index, and various devices for evaluating speech systems.

  15. Speech communications in noise

    NASA Astrophysics Data System (ADS)

    1984-07-01

    The physical characteristics of speech, the methods of speech masking measurement, and the effects of noise on speech communication are investigated. Topics include the speech signal and intelligibility, the effects of noise on intelligibility, the articulation index, and various devices for evaluating speech systems.

  16. Modelling speech intelligibility in adverse conditions.

    PubMed

    Jørgensen, Søren; Dau, Torsten

    2013-01-01

    Jørgensen and Dau (J Acoust Soc Am 130:1475-1487, 2011) proposed the speech-based envelope power spectrum model (sEPSM) in an attempt to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII) in conditions with nonlinearly processed speech. Instead of considering the reduction of the temporal modulation energy as the intelligibility metric, as assumed in the STI, the sEPSM applies the signal-to-noise ratio in the envelope domain (SNRenv). This metric was shown to be the key for predicting the intelligibility of reverberant speech as well as noisy speech processed by spectral subtraction. The key role of the SNRenv metric is further supported here by the ability of a short-term version of the sEPSM to predict speech masking release for different speech materials and modulated interferers. However, the sEPSM cannot account for speech subjected to phase jitter, a condition in which the spectral structure of the intelligibility of speech signal is strongly affected, while the broadband temporal envelope is kept largely intact. In contrast, the effects of this distortion can be predicted -successfully by the spectro-temporal modulation index (STMI) (Elhilali et al., Speech Commun 41:331-348, 2003), which assumes an explicit analysis of the spectral "ripple" structure of the speech signal. However, since the STMI applies the same decision metric as the STI, it fails to account for spectral subtraction. The results from this study suggest that the SNRenv might reflect a powerful decision metric, while some explicit across-frequency analysis seems crucial in some conditions. How such across-frequency analysis is "realized" in the auditory system remains unresolved.

  17. Acoustic signal characteristics of laser induced cavitation in DDFP droplet: Spectrum and time-frequency analysis.

    PubMed

    Feng, Yi; Qin, Dui; Zhang, Jun; Ma, Chenxiang; Wan, Mingxi

    2015-01-01

    Cavitation has great application potential in microvessel damage and targeted drug delivery. Concerning cavitation, droplet vaporization has been widely investigated in vitro and in vivo with plasmonic nanoparticles. Droplets with a liquid dodecafluoropentane (DDFP) core enclosed in an albumin shell have a stable and simple structure with good characteristics of laser absorbing; thus, DDFP droplets could be an effective aim for laser-induced cavitation. The DDPF droplet was prepared and perfused in a mimic microvessel in the optical microscopic system with a passive acoustic detection module. Three patterns of laser-induced cavitation in the droplets were observed. The emitted acoustic signals showed specific spectrum components at specific time points. It was suggested that a nanosecond laser pulse could induce cavitation in DDPF droplets, and specific acoustic signals would be emitted. Analyzing its characteristics could aid in monitoring the laser-induced cavitation process in droplets, which is meaningful to theranostic application.

  18. Analysis of acoustic emission signals and monitoring of machining processes

    PubMed

    Govekar; Gradisek; Grabec

    2000-03-01

    Monitoring of a machining process on the basis of sensor signals requires a selection of informative inputs in order to reliably characterize and model the process. In this article, a system for selection of informative characteristics from signals of multiple sensors is presented. For signal analysis, methods of spectral analysis and methods of nonlinear time series analysis are used. With the aim of modeling relationships between signal characteristics and the corresponding process state, an adaptive empirical modeler is applied. The application of the system is demonstrated by characterization of different parameters defining the states of a turning machining process, such as: chip form, tool wear, and onset of chatter vibration. The results show that, in spite of the complexity of the turning process, the state of the process can be well characterized by just a few proper characteristics extracted from a representative sensor signal. The process characterization can be further improved by joining characteristics from multiple sensors and by application of chaotic characteristics.

  19. Development of an Acoustic Signal Analysis Tool “Auto-F” Based on the Temperament Scale

    NASA Astrophysics Data System (ADS)

    Modegi, Toshio

    The MIDI interface is originally designed for electronic musical instruments but we consider this music-note based coding concept can be extended for general acoustic signal description. We proposed applying the MIDI technology to coding of bio-medical auscultation sound signals such as heart sounds for retrieving medical records and performing telemedicine. Then we have tried to extend our encoding targets including vocal sounds, natural sounds and electronic bio-signals such as ECG, using Generalized Harmonic Analysis method. Currently, we are trying to separate vocal sounds included in popular songs and encode both vocal sounds and background instrumental sounds into separate MIDI channels. And also, we are trying to extract articulation parameters such as MIDI pitch-bend parameters in order to reproduce natural acoustic sounds using a GM-standard MIDI tone generator. In this paper, we present an overall algorithm of our developed acoustic signal analysis tool, based on those research works, which can analyze given time-based signals on the musical temperament scale. The prominent feature of this tool is producing high-precision MIDI codes, which reproduce the similar signals as the given source signal using a GM-standard MIDI tone generator, and also providing analyzed texts in the XML format.

  20. Analysis of acoustic signals on CO{sub 2} arc welding

    SciTech Connect

    Ogawa, Y.; Morita, T.; Sumitomo, T.; Koga, H.

    1995-12-31

    The sound emitted during the arc welding process is closely related to the welding phenomenon, and sometimes it provides useful information for monitoring and controlling the welding process. It is important to use different kinds of information to control the welding process to improve the quality of controlling system, especially for underwater welding. Because the recovery process of weld defects is a time and money consuming matter, and sometimes it is difficult to monitor the arc condition by a visual system. The fundamental analysis of acoustic signals and their relations with the other parameters such as arc voltage, arc current and a vibration of weld plate had been carried out in order to understand the feature of acoustic signals and to develop effective signal processing algorithm. All of the data were recorded by the cassette recorder. After the experiment was completed, the analysis of recorded data was carried out by using of a signal processor and a computer system.

  1. A novel multipitch measurement algorithm for acoustic signals of moving targets

    NASA Astrophysics Data System (ADS)

    Huang, Jingchang; Guo, Feng; Zu, Xingshui; Li, Haiyan; Liu, Huawei; Li, Baoqing

    2016-12-01

    In this paper, a novel multipitch measurement (MPM) method is proposed for acoustic signals. Starting from the analysis of moving targets' acoustic signatures, a pitch-based harmonics representation model of acoustic signal is put forward. According to the proposed harmonics model, a modified greatest common divisor (MGCD) method is developed to obtain an initial multipitch set (IMS). Subsequently, the harmonic number vector (HNV) associated with the IMS is determined by maximizing the objective function formulated as a multi-impulse-train weighted symmetric average magnitude sum function (SAMSF) of the observed signal. The frequencies of SAMSF are determined by the target acoustic signal, the periods of the multi-impulse-train are governed by the estimated IMS harmonics and the maximization of the objective function is figured out through a time-domain matching of periodicities of the multi-impulse-train with that of the SAMSF. Finally, by using the obtained IMS and its HNV, a precise fundamental frequency set is achieved. Evaluation of the algorithm performances in comparison with state-of-the-art methods indicates that MPM is practical for the multipitch extraction of moving targets.

  2. Acoustic Signal Processing for Pipe Condition Assessment (WaterRF Report 4360)

    EPA Science Inventory

    Unique to prestressed concrete cylinder pipe (PCCP), individual wire breaks create an excitation in the pipe wall that may vary in response to the remaining compression of the pipe core. This project was designed to improve acoustic signal processing for pipe condition assessment...

  3. Infrasonic and seismic signals from earthquakes and explosions observed with Plostina seismo-acoustic array

    NASA Astrophysics Data System (ADS)

    Ghica, D.; Ionescu, C.

    2012-04-01

    Plostina seismo-acoustic array has been recently deployed by the National Institute for Earth Physics in the central part of Romania, near the Vrancea epicentral area. The array has a 2.5 km aperture and consists of 7 seismic sites (PLOR) and 7 collocated infrasound instruments (IPLOR). The array is being used to assess the importance of collocated seismic and acoustic sensors for the purposes of (1) seismic monitoring of the local and regional events, and (2) acoustic measurement, consisting of detection of the infrasound events (explosions, mine and quarry blasts, earthquakes, aircraft etc.). This paper focuses on characterization of infrasonic and seismic signals from the earthquakes and explosions (accidental and mining type). Two Vrancea earthquakes with magnitude above 5.0 were selected to this study: one occurred on 1st of May 2011 (MD = 5.3, h = 146 km), and the other one, on 4th October 2011 (MD = 5.2, h = 142 km). The infrasonic signals from the earthquakes have the appearance of the vertical component of seismic signals. Because the mechanism of the infrasonic wave formation is the coupling of seismic waves with the atmosphere, trace velocity values for such signals are compatible with the characteristics of the various seismic phases observed with PLOR array. The study evaluates and characterizes, as well, infrasound and seismic data recorded from the explosion caused by the military accident produced at Evangelos Florakis Naval Base, in Cyprus, on 11th July 2011. Additionally, seismo-acoustic signals presumed to be related to strong mine and quarry blasts were investigated. Ground truth of mine observations provides validation of this interpretation. The combined seismo-acoustic analysis uses two types of detectors for signal identification: one is the automatic detector DFX-PMCC, applied for infrasound detection and characterization, while the other one, which is used for seismic data, is based on array processing techniques (beamforming and frequency

  4. Information-bearing acoustic change outperforms duration in predicting intelligibility of full-spectrum and noise-vocoded sentences.

    PubMed

    Stilp, Christian E

    2014-03-01

    Recent research has demonstrated a strong relationship between information-bearing acoustic changes in the speech signal and speech intelligibility. The availability of information-bearing acoustic changes reliably predicts intelligibility of full-spectrum [Stilp and Kluender (2010). Proc. Natl. Acad. Sci. U.S.A. 107(27), 12387-12392] and noise-vocoded sentences amid noise interruption [Stilp et al. (2013). J. Acoust. Soc. Am. 133(2), EL136-EL141]. However, other research reports that proportion of signal duration preserved also predicts intelligibility of noise-interrupted speech. These factors have only ever been investigated independently, obscuring whether one better explains speech perception. The present experiments manipulated both factors to answer this question. A broad range of sentence durations (160-480 ms) containing high or low information-bearing acoustic changes were replaced by speech-shaped noise in noise-vocoded (Experiment 1) and full-spectrum sentences (Experiment 2). Sentence intelligibility worsened with increasing noise replacement, but in both experiments, information-bearing acoustic change was a statistically superior predictor of performance. Perception relied more heavily on information-bearing acoustic changes in poorer listening conditions (in spectrally degraded sentences and amid increasing noise replacement). Highly linear relationships between measures of information and performance suggest that exploiting information-bearing acoustic change is a shared principle underlying perception of acoustically rich and degraded speech. Results demonstrate the explanatory power of information-theoretic approaches for speech perception.

  5. Cumulative and Synergistic Effects of Physical, Biological, and Acoustic Signals on Marine Mammal Habitat Use

    DTIC Science & Technology

    2011-09-30

    Biological, and Acoustic Signals on Marine Mammal Habitat Use Jennifer L. Miksis-Olds Applied Research Laboratory The Pennsylvania State University PO...signals impact marine mammal prey and resulting marine mammal habitat use. This is especially critical in areas like the Bering Sea where global climate...animal presence and habitat use. Objective 1: What effect do changing sea ice dynamics have on zooplankton populations? a) How does zooplankton

  6. Differentiating speech and nonspeech sounds via amplitude envelope cues

    NASA Astrophysics Data System (ADS)

    Lehnhoff, Robert J.; Strange, Winifred; Long, Glenis

    2001-05-01

    Recent evidence from neuroscience and behavioral speech science suggests that the temporal modulation pattern of the speech signal plays a distinctive role in speech perception. As a first step in exploring the nature of the perceptually relevant information in the temporal pattern of speech, this experiment examined whether speech versus nonspeech environmental sounds could be differentiated on the basis of their amplitude envelopes. Conversational speech was recorded from native speakers of six different languages (French, German, Hebrew, Hindi, Japanese, and Russian) along with samples of their English. Nonspeech sounds included animal vocalizations, water sounds, and other environmental sounds (e.g., thunder). The stimulus set included 30 2-s speech segments and 30 2-s nonspeech events. Frequency information was removed from all stimuli using a technique described by Dorman et al. [J. Acoust. Soc. Am. 102 (1997)]. Nine normal-hearing adult listeners participated in the experiment. Subjects decided whether each sound was (originally) speech or nonspeech and rated their confidence (7-point Likert scale). Overall, subjects differentiated speech from nonspeech very accurately (84% correct). Only 12 stimuli were not correctly categorized at greater than chance levels. Acoustical analysis is underway to determine what parameters of the amplitude envelope differentiate speech from nonspeech sounds.

  7. Differentiating speech and nonspeech sounds via amplitude envelope cues

    NASA Astrophysics Data System (ADS)

    Lehnhoff, Robert J.; Strange, Winifred; Long, Glenis

    2004-05-01

    Recent evidence from neuroscience and behavioral speech science suggests that the temporal modulation pattern of the speech signal plays a distinctive role in speech perception. As a first step in exploring the nature of the perceptually relevant information in the temporal pattern of speech, this experiment examined whether speech versus nonspeech environmental sounds could be differentiated on the basis of their amplitude envelopes. Conversational speech was recorded from native speakers of six different languages (French, German, Hebrew, Hindi, Japanese, and Russian) along with samples of their English. Nonspeech sounds included animal vocalizations, water sounds, and other environmental sounds (e.g., thunder). The stimulus set included 30 2-s speech segments and 30 2-s nonspeech events. Frequency information was removed from all stimuli using a technique described by Dorman et al. [J. Acoust. Soc. Am. 102 (1997)]. Nine normal-hearing adult listeners participated in the experiment. Subjects decided whether each sound was (originally) speech or nonspeech and rated their confidence (7-point Likert scale). Overall, subjects differentiated speech from nonspeech very accurately (84% correct). Only 12 stimuli were not correctly categorized at greater than chance levels. Acoustical analysis is underway to determine what parameters of the amplitude envelope differentiate speech from nonspeech sounds.

  8. Zipf's law in short-time timbral codings of speech, music, and environmental sound signals.

    PubMed

    Haro, Martín; Serrà, Joan; Herrera, Perfecto; Corral, Alvaro

    2012-01-01

    Timbre is a key perceptual feature that allows discrimination between different sounds. Timbral sensations are highly dependent on the temporal evolution of the power spectrum of an audio signal. In order to quantitatively characterize such sensations, the shape of the power spectrum has to be encoded in a way that preserves certain physical and perceptual properties. Therefore, it is common practice to encode short-time power spectra using psychoacoustical frequency scales. In this paper, we study and characterize the statistical properties of such encodings, here called timbral code-words. In particular, we report on rank-frequency distributions of timbral code-words extracted from 740 hours of audio coming from disparate sources such as speech, music, and environmental sounds. Analogously to text corpora, we find a heavy-tailed Zipfian distribution with exponent close to one. Importantly, this distribution is found independently of different encoding decisions and regardless of the audio source. Further analysis on the intrinsic characteristics of most and least frequent code-words reveals that the most frequent code-words tend to have a more homogeneous structure. We also find that speech and music databases have specific, distinctive code-words while, in the case of the environmental sounds, this database-specific code-words are not present. Finally, we find that a Yule-Simon process with memory provides a reasonable quantitative approximation for our data, suggesting the existence of a common simple generative mechanism for all considered sound sources.

  9. Zipf's Law in Short-Time Timbral Codings of Speech, Music, and Environmental Sound Signals

    PubMed Central

    Haro, Martín; Serrà, Joan; Herrera, Perfecto; Corral, Álvaro

    2012-01-01

    Timbre is a key perceptual feature that allows discrimination between different sounds. Timbral sensations are highly dependent on the temporal evolution of the power spectrum of an audio signal. In order to quantitatively characterize such sensations, the shape of the power spectrum has to be encoded in a way that preserves certain physical and perceptual properties. Therefore, it is common practice to encode short-time power spectra using psychoacoustical frequency scales. In this paper, we study and characterize the statistical properties of such encodings, here called timbral code-words. In particular, we report on rank-frequency distributions of timbral code-words extracted from 740 hours of audio coming from disparate sources such as speech, music, and environmental sounds. Analogously to text corpora, we find a heavy-tailed Zipfian distribution with exponent close to one. Importantly, this distribution is found independently of different encoding decisions and regardless of the audio source. Further analysis on the intrinsic characteristics of most and least frequent code-words reveals that the most frequent code-words tend to have a more homogeneous structure. We also find that speech and music databases have specific, distinctive code-words while, in the case of the environmental sounds, this database-specific code-words are not present. Finally, we find that a Yule-Simon process with memory provides a reasonable quantitative approximation for our data, suggesting the existence of a common simple generative mechanism for all considered sound sources. PMID:22479497

  10. Antifade sonar employs acoustic field diversity to recover signals from multipath fading

    SciTech Connect

    Lubman, D.

    1996-04-01

    Co-located pressure and particle motion (PM) hydrophones together with four-channel diversity combiners may be used to recover signals from multipath fading. Multipath fading is important in both shallow and deep water propagation and can be an important source of signal loss. The acoustic field diversity concept arises from the notion of conservation of signal energy and the observation that in rooms at least, the total acoustic energy density is the sum of potential energy (scalar field-sound pressure) and kinetic energy (vector field-sound PM) portions. One pressure hydrophone determines acoustic potential energy density at a point. In principle, three PM sensors (displacement, velocity, or acceleration) directed along orthogonal axes describe the kinetic energy density at a point. For a single plane wave, the time-averaged potential and kinetic field energies are identical everywhere. In multipath interference, however, potential and kinetic field energies at a point are partitioned unequally, depending mainly on relative signal phases. Thus, when pressure signals are in deep fade, abundant kinetic field signal energy may be available at that location. Performance benefits require a degree of uncorrelated fading between channels. The expectation of nearly uncorrelated fading is motivated from room theory. Performance benefits for sonar limited by independent Rayleigh fading are suggested by analogy to antifade radio. Average SNR can be improved by several decibels, holding time on target is multiplied manifold, and the bit error rate for data communication is reduced substantially. {copyright} {ital 1996 American Institute of Physics.}

  11. Hidden Markov models in automatic speech recognition

    NASA Astrophysics Data System (ADS)

    Wrzoskowicz, Adam

    1993-11-01

    This article describes a method for constructing an automatic speech recognition system based on hidden Markov models (HMMs). The author discusses the basic concepts of HMM theory and the application of these models to the analysis and recognition of speech signals. The author provides algorithms which make it possible to train the ASR system and recognize signals on the basis of distinct stochastic models of selected speech sound classes. The author describes the specific components of the system and the procedures used to model and recognize speech. The author discusses problems associated with the choice of optimal signal detection and parameterization characteristics and their effect on the performance of the system. The author presents different options for the choice of speech signal segments and their consequences for the ASR process. The author gives special attention to the use of lexical, syntactic, and semantic information for the purpose of improving the quality and efficiency of the system. The author also describes an ASR system developed by the Speech Acoustics Laboratory of the IBPT PAS. The author discusses the results of experiments on the effect of noise on the performance of the ASR system and describes methods of constructing HMM's designed to operate in a noisy environment. The author also describes a language for human-robot communications which was defined as a complex multilevel network from an HMM model of speech sounds geared towards Polish inflections. The author also added mandatory lexical and syntactic rules to the system for its communications vocabulary.

  12. Seismic and Acoustic Array Monitoring of Signal from Tungurahua Volcano, Ecuador

    NASA Astrophysics Data System (ADS)

    Terbush, B. R.; Anthony, R. E.; Johnson, J. B.; Ruiz, M. C.

    2012-12-01

    Tungurahua Volcano is an active stratovolcano located in Ecuador's eastern Cordillera. Since its most recent cycle of eruptive activity, beginning in 1999, it has produced both strombolian-to-vulcanian eruptions, and regular vapor emissions. Tungurahua is located above the city of Baños, so volcanic activity is well-monitored by Ecuador's Instituto Geofisico Nacional with a seismic and infrasound network, and other surveillance tools. Toward better understanding of the complex seismic and acoustic signals associated with low-level Tungurahua activity, and which are often low in signal-to-noise, we deployed temporary seismo-acoustic arrays between June 9th and 20th in 2012. This deployment was part of a Field Volcano Geophysics class, a collaboration between New Mexico Institute of Mining and Technology and the Escuela Politecnica Nacional's Instituto Geofísico in Ecuador. Two six-element arrays were deployed on the flank of the volcano. A seismo-acoustic array, which consisted of combined broadband seismic and infrasound sensors, possessed 100-meter spacing, and was deployed five kilometers north of the vent in an open field at 2700 m. The second array had only acoustic sensors with 30-meter spacing, and was deployed approximately six kilometers northwest of the vent, on an old pyroclastic flow deposit. The arrays picked up signals from four distinct explosion events, a number of diverse tremor signals, local volcano tectonic and long period earthquakes, and a regional tectonic event of magnitude 4.9. Coherency of both seismic and acoustic array data was quantified using Fisher Statistics, which was effective for identifying myriad signals. For most signals Fisher Statistics were particularly high in low frequency bands, between 0.5 and 2 Hz. Array analyses helped to filter out noise induced by cultural sources and livestock signals, which were particularly pronounced in the deployment site. Volcan Tungurahua sources were considered plane wave signals and could

  13. Depression Diagnoses and Fundamental Frequency-Based Acoustic Cues in Maternal Infant-Directed Speech

    PubMed Central

    Porritt, Laura L.; Zinser, Michael C.; Bachorowski, Jo-Anne; Kaplan, Peter S.

    2013-01-01

    F0-based acoustic measures were extracted from a brief, sentence-final target word spoken during structured play interactions between mothers and their 3- to 14-month-old infants, and were analyzed based on demographic variables and DSM-IV Axis-I clinical diagnoses and their common modifiers. F0 range (ΔF0) was negatively correlated with infant age and number of children. ΔF0 was significantly smaller in clinically depressed mothers and mothers diagnosed with depression in partial remission, relative to non-depressed mothers, mothers diagnosed with depression in full remission, and those diagnosed with depressive disorder not otherwise specified. ΔF0 was significantly lower in mothers experiencing their first major depressive episode relative to mothers with recurrent depression. Deficits in ΔF0 were specific to diagnosed clinical depression, and were not well predicted by elevated self-report scores only, or by diagnosed anxiety disorders. Mothers with higher ΔF0 had infants with reportedly larger productive vocabularies, but depression was unrelated to vocabulary development. Implications for cognitive-linguistic development are discussed. PMID:24489521

  14. Fatigue Level Estimation of Bill Based on Acoustic Signal Feature by Supervised SOM

    NASA Astrophysics Data System (ADS)

    Teranishi, Masaru; Omatu, Sigeru; Kosaka, Toshihisa

    Fatigued bills have harmful influence on daily operation of Automated Teller Machine(ATM). To make the fatigued bills classification more efficient, development of an automatic fatigued bill classification method is desired. We propose a new method to estimate bending rigidity of bill from acoustic signal feature of banking machines. The estimated bending rigidities are used as continuous fatigue level for classification of fatigued bill. By using the supervised Self-Organizing Map(supervised SOM), we estimate the bending rigidity from only the acoustic energy pattern effectively. The experimental result with real bill samples shows the effectiveness of the proposed method.

  15. Time-frequency Analysis for Acoustic Emission Signals of Hypervelocity Impact

    NASA Astrophysics Data System (ADS)

    Liu, W. G.; Pang, B. J.; Zhang, W.; Sun, F.; Guan, G. S.

    The risk of collision of man-made orbital debris with spacecraft in near Earth orbits continues to increase A major of the space debris between 1mm and 10mm can t be well tracked in Earth orbits Damage from these un-tracked debris impacts is a serious hazard to aircraft and spacecraft These on-orbit collisions occur at velocities exceeding 10km s and at these velocities even very small particles can create significant damage The development of in-situ impact detecting system is indispensable for protecting the spacecraft from tragedy malfunction by the debris Acoustic Emission AE detecting technique has been recognized as an important technology for non-destructive detecting due to the AE signals offering a potentially useful additional means of non-invasively gathering concerning the state of spacecrafts Also Acoustic emission health monitoring is able to detect locate and assess impact damage when the spacecrafts is impacted by hypervelocity space debris and micrometeoroids This information can help operators and designers at the ground station take effective measures to maintain the function of spacecraft In this article Acoustic emission AE is used for characterization and location for hypervelocity Impacts Two different Acoustic Emission AE sensors were used to detect the arrival time and signals of the hits Hypervelocity Impacts were generated with a two-stage light-gas gun firing small Aluminum ball projectiles 4mm 6 4mm In the impact studies the signals were recorded with Disp AEwin PAC instruments by the conventional crossing

  16. Acoustic tweezers for studying intracellular calcium signaling in SKBR-3 human breast cancer cells.

    PubMed

    Hwang, Jae Youn; Yoon, Chi Woo; Lim, Hae Gyun; Park, Jin Man; Yoon, Sangpil; Lee, Jungwoo; Shung, K Kirk

    2015-12-01

    Extracellular matrix proteins such as fibronectin (FNT) play crucial roles in cell proliferation, adhesion, and migration. For better understanding of these associated cellular activities, various microscopic manipulation tools have been used to study their intracellular signaling pathways. Recently, it has appeared that acoustic tweezers may possess similar capabilities in the study. Therefore, we here demonstrate that our newly developed acoustic tweezers with a high-frequency lithium niobate ultrasonic transducer have potentials to study intracellular calcium signaling by FNT-binding to human breast cancer cells (SKBR-3). It is found that intracellular calcium elevations in SKBR-3 cells, initially occurring on the microbead-contacted spot and then eventually spreading over the entire cell, are elicited by attaching an acoustically trapped FNT-coated microbead. Interestingly, they are suppressed by either extracellular calcium elimination or phospholipase C (PLC) inhibition. Hence, this suggests that our acoustic tweezers may serve as an alternative tool in the study of intracellular signaling by FNT-binding activities.

  17. Data quality enhancement and knowledge discovery from relevant signals in acoustic emission

    NASA Astrophysics Data System (ADS)

    Mejia, Felipe; Shyu, Mei-Ling; Nanni, Antonio

    2015-10-01

    The increasing popularity of structural health monitoring has brought with it a growing need for automated data management and data analysis tools. Of great importance are filters that can systematically detect unwanted signals in acoustic emission datasets. This study presents a semi-supervised data mining scheme that detects data belonging to unfamiliar distributions. This type of outlier detection scheme is useful detecting the presence of new acoustic emission sources, given a training dataset of unwanted signals. In addition to classifying new observations (herein referred to as "outliers") within a dataset, the scheme generates a decision tree that classifies sub-clusters within the outlier context set. The obtained tree can be interpreted as a series of characterization rules for newly-observed data, and they can potentially describe the basic structure of different modes within the outlier distribution. The data mining scheme is first validated on a synthetic dataset, and an attempt is made to confirm the algorithms' ability to discriminate outlier acoustic emission sources from a controlled pencil-lead-break experiment. Finally, the scheme is applied to data from two fatigue crack-growth steel specimens, where it is shown that extracted rules can adequately describe crack-growth related acoustic emission sources while filtering out background "noise." Results show promising performance in filter generation, thereby allowing analysts to extract, characterize, and focus only on meaningful signals.

  18. Signal classification and event reconstruction for acoustic neutrino detection in sea water with KM3NeT

    NASA Astrophysics Data System (ADS)

    Kießling, Dominik

    2017-03-01

    The research infrastructure KM3NeT will comprise a multi cubic kilometer neutrino telescope that is currently being constructed in the Mediterranean Sea. Modules with optical and acoustic sensors are used in the detector. While the main purpose of the acoustic sensors is the position calibration of the detection units, they can be used as instruments for studies on acoustic neutrino detection, too. In this article, methods for signal classification and event reconstruction for acoustic neutrino detectors will be presented, which were developed using Monte Carlo simulations. For the signal classification the disk-like emission pattern of the acoustic neutrino signal is used. This approach improves the suppression of transient background by several orders of magnitude. Additionally, an event reconstruction is developed based on the signal classification. An overview of these algorithms will be presented and the efficiency of the classification will be discussed. The quality of the event reconstruction will also be presented.

  19. Robust Speech Rate Estimation for Spontaneous Speech

    PubMed Central

    Wang, Dagen; Narayanan, Shrikanth S.

    2010-01-01

    In this paper, we propose a direct method for speech rate estimation from acoustic features without requiring any automatic speech transcription. We compare various spectral and temporal signal analysis and smoothing strategies to better characterize the underlying syllable structure to derive speech rate. The proposed algorithm extends the methods of spectral subband correlation by including temporal correlation and the use of prominent spectral subbands for improving the signal correlation essential for syllable detection. Furthermore, to address some of the practical robustness issues in previously proposed methods, we introduce some novel components into the algorithm such as the use of pitch confidence for filtering spurious syllable envelope peaks, magnifying window for tackling neighboring syllable smearing, and relative peak measure thresholds for pseudo peak rejection. We also describe an automated approach for learning algorithm parameters from data, and find the optimal settings through Monte Carlo simulations and parameter sensitivity analysis. Final experimental evaluations are conducted based on a portion of the Switchboard corpus for which manual phonetic segmentation information, and published results for direct comparison are available. The results show a correlation coefficient of 0.745 with respect to the ground truth based on manual segmentation. This result is about a 17% improvement compared to the current best single estimator and a 11% improvement over the multiestimator evaluated on the same Switchboard database. PMID:20428476

  20. Temperature and Pressure Dependence of Signal Amplitudes for Electrostriction Laser-Induced Thermal Acoustics

    NASA Technical Reports Server (NTRS)

    Herring, Gregory C.

    2015-01-01

    The relative signal strength of electrostriction-only (no thermal grating) laser-induced thermal acoustics (LITA) in gas-phase air is reported as a function of temperature T and pressure P. Measurements were made in the free stream of a variable Mach number supersonic wind tunnel, where T and P are varied simultaneously as Mach number is varied. Using optical heterodyning, the measured signal amplitude (related to the optical reflectivity of the acoustic grating) was averaged for each of 11 flow conditions and compared to the expected theoretical dependence of a pure-electrostriction LITA process, where the signal is proportional to the square root of [P*P /( T*T*T)].

  1. Multi-scale morphology analysis of acoustic emission signal and quantitative diagnosis for bearing fault

    NASA Astrophysics Data System (ADS)

    Wang, Wen-Jing; Cui, Ling-Li; Chen, Dao-Yun

    2016-04-01

    Monitoring of potential bearing faults in operation is of critical importance to safe operation of high speed trains. One of the major challenges is how to differentiate relevant signals to operational conditions of bearings from noises emitted from the surrounding environment. In this work, we report a procedure for analyzing acoustic emission signals collected from rolling bearings for diagnosis of bearing health conditions by examining their morphological pattern spectrum (MPS) through a multi-scale morphology analysis procedure. The results show that acoustic emission signals resulted from a given type of bearing faults share rather similar MPS curves. Further examinations in terms of sample entropy and Lempel-Ziv complexity of MPS curves suggest that these two parameters can be utilized to determine damage modes.

  2. Abnormal cortical processing of the syllable rate of speech in poor readers

    PubMed Central

    Abrams, Daniel A.; Nicol, Trent; Zecker, Steven; Kraus, Nina

    2009-01-01

    Children with reading impairments have long been associated with impaired perception for rapidly presented acoustic stimuli and recently have shown deficits for slower features. It is not known whether impairments for low-frequency acoustic features negatively impact processing of speech in reading impaired individuals. Here we provide neurophysiological evidence that poor readers have impaired representation of the speech envelope, the acoustical cue that provides syllable pattern information in speech. We measured cortical-evoked potentials in response to sentence stimuli and found that good readers indicated consistent right-hemisphere dominance in auditory cortex for all measures of speech envelope representation, including the precision, timing and magnitude of cortical responses. Poor readers showed abnormal patterns of cerebral asymmetry for all measures of speech envelope representation. Moreover, cortical measures of speech envelope representation predicted up to 44% of the variability in standardized reading scores and 50% in measures of phonological processing across a wide range of abilities. Findings strongly support a relationship between acoustic-level processing and higher-level language abilities, and are the first to link reading ability with cortical processing of low-frequency acoustic features in the speech signal. Results also support the hypothesis that asymmetric routing between cerebral hemispheres represents an important mechanism for temporal encoding in the human auditory system, and the need for an expansion of the temporal processing hypothesis for reading-disabilities to encompass impairments for a wider range of speech features than previously acknowledged. PMID:19535580

  3. [Research on Time-frequency Characteristics of Magneto-acoustic Signal of Different Thickness Medium Based on Wave Summing Method].

    PubMed

    Zhang, Shunqi; Yin, Tao; Ma, Ren; Liu, Zhipeng

    2015-08-01

    Functional imaging method of biological electrical characteristics based on magneto-acoustic effect gives valuable information of tissue in early tumor diagnosis, therein time and frequency characteristics analysis of magneto-acoustic signal is important in image reconstruction. This paper proposes wave summing method based on Green function solution for acoustic source of magneto-acoustic effect. Simulations and analysis under quasi 1D transmission condition are carried out to time and frequency characteristics of magneto-acoustic signal of models with different thickness. Simulation results of magneto-acoustic signal were verified through experiments. Results of the simulation with different thickness showed that time-frequency characteristics of magneto-acoustic signal reflected thickness of sample. Thin sample, which is less than one wavelength of pulse, and thick sample, which is larger than one wavelength, showed different summed waveform and frequency characteristics, due to difference of summing thickness. Experimental results verified theoretical analysis and simulation results. This research has laid a foundation for acoustic source and conductivity reconstruction to the medium with different thickness in magneto-acoustic imaging.

  4. Automatic Speech Recognition Based on Electromyographic Biosignals

    NASA Astrophysics Data System (ADS)

    Jou, Szu-Chen Stan; Schultz, Tanja

    This paper presents our studies of automatic speech recognition based on electromyographic biosignals captured from the articulatory muscles in the face using surface electrodes. We develop a phone-based speech recognizer and describe how the performance of this recognizer improves by carefully designing and tailoring the extraction of relevant speech feature toward electromyographic signals. Our experimental design includes the collection of audibly spoken speech simultaneously recorded as acoustic data using a close-speaking microphone and as electromyographic signals using electrodes. Our experiments indicate that electromyographic signals precede the acoustic signal by about 0.05-0.06 seconds. Furthermore, we introduce articulatory feature classifiers, which had recently shown to improved classical speech recognition significantly. We describe that the classification accuracy of articulatory features clearly benefits from the tailored feature extraction. Finally, these classifiers are integrated into the overall decoding framework applying a stream architecture. Our final system achieves a word error rate of 29.9% on a 100-word recognition task.

  5. Some Interactions of Speech Rate, Signal Distortion, and Certain Linguistic Factors in Listening Comprehension. Professional Paper No. 39-68.

    ERIC Educational Resources Information Center

    Sticht, Thomas G.

    This experiment was designed to determine the relative effects of speech rate and signal distortion due to the time-compression process on listening comprehension. In addition, linguistic factors--including sequencing of random words into story form, and inflection and phraseology--were qualitatively considered for their effects on listening…

  6. (A new time of flight) Acoustic flow meter using wide band signals and adaptive beamforming techniques

    NASA Astrophysics Data System (ADS)

    Murgan, I.; Ioana, C.; Candel, I.; Anghel, A.; Ballester, J. L.; Reeb, B.; Combes, G.

    2016-11-01

    In this paper we present the result of our research concerning the improvement of acoustic time of flight flow metering for water pipes. Current flow meters are based on the estimation of direct time of flight by matched filtering of the received and emitted signals by acoustic transducers. Currently, narrow band signals are used, as well as a single emitter/receptor transducer configuration. Although simple, this configuration presents a series of limitations such as energy losses due to pipe wall/water interface, pressure/flow transients, sensitivity to flow induced vibrations, acoustic beam deformations and shift due to changes in flow velocity and embedded turbulence in the flow. The errors associated with these limitations reduce the overall robustness of existing flow meters, as well as the measured flow rate range and lower accuracy. In order to overcome these limitations, two major innovations were implemented at the signal processing level. The first one concerns the use of wide band signals that optimise the power transfer throughout the acoustic path and also increase the number of velocity/flow readings per second. Using wide band signals having a high duration-bandwidth product increases the precision in terms of time of flight measurements and, in the same time, improves the system robustness. The second contribution consists in the use of a multiple emitter - multiple receivers configuration (for one path) in order to compensate the emitted acoustic beam shift, compensate the time of flight estimation errors and thus increase the flow meter's robustness in case of undesired effects such as the “flow blow” and transient/rapid flow rate/velocity changes. Using a new signal processing algorithm that take advantage of the controlled wide band content coming from multiple receivers, the new flow meters achieves a higher accuracy in terms of flow velocity over a wider velocity range than existing systems. Tests carried out on real scale experimental

  7. Neural Oscillations Carry Speech Rhythm through to Comprehension

    PubMed Central

    Peelle, Jonathan E.; Davis, Matthew H.

    2012-01-01

    A key feature of speech is the quasi-regular rhythmic information contained in its slow amplitude modulations. In this article we review the information conveyed by speech rhythm, and the role of ongoing brain oscillations in listeners’ processing of this content. Our starting point is the fact that speech is inherently temporal, and that rhythmic information conveyed by the amplitude envelope contains important markers for place and manner of articulation, segmental information, and speech rate. Behavioral studies demonstrate that amplitude envelope information is relied upon by listeners and plays a key role in speech intelligibility. Extending behavioral findings, data from neuroimaging – particularly electroencephalography (EEG) and magnetoencephalography (MEG) – point to phase locking by ongoing cortical oscillations to low-frequency information (~4–8 Hz) in the speech envelope. This phase modulation effectively encodes a prediction of when important events (such as stressed syllables) are likely to occur, and acts to increase sensitivity to these relevant acoustic cues. We suggest a framework through which such neural entrainment to speech rhythm can explain effects of speech rate on word and segment perception (i.e., that the perception of phonemes and words in connected speech is influenced by preceding speech rate). Neuroanatomically, acoustic amplitude modulations are processed largely bilaterally in auditory cortex, with intelligible speech resulting in differential recruitment of left-hemisphere regions. Notable among these is lateral anterior temporal cortex, which we propose functions in a domain-general fashion to support ongoing memory and integration of meaningful input. Together, the reviewed evidence suggests that low-frequency oscillations in the acoustic speech signal form the foundation of a rhythmic hierarchy supporting spoken language, mirrored by phase-locked oscillations in the human brain. PMID:22973251

  8. Speech research: Studies on the nature of speech, instrumentation for its investigation, and practical applications

    NASA Astrophysics Data System (ADS)

    Liberman, A. M.

    1982-03-01

    This report is one of a regular series on the status and progress of studies on the nature of speech, instrumentation for its investigation and practical applications. Manuscripts cover the following topics: Speech perception and memory coding in relation to reading ability; The use of orthographic structure by deaf adults: Recognition of finger-spelled letters; Exploring the information support for speech; The stream of speech; Using the acoustic signal to make inferences about place and duration of tongue-palate contact. Patterns of human interlimb coordination emerge from the the properties of nonlinear limit cycle oscillatory processes: Theory and data; Motor control: Which themes do we orchestrate? Exploring the nature of motor control in Down's syndrome; Periodicity and auditory memory: A pilot study; Reading skill and language skill: On the role of sign order and morphological structure in memory for American Sign Language sentences; Perception of nasal consonants with special reference to Catalan; and Speech production Characteristics of the hearing impaired.

  9. Ductile Deformation of Dehydrating Serpentinite Evidenced by Acoustic Signal Monitoring

    NASA Astrophysics Data System (ADS)

    Gasc, J.; Hilairet, N.; Wang, Y.; Schubnel, A. J.

    2012-12-01

    Serpentinite dehydration is believed to be responsible for triggering earthquakes at intermediate depths (i.e., 60-300 km) in subduction zones. Based on experimental results, some authors have proposed mechanisms that explain how brittle deformation can occur despite high pressure and temperature conditions [1]. However, reproducing microseismicity in the laboratory associated with the deformation of dehydrating serpentinite remains challenging. A recent study showed that, even for fast dehydration kinetics, ductile deformation could take place rather than brittle faulting in the sample [2]. This latter study was conducted in a multi-anvil apparatus without the ability to control differential stress during dehydration. We have since conducted controlled deformation experiments in the deformation-DIA (D-DIA) on natural serpentinite samples at sector 13 (GSECARS) of the APS. Monochromatic radiation was used with both a 2D MAR-CCD detector and a CCD camera to determine the stress and the strain of the sample during the deformation process [3]. In addition, an Acoustic Emission (AE) recording setup was used to monitor the microseismicity from the sample, using piezo-ceramic transducers glued on the basal truncation of the anvils. The use of six independent transducers allows locating the AEs and calculating the corresponding focal mechanisms. The samples were deformed at strain rates of 10-5-10-4 s-1 under confining pressures of 3-5 GPa. Dehydration was triggered during the deformation by heating the samples at rates ranging from 5 to 60 K/min. Before the onset of the dehydration, X-ray diffraction data showed that the serpentinite sustained ~1 GPa of stress which plummeted when dehydration occurred. Although AEs were recorded during the compression and decompression stages, no AEs ever accompanied this stress drop, suggesting ductile deformation of the samples. Hence, unlike many previous studies, no evidence for fluid embrittlement and anticrack generation was found

  10. Investigation of ELF Signals Associated with Mine Warfare: A University of Idaho and Acoustic Research Detachment Collaboration, Phase Three

    DTIC Science & Technology

    2012-07-01

    with Mine Warfare: A University of Idaho and Acoustic Research Detachment Collaboration, Phase Three 5a. CONTRACT NUMBER 5b. GRANT NUMBER...Warfare, A University of Idaho and Acoustic Research Detachment Collaboration, Phase Three.” Phase Three is a continuation of the Phase One and Two...of ELF Signals Associated with Mine Warfare: A University of Idaho and Acoustic Research Detachment Collaboration, Phase Three By Jeffrey L

  11. Extraction of fault component from abnormal sound in diesel engines using acoustic signals

    NASA Astrophysics Data System (ADS)

    Dayong, Ning; Changle, Sun; Yongjun, Gong; Zengmeng, Zhang; Jiaoyi, Hou

    2016-06-01

    In this paper a method for extracting fault components from abnormal acoustic signals and automatically diagnosing diesel engine faults is presented. The method named dislocation superimposed method (DSM) is based on the improved random decrement technique (IRDT), differential function (DF) and correlation analysis (CA). The aim of DSM is to linearly superpose multiple segments of abnormal acoustic signals because of the waveform similarity of faulty components. The method uses sample points at the beginning of time when abnormal sound appears as the starting position for each segment. In this study, the abnormal sound belonged to shocking faulty type; thus, the starting position searching method based on gradient variance was adopted. The coefficient of similar degree between two same sized signals is presented. By comparing with a similar degree, the extracted fault component could be judged automatically. The results show that this method is capable of accurately extracting the fault component from abnormal acoustic signals induced by faulty shocking type and the extracted component can be used to identify the fault type.

  12. Research on the characteristic of acoustic signal induced by thermoelastic mechanism

    NASA Astrophysics Data System (ADS)

    Zhou, Ju; Lei, Li Hua; Zhang, Jian Jun; Xue, Ming

    2016-10-01

    When a laser irradiates into the liquid medium, the medium absorbs the laser energy and induces sound source. As a new method to generate underwater sound wave, laser-acoustic has a variety of commercial and oceanographic applications on the information transmission between aerial and underwater platform, underwater target detection, marine environment measurement etc. due to its merits such as high acoustic intensity, spike pulse and wide frequency spectrum. According to different energy intensity of the laser pulse and the spatial and temporal distribution of energy interaction region, the mechanism of the laser interacting with water that generating sound are classified as thermoelastic, vaporization and optical breakdown mainly. Thermoelastic is an important mechanism of laser-acoustics. The characteristics of photoacoustic signal that induced by thermoelastic mechanism was summarized and analyzed comprehensively. According to different induce conditions, theoretical models of the photoacoustic signal induced by a δ pulse and a long pulse laser are summarized respectively, and its nature characteristic in the time domain and frequency domain were analyzed. Through simulation, the theoretical curve of the sound directivity was drawn. These studies will provide a reference for the practical application of laser-acoustics technology.

  13. Classroom Acoustics: The Problem, Impact, and Solution.

    ERIC Educational Resources Information Center

    Berg, Frederick S.; And Others

    1996-01-01

    This article describes aspects of classroom acoustics that interfere with the ability of listeners to understand speech. It considers impacts on students and teachers and offers four possible solutions: noise control, signal control without amplification, individual amplification systems, and sound field amplification systems. (Author/DB)

  14. Acoustic cardiac signals analysis: a Kalman filter-based approach.

    PubMed

    Salleh, Sheik Hussain; Hussain, Hadrina Sheik; Swee, Tan Tian; Ting, Chee-Ming; Noor, Alias Mohd; Pipatsart, Surasak; Ali, Jalil; Yupapin, Preecha P

    2012-01-01

    Auscultation of the heart is accompanied by both electrical activity and sound. Heart auscultation provides clues to diagnose many cardiac abnormalities. Unfortunately, detection of relevant symptoms and diagnosis based on heart sound through a stethoscope is difficult. The reason GPs find this difficult is that the heart sounds are of short duration and separated from one another by less than 30 ms. In addition, the cost of false positives constitutes wasted time and emotional anxiety for both patient and GP. Many heart diseases cause changes in heart sound, waveform, and additional murmurs before other signs and symptoms appear. Heart-sound auscultation is the primary test conducted by GPs. These sounds are generated primarily by turbulent flow of blood in the heart. Analysis of heart sounds requires a quiet environment with minimum ambient noise. In order to address such issues, the technique of denoising and estimating the biomedical heart signal is proposed in this investigation. Normally, the performance of the filter naturally depends on prior information related to the statistical properties of the signal and the background noise. This paper proposes Kalman filtering for denoising statistical heart sound. The cycles of heart sounds are certain to follow first-order Gauss-Markov process. These cycles are observed with additional noise for the given measurement. The model is formulated into state-space form to enable use of a Kalman filter to estimate the clean cycles of heart sounds. The estimates obtained by Kalman filtering are optimal in mean squared sense.

  15. Automatic detection of the dominant melody in acoustic musical signals

    NASA Astrophysics Data System (ADS)

    Klapuri, Anssi P.

    2005-09-01

    An auditory-model based method is described for estimating the fundamental frequency contour of the dominant melody in complex music signals. The core method consists of a conventional cochlear model followed by a novel periodicity analysis mechanism within the subbands. As the output, the method computes the salience (i.e., strength) of different fundamental frequency candidates in successive time frames. The maximum value of this vector in each frame can be used to indicate the dominant fundamental frequency directly. In addition, however, it was noted that the first-order time differential of the salience vector leads to an efficient use of temporal features which improve the performance in the presence of a large number of concurrent sounds. These temporal features include particularly the common amplitude or frequency modulation of the partials of the sound that is used to communicate the melody. A noise-suppression mechanism is described which improves the robustness of estimation in the presence of drums and percussive instruments. In evaluations, a database of complex music signals was used where the melody was manually annotated. Use of the method for music information retrieval and music summarization is discussed.

  16. Speech coding

    SciTech Connect

    Ravishankar, C., Hughes Network Systems, Germantown, MD

    1998-05-08

    Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the

  17. Ultrasonic speech translator and communications system

    DOEpatents

    Akerman, M.A.; Ayers, C.W.; Haynes, H.D.

    1996-07-23

    A wireless communication system undetectable by radio frequency methods for converting audio signals, including human voice, to electronic signals in the ultrasonic frequency range, transmitting the ultrasonic signal by way of acoustical pressure waves across a carrier medium, including gases, liquids, or solids, and reconverting the ultrasonic acoustical pressure waves back to the original audio signal. The ultrasonic speech translator and communication system includes an ultrasonic transmitting device and an ultrasonic receiving device. The ultrasonic transmitting device accepts as input an audio signal such as human voice input from a microphone or tape deck. The ultrasonic transmitting device frequency modulates an ultrasonic carrier signal with the audio signal producing a frequency modulated ultrasonic carrier signal, which is transmitted via acoustical pressure waves across a carrier medium such as gases, liquids or solids. The ultrasonic receiving device converts the frequency modulated ultrasonic acoustical pressure waves to a frequency modulated electronic signal, demodulates the audio signal from the ultrasonic carrier signal, and conditions the demodulated audio signal to reproduce the original audio signal at its output. 7 figs.

  18. Ultrasonic speech translator and communications system

    DOEpatents

    Akerman, M. Alfred; Ayers, Curtis W.; Haynes, Howard D.

    1996-01-01

    A wireless communication system undetectable by radio frequency methods for converting audio signals, including human voice, to electronic signals in the ultrasonic frequency range, transmitting the ultrasonic signal by way of acoustical pressure waves across a carrier medium, including gases, liquids, or solids, and reconverting the ultrasonic acoustical pressure waves back to the original audio signal. The ultrasonic speech translator and communication system (20) includes an ultrasonic transmitting device (100) and an ultrasonic receiving device (200). The ultrasonic transmitting device (100) accepts as input (115) an audio signal such as human voice input from a microphone (114) or tape deck. The ultrasonic transmitting device (100) frequency modulates an ultrasonic carrier signal with the audio signal producing a frequency modulated ultrasonic carrier signal, which is transmitted via acoustical pressure waves across a carrier medium such as gases, liquids or solids. The ultrasonic receiving device (200) converts the frequency modulated ultrasonic acoustical pressure waves to a frequency modulated electronic signal, demodulates the audio signal from the ultrasonic carrier signal, and conditions the demodulated audio signal to reproduce the original audio signal at its output (250).

  19. Early recognition of speech

    PubMed Central

    Remez, Robert E; Thomas, Emily F

    2013-01-01

    Classic research on the perception of speech sought to identify minimal acoustic correlates of each consonant and vowel. In explaining perception, this view designated momentary components of an acoustic spectrum as cues to the recognition of elementary phonemes. This conceptualization of speech perception is untenable given the findings of phonetic sensitivity to modulation independent of the acoustic and auditory form of the carrier. The empirical key is provided by studies of the perceptual organization of speech, a low-level integrative function that finds and follows the sensory effects of speech amid concurrent events. These projects have shown that the perceptual organization of speech is keyed to modulation; fast; unlearned; nonsymbolic; indifferent to short-term auditory properties; and organization requires attention. The ineluctably multisensory nature of speech perception also imposes conditions that distinguish language among cognitive systems. WIREs Cogn Sci 2013, 4:213–223. doi: 10.1002/wcs.1213 PMID:23926454

  20. An initial investigation into the real-time conversion of facial surface EMG signals to audible speech.

    PubMed

    Diener, Lorenz; Herff, Christian; Janke, Matthias; Schultz, Tanja

    2016-08-01

    This paper presents early-stage results of our investigations into the direct conversion of facial surface electromyographic (EMG) signals into audible speech in a real-time setting, enabling novel avenues for research and system improvement through real-time feedback. The system uses a pipeline approach to enable online acquisition of EMG data, extraction of EMG features, mapping of EMG features to audio features, synthesis of audio waveforms from audio features and output of the audio waveforms via speakers or headphones. Our system allows for performing EMG-to-Speech conversion with low latency and on a continuous stream of EMG data, enabling near instantaneous audio output during audible as well as silent speech production. In this paper, we present an analysis of our systems components for latency incurred, as well as the tradeoffs between conversion quality, latency and training duration required.

  1. Seismo-acoustic signals associated with degassing explosions recorded at Shishaldin Volcano, Alaska, 2003-2004

    USGS Publications Warehouse

    Petersen, T.

    2007-01-01

    In summer 2003, a Chaparral Model 2 microphone was deployed at Shishaldin Volcano, Aleutian Islands, Alaska. The pressure sensor was co-located with a short-period seismometer on the volcano’s north flank at a distance of 6.62 km from the active summit vent. The seismo-acoustic data exhibit a correlation between impulsive acoustic signals (1–2 Pa) and long-period (LP, 1–2 Hz) earthquakes. Since it last erupted in 1999, Shishaldin has been characterized by sustained seismicity consisting of many hundreds to two thousand LP events per day. The activity is accompanied by up to ∼200 m high discrete gas puffs exiting the small summit vent, but no significant eruptive activity has been confirmed. The acoustic waveforms possess similarity throughout the data set (July 2003–November 2004) indicating a repetitive source mechanism. The simplicity of the acoustic waveforms, the impulsive onsets with relatively short (∼10–20 s) gradually decaying codas and the waveform similarities suggest that the acoustic pulses are generated at the fluid–air interface within an open-vent system. SO2 measurements have revealed a low SO2 flux, suggesting a hydrothermal system with magmatic gases leaking through. This hypothesis is supported by the steady-state nature of Shishaldin’s volcanic system since 1999. Time delays between the seismic LP and infrasound onsets were acquired from a representative day of seismo-acoustic data. A simple model was used to estimate source depths. The short seismo-acoustic delay times have revealed that the seismic and acoustic sources are co-located at a depth of 240±200 m below the crater rim. This shallow depth is confirmed by resonance of the upper portion of the open conduit, which produces standing waves with f=0.3 Hz in the acoustic waveform codas. The infrasound data has allowed us to relate Shishaldin’s LP earthquakes to degassing explosions, created by gas volume ruptures from a fluid–air interface.

  2. Visual-tactile integration in speech perception: Evidence for modality neutral speech primitives.

    PubMed

    Bicevskis, Katie; Derrick, Donald; Gick, Bryan

    2016-11-01

    Audio-visual [McGurk and MacDonald (1976). Nature 264, 746-748] and audio-tactile [Gick and Derrick (2009). Nature 462(7272), 502-504] speech stimuli enhance speech perception over audio stimuli alone. In addition, multimodal speech stimuli form an asymmetric window of integration that is consistent with the relative speeds of the various signals [Munhall, Gribble, Sacco, and Ward (1996). Percept. Psychophys. 58(3), 351-362; Gick, Ikegami, and Derrick (2010). J. Acoust. Soc. Am. 128(5), EL342-EL346]. In this experiment, participants were presented video of faces producing /pa/ and /ba/ syllables, both alone and with air puffs occurring synchronously and at different timings up to 300 ms before and after the stop release. Perceivers were asked to identify the syllable they perceived, and were more likely to respond that they perceived /pa/ when air puffs were present, with asymmetrical preference for puffs following the video signal-consistent with the relative speeds of visual and air puff signals. The results demonstrate that visual-tactile integration of speech perception occurs much as it does with audio-visual and audio-tactile stimuli. This finding contributes to the understanding of multimodal speech perception, lending support to the idea that speech is not perceived as an audio signal that is supplemented by information from other modes, but rather that primitives of speech perception are, in principle, modality neutral.

  3. Speech perception as an active cognitive process.

    PubMed

    Heald, Shannon L M; Nusbaum, Howard C

    2014-01-01

    One view of speech perception is that acoustic signals are transformed into representations for pattern matching to determine linguistic structure. This process can be taken as a statistical pattern-matching problem, assuming realtively stable linguistic categories are characterized by neural representations related to auditory properties of speech that can be compared to speech input. This kind of pattern matching can be termed a passive process which implies rigidity of processing with few demands on cognitive processing. An alternative view is that speech recognition, even in early stages, is an active process in which speech analysis is attentionally guided. Note that this does not mean consciously guided but that information-contingent changes in early auditory encoding can occur as a function of context and experience. Active processing assumes that attention, plasticity, and listening goals are important in considering how listeners cope with adverse circumstances that impair hearing by masking noise in the environment or hearing loss. Although theories of speech perception have begun to incorporate some active processing, they seldom treat early speech encoding as plastic and attentionally guided. Recent research has suggested that speech perception is the product of both feedforward and feedback interactions between a number of brain regions that include descending projections perhaps as far downstream as the cochlea. It is important to understand how the ambiguity of the speech signal and constraints of context dynamically determine cognitive resources recruited during perception including focused attention, learning, and working memory. Theories of speech perception need to go beyond the current corticocentric approach in order to account for the intrinsic dynamics of the auditory encoding of speech. In doing so, this may provide new insights into ways in which hearing disorders and loss may be treated either through augementation or therapy.

  4. Particle Swarm Optimization Based Feature Enhancement and Feature Selection for Improved Emotion Recognition in Speech and Glottal Signals

    PubMed Central

    Muthusamy, Hariharan; Polat, Kemal; Yaacob, Sazali

    2015-01-01

    In the recent years, many research works have been published using speech related features for speech emotion recognition, however, recent studies show that there is a strong correlation between emotional states and glottal features. In this work, Mel-frequency cepstralcoefficients (MFCCs), linear predictive cepstral coefficients (LPCCs), perceptual linear predictive (PLP) features, gammatone filter outputs, timbral texture features, stationary wavelet transform based timbral texture features and relative wavelet packet energy and entropy features were extracted from the emotional speech (ES) signals and its glottal waveforms(GW). Particle swarm optimization based clustering (PSOC) and wrapper based particle swarm optimization (WPSO) were proposed to enhance the discerning ability of the features and to select the discriminating features respectively. Three different emotional speech databases were utilized to gauge the proposed method. Extreme learning machine (ELM) was employed to classify the different types of emotions. Different experiments were conducted and the results show that the proposed method significantly improves the speech emotion recognition performance compared to previous works published in the literature. PMID:25799141

  5. Particle swarm optimization based feature enhancement and feature selection for improved emotion recognition in speech and glottal signals.

    PubMed

    Muthusamy, Hariharan; Polat, Kemal; Yaacob, Sazali

    2015-01-01

    In the recent years, many research works have been published using speech related features for speech emotion recognition, however, recent studies show that there is a strong correlation between emotional states and glottal features. In this work, Mel-frequency cepstralcoefficients (MFCCs), linear predictive cepstral coefficients (LPCCs), perceptual linear predictive (PLP) features, gammatone filter outputs, timbral texture features, stationary wavelet transform based timbral texture features and relative wavelet packet energy and entropy features were extracted from the emotional speech (ES) signals and its glottal waveforms(GW). Particle swarm optimization based clustering (PSOC) and wrapper based particle swarm optimization (WPSO) were proposed to enhance the discerning ability of the features and to select the discriminating features respectively. Three different emotional speech databases were utilized to gauge the proposed method. Extreme learning machine (ELM) was employed to classify the different types of emotions. Different experiments were conducted and the results show that the proposed method significantly improves the speech emotion recognition performance compared to previous works published in the literature.

  6. Circuit for echo and noise suppression of acoustic signals transmitted through a drill string

    DOEpatents

    Drumheller, D.S.; Scott, D.D.

    1993-12-28

    An electronic circuit for digitally processing analog electrical signals produced by at least one acoustic transducer is presented. In a preferred embodiment of the present invention, a novel digital time delay circuit is utilized which employs an array of First-in-First-out (FiFo) microchips. Also, a bandpass filter is used at the input to this circuit for isolating drill string noise and eliminating high frequency output. 20 figures.

  7. Application of Acoustic Signal Processing Techniques for Improved Underwater Source Detection and Localization

    DTIC Science & Technology

    1988-08-31

    Systems Center, San Diego; the Electric Boat Division of General ambiguities in the beam patterns, provided the bearmforming is done with Dynamics. ] the...Am. Suppl. 1. Vol. 60. Fall 1986 112th Meeting: Acoustical Socity of America A wearable multichannel signal processor for stimulation of single... electrical dynamic range 1Hi4 & Channel interaction measured by forward-masked "pla of the patient. Several processor configurations with different resonator

  8. Sparsity-Based Representation for Classification Algorithms and Comparison Results for Transient Acoustic Signals

    DTIC Science & Technology

    2016-05-01

    large but correlated noise and signal interference (i.e., low-rank interference). Another contribution is the implementation of deep learning ...representation, low rank, deep learning 52 Tung-Duong Tran-Luu 301-394-3082Unclassified Unclassified Unclassified UU ii Approved for public release; distribution...is unlimited. Contents List of Figures v List of Tables vi 1. Introduction 1 1.1 Motivations 1 1.2 Sparsity-Based Representation for Transient Acoustic

  9. Non-invasive estimation of static and pulsatile intracranial pressure from transcranial acoustic signals.

    PubMed

    Levinsky, Alexandra; Papyan, Surik; Weinberg, Guy; Stadheim, Trond; Eide, Per Kristian

    2016-05-01

    The aim of the present study was to examine whether a method for estimation of non-invasive ICP (nICP) from transcranial acoustic (TCA) signals mixed with head-generated sounds estimate the static and pulsatile invasive ICP (iICP). For that purpose, simultaneous iICP and mixed TCA signals were obtained from patients undergoing continuous iICP monitoring as part of clinical management. The ear probe placed in the right outer ear channel sent a TCA signal with fixed frequency (621 Hz) that was picked up by the left ear probe along with acoustic signals generated by the intracranial compartment. Based on a mathematical model of the association between mixed TCA and iICP, the static and pulsatile nICP values were determined. Total 39 patients were included in the study; the total number of observations for prediction of static and pulsatile iICP were 5789 and 6791, respectively. The results demonstrated a good agreement between iICP/nICP observations, with mean difference of 0.39 mmHg and 0.53 mmHg for static and pulsatile ICP, respectively. In summary, in this cohort of patients, mixed TCA signals estimated the static and pulsatile iICP with rather good accuracy. Further studies are required to validate whether mixed TCA signals may become useful for measurement of nICP.

  10. Acoustic effects of the ATOC signal (75 Hz, 195 dB) on dolphins and whales.

    PubMed

    Au, W W; Nachtigall, P E; Pawloski, J L

    1997-05-01

    The Acoustic Thermometry of Ocean Climate (ATOC) program of Scripps Institution of Oceanography and the Applied Physics Laboratory, University of Washington, will broadcast a low-frequency 75-Hz phase modulated acoustic signal over ocean basins in order to study ocean temperatures on a global scale and examine the effects of global warming. One of the major concerns is the possible effect of the ATOC signal on marine life, especially on dolphins and whales. In order to address this issue, the hearing sensitivity of a false killer whale (Pseudorca crassidens) and a Risso's dolphin (Grampus griseus) to the ATOC sound was measured behaviorally. A staircase procedure with the signal levels being changed in 1-dB steps was used to measure the animals' threshold to the actual ATOC coded signal. The results indicate that small odontocetes such as the Pseudorca and Grampus swimming directly above the ATOC source will not hear the signal unless they dive to a depth of approximately 400 m. A sound propagation analysis suggests that the sound-pressure level at ranges greater than 0.5 km will be less than 130 dB for depths down to about 500 m. Several species of baleen whales produce sounds much greater than 170-180 dB. With the ATOC source on the axis of the deep sound channel (greater than 800 m), the ATOC signal will probably have minimal physical and physiological effects on cetaceans.

  11. Punch stretching process monitoring using acoustic emission signal analysis. II - Application of frequency domain deconvolution

    NASA Technical Reports Server (NTRS)

    Liang, Steven Y.; Dornfeld, David A.; Nickerson, Jackson A.

    1987-01-01

    The coloring effect on the acoustic emission signal due to the frequency response of the data acquisition/processing instrumentation may bias the interpretation of AE signal characteristics. In this paper, a frequency domain deconvolution technique, which involves the identification of the instrumentation transfer functions and multiplication of the AE signal spectrum by the inverse of these system functions, has been carried out. In this way, the change in AE signal characteristics can be better interpreted as the result of the change in only the states of the process. Punch stretching process was used as an example to demonstrate the application of the technique. Results showed that, through the deconvolution, the frequency characteristics of AE signals generated during the stretching became more distinctive and can be more effectively used as tools for process monitoring.

  12. Problems Associated with Statistical Pattern Recognition of Acoustic Emission Signals in a Compact Tension Fatigue Specimen

    NASA Technical Reports Server (NTRS)

    Hinton, Yolanda L.

    1999-01-01

    Acoustic emission (AE) data were acquired during fatigue testing of an aluminum 2024-T4 compact tension specimen using a commercially available AE system. AE signals from crack extension were identified and separated from noise spikes, signals that reflected from the specimen edges, and signals that saturated the instrumentation. A commercially available software package was used to train a statistical pattern recognition system to classify the signals. The software trained a network to recognize signals with a 91-percent accuracy when compared with the researcher's interpretation of the data. Reasons for the discrepancies are examined and it is postulated that additional preprocessing of the AE data to focus on the extensional wave mode and eliminate other effects before training the pattern recognition system will result in increased accuracy.

  13. Acoustics

    NASA Technical Reports Server (NTRS)

    Goodman, Jerry R.; Grosveld, Ferdinand

    2007-01-01

    The acoustics environment in space operations is important to maintain at manageable levels so that the crewperson can remain safe, functional, effective, and reasonably comfortable. High acoustic levels can produce temporary or permanent hearing loss, or cause other physiological symptoms such as auditory pain, headaches, discomfort, strain in the vocal cords, or fatigue. Noise is defined as undesirable sound. Excessive noise may result in psychological effects such as irritability, inability to concentrate, decrease in productivity, annoyance, errors in judgment, and distraction. A noisy environment can also result in the inability to sleep, or sleep well. Elevated noise levels can affect the ability to communicate, understand what is being said, hear what is going on in the environment, degrade crew performance and operations, and create habitability concerns. Superfluous noise emissions can also create the inability to hear alarms or other important auditory cues such as an equipment malfunctioning. Recent space flight experience, evaluations of the requirements in crew habitable areas, and lessons learned (Goodman 2003; Allen and Goodman 2003; Pilkinton 2003; Grosveld et al. 2003) show the importance of maintaining an acceptable acoustics environment. This is best accomplished by having a high-quality set of limits/requirements early in the program, the "designing in" of acoustics in the development of hardware and systems, and by monitoring, testing and verifying the levels to ensure that they are acceptable.

  14. Tracking the Speech Signal--Time-Locked MEG Signals during Perception of Ultra-Fast and Moderately Fast Speech in Blind and in Sighted Listeners

    ERIC Educational Resources Information Center

    Hertrich, Ingo; Dietrich, Susanne; Ackermann, Hermann

    2013-01-01

    Blind people can learn to understand speech at ultra-high syllable rates (ca. 20 syllables/s), a capability associated with hemodynamic activation of the central-visual system. To further elucidate the neural mechanisms underlying this skill, magnetoencephalographic (MEG) measurements during listening to sentence utterances were cross-correlated…

  15. The Acoustic Structure and Information Content of Female Koala Vocal Signals

    PubMed Central

    Charlton, Benjamin D.

    2015-01-01

    Determining the information content of animal vocalisations can give valuable insights into the potential functions of vocal signals. The source-filter theory of vocal production allows researchers to examine the information content of mammal vocalisations by linking variation in acoustic features with variation in relevant physical characteristics of the caller. Here I used a source-filter theory approach to classify female koala vocalisations into different call-types, and determine which acoustic features have the potential to convey important information about the caller to other conspecifics. A two-step cluster analysis classified female calls into bellows, snarls and tonal rejection calls. Additional results revealed that female koala vocalisations differed in their potential to provide information about a given caller’s phenotype that may be of importance to receivers. Female snarls did not contain reliable acoustic cues to the caller’s identity and age. In contrast, female bellows and tonal rejection calls were individually distinctive, and the tonal rejection calls of older female koalas had consistently lower mean, minimum and maximum fundamental frequency. In addition, female bellows were significantly shorter in duration and had higher fundamental frequency, formant frequencies, and formant frequency spacing than male bellows. These results indicate that female koala vocalisations have the potential to signal the caller’s identity, age and sex. I go on to discuss the anatomical basis for these findings, and consider the possible functional relevance of signalling this type of information in the koala’s natural habitat. PMID:26465340

  16. The Acoustic Structure and Information Content of Female Koala Vocal Signals.

    PubMed

    Charlton, Benjamin D

    2015-01-01

    Determining the information content of animal vocalisations can give valuable insights into the potential functions of vocal signals. The source-filter theory of vocal production allows researchers to examine the information content of mammal vocalisations by linking variation in acoustic features with variation in relevant physical characteristics of the caller. Here I used a source-filter theory approach to classify female koala vocalisations into different call-types, and determine which acoustic features have the potential to convey important information about the caller to other conspecifics. A two-step cluster analysis classified female calls into bellows, snarls and tonal rejection calls. Additional results revealed that female koala vocalisations differed in their potential to provide information about a given caller's phenotype that may be of importance to receivers. Female snarls did not contain reliable acoustic cues to the caller's identity and age. In contrast, female bellows and tonal rejection calls were individually distinctive, and the tonal rejection calls of older female koalas had consistently lower mean, minimum and maximum fundamental frequency. In addition, female bellows were significantly shorter in duration and had higher fundamental frequency, formant frequencies, and formant frequency spacing than male bellows. These results indicate that female koala vocalisations have the potential to signal the caller's identity, age and sex. I go on to discuss the anatomical basis for these findings, and consider the possible functional relevance of signalling this type of information in the koala's natural habitat.

  17. Moisture estimation in power transformer oil using acoustic signals and spectral kurtosis

    NASA Astrophysics Data System (ADS)

    Leite, Valéria C. M. N.; Veloso, Giscard F. C.; Borges da Silva, Luiz Eduardo; Lambert-Torres, Germano; Borges da Silva, Jonas G.; Onofre Pereira Pinto, João

    2016-03-01

    The aim of this paper is to present a new technique for estimating the contamination by moisture in power transformer insulating oil based on the spectral kurtosis analysis of the acoustic signals of partial discharges (PDs). Basically, in this approach, the spectral kurtosis of the PD acoustic signal is calculated and the correlation between its maximum value and the moisture percentage is explored to find a function that calculates the moisture percentage. The function can be easily implemented in DSP, FPGA, or any other type of embedded system for online moisture monitoring. To evaluate the proposed approach, an experiment is assembled with a piezoelectric sensor attached to a tank, which is filled with insulating oil samples contaminated by different levels of moisture. A device generating electrical discharges is submerged into the oil to simulate the occurrence of PDs. Detected acoustic signals are processed using fast kurtogram algorithm to extract spectral kurtosis values. The obtained data are used to find the fitting function that relates the water contamination to the maximum value of the spectral kurtosis. Experimental results show that the proposed method is suitable for online monitoring system of power transformers.

  18. Gearbox fault diagnosis based on deep random forest fusion of acoustic and vibratory signals

    NASA Astrophysics Data System (ADS)

    Li, Chuan; Sanchez, René-Vinicio; Zurita, Grover; Cerrada, Mariela; Cabrera, Diego; Vásquez, Rafael E.

    2016-08-01

    Fault diagnosis is an effective tool to guarantee safe operations in gearboxes. Acoustic and vibratory measurements in such mechanical devices are all sensitive to the existence of faults. This work addresses the use of a deep random forest fusion (DRFF) technique to improve fault diagnosis performance for gearboxes by using measurements of an acoustic emission (AE) sensor and an accelerometer that are used for monitoring the gearbox condition simultaneously. The statistical parameters of the wavelet packet transform (WPT) are first produced from the AE signal and the vibratory signal, respectively. Two deep Boltzmann machines (DBMs) are then developed for deep representations of the WPT statistical parameters. A random forest is finally suggested to fuse the outputs of the two DBMs as the integrated DRFF model. The proposed DRFF technique is evaluated using gearbox fault diagnosis experiments under different operational conditions, and achieves 97.68% of the classification rate for 11 different condition patterns. Compared to other peer algorithms, the addressed method exhibits the best performance. The results indicate that the deep learning fusion of acoustic and vibratory signals may improve fault diagnosis capabilities for gearboxes.

  19. Child directed speech, speech in noise and hyperarticulated speech in the Pacific Northwest

    NASA Astrophysics Data System (ADS)

    Wright, Richard; Carmichael, Lesley; Beckford Wassink, Alicia; Galvin, Lisa

    2004-05-01

    Three types of exaggerated speech are thought to be systematic responses to accommodate the needs of the listener: child-directed speech (CDS), hyperspeech, and the Lombard response. CDS (e.g., Kuhl et al., 1997) occurs in interactions with young children and infants. Hyperspeech (Johnson et al., 1993) is a modification in response to listeners difficulties in recovering the intended message. The Lombard response (e.g., Lane et al., 1970) is a compensation for increased noise in the signal. While all three result from adaptations to accommodate the needs of the listener, and therefore should share some features, the triggering conditions are quite different, and therefore should exhibit differences in their phonetic outcomes. While CDS has been the subject of a variety of acoustic studies, it has never been studied in the broader context of the other ``exaggerated'' speech styles. A large crosslinguistic study was undertaken that compares speech produced under four conditions: spontaneous conversations, CDS aimed at 6-9-month-old infants, hyperarticulated speech, and speech in noise. This talk will present some findings for North American English as spoken in the Pacific Northwest. The measures include f0, vowel duration, F1 and F2 at vowel midpoint, and intensity.

  20. Acoustic effects of the ATOC signal (75 Hz, 195 dB) on dolphins and whales

    SciTech Connect

    Au, W.W.; Nachtigall, P.E.; Pawloski, J.L.

    1997-05-01

    The Acoustic Thermometry of Ocean Climate (ATOC) program of Scripps Institution of Oceanography and the Applied Physics Laboratory, University of Washington, will broadcast a low-frequency 75-Hz phase modulated acoustic signal over ocean basins in order to study ocean temperatures on a global scale and examine the effects of global warming. One of the major concerns is the possible effect of the ATOC signal on marine life, especially on dolphins and whales. In order to address this issue, the hearing sensitivity of a false killer whale ({ital Pseudorca crassidens}) and a Risso{close_quote}s dolphin ({ital Grampus griseus}) to the ATOC sound was measured behaviorally. A staircase procedure with the signal levels being changed in 1-dB steps was used to measure the animals{close_quote} threshold to the actual ATOC coded signal. The results indicate that small odontocetes such as the {ital Pseudorca} and {ital Grampus} swimming directly above the ATOC source will not hear the signal unless they dive to a depth of approximately 400 m. A sound propagation analysis suggests that the sound-pressure level at ranges greater than 0.5 km will be less than 130 dB for depths down to about 500 m. Several species of baleen whales produce sounds much greater than 170{endash}180 dB. With the ATOC source on the axis of the deep sound channel (greater than 800 m), the ATOC signal will probably have minimal physical and physiological effects on cetaceans. {copyright} {ital 1997 Acoustical Society of America.}

  1. Design Foundations for Content-Rich Acoustic Interfaces: Investigating Audemes as Referential Non-Speech Audio Cues

    ERIC Educational Resources Information Center

    Ferati, Mexhid Adem

    2012-01-01

    To access interactive systems, blind and visually impaired users can leverage their auditory senses by using non-speech sounds. The current structure of non-speech sounds, however, is geared toward conveying user interface operations (e.g., opening a file) rather than large theme-based information (e.g., a history passage) and, thus, is ill-suited…

  2. Influences of an acoustic signal with ultrasound components on the acquisition of a defensive conditioned reflex in Wistar rats.

    PubMed

    Loseva, E V; Alekseeva, T G

    2007-06-01

    The effects of short (90 sec) exposures to a complex acoustic signal with ultrasound components on the acquisition of a defensive conditioned two-way avoidance reflex using an electric shock as the unconditioned stimulus in a shuttle box were studied in female Wistar rats. This stimulus induced audiogenic convulsions of different severities in 59% of the animals. A scale for assessing the ability of rats to acquire the conditioned two-way avoidance reflex was developed. Presentation of the complex acoustic signal was found to be a powerful stressor for Wistar rats, preventing the acquisition of the reflex in the early stages (four and six days) after presentation. This effect was independent of the presence and severity of audiogenic convulsions in the rats during presentation of the acoustic signal. On repeat training nine days after the acoustic signal (with the first session after four days), acquisition of the reflex was hindered (as compared with controls not presented with the acoustic signal). However, on repeat training at later time points (1.5 months after the complex acoustic signal, with the first session after six days), the rats rapidly achieved the learning criterion (10 correct avoidance responses in a row). On the other hand, if the acoustic signal was presented at different times (immediately or at three or 45 days) after the first training session, the animals' ability to acquire the reflex on repeat training was not impaired at either the early or late periods after exposure to the stressor. These results suggest that the complex acoustic signal impairs short-term memory (the process of acquisition of the conditioned two-way avoidance reflex at the early post-presentation time point) but has no effect on long-term memory or consolidation of the memory trace.

  3. Critical Issues in Airborne Applications of Speech Recognition.

    DTIC Science & Technology

    1979-01-01

    human’s tongue , lips, and other articulators to get ready for the next vowel or consonant to be spoken, and to gradually move away from the...acoustic tube, so that formants and other interesting features of the speech signal could be more readily and accurately detected). Of particular

  4. "Perception of the speech code" revisited: Speech is alphabetic after all.

    PubMed

    Fowler, Carol A; Shankweiler, Donald; Studdert-Kennedy, Michael

    2016-03-01

    We revisit an article, "Perception of the Speech Code" (PSC), published in this journal 50 years ago (Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967) and address one of its legacies concerning the status of phonetic segments, which persists in theories of speech today. In the perspective of PSC, segments both exist (in language as known) and do not exist (in articulation or the acoustic speech signal). Findings interpreted as showing that speech is not a sound alphabet, but, rather, phonemes are encoded in the signal, coupled with findings that listeners perceive articulation, led to the motor theory of speech perception, a highly controversial legacy of PSC. However, a second legacy, the paradoxical perspective on segments has been mostly unquestioned. We remove the paradox by offering an alternative supported by converging evidence that segments exist in language both as known and as used. We support the existence of segments in both language knowledge and in production by showing that phonetic segments are articulatory and dynamic and that coarticulation does not eliminate them. We show that segments leave an acoustic signature that listeners can track. This suggests that speech is well-adapted to public communication in facilitating, not creating a barrier to, exchange of language forms.

  5. Quadratic Time-Frequency Analysis of Hydroacoustic Signals as Applied to Acoustic Emissions of Large Whales

    NASA Astrophysics Data System (ADS)

    Le Bras, Ronan; Victor, Sucic; Damir, Malnar; Götz, Bokelmann

    2014-05-01

    In order to enrich the set of attributes in setting up a large database of whale signals, as envisioned in the Baleakanta project, we investigate methods of time-frequency analysis. The purpose of establishing the database is to increase and refine knowledge of the emitted signal and of its propagation characteristics, leading to a better understanding of the animal migrations in a non-invasive manner and to characterize acoustic propagation in oceanic media. The higher resolution for signal extraction and a better separation from other signals and noise will be used for various purposes, including improved signal detection and individual animal identification. The quadratic class of time-frequency distributions (TFDs) is the most popular set of time-frequency tools for analysis and processing of non-stationary signals. Two best known and most studied members of this class are the spectrogram and the Wigner-Ville distribution. However, to be used efficiently, i.e. to have highly concentrated signal components while significantly suppressing interference and noise simultaneously, TFDs need to be optimized first. The optimization method used in this paper is based on the Cross-Wigner-Ville distribution, and unlike similar approaches it does not require prior information on the analysed signal. The method is applied to whale signals, which, just like the majority of other real-life signals, can generally be classified as multicomponent non-stationary signals, and hence time-frequency techniques are a natural choice for their representation, analysis, and processing. We present processed data from a set containing hundreds of individual calls. The TFD optimization method results into a high resolution time-frequency representation of the signals. It allows for a simple extraction of signal components from the TFD's dominant ridges. The local peaks of those ridges can then be used for the signal components instantaneous frequency estimation, which in turn can be used as

  6. Similarity assessment of acoustic emission signals and its application in source localization.

    PubMed

    Chen, Shiwan; Yang, Chunhe; Wang, Guibin; Liu, Wei

    2017-03-01

    In conventional AE source localization acoustic emission (AE) signals are applied directly to localize the source without any waveform identification or quality evaluation, which always leads to large errors in source localization. To improve the reliability and accuracy of acoustic emission source localization, an identification procedure is developed to assess the similarity of AE signals to select signals with high quality to localize the AE source. Magnitude square coherence (MSC), wavelet coherence and dynamic timing warping (DTW) are successively applied for similarity assessment. Results show that cluster analysis based on DTW distance is effective to select AE signals with high similarity. Similarity assessment results of the proposed method are almost completely consistent with manual identification. A novel AE source localization procedure is developed combining the selected AE signals with high quality and a direct source localization algorithm. AE data from thermal-cracking tests in Beishan granite are analyzed to demonstrate the effectiveness of the proposed AE localization procedure. AE events are re-localized by the proposed AE localization procedure. And the accuracy of events localization has been improved significantly. The reliability and credibility of AE source localization will be improved by the proposed method.

  7. Seismo-acoustic Signals Recorded at KSIAR, the Infrasound Array Installed at PS31

    NASA Astrophysics Data System (ADS)

    Kim, T. S.; Che, I. Y.; Jeon, J. S.; Chi, H. C.; Kang, I. B.

    2014-12-01

    One of International Monitoring System (IMS)'s primary seismic stations, PS31, called Korea Seismic Research Station (KSRS), was installed around Wonju, Korea in 1970s. It has been operated by US Air Force Technical Applications Center (AFTAC) for more than 40 years. KSRS is composed of 26 seismic sensors including 19 short period, 6 long period and 1 broad band seismometers. The 19 short period sensors were used to build an array with a 10-km aperture while the 6 long period sensors were used for a relatively long period array with a 40-km aperture. After KSRS was certified as an IMS station in 2006 by Comprehensive Nuclear Test Ban Treaty Organization (CTBTO), Korea Institute of Geoscience and Mineral Resources (KIGAM) which is the Korea National Data Center started to take over responsibilities on the operation and maintenance of KSRS from AFTAC. In April of 2014, KIGAM installed an infrasound array, KSIAR, on the existing four short period seismic stations of KSRS, the sites KS05, KS06, KS07 and KS16. The collocated KSIAR changed KSRS from a seismic array into a seismo-acoustic array. The aperture of KSIAR is 3.3 km. KSIAR also has a 100-m small aperture infrasound array at KS07. The infrasound data from KSIAR except that from the site KS06 is being transmitted in real time to KIGAM with VPN and internet line. An initial analysis on seismo-acoustic signals originated from local and regional distance ranges has been performed since May 2014. The analysis with the utilization of an array process called Progressive Multi-Channel Correlation (PMCC) detected seismo-acoustic signals caused by various sources including small explosions in relation to constructing local tunnels and roads. Some of them were not found in the list of automatic bulletin of KIGAM. The seismo-acoustic signals recorded by KSIAR are supplying a useful information for discriminating local and regional man-made events from natural events.

  8. Automatic parameter optimization in epsilon-filter for acoustical signal processing utilizing correlation coefficient.

    PubMed

    Abe, Tomomi; Hashimoto, Shuji; Matsumoto, Mitsuharu

    2010-02-01

    epsilon-filter can reduce most kinds of noise from a single-channel noisy signal while preserving signals that vary drastically such as speech signals. It can reduce not only stationary noise but also nonstationary noise. However, it has some parameters whose values are set empirically. So far, there have been few studies to evaluate the appropriateness of the parameter settings for epsilon-filter. This paper employs the correlation coefficient of the filter output and the difference between the filter input and output as the evaluation function of the parameter setting. This paper also describes the algorithm to set the optimal parameter value of epsilon-filter automatically. To evaluate the adequateness of the obtained parameter, the mean absolute error is calculated. The experimental results show that the adequate parameter in epsilon-filter can be obtained automatically by using the proposed method.

  9. Acoustic measurements of articulator motions.

    PubMed

    Schroeder, M R; Strube, H W

    1979-01-01

    Methods for estimating articulatory data from acoustic measurements are reviewed. First, relations between the vocal-tract area function and formant or impedance data are pointed out. Then the possibility of determining a (discretized) area function from the speech signal itself is considered. Finally, we look at the estimation of certain articulatory parameters rather than the area function. By using a regression method, such parameters can even be estimated independently of any vocal-tract model. Results for real-speech data are given.

  10. Processing of acoustic signals in grasshoppers - a neuroethological approach towards female choice.

    PubMed

    Ronacher, Bernhard; Stange, Nicole

    2013-01-01

    Acoustic communication is a major factor for mate attraction in many grasshopper species and thus plays a vital role in a grasshopper's life. First of all, the recognition of the species-specific sound patterns is crucial for preventing hybridization with other species, which would result in a drastic fitness loss. In addition, there is evidence that females are choosy with respect to conspecific males and prefer or reject the songs of some individuals, thereby exerting a sexual selection on males. Remarkably, the preferences of females are preserved even under masking noise. To discriminate between the basically similar signals of conspecifics is obviously a challenge for small nervous systems. We therefore ask how the acoustic signals are processed and represented in the grasshopper's nervous system, to allow for a fine discrimination and assessment of individual songs. The discrimination of similar signals may be impeded not only by signal masking due to external noise sources, but also by intrinsic noise due to the inherent variability of spike trains. Using a spike train metric we could estimate how well, in principle, the songs of different individuals can be discriminated on the basis of neuronal responses, and found a remarkable potential for discrimination performance at the first stage, but not on higher stages of the auditory pathway. Next, we ask which benefits a grasshopper female may earn from being choosy. New results, which revealed correlations between specific song features and the size and immunocompetence of the males, suggest that females may derive from acoustic signals clues about condition and health of the sending male. However, we observed substantial differences between the preference functions of individual females and it may be particularly rewarding to relate the variations in female preferences to individual differences in the responses of identified neurons.

  11. Classroom acoustics: Three pilot studies

    NASA Astrophysics Data System (ADS)

    Smaldino, Joseph J.

    2005-04-01

    This paper summarizes three related pilot projects designed to focus on the possible effects of classroom acoustics on fine auditory discrimination as it relates to language acquisition, especially English as a second language. The first study investigated the influence of improving the signal-to-noise ratio on the differentiation of English phonemes. The results showed better differentiation with better signal-to-noise ratio. The second studied speech perception in noise by young adults for whom English was a second language. The outcome indicated that the second language learners required a better signal-to-noise ratio to perform equally to the native language participants. The last study surveyed the acoustic conditions of preschool and day care classrooms, wherein first and second language learning occurs. The survey suggested an unfavorable acoustic environment for language learning.

  12. Extruded Bread Classification on the Basis of Acoustic Emission Signal With Application of Artificial Neural Networks

    NASA Astrophysics Data System (ADS)

    Świetlicka, Izabela; Muszyński, Siemowit; Marzec, Agata

    2015-04-01

    The presented work covers the problem of developing a method of extruded bread classification with the application of artificial neural networks. Extruded flat graham, corn, and rye breads differening in water activity were used. The breads were subjected to the compression test with simultaneous registration of acoustic signal. The amplitude-time records were analyzed both in time and frequency domains. Acoustic emission signal parameters: single energy, counts, amplitude, and duration acoustic emission were determined for the breads in four water activities: initial (0.362 for rye, 0.377 for corn, and 0.371 for graham bread), 0.432, 0.529, and 0.648. For classification and the clustering process, radial basis function, and self-organizing maps (Kohonen network) were used. Artificial neural networks were examined with respect to their ability to classify or to cluster samples according to the bread type, water activity value, and both of them. The best examination results were achieved by the radial basis function network in classification according to water activity (88%), while the self-organizing maps network yielded 81% during bread type clustering.

  13. Identification of blasting sources in the Dobrogea seismogenic region, Romania using seismo-acoustic signals

    NASA Astrophysics Data System (ADS)

    Ghica, Daniela Veronica; Grecu, Bogdan; Popa, Mihaela; Radulian, Mircea

    2016-10-01

    In order to discriminate between quarry blasts and earthquakes observed in the Dobrogea seismogenic region, a seismo-acoustic analysis was performed on 520 events listed in the updated Romanian seismic catalogue from January 2011 to December 2012. During this time interval, 104 seismo-acoustic events observed from a distance between 110 and 230 km and backazimuth interval of 110-160° from the IPLOR infrasound array were identified as explosions by associating with infrasonic signals. WinPMCC software for interactive analysis was applied to detect and characterize infrasonic signals in terms of backazimuth, speed and frequency content. The measured and expected values of both backazimuths and arrival times for the study events were compared in order to identify the sources of infrasound. Two predominant directions for seismo-acoustic sources' aligning were observed, corresponding to the northern and central parts of Dobrogea, and these directions are further considered as references in the process of discriminating explosions from earthquakes. A predominance of high-frequency detections (above 1 Hz) is also observed in the infrasound data. The strong influence of seasonally dependent stratospheric winds on the IPLOR detection capability limits the efficiency of the discrimination procedure, as proposed by this study.

  14. Temporal patterns in the acoustic signals of beaked whales at Cross Seamount.

    PubMed

    Johnston, D W; McDonald, M; Polovina, J; Domokos, R; Wiggins, S; Hildebrand, J

    2008-04-23

    Seamounts may influence the distribution of marine mammals through a combination of increased ocean mixing, enhanced local productivity and greater prey availability. To study the effects of seamounts on the presence and acoustic behaviour of cetaceans, we deployed a high-frequency acoustic recording package on the summit of Cross Seamount during April through October 2005. The most frequently detected cetacean vocalizations were echolocation sounds similar to those produced by ziphiid and mesoplodont beaked whales together with buzz-type signals consistent with prey-capture attempts. Beaked whale signals occurred almost entirely at night throughout the six-month deployment. Measurements of prey presence with a Simrad EK-60 fisheries acoustics echo sounder indicate that Cross Seamount may enhance local productivity in near-surface waters. Concentrations of micronekton were aggregated over the seamount in near-surface waters at night, and dense concentrations of nekton were detected across the surface of the summit. Our results suggest that seamounts may provide enhanced foraging opportunities for beaked whales during the night through a combination of increased productivity, vertical migrations by micronekton and local retention of prey. Furthermore, the summit of the seamount may act as a barrier against which whales concentrate prey.

  15. Signal Processing Methods for Removing the Effects of Whole Body Vibration upon Speech

    NASA Technical Reports Server (NTRS)

    Bitner, Rachel M.; Begault, Durand R.

    2014-01-01

    Humans may be exposed to whole-body vibration in environments where clear speech communications are crucial, particularly during the launch phases of space flight and in high-performance aircraft. Prior research has shown that high levels of vibration cause a decrease in speech intelligibility. However, the effects of whole-body vibration upon speech are not well understood, and no attempt has been made to restore speech distorted by whole-body vibration. In this paper, a model for speech under whole-body vibration is proposed and a method to remove its effect is described. The method described reduces the perceptual effects of vibration, yields higher ASR accuracy scores, and may significantly improve intelligibility. Possible applications include incorporation within communication systems to improve radio-communication systems in environments such a spaceflight, aviation, or off-road vehicle operations.

  16. Brain estrogen signaling and acute modulation of acoustic communication behaviors: a working hypothesis

    PubMed Central

    Remage-Healey, Luke

    2013-01-01

    Summary Although estrogens are widely considered circulating ‘sex steroid hormones’ typically associated with female reproduction, recent evidence suggests that estrogens can act as local modulators of brain circuits in both males and females. Functional implications of this newly-characterized estrogen signaling system have begun to emerge. This essay summarizes evidence in support of the hypothesis that the rapid production of estrogens in brain circuits can drive acute changes in both the production and perception of acoustic communication behaviors. These studies reveal two fundamental neurobiological concepts: 1) estrogens can be produced locally in brain circuits independent of levels in nearby circuits and in the circulation, and 2) estrogens can have very rapid effects within these brain circuits to modulate social vocalizations, acoustic processing, and sensorimotor integration. This research relies on a vertebrate-wide span of investigations, including vocalizing fishes, amphibians and birds, emphasizing the importance of comparative model systems in understanding principles of neurobiology. PMID:23065844

  17. Wavelet Transform Of Acoustic Signal From A Ranque- Hilsch Vortex Tube

    NASA Astrophysics Data System (ADS)

    Istihat, Y.; Wisnoe, W.

    2015-09-01

    This paper presents the frequency analysis of flow in a Ranque-Hilsch Vortex Tube (RHVT) obtained from acoustic signal using microphones in an isolated formation setup. Data Acquisition System (DAS) that incorporates Analog to Digital Converter (ADC) with laptop computer has been used to acquire the wave data. Different inlet pressures (20, 30, 40, 50 and 60 psi) are supplied and temperature differences are recorded. Frequencies produced from a RHVT are experimentally measured and analyzed by means of Wavelet Transform (WT). Morlet Wavelet is used and relation between Pressure variation, Temperature and Frequency are studied. Acoustic data has been analyzed using Matlab® and time-frequency analysis (Scalogram) is presented. Results show that the Pressure is proportional with the Frequency inside the RHVT whereby two distinct working frequencies is pronounced in between 4-8 kHz.

  18. The effect of artificial rain on backscattered acoustic signal: first measurements

    NASA Astrophysics Data System (ADS)

    Titchenko, Yuriy; Karaev, Vladimir; Meshkov, Evgeny; Goldblat, Vladimir

    The problem of rain influencing on a characteristics of backscattered ultrasonic and microwave signal by water surface is considered. The rain influence on backscattering process of electromagnetic waves was investigated in laboratory and field experiments, for example [1-3]. Raindrops have a significant impact on backscattering of microwave and influence on wave spectrum measurement accuracy by string wave gauge. This occurs due to presence of raindrops in atmosphere and modification of the water surface. For measurements of water surface characteristics during precipitation we propose to use an acoustic system. This allows us obtaining of the water surface parameters independently on precipitation in atmosphere. The measurements of significant wave height of water surface using underwater acoustical systems are well known [4, 5]. Moreover, the variance of orbital velocity can be measure using these systems. However, these methods cannot be used for measurements of slope variance and the other second statistical moments of water surface that required for analyzing the radar backscatter signal. An original design Doppler underwater acoustic wave gauge allows directly measuring the surface roughness characteristics that affect on electromagnetic waves backscattering of the same wavelength [6]. Acoustic wave gauge is Doppler ultrasonic sonar which is fixed near the bottom on the floating disk. Measurements are carried out at vertically orientation of sonar antennas towards water surface. The first experiments were conducted with the first model of an acoustic wave gauge. The acoustic wave gauge (8 mm wavelength) is equipped with a transceiving antenna with a wide symmetrical antenna pattern. The gauge allows us to measure Doppler spectrum and cross section of backscattered signal. Variance of orbital velocity vertical component can be retrieved from Doppler spectrum with high accuracy. The result of laboratory and field experiments during artificial rain is presented

  19. A hardware model of the auditory periphery to transduce acoustic signals into neural activity

    PubMed Central

    Tateno, Takashi; Nishikawa, Jun; Tsuchioka, Nobuyoshi; Shintaku, Hirofumi; Kawano, Satoyuki

    2013-01-01

    To improve the performance of cochlear implants, we have integrated a microdevice into a model of the auditory periphery with the goal of creating a microprocessor. We constructed an artificial peripheral auditory system using a hybrid model in which polyvinylidene difluoride was used as a piezoelectric sensor to convert mechanical stimuli into electric signals. To produce frequency selectivity, the slit on a stainless steel base plate was designed such that the local resonance frequency of the membrane over the slit reflected the transfer function. In the acoustic sensor, electric signals were generated based on the piezoelectric effect from local stress in the membrane. The electrodes on the resonating plate produced relatively large electric output signals. The signals were fed into a computer model that mimicked some functions of inner hair cells, inner hair cell–auditory nerve synapses, and auditory nerve fibers. In general, the responses of the model to pure-tone burst and complex stimuli accurately represented the discharge rates of high-spontaneous-rate auditory nerve fibers across a range of frequencies greater than 1 kHz and middle to high sound pressure levels. Thus, the model provides a tool to understand information processing in the peripheral auditory system and a basic design for connecting artificial acoustic sensors to the peripheral auditory nervous system. Finally, we discuss the need for stimulus control with an appropriate model of the auditory periphery based on auditory brainstem responses that were electrically evoked by different temporal pulse patterns with the same pulse number. PMID:24324432

  20. Demodulation of acoustic telemetry binary phase shift keying signal based on high-order Duffing system

    NASA Astrophysics Data System (ADS)

    Yan, Bing-Nan; Liu, Chong-Xin; Ni, Jun-Kang; Zhao, Liang

    2016-10-01

    In order to grasp the downhole situation immediately, logging while drilling (LWD) technology is adopted. One of the LWD technologies, called acoustic telemetry, can be successfully applied to modern drilling. It is critical for acoustic telemetry technology that the signal is successfully transmitted to the ground. In this paper, binary phase shift keying (BPSK) is used to modulate carrier waves for the transmission and a new BPSK demodulation scheme based on Duffing chaos is investigated. Firstly, a high-order system is given in order to enhance the signal detection capability and it is realized through building a virtual circuit using an electronic workbench (EWB). Secondly, a new BPSK demodulation scheme is proposed based on the intermittent chaos phenomena of the new Duffing system. Finally, a system variable crossing zero-point equidistance method is proposed to obtain the phase difference between the system and the BPSK signal. Then it is determined that the digital signal transmitted from the bottom of the well is ‘0’ or ‘1’. The simulation results show that the demodulation method is feasible. Project supported by the National Natural Science Foundation of China (Grant No. 51177117) and the National Key Science & Technology Special Projects, China (Grant No. 2011ZX05021-005).

  1. a Wavelet Model for Vocalic Speech Coarticulation

    NASA Astrophysics Data System (ADS)

    Lange, Robert Charles

    A known aspect of human speech is that a vowel produced in isolation (for example, "ee") is acoustically different from a production of the same vowel in the company of two consonants ("deed"). This phenomenon, natural to the speech of any language, is known as consonant-vowel -consonant coarticulation. The effect of coarticulation results when a speech segment ("d") dynamically influences the articulation of an adjacent segment ("ee" within "deed"). A recent development in the theory of wavelet signal processing is wavelet system characterization. In wavelet system theory, the wavelet transform is used to describe the time-frequency behavior of a transmission channel, by virtue of its ability to describe the time -frequency content of the system's input and output signals. The present research proposes a wavelet-system model for speech coarticulation; wherein, the system is the process of transformation from a control speech state (input) to an effected speech state (output). Specifically, a vowel produced in isolation is transformed into an effected version of the same vowel produced in consonant-vowel-consonant, via the "coarticulation channel". Quantitatively, the channel is determined by the wavelet transform of the effected vowel's signal, using the control vowel's signal as the mother wavelet. A practical experiment is conducted to evaluate the coarticulation channel using samples of real speech. The results show that the model is capable of depicting coarticulation effects associated with certain vowel-consonant combinations. They suggest that elements of the vowel's acoustic composition are continuously present, in a modified form, throughout the consonant-vowel transition. For other phonetic combinations, however, the model does not respond to instances of segmental transition in a characteristic way. The conclusions drawn from the study are that the wavelet techniques employed here are effective tools for the general analysis of speech sounds, and can

  2. Nondestructive evaluation of steels using acoustic and magnetic Barkhausen signals - I. Effect of carbide precipitation and hardness

    SciTech Connect

    Kameda, J.; Ranjan, R.

    1987-07-01

    The effect of microstructures on acoustic and magnetic Barkhausen signals has been investigated in a quenched and tempered steel and spheroidized steels with various carbon contents. A major peak of the acoustic Barkhausen signal was induced when a magnetic field was increased from zero to the saturation state. A minor peak of the acoustic signal and a single peak of the magnetic signal appeared during the decreasing field. The peak value of the acoustic Barkhausen signal shows a linear dependence on the sweep rate of a magnetic field while that of the magnetic Barkhausen shows a nonlinear one. The increasing tempering temperature which gives rise to a decrease in hardness and an increase in carbide size and spacing caused the acoustic and magnetic Barkhausen peak voltages to increase precipitously and gradually, respectively. In the spheroidized steels, the acoustic peak voltage monotonically decreased with increasing carbon content from 0.17 to 0.96 wt% and the magnetic peak voltage was greatest when the carbon content was 0.46 wt%.

  3. Associations between tongue movement pattern consistency and formant movement pattern consistency in response to speech behavioral modifications.

    PubMed

    Mefferd, Antje S

    2016-11-01

    The degree of speech movement pattern consistency can provide information about speech motor control. Although tongue motor control is particularly important because of the tongue's primary contribution to the speech acoustic signal, capturing tongue movements during speech remains difficult and costly. This study sought to determine if formant movements could be used to estimate tongue movement pattern consistency indirectly. Two age groups (seven young adults and seven older adults) and six speech conditions (typical, slow, loud, clear, fast, bite block speech) were selected to elicit an age- and task-dependent performance range in tongue movement pattern consistency. Kinematic and acoustic spatiotemporal indexes (STI) were calculated based on sentence-length tongue movement and formant movement signals, respectively. Kinematic and acoustic STI values showed strong associations across talkers and moderate to strong associations for each talker across speech tasks; although, in cases where task-related tongue motor performance changes were relatively small, the acoustic STI values were poorly associated with kinematic STI values. These findings suggest that, depending on the sensitivity needs, formant movement pattern consistency could be used in lieu of direct kinematic analysis to indirectly examine speech motor control.

  4. Time course of a perceptual enhancement effect for noise-masked speech in reverberant environments.

    PubMed

    Brandewie, Eugene; Zahorik, Pavel

    2013-08-01

    Speech intelligibility has been shown to improve with prior exposure to a reverberant room environment [Brandewie and Zahorik (2010). J. Acoust. Soc. Am. 128, 291-299] with a spatially separated noise masker. Here, this speech enhancement effect was examined in multiple room environments using carrier phrases of varying lengths in order to control the amount of exposure. Speech intelligibility enhancement of between 5% and 18% was observed with as little as 850 ms of exposure, although the effect's time course varied considerably with reverberation and signal-to-noise ratio. In agreement with previous work, greater speech enhancement was found for reverberant environments compared to anechoic space.

  5. A comparison of automatic and human speech recognition in null grammar.

    PubMed

    Juneja, Amit

    2012-03-01

    The accuracy of automatic speech recognition (ASR) systems is generally evaluated using corpora of grammatically sound read speech or natural spontaneous speech. This prohibits an accurate estimation of the performance of the acoustic modeling part of ASR because the language modeling performance is inherently integrated in the overall performance metric. In this work, ASR and human speech recognition (HSR) accuracies are compared for null grammar sentences in different signal-to-noise ratios and vocabulary sizes-1000, 2000, 4000, and 8000. The results shed light on differences between ASR and HSR in relative significance of bottom-up word recognition and context awareness.

  6. Motor representations of articulators contribute to categorical perception of speech sounds.

    PubMed

    Möttönen, Riikka; Watkins, Kate E

    2009-08-05

    Listening to speech modulates activity in human motor cortex. It is unclear, however, whether the motor cortex has an essential role in speech perception. Here, we aimed to determine whether the motor representations of articulators contribute to categorical perception of speech sounds. Categorization of continuously variable acoustic signals into discrete phonemes is a fundamental feature of speech communication. We used repetitive transcranial magnetic stimulation (rTMS) to temporarily disrupt the lip representation in the left primary motor cortex. This disruption impaired categorical perception of artificial acoustic continua ranging between two speech sounds that differed in place of articulation, in that the vocal tract is opened and closed rapidly either with the lips or the tip of the tongue (/ba/-/da/ and /pa/-/ta/). In contrast, it did not impair categorical perception of continua ranging between speech sounds that do not involve the lips in their articulation (/ka/-/ga/ and /da/-/ga/). Furthermore, an rTMS-induced disruption of the hand representation had no effect on categorical perception of either of the tested continua (/ba/-da/ and /ka/-/ga/). These findings indicate that motor circuits controlling production of speech sounds also contribute to their perception. Mapping acoustically highly variable speech sounds onto less variable motor representations may facilitate their phonemic categorization and be important for robust speech perception.

  7. A Novel Method for Speech Acquisition and Enhancement by 94 GHz Millimeter-Wave Sensor

    PubMed Central

    Chen, Fuming; Li, Sheng; Li, Chuantao; Liu, Miao; Li, Zhao; Xue, Huijun; Jing, Xijing; Wang, Jianqi

    2015-01-01

    In order to improve the speech acquisition ability of a non-contact method, a 94 GHz millimeter wave (MMW) radar sensor was employed to detect speech signals. This novel non-contact speech acquisition method was shown to have high directional sensitivity, and to be immune to strong acoustical disturbance. However, MMW radar speech is often degraded by combined sources of noise, which mainly include harmonic, electrical circuit and channel noise. In this paper, an algorithm combining empirical mode decomposition (EMD) and mutual information entropy (MIE) was proposed for enhancing the perceptibility and intelligibility of radar speech. Firstly, the radar speech signal was adaptively decomposed into oscillatory components called intrinsic mode functions (IMFs) by EMD. Secondly, MIE was used to determine the number of reconstructive components, and then an adaptive threshold was employed to remove the noise from the radar speech. The experimental results show that human speech can be effectively acquired by a 94 GHz MMW radar sensor when the detection distance is 20 m. Moreover, the noise of the radar speech is greatly suppressed and the speech sounds become more pleasant to human listeners after being enhanced by the proposed algorithm, suggesting that this novel speech acquisition and enhancement method will provide a promising alternative for various applications associated with speech detection. PMID:26729126

  8. A Novel Method for Speech Acquisition and Enhancement by 94 GHz Millimeter-Wave Sensor.

    PubMed

    Chen, Fuming; Li, Sheng; Li, Chuantao; Liu, Miao; Li, Zhao; Xue, Huijun; Jing, Xijing; Wang, Jianqi

    2015-12-31

    In order to improve the speech acquisition ability of a non-contact method, a 94 GHz millimeter wave (MMW) radar sensor was employed to detect speech signals. This novel non-contact speech acquisition method was shown to have high directional sensitivity, and to be immune to strong acoustical disturbance. However, MMW radar speech is often degraded by combined sources of noise, which mainly include harmonic, electrical circuit and channel noise. In this paper, an algorithm combining empirical mode decomposition (EMD) and mutual information entropy (MIE) was proposed for enhancing the perceptibility and intelligibility of radar speech. Firstly, the radar speech signal was adaptively decomposed into oscillatory components called intrinsic mode functions (IMFs) by EMD. Secondly, MIE was used to determine the number of reconstructive components, and then an adaptive threshold was employed to remove the noise from the radar speech. The experimental results show that human speech can be effectively acquired by a 94 GHz MMW radar sensor when the detection distance is 20 m. Moreover, the noise of the radar speech is greatly suppressed and the speech sounds become more pleasant to human listeners after being enhanced by the proposed algorithm, suggesting that this novel speech acquisition and enhancement method will provide a promising alternative for various applications associated with speech detection.

  9. Comparison of a short-time speech-based intelligibility metric to the speech transmission index and intelligibility data.

    PubMed

    Payton, Karen L; Shrestha, Mona

    2013-11-01

    Several algorithms have been shown to generate a metric corresponding to the Speech Transmission Index (STI) using speech as a probe stimulus [e.g., Goldsworthy and Greenberg, J. Acoust. Soc. Am. 116, 3679-3689 (2004)]. The time-domain approaches work well on long speech segments and have the added potential to be used for short-time analysis. This study investigates the performance of the Envelope Regression (ER) time-domain STI method as a function of window length, in acoustically degraded environments with multiple talkers and speaking styles. The ER method is compared with a short-time Theoretical STI, derived from octave-band signal-to-noise ratios and reverberation times. For windows as short as 0.3 s, the ER method tracks short-time Theoretical STI changes in stationary speech-shaped noise, fluctuating restaurant babble and stationary noise plus reverberation. The metric is also compared to intelligibility scores on conversational speech and speech articulated clearly but at normal speaking rates (Clear/Norm) in stationary noise. Correlation between the metric and intelligibility scores is high and, consistent with the subject scores, the metrics are higher for Clear/Norm speech than for conversational speech and higher for the first word in a sentence than for the last word.

  10. Comparison of a short-time speech-based intelligibility metric to the speech transmission index and intelligibility dataa

    PubMed Central

    Payton, Karen L.; Shrestha, Mona

    2013-01-01

    Several algorithms have been shown to generate a metric corresponding to the Speech Transmission Index (STI) using speech as a probe stimulus [e.g., Goldsworthy and Greenberg, J. Acoust. Soc. Am. 116, 3679–3689 (2004)]. The time-domain approaches work well on long speech segments and have the added potential to be used for short-time analysis. This study investigates the performance of the Envelope Regression (ER) time-domain STI method as a function of window length, in acoustically degraded environments with multiple talkers and speaking styles. The ER method is compared with a short-time Theoretical STI, derived from octave-band signal-to-noise ratios and reverberation times. For windows as short as 0.3 s, the ER method tracks short-time Theoretical STI changes in stationary speech-shaped noise, fluctuating restaurant babble and stationary noise plus reverberation. The metric is also compared to intelligibility scores on conversational speech and speech articulated clearly but at normal speaking rates (Clear/Norm) in stationary noise. Correlation between the metric and intelligibility scores is high and, consistent with the subject scores, the metrics are higher for Clear/Norm speech than for conversational speech and higher for the first word in a sentence than for the last word. PMID:24180791

  11. Speech analyzer

    NASA Technical Reports Server (NTRS)

    Lokerson, D. C. (Inventor)

    1977-01-01

    A speech signal is analyzed by applying the signal to formant filters which derive first, second and third signals respectively representing the frequency of the speech waveform in the first, second and third formants. A first pulse train having approximately a pulse rate representing the average frequency of the first formant is derived; second and third pulse trains having pulse rates respectively representing zero crossings of the second and third formants are derived. The first formant pulse train is derived by establishing N signal level bands, where N is an integer at least equal to two. Adjacent ones of the signal bands have common boundaries, each of which is a predetermined percentage of the peak level of a complete cycle of the speech waveform.

  12. Analysis of Acoustic Emission Signals During Laser Spot Welding of SS304 Stainless Steel

    NASA Astrophysics Data System (ADS)

    Lee, Seounghwan; Ahn, Suneung; Park, Changsoon

    2014-03-01

    In this article, an in-process monitoring scheme for a pulsed Nd:YAG laser spot welding (LSW) is presented. Acoustic emission (AE) was selected for the feedback signal, and the AE data during LSW were sampled and analyzed for varying process conditions such as laser power and pulse duration. In the analysis, possible AE generation sources such as melting and solidification mechanism during welding were investigated using both the time- and frequency-domain signal processings. The results, which show close relationships between LSW and AE signals, were adopted in the feature (input) selection of a back-propagation artificial neural network, to predict the weldability of stainless steel sheets. Processed outputs agree well with LSW experimental data, which confirms the usefulness of the proposed scheme.

  13. Influence of attenuation on acoustic emission signals in carbon fiber reinforced polymer panels.

    PubMed

    Asamene, Kassahun; Hudson, Larry; Sundaresan, Mannur

    2015-05-01

    Influence of attenuation on acoustic emission (AE) signals in Carbon Fiber Reinforced Polymer (CFRP) crossply and quasi-isotropic panels is examined in this paper. Attenuation coefficients of the fundamental antisymmetric (A0) and symmetric (S0) wave modes were determined experimentally along different directions for the two types of CFRP panels. In the frequency range from 100 kHz to 500 kHz, the A0 mode undergoes significantly greater changes due to material related attenuation compared to the S0 mode. Moderate to strong changes in the attenuation levels were noted with propagation directions. Such mode and frequency dependent attenuation introduces major changes in the characteristics of AE signals depending on the position of the AE sensor relative to the source. Results from finite element simulations of a microscopic damage event in the composite laminates are used to illustrate attenuation related changes in modal and frequency components of AE signals.

  14. Perceptual organization of speech signals by children with and without dyslexia

    PubMed Central

    Nittrouer, Susan; Lowenstein, Joanna H.

    2013-01-01

    Developmental dyslexia is a condition in which children encounter difficulty learning to read in spite of adequate instruction. Although considerable effort has been expended trying to identify the source of the problem, no single solution has been agreed upon. The current study explored a new hypothesis, that developmental dyslexia may be due to faulty perceptual organization of linguistically relevant sensory input. To test that idea, sentence-length speech signals were processed to create either sine-wave or noise-vocoded analogs. Seventy children between 8 and 11 years of age, with and without dyslexia participated. Children with dyslexia were selected to have phonological awareness deficits, although those without such deficits were retained in the study. The processed sentences were presented for recognition, and measures of reading, phonological awareness, and expressive vocabulary were collected. Results showed that children with dyslexia, regardless of phonological subtype, had poorer recognition scores than children without dyslexia for both kinds of degraded sentences. Older children with dyslexia recognized the sine-wave sentences better than younger children with dyslexia, but no such effect of age was found for the vocoded materials. Recognition scores were used as predictor variables in regression analyses with reading, phonological awareness, and vocabulary measures used as dependent variables. Scores for both sorts of sentence materials were strong predictors of performance on all three dependent measures when all children were included, but only performance for the sine-wave materials explained significant proportions of variance when only children with dyslexia were included. Finally, matching young, typical readers with older children with dyslexia on reading abilities did not mitigate the group difference in recognition of vocoded sentences. Conclusions were that children with dyslexia have difficulty organizing linguistically relevant sensory

  15. Speaker specificity in speech perception: the importance of what is and is not in the signal

    NASA Astrophysics Data System (ADS)

    Dahan, Delphine; Scarborough, Rebecca A.

    2005-09-01

    In some American English dialects, /ae/ before /g/ (but not before /k/) raises to a vowel approaching [E], in effect reducing phonetic overlap between (e.g.) ``bag'' and ``back.'' Here, participants saw four written words on a computer screen (e.g., ``bag,'' ``back,'' ``dog,'' ``dock'') and heard a spoken word. Their task was to indicate which word they heard. Participants' eye movements to the written words were recorded. Participants in the ``ae-raising'' group heard identity-spliced ``bag''-like words containing the raised vowel [E] participants in the ``control'' group heard cross-spliced ``bag''-like words containing standard [ae]. Acoustically identical ``back''-like words were subsequently presented to both groups. The ae-raising-group participants identified ``back''-like words faster and more accurately, and made fewer fixations to the competitor ``bag,'' than control-group participants did. Thus, exposure to ae-raised realizations of ``bag'' facilitated the identification of ``back'' because of the reduced fit between the input and the altered representation of the competing hypothesis ``bag.'' This demonstrates that listeners evaluate the spoken input with respect to what is, but also what is not, in the signal, and that this evaluation involves speaker-specific representations. [Work supported by NSF Human and Social Dynamics 0433567.

  16. Auditory-tactile echo-reverberating stuttering speech corrector

    NASA Astrophysics Data System (ADS)

    Kuniszyk-Jozkowiak, Wieslawa; Adamczyk, Bogdan

    1997-02-01

    The work presents the construction of a device, which transforms speech sounds into acoustical and tactile signals of echo and reverberation. Research has been done on the influence of the echo and reverberation, which are transmitted as acoustic and tactile stimuli, on speech fluency. Introducing the echo or reverberation into the auditory feedback circuit results in a reduction of stuttering. A bit less, but still significant corrective effects are observed while using the tactile channel for transmitting the signals. The use of joined auditory and tactile channels increases the effects of their corrective influence on the stutterers' speech. The results of the experiment justify the use of the tactile channel in the stutterers' therapy.

  17. Pipe wall damage detection by electromagnetic acoustic transducer generated guided waves in absence of defect signals.

    PubMed

    Vasiljevic, Milos; Kundu, Tribikram; Grill, Wolfgang; Twerdowski, Evgeny

    2008-05-01

    Most investigators emphasize the importance of detecting the reflected signal from the defect to determine if the pipe wall has any damage and to predict the damage location. However, often the small signal from the defect is hidden behind the other arriving wave modes and signal noise. To overcome the difficulties associated with the identification of the small defect signal in the time history plots, in this paper the time history is analyzed well after the arrival of the first defect signal, and after different wave modes have propagated multiple times through the pipe. It is shown that the defective pipe can be clearly identified by analyzing these late arriving diffuse ultrasonic signals. Multiple reflections and scattering of the propagating wave modes by the defect and pipe ends do not hamper the defect detection capability; on the contrary, it apparently stabilizes the signal and makes it easier to distinguish the defective pipe from the defect-free pipe. This paper also highlights difficulties associated with the interpretation of the recorded time histories due to mode conversion by the defect. The design of electro-magnetic acoustic transducers used to generate and receive the guided waves in the pipe is briefly described in the paper.

  18. Acoustic emission signals frequency-amplitude characteristics of sandstone after thermal treated under uniaxial compression

    NASA Astrophysics Data System (ADS)

    Kong, Biao; Wang, Enyuan; Li, Zenghua; Wang, Xiaoran; Niu, Yue; Kong, Xiangguo

    2017-01-01

    Thermally treated sandstone deformation and fracture produced abundant acoustic emission (AE) signals. The AE signals waveform contained plentiful precursor information of sandstone deformation and fracture behavior. In this paper, uniaxial compression tests of sandstone after different temperature treatments were conducted, the frequency-amplitude characteristics of AE signals were studied, and the main frequency distribution at different stress level was analyzed. The AE signals frequency-amplitude characteristics had great difference after different high temperature treatment. Significant differences existed of the main frequency distribution of AE signals during thermal treated sandstone deformation and fracture. The main frequency band of the largest waveforms proportion was not unchanged after different high temperature treatments. High temperature caused thermal damage to the sandstone, and sandstone deformation and fracture was obvious than the room temperature. The number of AE signals was larger than the room temperature during the initial loading stage. The low frequency AE signals had bigger proportion when the stress was 0.1, and the maximum value of the low frequency amplitude was larger than high frequency signals. With the increase of stress, the low and high frequency AE signals were gradually increase, which indicated that different scales ruptures were broken in sandstone. After high temperature treatment, the number of high frequency AE signals was significantly bigger than the low frequency AE signals during the latter loading stage, this indicates that the small scale rupture rate of recurrence and frequency were more than large scale rupture. The AE ratio reached the maximum during the sandstone instability failure period, and large scale rupture was dominated in the failure process. AE amplitude increase as the loading increases, the deformation and fracture of sandstone was increased gradually. By comparison, the value of the low frequency

  19. The potential influence of morphology on the evolutionary divergence of an acoustic signal

    PubMed Central

    Pitchers, W. R.; Klingenberg, C.P.; Tregenza, Tom; Hunt, J.; Dworkin, I.

    2014-01-01

    The evolution of acoustic behaviour and that of the morphological traits mediating its production are often coupled. Lack of variation in the underlying morphology of signalling traits has the potential to constrain signal evolution. This relationship is particularly likely in field crickets, where males produce acoustic advertisement signals to attract females by stridulating with specialized structures on their forewings. In this study, we characterise the size and geometric shape of the forewings of males from six allopatric populations of the black field cricket (Teleogryllus commodus) known to have divergent advertisement calls. We sample from each of these populations using both wild-caught and common-garden reared cohorts, allowing us to test for multivariate relationships between wing morphology and call structure. We show that the allometry of shape has diverged across populations. However, there was a surprisingly small amount of covariation between wing shape and call structure within populations. Given the importance of male size for sexual selection in crickets, the divergence we observe among populations has the potential to influence the evolution of advertisement calls in this species. PMID:25223712

  20. Long recording sequences: how to track the intra-individual variability of acoustic signals.

    PubMed

    Lengagne, Thierry; Gomez, Doris; Josserand, Rémy; Voituron, Yann

    2015-01-01

    Recently developed acoustic technologies - like automatic recording units - allow the recording of long sequences in natural environments. These devices are used for biodiversity survey but they could also help researchers to estimate global signal variability at various (individual, population, species) scales. While sexually-selected signals are expected to show a low intra-individual variability at relatively short time scale, this variability has never been estimated so far. Yet, measuring signal variability in controlled conditions should prove useful to understand sexual selection processes and should help design acoustic sampling schedules and to analyse long call recordings. We here use the overall call production of 36 male treefrogs (Hyla arborea) during one night to evaluate within-individual variability in call dominant frequency and to test the efficiency of different sampling methods at capturing such variability. Our results confirm that using low number of calls underestimates call dominant frequency variation of about 35% in the tree frog and suggest that the assessment of this variability is better by using 2 or 3 short and well-distributed records than by using samples made of consecutive calls. Hence, 3 well-distributed 2-minutes records (beginning, middle and end of the calling period) are sufficient to capture on average all the nightly variability, whereas a sample of 10 000 consecutive calls captures only 86% of it. From a biological point of view, the call dominant frequency variability observed in H. arborea (116Hz on average but up to 470 Hz of variability during the course of the night for one male) challenge about its reliability in mate quality assessment. Automatic acoustic recording units will provide long call sequences in the near future and it will be then possible to confirm such results on large samples recorded in more complex field conditions.

  1. Long Recording Sequences: How to Track the Intra-Individual Variability of Acoustic Signals

    PubMed Central

    Lengagne, Thierry; Gomez, Doris; Josserand, Rémy; Voituron, Yann

    2015-01-01

    Recently developed acoustic technologies - like automatic recording units - allow the recording of long sequences in natural environments. These devices are used for biodiversity survey but they could also help researchers to estimate global signal variability at various (individual, population, species) scales. While sexually-selected signals are expected to show a low intra-individual variability at relatively short time scale, this variability has never been estimated so far. Yet, measuring signal variability in controlled conditions should prove useful to understand sexual selection processes and should help design acoustic sampling schedules and to analyse long call recordings. We here use the overall call production of 36 male treefrogs (Hyla arborea) during one night to evaluate within-individual variability in call dominant frequency and to test the efficiency of different sampling methods at capturing such variability. Our results confirm that using low number of calls underestimates call dominant frequency variation of about 35% in the tree frog and suggest that the assessment of this variability is better by using 2 or 3 short and well-distributed records than by using samples made of consecutive calls. Hence, 3 well-distributed 2-minutes records (beginning, middle and end of the calling period) are sufficient to capture on average all the nightly variability, whereas a sample of 10 000 consecutive calls captures only 86% of it. From a biological point of view, the call dominant frequency variability observed in H. arborea (116Hz on average but up to 470 Hz of variability during the course of the night for one male) challenge about its reliability in mate quality assessment. Automatic acoustic recording units will provide long call sequences in the near future and it will be then possible to confirm such results on large samples recorded in more complex field conditions. PMID:25970183

  2. Surface Reflection Phase in Two Way Acoustic Signal in Oceanic Crustal Deformation Measurement

    NASA Astrophysics Data System (ADS)

    Ikuta, R.; Tadokoro, K.; Watanabe, T.; Nagai, S.; Okuda, T.

    2011-12-01

    We are developing a geodetic method of monitoring crustal deformation under the ocean using kinematic GPS and acoustic ranging. The measurements are done by measuring two way traveltime of supersonic signal between a vessel, whose position is precisely determined by kinematic GPS, and transponders array (benchmark) on the ocean bottom. The goal of our research is to achieve sub-centimeter accuracy in measuring position of the benchmark by a very short-time measurement like 10 hours. In this study, we focused the under-water acoustic part of the system to improve data acquisition rate and then number of observation equations to solve the position of the benchmark with better accuracy. The measurements have started in Suruga Bay in 2003 and in Kumano Basin in 2004, which have been repeated a few times in a year. The accuracy of the benchmark positioning depends on the quality and quantity of the acoustic signal data. We are using M-sequence signal because of its robustness against ambient noises (The signal length is 14.322ms, Carrier frequency is 12.987kHz). We calculate cross-correlation between emitted and received signal and then accept the signal with cross correlation coefficient higher than a threshold. However, we often failed to achieve well correlated signals and then obtain very few traveltime data through one cruise. Sometimes in the cruise of good condition, 70 % of acoustic data have correlation coefficient above 0.7, on the other hand, only 10 % of all the data have correlation coefficient of 0.7 in bad condition cruise. We found that increase of ambient noise and contamination of later phase resembling to the main signal occurs independently each other. The ambient noise should be due to screw noise of the vessel because the noise grew up when sailing against the wind and current. On the other hand the later phases have following features: 1. Arrive in between 1 and 2 ms after the main signal arrival 2. The cross-correlation coefficient sometimes

  3. Cued Speech for Enhancing Speech Perception and First Language Development of Children With Cochlear Implants

    PubMed Central

    Leybaert, Jacqueline; LaSasso, Carol J.

    2010-01-01

    Nearly 300 million people worldwide have moderate to profound hearing loss. Hearing impairment, if not adequately managed, has strong socioeconomic and affective impact on individuals. Cochlear implants have become the most effective vehicle for helping profoundly deaf children and adults to understand spoken language, to be sensitive to environmental sounds, and, to some extent, to listen to music. The auditory information delivered by the cochlear implant remains non-optimal for speech perception because it delivers a spectrally degraded signal and lacks some of the fine temporal acoustic structure. In this article, we discuss research revealing the multimodal nature of speech perception in normally-hearing individuals, with important inter-subject variability in the weighting of auditory or visual information. We also discuss how audio-visual training, via Cued Speech, can improve speech perception in cochlear implantees, particularly in noisy contexts. Cued Speech is a system that makes use of visual information from speechreading combined with hand shapes positioned in different places around the face in order to deliver completely unambiguous information about the syllables and the phonemes of spoken language. We support our view that exposure to Cued Speech before or after the implantation could be important in the aural rehabilitation process of cochlear implantees. We describe five lines of research that are converging to support the view that Cued Speech can enhance speech perception in individuals with cochlear implants. PMID:20724357

  4. Traits of acoustic signalization and generation of sounds by some schooling physostomous fish

    NASA Astrophysics Data System (ADS)

    Kuznetsov, M. Yu.

    2009-11-01

    The results of experimental investigations of acoustic activity of schooling physostomous fish are discussed, made with reference to chum salmon, pink salmon, Pacific herring, and sardine. Dynamic spectra of most investigated fish are concentrated within two subranges of frequency, according to each investigated fish species. Direct participation of the swimming bladder in sound formation in the investigated fish is shown. Morphological traits of sound-producing organs of salmons and herrings are considered. Mechanisms of generation of signals in physotmous fish involving the muscular sphincter and swimming bladder are analyzed.

  5. Oscillating bubble as a sensor of low frequency electro-acoustic signals in electrolytes.

    PubMed

    Tankovsky, N; Baerner, K; Barey, Dooa Abdel

    2006-08-16

    Small air-bubble deformations, caused by electro-acoustic signals generated in electrolytic solutions have been detected by angle-modulation of a refracted He-Ne laser beam. The observed electromechanical resonance at low frequency, below 100 Hz, has proved to be directly related to the oscillations of characteristic ion-doped water structures when driven by an external electric field. The presence of structure-breaking or structure-making ions modifies the water structure, which varies the mechanical losses of the oscillating system and can be registered as changes in the width of the observed resonance curves.

  6. Differential Diagnosis of Severe Speech Disorders Using Speech Gestures

    ERIC Educational Resources Information Center

    Bahr, Ruth Huntley

    2005-01-01

    The differentiation of childhood apraxia of speech from severe phonological disorder is a common clinical problem. This article reports on an attempt to describe speech errors in children with childhood apraxia of speech on the basis of gesture use and acoustic analyses of articulatory gestures. The focus was on the movement of articulators and…

  7. Mate preference in the painted goby: the influence of visual and acoustic courtship signals.

    PubMed

    Amorim, M Clara P; da Ponte, Ana Nunes; Caiano, Manuel; Pedroso, Silvia S; Pereira, Ricardo; Fonseca, Paulo J

    2013-11-01

    We tested the hypothesis that females of a small vocal marine fish with exclusive paternal care, the painted goby, prefer high parental-quality mates such as large or high-condition males. We tested the effect of male body size and male visual and acoustic courtship behaviour (playback experiments) on female mating preferences by measuring time spent near one of a two-choice stimuli. Females did not show preference for male size but preferred males that showed higher levels of courtship, a trait known to advertise condition (fat reserves). Also, time spent near the preferred male depended on male courtship effort. Playback experiments showed that when sound was combined with visual stimuli (a male confined in a small aquarium placed near each speaker), females spent more time near the male associated with courtship sound than with the control male (associated with white noise or silence). Although male visual courtship effort also affected female preference in the pre-playback period, this effect decreased during playback and disappeared in the post-playback period. Courtship sound stimuli alone did not elicit female preference in relation to a control. Taken together, the results suggest that visual and mainly acoustic courtship displays are subject to mate preference and may advertise parental quality in this species. Our results indicate that visual and acoustic signals interplay in a complex fashion and highlight the need to examine how different sensory modalities affect mating preferences in fish and other vertebrates.

  8. Multichannel signal processing at Bell Labs Acoustics Research-Sampled by a postdoc

    NASA Astrophysics Data System (ADS)

    Kellermann, Walter

    2004-05-01

    In the mid 1980's, the first large microphone arrays for audio capture were designed and realized by Jim Flanagan and Gary Elko. After the author joined Bell Labs in 1989, the first real-time digital beamformer for teleconferencing applications was implemented and formed a starting point for the development of several novel beamforming techniques. In parallel, multichannel loudspeaker systems were already investigated and research on acoustic echo cancellation, small-aperture directional microphones, and sensor technology complemented the research scenario aiming at seamless hands-free acoustic communication. Arrays of many sensors and loudspeakers for sampling the spatial domain combined with advanced signal processing sparked new concepts that are still fueling ongoing research around the world-including the author's research group. Here, robust adaptive beamforming has found its way from large-scale arrays into many applications using smaller apertures. Blind source separation algorithms allow for effective spatial filtering without a priori information on source positions. Full-duplex communication using multiple channels for both reproduction and recording is enabled by multichannel acoustic echo cancellation combined with beamforming. Recently, wave domain adaptive filtering, a new concept for handling many sensors and many loudspeakers, has been verified for arrays that may well remind some observers of former Bell Labs projects.

  9. SPEECH COMMUNICATION RESEARCH.

    DTIC Science & Technology

    studies of the dynamics of speech production through cineradiographic techniques and through acoustic analysis of formant motions in vowels in various...particular, the activity of the vocal cords and the dynamics of tongue motion. Research on speech perception has included experiments on vowel

  10. Neural Mechanisms for Acoustic Signal Detection under Strong Masking in an Insect

    PubMed Central

    Römer, Heiner

    2015-01-01

    Communication is fundamental for our understanding of behavior. In the acoustic modality, natural scenes for communication in humans and animals are often very noisy, decreasing the chances for signal detection and discrimination. We investigated the mechanisms enabling selective hearing under natural noisy conditions for auditory receptors and interneurons of an insect. In the studied katydid Mecopoda elongata species-specific calling songs (chirps) are strongly masked by signals of another species, both communicating in sympatry. The spectral properties of the two signals are similar and differ only in a small frequency band at 2 kHz present in the chirping species. Receptors sharply tuned to 2 kHz are completely unaffected by the masking signal of the other species, whereas receptors tuned to higher audio and ultrasonic frequencies show complete masking. Intracellular recordings of identified interneurons revealed two mechanisms providing response selectivity to the chirp. (1) Response selectivity is when several identified interneurons exhibit remarkably selective responses to the chirps, even at signal-to-noise ratios of −21 dB, since they are sharply tuned to 2 kHz. Their dendritic arborizations indicate selective connectivity with low-frequency receptors tuned to 2 kHz. (2) Novelty detection is when a second group of interneurons is broadly tuned but, because of strong stimulus-specific adaptation to the masker spectrum and “novelty detection” to the 2 kHz band present only in the conspecific signal, these interneurons start to respond selectively to the chirp shortly after the onset of the continuous masker. Both mechanisms provide the sensory basis for hearing at unfavorable signal-to-noise ratios. SIGNIFICANCE STATEMENT Animal and human acoustic communication may suffer from the same “cocktail party problem,” when communication happens in noisy social groups. We address solutions for this problem in a model system of two katydids, where one

  11. System and method for characterizing, synthesizing, and/or canceling out acoustic signals from inanimate sound sources

    DOEpatents

    Holzrichter, John F.; Burnett, Greg C.; Ng, Lawrence C.

    2007-10-16

    A system and method for characterizing, synthesizing, and/or canceling out acoustic signals from inanimate sound sources is disclosed. Propagating wave electromagnetic sensors monitor excitation sources in sound producing systems, such as machines, musical instruments, and various other structures. Acoustical output from these sound producing systems is also monitored. From such information, a transfer function characterizing the sound producing system is generated. From the transfer function, acoustical output from the sound producing system may be synthesized or canceled. The methods disclosed enable accurate calculation of matched transfer functions relating specific excitations to specific acoustical outputs. Knowledge of such signals and functions can be used to effect various sound replication, sound source identification, and sound cancellation applications.

  12. System and method for characterizing, synthesizing, and/or canceling out acoustic signals from inanimate sound sources

    SciTech Connect

    Holzrichter, John F; Burnett, Greg C; Ng, Lawrence C

    2013-05-21

    A system and method for characterizing, synthesizing, and/or canceling out acoustic signals from inanimate sound sources is disclosed. Propagating wave electromagnetic sensors monitor excitation sources in sound producing systems, such as machines, musical instruments, and various other structures. Acoustical output from these sound producing systems is also monitored. From such information, a transfer function characterizing the sound producing system is generated. From the transfer function, acoustical output from the sound producing system may be synthesized or canceled. The methods disclosed enable accurate calculation of matched transfer functions relating specific excitations to specific acoustical outputs. Knowledge of such signals and functions can be used to effect various sound replication, sound source identification, and sound cancellation applications.

  13. System and method for characterizing synthesizing and/or canceling out acoustic signals from inanimate sound sources

    DOEpatents

    Holzrichter, John F.; Burnett, Greg C.; Ng, Lawrence C.

    2003-01-01

    A system and method for characterizing, synthesizing, and/or canceling out acoustic signals from inanimate sound sources is disclosed. Propagating wave electromagnetic sensors monitor excitation sources in sound producing systems, such as machines, musical instruments, and various other structures. Acoustical output from these sound producing systems is also monitored. From such information, a transfer function characterizing the sound producing system is generated. From the transfer function, acoustical output from the sound producing system may be synthesized or canceled. The methods disclosed enable accurate calculation of matched transfer functions relating specific excitations to specific acoustical outputs. Knowledge of such signals and functions can be used to effect various sound replication, sound source identification, and sound cancellation applications.

  14. Auditory perception bias in speech imitation.

    PubMed

    Postma-Nilsenová, Marie; Postma, Eric

    2013-01-01

    In an experimental study, we explored the role of auditory perception bias in vocal pitch imitation. Psychoacoustic tasks involving a missing fundamental indicate that some listeners are attuned to the relationship between all the higher harmonics present in the signal, which supports their perception of the fundamental frequency (the primary acoustic correlate of pitch). Other listeners focus on the lowest harmonic constituents of the complex sound signal which may hamper the perception of the fundamental. These two listener types are referred to as fundamental and spectral listeners, respectively. We hypothesized that the individual differences in speakers' capacity to imitate F 0 found in earlier studies, may at least partly be due to the capacity to extract information about F 0 from the speech signal. Participants' auditory perception bias was determined with a standard missing fundamental perceptual test. Subsequently, speech data were collected in a shadowing task with two conditions, one with a full speech signal and one with high-pass filtered speech above 300 Hz. The results showed that perception bias toward fundamental frequency was related to the degree of F 0 imitation. The effect was stronger in the condition with high-pass filtered speech. The experimental outcomes suggest advantages for fundamental listeners in communicative situations where F 0 imitation is used as a behavioral cue. Future research needs to determine to what extent auditory perception bias may be related to other individual properties known to improve imitation, such as phonetic talent.

  15. The contribution of auditory temporal processing to the separation of competing speech signals in listeners with normal hearing

    NASA Astrophysics Data System (ADS)

    Adam, Trudy J.; Pichora-Fuller, Kathy

    2002-05-01

    The hallmark of auditory function in aging adults is difficulty listening in a background of competing talkers, even when hearing sensitivity in quiet is good. Age-related physiological changes may contribute by introducing small timing errors (jitter) to the neural representation of sound, compromising the fidelity of the signal's fine temporal structure. This may preclude the association of spectral features to form an accurate percept of one complex stimulus, distinct from competing sounds. For simple voiced speech (vowels), the separation of two competing stimuli can be achieved on the basis of their respective harmonic (temporal) structures. Fundamental frequency (F0) differences in competing stimuli facilitate their segregation. This benefit was hypothesized to rely on the adequate temporal representation of the speech signal(s). Auditory aging was simulated via the desynchronization (~0.25-ms jitter) of the spectral bands of synthesized vowels. The perceptual benefit of F0 difference for the identification of concurrent vowel pairs was examined for intact and jittered vowels in young adults with normal hearing thresholds. Results suggest a role for reduced signal fidelity in the perceptual difficulties encountered in noisy everyday environments by aging listeners. [Work generously supported by the Michael Smith Foundation for Health Research.

  16. Holographic matched filtering of acoustic signals with the application of a membrane modulator

    NASA Astrophysics Data System (ADS)

    Larkin, A. I.; Minialga, V. L.; Petropavlovskii, V. M.

    1986-04-01

    The results of preliminary experiments on a holographic-matched-filtering space-time light modulator for use in the real-time digital analysis of acoustic signals (such as those from the multiple hydrophones of the DUMAND project) are reported. The modulator is based on a transverse-displacement traveling-wave membrane (in this case a taut metal ribbon with a diffusely reflective coating) illuminated by an electrooptic-shutter-pulsed laser beam to record Fresnel holograms. The effects of varying the illumination optics, the ribbon temperature and characteristics, and other device parameters are investigated, and the feasibility of analyzing signals from 0.1 to 100 kHz with a base of 1000 is demonstrated.

  17. Military Vehicle Classification via Acoustic and Seismic Signals Using Statistical Learning Methods

    NASA Astrophysics Data System (ADS)

    Xiao, Hanguang; Cai, Congzhong; Chen, Yuzong

    It is a difficult and important task to classify the types of military vehicles using the acoustic and seismic signals generated by military vehicles. For improving the classification accuracy and reducing the computing time and memory size, we investigated different pre-processing technology, feature extraction and selection methods. Short Time Fourier Transform (STFT) was employed for feature extraction. Genetic Algorithms (GA) and Principal Component Analysis (PCA) were used for feature selection and extraction further. A new feature vector construction method was proposed by uniting PCA and another feature selection method. K-Nearest Neighbor Classifier (KNN) and Support Vector Machines (SVM) were used for classification. The experimental results showed the accuracies of KNN and SVM were affected obviously by the window size which was used to frame the time series of the acoustic and seismic signals. The classification results indicated the performance of SVM was superior to that of KNN. The comparison of the four feature selection and extraction methods showed the proposed method is a simple, none time-consuming, and reliable technique for feature selection and helps the classifier SVM to achieve more better results than solely using PCA, GA, or combination.

  18. Effects of computer-based intervention through acoustically modified speech (Fast ForWord) in severe mixed receptive-expressive language impairment: outcomes from a randomized controlled trial.

    PubMed

    Cohen, Wendy; Hodson, Ann; O'Hare, Anne; Boyle, James; Durrani, Tariq; McCartney, Elspeth; Mattey, Mike; Naftalin, Lionel; Watson, Jocelynne

    2005-06-01

    Seventy-seven children between the ages of 6 and 10 years, with severe mixed receptive-expressive specific language impairment (SLI), participated in a randomized controlled trial (RCT) of Fast ForWord (FFW; Scientific Learning Corporation, 1997, 2001). FFW is a computer-based intervention for treating SLI using acoustically enhanced speech stimuli. These stimuli are modified to exaggerate their time and intensity properties as part of an adaptive training process. All children who participated in the RCT maintained their regular speech and language therapy and school regime throughout the trial. Standardized measures of receptive and expressive language were used to assess performance at baseline and to measure outcome from treatment at 9 weeks and 6 months. Children were allocated to 1 of 3 groups. Group A (n = 23) received the FFW intervention as a home-based therapy for 6 weeks. Group B (n = 27) received commercially available computer-based activities designed to promote language as a control for computer games exposure. Group C (n = 27) received no additional study intervention. Each group made significant gains in language scores, but there was no additional effect for either computer intervention. Thus, the findings from this RCT do not support the efficacy of FFW as an intervention for children with severe mixed receptive-expressive SLI.

  19. Processing of simple and complex acoustic signals in a tonotopically organized ear

    PubMed Central

    Hummel, Jennifer; Wolf, Konstantin; Kössl, Manfred; Nowotny, Manuela

    2014-01-01

    Processing of complex signals in the hearing organ remains poorly understood. This paper aims to contribute to this topic by presenting investigations on the mechanical and neuronal response of the hearing organ of the tropical bushcricket species Mecopoda elongata to simple pure tone signals as well as to the conspecific song as a complex acoustic signal. The high-frequency hearing organ of bushcrickets, the crista acustica (CA), is tonotopically tuned to frequencies between about 4 and 70 kHz. Laser Doppler vibrometer measurements revealed a strong and dominant low-frequency-induced motion of the CA when stimulated with either pure tone or complex stimuli. Consequently, the high-frequency distal area of the CA is more strongly deflected by low-frequency-induced waves than by high-frequency-induced waves. This low-frequency dominance will have strong effects on the processing of complex signals. Therefore, we additionally studied the neuronal response of the CA to native and frequency-manipulated chirps. Again, we found a dominant influence of low-frequency components within the conspecific song, indicating that the mechanical vibration pattern highly determines the neuronal response of the sensory cells. Thus, we conclude that the encoding of communication signals is modulated by ear mechanics. PMID:25339727

  20. Processing of simple and complex acoustic signals in a tonotopically organized ear.

    PubMed

    Hummel, Jennifer; Wolf, Konstantin; Kössl, Manfred; Nowotny, Manuela

    2014-12-07

    Processing of complex signals in the hearing organ remains poorly understood. This paper aims to contribute to this topic by presenting investigations on the mechanical and neuronal response of the hearing organ of the tropical bushcricket species Mecopoda elongata to simple pure tone signals as well as to the conspecific song as a complex acoustic signal. The high-frequency hearing organ of bushcrickets, the crista acustica (CA), is tonotopically tuned to frequencies between about 4 and 70 kHz. Laser Doppler vibrometer measurements revealed a strong and dominant low-frequency-induced motion of the CA when stimulated with either pure tone or complex stimuli. Consequently, the high-frequency distal area of the CA is more strongly deflected by low-frequency-induced waves than by high-frequency-induced waves. This low-frequency dominance will have strong effects on the processing of complex signals. Therefore, we additionally studied the neuronal response of the CA to native and frequency-manipulated chirps. Again, we found a dominant influence of low-frequency components within the conspecific song, indicating that the mechanical vibration pattern highly determines the neuronal response of the sensory cells. Thus, we conclude that the encoding of communication signals is modulated by ear mechanics.