speech processing efficiency: Topics by Science.gov

Sample records for speech processing efficiency

Talking to children matters: Early language experience strengthens processing and builds vocabulary

PubMed Central

Weisleder, Adriana; Fernald, Anne

2016-01-01

Infants differ substantially in their rates of language growth, and slower growth predicts later academic difficulties. This study explored how the amount of speech to infants in Spanish-speaking families low in socioeconomic status (SES) influenced the development of children's skill in real-time language processing and vocabulary learning. All-day recordings of parent-infant interactions at home revealed striking variability among families in how much speech caregivers addressed to their child. Infants who experienced more child-directed speech became more efficient in processing familiar words in real time and had larger expressive vocabularies by 24 months, although speech simply overheard by the child was unrelated to vocabulary outcomes. Mediation analyses showed that the effect of child-directed speech on expressive vocabulary was explained by infants’ language-processing efficiency, suggesting that richer language experience strengthens processing skills that facilitate language growth. PMID:24022649
Neural Correlates of Intersensory Processing in Five-Month-Old Infants

PubMed Central

Reynolds, Greg D.; Bahrick, Lorraine E.; Lickliter, Robert; Guy, Maggie W.

2014-01-01

Two experiments assessing event-related potentials in 5-month-old infants were conducted to examine neural correlates of attentional salience and efficiency of processing of a visual event (woman speaking) paired with redundant (synchronous) speech, nonredundant (asynchronous) speech, or no speech. In Experiment 1, the Nc component associated with attentional salience was greater in amplitude following synchronous audiovisual as compared with asynchronous audiovisual and unimodal visual presentations. A block design was utilized in Experiment 2 to examine efficiency of processing of a visual event. Only infants exposed to synchronous audiovisual speech demonstrated a significant reduction in amplitude of the late slow wave associated with successful stimulus processing and recognition memory from early to late blocks of trials. These findings indicate that events that provide intersensory redundancy are associated with enhanced neural responsiveness indicative of greater attentional salience and more efficient stimulus processing as compared with the same events when they provide no intersensory redundancy in 5-month-old infants. PMID:23423948
Does input influence uptake? Links between maternal talk, processing speed and vocabulary size in Spanish-learning children

PubMed Central

Hurtado, Nereyda; Marchman, Virginia A.; Fernald, Anne

2010-01-01

It is well established that variation in caregivers' speech is associated with language outcomes, yet little is known about the learning principles that mediate these effects. This longitudinal study (n = 27) explores whether Spanish-learning children's early experiences with language predict efficiency in real-time comprehension and vocabulary learning. Measures of mothers' speech at 18 months were examined in relation to children's speech processing efficiency and reported vocabulary at 18 and 24 months. Children of mothers who provided more input at 18 months knew more words and were faster in word recognition at 24 months. Moreover, multiple regression analyses indicated that the influences of caregiver speech on speed of word recognition and vocabulary were largely overlapping. This study provides the first evidence that input shapes children's lexical processing efficiency and that vocabulary growth and increasing facility in spoken word comprehension work together to support the uptake of the information that rich input affords the young language learner. PMID:19046145
Auditory-Motor Processing of Speech Sounds

PubMed Central

Möttönen, Riikka; Dutton, Rebekah; Watkins, Kate E.

2013-01-01

The motor regions that control movements of the articulators activate during listening to speech and contribute to performance in demanding speech recognition and discrimination tasks. Whether the articulatory motor cortex modulates auditory processing of speech sounds is unknown. Here, we aimed to determine whether the articulatory motor cortex affects the auditory mechanisms underlying discrimination of speech sounds in the absence of demanding speech tasks. Using electroencephalography, we recorded responses to changes in sound sequences, while participants watched a silent video. We also disrupted the lip or the hand representation in left motor cortex using transcranial magnetic stimulation. Disruption of the lip representation suppressed responses to changes in speech sounds, but not piano tones. In contrast, disruption of the hand representation had no effect on responses to changes in speech sounds. These findings show that disruptions within, but not outside, the articulatory motor cortex impair automatic auditory discrimination of speech sounds. The findings provide evidence for the importance of auditory-motor processes in efficient neural analysis of speech sounds. PMID:22581846
Design of an efficient music-speech discriminator.

PubMed

Tardón, Lorenzo J; Sammartino, Simone; Barbancho, Isabel

2010-01-01

In this paper, the problem of the design of a simple and efficient music-speech discriminator for large audio data sets in which advanced music playing techniques are taught and voice and music are intrinsically interleaved is addressed. In the process, a number of features used in speech-music discrimination are defined and evaluated over the available data set. Specifically, the data set contains pieces of classical music played with different and unspecified instruments (or even lyrics) and the voice of a teacher (a top music performer) or even the overlapped voice of the translator and other persons. After an initial test of the performance of the features implemented, a selection process is started, which takes into account the type of classifier selected beforehand, to achieve good discrimination performance and computational efficiency, as shown in the experiments. The discrimination application has been defined and tested on a large data set supplied by Fundacion Albeniz, containing a large variety of classical music pieces played with different instrument, which include comments and speeches of famous performers.
Effects of Computer Architecture on FFT (Fast Fourier Transform) Algorithm Performance.

DTIC Science & Technology

1983-12-01

Criteria for Efficient Implementation of FFT Algorithms," IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-30, pp. 107-109, Feb...1982. Burrus, C. S. and P. W. Eschenbacher. "An In-Place, In-Order Prime Factor FFT Algorithm," IEEE Transactions on Acoustics, Speech, and Signal... Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-30, pp. 217-226, Apr. 1982. Control Data Corporation. CDC Cyber 170 Computer Systems
[Assessment of the efficiency of the auditory training in children with dyslalia and auditory processing disorders].

PubMed

Włodarczyk, Elżbieta; Szkiełkowska, Agata; Skarżyński, Henryk; Piłka, Adam

2011-01-01

To assess effectiveness of the auditory training in children with dyslalia and central auditory processing disorders. Material consisted of 50 children aged 7-9-years-old. Children with articulation disorders stayed under long-term speech therapy care in the Auditory and Phoniatrics Clinic. All children were examined by a laryngologist and a phoniatrician. Assessment included tonal and impedance audiometry and speech therapists' and psychologist's consultations. Additionally, a set of electrophysiological examinations was performed - registration of N2, P2, N2, P2, P300 waves and psychoacoustic test of central auditory functions: FPT - frequency pattern test. Next children took part in the regular auditory training and attended speech therapy. Speech assessment followed treatment and therapy, again psychoacoustic tests were performed and P300 cortical potentials were recorded. After that statistical analyses were performed. Analyses revealed that application of auditory training in patients with dyslalia and other central auditory disorders is very efficient. Auditory training may be a very efficient therapy supporting speech therapy in children suffering from dyslalia coexisting with articulation and central auditory disorders and in children with educational problems of audiogenic origin. Copyright © 2011 Polish Otolaryngology Society. Published by Elsevier Urban & Partner (Poland). All rights reserved.
Language/Culture Modulates Brain and Gaze Processes in Audiovisual Speech Perception.

PubMed

Hisanaga, Satoko; Sekiyama, Kaoru; Igasaki, Tomohiko; Murayama, Nobuki

2016-10-13

Several behavioural studies have shown that the interplay between voice and face information in audiovisual speech perception is not universal. Native English speakers (ESs) are influenced by visual mouth movement to a greater degree than native Japanese speakers (JSs) when listening to speech. However, the biological basis of these group differences is unknown. Here, we demonstrate the time-varying processes of group differences in terms of event-related brain potentials (ERP) and eye gaze for audiovisual and audio-only speech perception. On a behavioural level, while congruent mouth movement shortened the ESs' response time for speech perception, the opposite effect was observed in JSs. Eye-tracking data revealed a gaze bias to the mouth for the ESs but not the JSs, especially before the audio onset. Additionally, the ERP P2 amplitude indicated that ESs processed multisensory speech more efficiently than auditory-only speech; however, the JSs exhibited the opposite pattern. Taken together, the ESs' early visual attention to the mouth was likely to promote phonetic anticipation, which was not the case for the JSs. These results clearly indicate the impact of language and/or culture on multisensory speech processing, suggesting that linguistic/cultural experiences lead to the development of unique neural systems for audiovisual speech perception.
Brain responses and looking behavior during audiovisual speech integration in infants predict auditory speech comprehension in the second year of life

PubMed Central

Kushnerenko, Elena; Tomalski, Przemyslaw; Ballieux, Haiko; Potton, Anita; Birtles, Deidre; Frostick, Caroline; Moore, Derek G.

2013-01-01

The use of visual cues during the processing of audiovisual (AV) speech is known to be less efficient in children and adults with language difficulties and difficulties are known to be more prevalent in children from low-income populations. In the present study, we followed an economically diverse group of thirty-seven infants longitudinally from 6–9 months to 14–16 months of age. We used eye-tracking to examine whether individual differences in visual attention during AV processing of speech in 6–9 month old infants, particularly when processing congruent and incongruent auditory and visual speech cues, might be indicative of their later language development. Twenty-two of these 6–9 month old infants also participated in an event-related potential (ERP) AV task within the same experimental session. Language development was then followed-up at the age of 14–16 months, using two measures of language development, the Preschool Language Scale and the Oxford Communicative Development Inventory. The results show that those infants who were less efficient in auditory speech processing at the age of 6–9 months had lower receptive language scores at 14–16 months. A correlational analysis revealed that the pattern of face scanning and ERP responses to audiovisually incongruent stimuli at 6–9 months were both significantly associated with language development at 14–16 months. These findings add to the understanding of individual differences in neural signatures of AV processing and associated looking behavior in infants. PMID:23882240
Children and Adults Integrate Talker and Verb Information in Online Processing

ERIC Educational Resources Information Center

Borovsky, Arielle; Creel, Sarah C.

2014-01-01

Children seem able to efficiently interpret a variety of linguistic cues during speech comprehension, yet have difficulty interpreting sources of nonlinguistic and paralinguistic information that accompany speech. The current study asked whether (paralinguistic) voice-activated role knowledge is rapidly interpreted in coordination with a…
Estimating psycho-physiological state of a human by speech analysis

NASA Astrophysics Data System (ADS)

Ronzhin, A. L.

2005-05-01

Adverse effects of intoxication, fatigue and boredom could degrade performance of highly trained operators of complex technical systems with potentially catastrophic consequences. Existing physiological fitness for duty tests are time consuming, costly, invasive, and highly unpopular. Known non-physiological tests constitute a secondary task and interfere with the busy workload of the tested operator. Various attempts to assess the current status of the operator by processing of "normal operational data" often lead to excessive amount of computations, poorly justified metrics, and ambiguity of results. At the same time, speech analysis presents a natural, non-invasive approach based upon well-established efficient data processing. In addition, it supports both behavioral and physiological biometric. This paper presents an approach facilitating robust speech analysis/understanding process in spite of natural speech variability and background noise. Automatic speech recognition is suggested as a technique for the detection of changes in the psycho-physiological state of a human that typically manifest themselves by changes of characteristics of voice tract and semantic-syntactic connectivity of conversation. Preliminary tests have confirmed that the statistically significant correlation between the error rate of automatic speech recognition and the extent of alcohol intoxication does exist. In addition, the obtained data allowed exploring some interesting correlations and establishing some quantitative models. It is proposed to utilize this approach as a part of fitness for duty test and compare its efficiency with analyses of iris, face geometry, thermography and other popular non-invasive biometric techniques.
Speech Rhythms and Multiplexed Oscillatory Sensory Coding in the Human Brain

PubMed Central

Gross, Joachim; Hoogenboom, Nienke; Thut, Gregor; Schyns, Philippe; Panzeri, Stefano; Belin, Pascal; Garrod, Simon

2013-01-01

Cortical oscillations are likely candidates for segmentation and coding of continuous speech. Here, we monitored continuous speech processing with magnetoencephalography (MEG) to unravel the principles of speech segmentation and coding. We demonstrate that speech entrains the phase of low-frequency (delta, theta) and the amplitude of high-frequency (gamma) oscillations in the auditory cortex. Phase entrainment is stronger in the right and amplitude entrainment is stronger in the left auditory cortex. Furthermore, edges in the speech envelope phase reset auditory cortex oscillations thereby enhancing their entrainment to speech. This mechanism adapts to the changing physical features of the speech envelope and enables efficient, stimulus-specific speech sampling. Finally, we show that within the auditory cortex, coupling between delta, theta, and gamma oscillations increases following speech edges. Importantly, all couplings (i.e., brain-speech and also within the cortex) attenuate for backward-presented speech, suggesting top-down control. We conclude that segmentation and coding of speech relies on a nested hierarchy of entrained cortical oscillations. PMID:24391472
Evidence of degraded representation of speech in noise, in the aging midbrain and cortex

PubMed Central

Simon, Jonathan Z.; Anderson, Samira

2016-01-01

Humans have a remarkable ability to track and understand speech in unfavorable conditions, such as in background noise, but speech understanding in noise does deteriorate with age. Results from several studies have shown that in younger adults, low-frequency auditory cortical activity reliably synchronizes to the speech envelope, even when the background noise is considerably louder than the speech signal. However, cortical speech processing may be limited by age-related decreases in the precision of neural synchronization in the midbrain. To understand better the neural mechanisms contributing to impaired speech perception in older adults, we investigated how aging affects midbrain and cortical encoding of speech when presented in quiet and in the presence of a single-competing talker. Our results suggest that central auditory temporal processing deficits in older adults manifest in both the midbrain and in the cortex. Specifically, midbrain frequency following responses to a speech syllable are more degraded in noise in older adults than in younger adults. This suggests a failure of the midbrain auditory mechanisms needed to compensate for the presence of a competing talker. Similarly, in cortical responses, older adults show larger reductions than younger adults in their ability to encode the speech envelope when a competing talker is added. Interestingly, older adults showed an exaggerated cortical representation of speech in both quiet and noise conditions, suggesting a possible imbalance between inhibitory and excitatory processes, or diminished network connectivity that may impair their ability to encode speech efficiently. PMID:27535374
The neural processing of foreign-accented speech and its relationship to listener bias

PubMed Central

Yi, Han-Gyol; Smiljanic, Rajka; Chandrasekaran, Bharath

2014-01-01

Foreign-accented speech often presents a challenging listening condition. In addition to deviations from the target speech norms related to the inexperience of the nonnative speaker, listener characteristics may play a role in determining intelligibility levels. We have previously shown that an implicit visual bias for associating East Asian faces and foreignness predicts the listeners' perceptual ability to process Korean-accented English audiovisual speech (Yi et al., 2013). Here, we examine the neural mechanism underlying the influence of listener bias to foreign faces on speech perception. In a functional magnetic resonance imaging (fMRI) study, native English speakers listened to native- and Korean-accented English sentences, with or without faces. The participants' Asian-foreign association was measured using an implicit association test (IAT), conducted outside the scanner. We found that foreign-accented speech evoked greater activity in the bilateral primary auditory cortices and the inferior frontal gyri, potentially reflecting greater computational demand. Higher IAT scores, indicating greater bias, were associated with increased BOLD response to foreign-accented speech with faces in the primary auditory cortex, the early node for spectrotemporal analysis. We conclude the following: (1) foreign-accented speech perception places greater demand on the neural systems underlying speech perception; (2) face of the talker can exaggerate the perceived foreignness of foreign-accented speech; (3) implicit Asian-foreign association is associated with decreased neural efficiency in early spectrotemporal processing. PMID:25339883
Focal versus distributed temporal cortex activity for speech sound category assignment

PubMed Central

Bouton, Sophie; Chambon, Valérian; Tyrand, Rémi; Seeck, Margitta; Karkar, Sami; van de Ville, Dimitri; Giraud, Anne-Lise

2018-01-01

Percepts and words can be decoded from distributed neural activity measures. However, the existence of widespread representations might conflict with the more classical notions of hierarchical processing and efficient coding, which are especially relevant in speech processing. Using fMRI and magnetoencephalography during syllable identification, we show that sensory and decisional activity colocalize to a restricted part of the posterior superior temporal gyrus (pSTG). Next, using intracortical recordings, we demonstrate that early and focal neural activity in this region distinguishes correct from incorrect decisions and can be machine-decoded to classify syllables. Crucially, significant machine decoding was possible from neuronal activity sampled across different regions of the temporal and frontal lobes, despite weak or absent sensory or decision-related responses. These findings show that speech-sound categorization relies on an efficient readout of focal pSTG neural activity, while more distributed activity patterns, although classifiable by machine learning, instead reflect collateral processes of sensory perception and decision. PMID:29363598
An Assessment of Behavioral Dynamic Information Processing Measures in Audiovisual Speech Perception

PubMed Central

Altieri, Nicholas; Townsend, James T.

2011-01-01

Research has shown that visual speech perception can assist accuracy in identification of spoken words. However, little is known about the dynamics of the processing mechanisms involved in audiovisual integration. In particular, architecture and capacity, measured using response time methodologies, have not been investigated. An issue related to architecture concerns whether the auditory and visual sources of the speech signal are integrated “early” or “late.” We propose that “early” integration most naturally corresponds to coactive processing whereas “late” integration corresponds to separate decisions parallel processing. We implemented the double factorial paradigm in two studies. First, we carried out a pilot study using a two-alternative forced-choice discrimination task to assess architecture, decision rule, and provide a preliminary assessment of capacity (integration efficiency). Next, Experiment 1 was designed to specifically assess audiovisual integration efficiency in an ecologically valid way by including lower auditory S/N ratios and a larger response set size. Results from the pilot study support a separate decisions parallel, late integration model. Results from both studies showed that capacity was severely limited for high auditory signal-to-noise ratios. However, Experiment 1 demonstrated that capacity improved as the auditory signal became more degraded. This evidence strongly suggests that integration efficiency is vitally affected by the S/N ratio. PMID:21980314
The Hierarchical Cortical Organization of Human Speech Processing

PubMed Central

de Heer, Wendy A.; Huth, Alexander G.; Griffiths, Thomas L.

2017-01-01

Speech comprehension requires that the brain extract semantic meaning from the spectral features represented at the cochlea. To investigate this process, we performed an fMRI experiment in which five men and two women passively listened to several hours of natural narrative speech. We then used voxelwise modeling to predict BOLD responses based on three different feature spaces that represent the spectral, articulatory, and semantic properties of speech. The amount of variance explained by each feature space was then assessed using a separate validation dataset. Because some responses might be explained equally well by more than one feature space, we used a variance partitioning analysis to determine the fraction of the variance that was uniquely explained by each feature space. Consistent with previous studies, we found that speech comprehension involves hierarchical representations starting in primary auditory areas and moving laterally on the temporal lobe: spectral features are found in the core of A1, mixtures of spectral and articulatory in STG, mixtures of articulatory and semantic in STS, and semantic in STS and beyond. Our data also show that both hemispheres are equally and actively involved in speech perception and interpretation. Further, responses as early in the auditory hierarchy as in STS are more correlated with semantic than spectral representations. These results illustrate the importance of using natural speech in neurolinguistic research. Our methodology also provides an efficient way to simultaneously test multiple specific hypotheses about the representations of speech without using block designs and segmented or synthetic speech. SIGNIFICANCE STATEMENT To investigate the processing steps performed by the human brain to transform natural speech sound into meaningful language, we used models based on a hierarchical set of speech features to predict BOLD responses of individual voxels recorded in an fMRI experiment while subjects listened to natural speech. Both cerebral hemispheres were actively involved in speech processing in large and equal amounts. Also, the transformation from spectral features to semantic elements occurs early in the cortical speech-processing stream. Our experimental and analytical approaches are important alternatives and complements to standard approaches that use segmented speech and block designs, which report more laterality in speech processing and associated semantic processing to higher levels of cortex than reported here. PMID:28588065
Prediction and constraint in audiovisual speech perception

PubMed Central

Peelle, Jonathan E.; Sommers, Mitchell S.

2015-01-01

During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing precision of prediction. Electrophysiological studies demonstrate oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to auditory information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration, supported by distinct neuroanatomical mechanisms. PMID:25890390
Prediction and constraint in audiovisual speech perception.

PubMed

Peelle, Jonathan E; Sommers, Mitchell S

2015-07-01

During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing the precision of prediction. Electrophysiological studies demonstrate that oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to acoustic information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration, supported by distinct neuroanatomical mechanisms. Copyright © 2015 Elsevier Ltd. All rights reserved.
Speech and pause characteristics in multiple sclerosis: A preliminary study of speakers with high and low neuropsychological test performance

PubMed Central

FEENAUGHTY, LYNDA; TJADEN, KRIS; BENEDICT, RALPH H.B.; WEINSTOCK-GUTTMAN, BIANCA

2017-01-01

This preliminary study investigated how cognitive-linguistic status in multiple sclerosis (MS) is reflected in two speech tasks (i.e. oral reading, narrative) that differ in cognitive-linguistic demand. Twenty individuals with MS were selected to comprise High and Low performance groups based on clinical tests of executive function and information processing speed and efficiency. Ten healthy controls were included for comparison. Speech samples were audio-recorded and measures of global speech timing were obtained. Results indicated predicted differences in global speech timing (i.e. speech rate and pause characteristics) for speech tasks differing in cognitive-linguistic demand, but the magnitude of these task-related differences was similar for all speaker groups. Findings suggest that assumptions concerning the cognitive-linguistic demands of reading aloud as compared to spontaneous speech may need to be re-considered for individuals with cognitive impairment. Qualitative trends suggest that additional studies investigating the association between cognitive-linguistic and speech motor variables in MS are warranted. PMID:23294227

Application of independent component analysis for speech-music separation using an efficient score function estimation

NASA Astrophysics Data System (ADS)

Pishravian, Arash; Aghabozorgi Sahaf, Masoud Reza

2012-12-01

In this paper speech-music separation using Blind Source Separation is discussed. The separating algorithm is based on the mutual information minimization where the natural gradient algorithm is used for minimization. In order to do that, score function estimation from observation signals (combination of speech and music) samples is needed. The accuracy and the speed of the mentioned estimation will affect on the quality of the separated signals and the processing time of the algorithm. The score function estimation in the presented algorithm is based on Gaussian mixture based kernel density estimation method. The experimental results of the presented algorithm on the speech-music separation and comparing to the separating algorithm which is based on the Minimum Mean Square Error estimator, indicate that it can cause better performance and less processing time
Toward dynamic magnetic resonance imaging of the vocal tract during speech production.

PubMed

Ventura, Sandra M Rua; Freitas, Diamantino Rui S; Tavares, João Manuel R S

2011-07-01

The most recent and significant magnetic resonance imaging (MRI) improvements allow for the visualization of the vocal tract during speech production, which has been revealed to be a powerful tool in dynamic speech research. However, a synchronization technique with enhanced temporal resolution is still required. The study design was transversal in nature. Throughout this work, a technique for the dynamic study of the vocal tract with MRI by using the heart's signal to synchronize and trigger the imaging-acquisition process is presented and described. The technique in question is then used in the measurement of four speech articulatory parameters to assess three different syllables (articulatory gestures) of European Portuguese Language. The acquired MR images are automatically reconstructed so as to result in a variable sequence of images (slices) of different vocal tract shapes in articulatory positions associated with Portuguese speech sounds. The knowledge obtained as a result of the proposed technique represents a direct contribution to the improvement of speech synthesis algorithms, thereby allowing for novel perceptions in coarticulation studies, in addition to providing further efficient clinical guidelines in the pursuit of more proficient speech rehabilitation processes. Copyright © 2011 The Voice Foundation. Published by Mosby, Inc. All rights reserved.
Relatively effortless listening promotes understanding and recall of medical instructions in older adults

PubMed Central

DiDonato, Roberta M.; Surprenant, Aimée M.

2015-01-01

Communication success under adverse conditions requires efficient and effective recruitment of both bottom-up (sensori-perceptual) and top-down (cognitive-linguistic) resources to decode the intended auditory-verbal message. Employing these limited capacity resources has been shown to vary across the lifespan, with evidence indicating that younger adults out-perform older adults for both comprehension and memory of the message. This study examined how sources of interference arising from the speaker (message spoken with conversational vs. clear speech technique), the listener (hearing-listening and cognitive-linguistic factors), and the environment (in competing speech babble noise vs. quiet) interact and influence learning and memory performance using more ecologically valid methods than has been done previously. The results suggest that when older adults listened to complex medical prescription instructions with “clear speech,” (presented at audible levels through insertion earphones) their learning efficiency, immediate, and delayed memory performance improved relative to their performance when they listened with a normal conversational speech rate (presented at audible levels in sound field). This better learning and memory performance for clear speech listening was maintained even in the presence of speech babble noise. The finding that there was the largest learning-practice effect on 2nd trial performance in the conversational speech when the clear speech listening condition was first is suggestive of greater experience-dependent perceptual learning or adaptation to the speaker's speech and voice pattern in clear speech. This suggests that experience-dependent perceptual learning plays a role in facilitating the language processing and comprehension of a message and subsequent memory encoding. PMID:26106353
Dual Key Speech Encryption Algorithm Based Underdetermined BSS

PubMed Central

Zhao, Huan; Chen, Zuo; Zhang, Xixiang

2014-01-01

When the number of the mixed signals is less than that of the source signals, the underdetermined blind source separation (BSS) is a significant difficult problem. Due to the fact that the great amount data of speech communications and real-time communication has been required, we utilize the intractability of the underdetermined BSS problem to present a dual key speech encryption method. The original speech is mixed with dual key signals which consist of random key signals (one-time pad) generated by secret seed and chaotic signals generated from chaotic system. In the decryption process, approximate calculation is used to recover the original speech signals. The proposed algorithm for speech signals encryption can resist traditional attacks against the encryption system, and owing to approximate calculation, decryption becomes faster and more accurate. It is demonstrated that the proposed method has high level of security and can recover the original signals quickly and efficiently yet maintaining excellent audio quality. PMID:24955430
An eye-tracking paradigm for analyzing the processing time of sentences with different linguistic complexities.

PubMed

Wendt, Dorothea; Brand, Thomas; Kollmeier, Birger

2014-01-01

An eye-tracking paradigm was developed for use in audiology in order to enable online analysis of the speech comprehension process. This paradigm should be useful in assessing impediments in speech processing. In this paradigm, two scenes, a target picture and a competitor picture, were presented simultaneously with an aurally presented sentence that corresponded to the target picture. At the same time, eye fixations were recorded using an eye-tracking device. The effect of linguistic complexity on language processing time was assessed from eye fixation information by systematically varying linguistic complexity. This was achieved with a sentence corpus containing seven German sentence structures. A novel data analysis method computed the average tendency to fixate the target picture as a function of time during sentence processing. This allowed identification of the point in time at which the participant understood the sentence, referred to as the decision moment. Systematic differences in processing time were observed as a function of linguistic complexity. These differences in processing time may be used to assess the efficiency of cognitive processes involved in resolving linguistic complexity. Thus, the proposed method enables a temporal analysis of the speech comprehension process and has potential applications in speech audiology and psychoacoustics.
An Eye-Tracking Paradigm for Analyzing the Processing Time of Sentences with Different Linguistic Complexities

PubMed Central

Wendt, Dorothea; Brand, Thomas; Kollmeier, Birger

2014-01-01

An eye-tracking paradigm was developed for use in audiology in order to enable online analysis of the speech comprehension process. This paradigm should be useful in assessing impediments in speech processing. In this paradigm, two scenes, a target picture and a competitor picture, were presented simultaneously with an aurally presented sentence that corresponded to the target picture. At the same time, eye fixations were recorded using an eye-tracking device. The effect of linguistic complexity on language processing time was assessed from eye fixation information by systematically varying linguistic complexity. This was achieved with a sentence corpus containing seven German sentence structures. A novel data analysis method computed the average tendency to fixate the target picture as a function of time during sentence processing. This allowed identification of the point in time at which the participant understood the sentence, referred to as the decision moment. Systematic differences in processing time were observed as a function of linguistic complexity. These differences in processing time may be used to assess the efficiency of cognitive processes involved in resolving linguistic complexity. Thus, the proposed method enables a temporal analysis of the speech comprehension process and has potential applications in speech audiology and psychoacoustics. PMID:24950184
Communication skills and thalamic lesion: Strategies of rehabilitation.

PubMed

Amaddii, Luisa; Centorrino, Santi; Cambi, Jacopo; Passali, Desiderio

2014-01-01

To describe the speech rehabilitation history of patients with thalamic lesions. Thalamic lesions can affect speech and language according to diverse thalamic nuclei involved. Because of the strategic functional position of the thalamus within the cognitive networks, its lesion can also interfere with other cognitive processes, such as attention, memory and executive functions. Alterations of these cognitive domains contribute significantly to language deficits, leading to communicative inefficacy. This fact must be considered in the rehabilitation efforts. Whereas evaluation of cognitive functions and communicative efficiency is different from that of aphasic disorder, treatment should also be different. The treatment must be focused on specific cognitive deficits with belief in the regaining of communicative ability, as well as it occurs in therapy of pragmatic disorder in traumatic brain injury: attention process training, mnemotechnics and prospective memory training. According to our experience: (a) there is a close correlation between cognitive processes and communication skills; (b) alterations of attention, memory and executive functions cause a loss of efficiency in the language use; and (c) appropriate cognitive treatment improves pragmatic competence and therefore the linguistic disorder. For planning a speech-therapy it is important to consider the relationship between cognitive functions and communication. The cognitive/behavioral treatment confirms its therapeutic efficiency for thalamic lesions. Copyright © 2014 Polish Otorhinolaryngology - Head and Neck Surgery Society. Published by Elsevier Urban & Partner Sp. z.o.o. All rights reserved.
Influence of Gestational Age and Postnatal Age on Speech Sound Processing in NICU infants

PubMed Central

Key, Alexandra P.F.; Lambert, E. Warren; Aschner, Judy L.; Maitre, Nathalie L.

2012-01-01

The study examined the effect of gestational (GA) and postnatal (PNA) age on speech sound perception in infants. Auditory ERPs were recorded in response to speech sounds (CV syllables) in 50 infant NICU patients (born at 24–40 weeks gestation) prior to discharge. Efficiency of speech perception was quantified as absolute difference in mean amplitudes of ERPs in response to vowel (/a/–/u/) and consonant (/b/–/g/, /d/–/g/) contrasts within 150–250, 250–400, 400–700 ms after stimulus onset. Results indicated that both GA and PNA affected speech sound processing. These effects were more pronounced for consonant than vowel contrasts. Increasing PNA was associated with greater sound discrimination in infants born at or after 30 weeks GA, while minimal PNA-related changes were observed for infants with GA less than 30 weeks. Our findings suggest that a certain level of brain maturity at birth is necessary to benefit from postnatal experience in the first 4 months of life, and both gestational and postnatal ages need to be considered when evaluating infant brain responses. PMID:22332725
A Deep Ensemble Learning Method for Monaural Speech Separation.

PubMed

Zhang, Xiao-Lei; Wang, DeLiang

2016-03-01

Monaural speech separation is a fundamental problem in robust speech processing. Recently, deep neural network (DNN)-based speech separation methods, which predict either clean speech or an ideal time-frequency mask, have demonstrated remarkable performance improvement. However, a single DNN with a given window length does not leverage contextual information sufficiently, and the differences between the two optimization objectives are not well understood. In this paper, we propose a deep ensemble method, named multicontext networks, to address monaural speech separation. The first multicontext network averages the outputs of multiple DNNs whose inputs employ different window lengths. The second multicontext network is a stack of multiple DNNs. Each DNN in a module of the stack takes the concatenation of original acoustic features and expansion of the soft output of the lower module as its input, and predicts the ratio mask of the target speaker; the DNNs in the same module employ different contexts. We have conducted extensive experiments with three speech corpora. The results demonstrate the effectiveness of the proposed method. We have also compared the two optimization objectives systematically and found that predicting the ideal time-frequency mask is more efficient in utilizing clean training speech, while predicting clean speech is less sensitive to SNR variations.
Application of a VLSI vector quantization processor to real-time speech coding

NASA Technical Reports Server (NTRS)

Davidson, G.; Gersho, A.

1986-01-01

Attention is given to a working vector quantization processor for speech coding that is based on a first-generation VLSI chip which efficiently performs the pattern-matching operation needed for the codebook search process (CPS). Using this chip, the CPS architecture has been successfully incorporated into a compact, single-board Vector PCM implementation operating at 7-18 kbits/sec. A real time Adaptive Vector Predictive Coder system using the CPS has also been implemented.
Sparse gammatone signal model optimized for English speech does not match the human auditory filters.

PubMed

Strahl, Stefan; Mertins, Alfred

2008-07-18

Evidence that neurosensory systems use sparse signal representations as well as improved performance of signal processing algorithms using sparse signal models raised interest in sparse signal coding in the last years. For natural audio signals like speech and environmental sounds, gammatone atoms have been derived as expansion functions that generate a nearly optimal sparse signal model (Smith, E., Lewicki, M., 2006. Efficient auditory coding. Nature 439, 978-982). Furthermore, gammatone functions are established models for the human auditory filters. Thus far, a practical application of a sparse gammatone signal model has been prevented by the fact that deriving the sparsest representation is, in general, computationally intractable. In this paper, we applied an accelerated version of the matching pursuit algorithm for gammatone dictionaries allowing real-time and large data set applications. We show that a sparse signal model in general has advantages in audio coding and that a sparse gammatone signal model encodes speech more efficiently in terms of sparseness than a sparse modified discrete cosine transform (MDCT) signal model. We also show that the optimal gammatone parameters derived for English speech do not match the human auditory filters, suggesting for signal processing applications to derive the parameters individually for each applied signal class instead of using psychometrically derived parameters. For brain research, it means that care should be taken with directly transferring findings of optimality for technical to biological systems.
Brain networks engaged in audiovisual integration during speech perception revealed by persistent homology-based network filtration.

PubMed

Kim, Heejung; Hahm, Jarang; Lee, Hyekyoung; Kang, Eunjoo; Kang, Hyejin; Lee, Dong Soo

2015-05-01

The human brain naturally integrates audiovisual information to improve speech perception. However, in noisy environments, understanding speech is difficult and may require much effort. Although the brain network is supposed to be engaged in speech perception, it is unclear how speech-related brain regions are connected during natural bimodal audiovisual or unimodal speech perception with counterpart irrelevant noise. To investigate the topological changes of speech-related brain networks at all possible thresholds, we used a persistent homological framework through hierarchical clustering, such as single linkage distance, to analyze the connected component of the functional network during speech perception using functional magnetic resonance imaging. For speech perception, bimodal (audio-visual speech cue) or unimodal speech cues with counterpart irrelevant noise (auditory white-noise or visual gum-chewing) were delivered to 15 subjects. In terms of positive relationship, similar connected components were observed in bimodal and unimodal speech conditions during filtration. However, during speech perception by congruent audiovisual stimuli, the tighter couplings of left anterior temporal gyrus-anterior insula component and right premotor-visual components were observed than auditory or visual speech cue conditions, respectively. Interestingly, visual speech is perceived under white noise by tight negative coupling in the left inferior frontal region-right anterior cingulate, left anterior insula, and bilateral visual regions, including right middle temporal gyrus, right fusiform components. In conclusion, the speech brain network is tightly positively or negatively connected, and can reflect efficient or effortful processes during natural audiovisual integration or lip-reading, respectively, in speech perception.
Reduced efficiency of audiovisual integration for nonnative speech.

PubMed

Yi, Han-Gyol; Phelps, Jasmine E B; Smiljanic, Rajka; Chandrasekaran, Bharath

2013-11-01

The role of visual cues in native listeners' perception of speech produced by nonnative speakers has not been extensively studied. Native perception of English sentences produced by native English and Korean speakers in audio-only and audiovisual conditions was examined. Korean speakers were rated as more accented in audiovisual than in the audio-only condition. Visual cues enhanced word intelligibility for native English speech but less so for Korean-accented speech. Reduced intelligibility of Korean-accented audiovisual speech was associated with implicit visual biases, suggesting that listener-related factors partially influence the efficiency of audiovisual integration for nonnative speech perception.
Speech Acquisition and Automatic Speech Recognition for Integrated Spacesuit Audio Systems

NASA Technical Reports Server (NTRS)

Huang, Yiteng; Chen, Jingdong; Chen, Shaoyan

2010-01-01

A voice-command human-machine interface system has been developed for spacesuit extravehicular activity (EVA) missions. A multichannel acoustic signal processing method has been created for distant speech acquisition in noisy and reverberant environments. This technology reduces noise by exploiting differences in the statistical nature of signal (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, the automatic speech recognition (ASR) accuracy can be improved to the level at which crewmembers would find the speech interface useful. The developed speech human/machine interface will enable both crewmember usability and operational efficiency. It can enjoy a fast rate of data/text entry, small overall size, and can be lightweight. In addition, this design will free the hands and eyes of a suited crewmember. The system components and steps include beam forming/multi-channel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, model adaption, ASR HMM (Hidden Markov Model) training, and ASR decoding. A state-of-the-art phoneme recognizer can obtain an accuracy rate of 65 percent when the training and testing data are free of noise. When it is used in spacesuits, the rate drops to about 33 percent. With the developed microphone array speech-processing technologies, the performance is improved and the phoneme recognition accuracy rate rises to 44 percent. The recognizer can be further improved by combining the microphone array and HMM model adaptation techniques and using speech samples collected from inside spacesuits. In addition, arithmetic complexity models for the major HMMbased ASR components were developed. They can help real-time ASR system designers select proper tasks when in the face of constraints in computational resources.
A Comparison of LBG and ADPCM Speech Compression Techniques

NASA Astrophysics Data System (ADS)

Bachu, Rajesh G.; Patel, Jignasa; Barkana, Buket D.

Speech compression is the technology of converting human speech into an efficiently encoded representation that can later be decoded to produce a close approximation of the original signal. In all speech there is a degree of predictability and speech coding techniques exploit this to reduce bit rates yet still maintain a suitable level of quality. This paper is a study and implementation of Linde-Buzo-Gray Algorithm (LBG) and Adaptive Differential Pulse Code Modulation (ADPCM) algorithms to compress speech signals. In here we implemented the methods using MATLAB 7.0. The methods we used in this study gave good results and performance in compressing the speech and listening tests showed that efficient and high quality coding is achieved.
The Speech multi features fusion perceptual hash algorithm based on tensor decomposition

NASA Astrophysics Data System (ADS)

Huang, Y. B.; Fan, M. H.; Zhang, Q. Y.

2018-03-01

With constant progress in modern speech communication technologies, the speech data is prone to be attacked by the noise or maliciously tampered. In order to make the speech perception hash algorithm has strong robustness and high efficiency, this paper put forward a speech perception hash algorithm based on the tensor decomposition and multi features is proposed. This algorithm analyses the speech perception feature acquires each speech component wavelet packet decomposition. LPCC, LSP and ISP feature of each speech component are extracted to constitute the speech feature tensor. Speech authentication is done by generating the hash values through feature matrix quantification which use mid-value. Experimental results showing that the proposed algorithm is robust for content to maintain operations compared with similar algorithms. It is able to resist the attack of the common background noise. Also, the algorithm is highly efficiency in terms of arithmetic, and is able to meet the real-time requirements of speech communication and complete the speech authentication quickly.
Applications of Hilbert Spectral Analysis for Speech and Sound Signals

NASA Technical Reports Server (NTRS)

Huang, Norden E.

2003-01-01

A new method for analyzing nonlinear and nonstationary data has been developed, and the natural applications are to speech and sound signals. The key part of the method is the Empirical Mode Decomposition method with which any complicated data set can be decomposed into a finite and often small number of Intrinsic Mode Functions (IMF). An IMF is defined as any function having the same numbers of zero-crossing and extrema, and also having symmetric envelopes defined by the local maxima and minima respectively. The IMF also admits well-behaved Hilbert transform. This decomposition method is adaptive, and, therefore, highly efficient. Since the decomposition is based on the local characteristic time scale of the data, it is applicable to nonlinear and nonstationary processes. With the Hilbert transform, the Intrinsic Mode Functions yield instantaneous frequencies as functions of time, which give sharp identifications of imbedded structures. This method invention can be used to process all acoustic signals. Specifically, it can process the speech signals for Speech synthesis, Speaker identification and verification, Speech recognition, and Sound signal enhancement and filtering. Additionally, as the acoustical signals from machinery are essentially the way the machines are talking to us. Therefore, the acoustical signals, from the machines, either from sound through air or vibration on the machines, can tell us the operating conditions of the machines. Thus, we can use the acoustic signal to diagnosis the problems of machines.
Perceptual Learning of Interrupted Speech

PubMed Central

Benard, Michel Ruben; Başkent, Deniz

2013-01-01

The intelligibility of periodically interrupted speech improves once the silent gaps are filled with noise bursts. This improvement has been attributed to phonemic restoration, a top-down repair mechanism that helps intelligibility of degraded speech in daily life. Two hypotheses were investigated using perceptual learning of interrupted speech. If different cognitive processes played a role in restoring interrupted speech with and without filler noise, the two forms of speech would be learned at different rates and with different perceived mental effort. If the restoration benefit were an artificial outcome of using the ecologically invalid stimulus of speech with silent gaps, this benefit would diminish with training. Two groups of normal-hearing listeners were trained, one with interrupted sentences with the filler noise, and the other without. Feedback was provided with the auditory playback of the unprocessed and processed sentences, as well as the visual display of the sentence text. Training increased the overall performance significantly, however restoration benefit did not diminish. The increase in intelligibility and the decrease in perceived mental effort were relatively similar between the groups, implying similar cognitive mechanisms for the restoration of the two types of interruptions. Training effects were generalizable, as both groups improved their performance also with the other form of speech than that they were trained with, and retainable. Due to null results and relatively small number of participants (10 per group), further research is needed to more confidently draw conclusions. Nevertheless, training with interrupted speech seems to be effective, stimulating participants to more actively and efficiently use the top-down restoration. This finding further implies the potential of this training approach as a rehabilitative tool for hearing-impaired/elderly populations. PMID:23469266
Spoken word recognition by Latino children learning Spanish as their first language*

PubMed Central

HURTADO, NEREYDA; MARCHMAN, VIRGINIA A.; FERNALD, ANNE

2010-01-01

Research on the development of efficiency in spoken language understanding has focused largely on middle-class children learning English. Here we extend this research to Spanish-learning children (n=49; M=2;0; range=1;3–3;1) living in the USA in Latino families from primarily low socioeconomic backgrounds. Children looked at pictures of familiar objects while listening to speech naming one of the objects. Analyses of eye movements revealed developmental increases in the efficiency of speech processing. Older children and children with larger vocabularies were more efficient at processing spoken language as it unfolds in real time, as previously documented with English learners. Children whose mothers had less education tended to be slower and less accurate than children of comparable age and vocabulary size whose mothers had more schooling, consistent with previous findings of slower rates of language learning in children from disadvantaged backgrounds. These results add to the cross-linguistic literature on the development of spoken word recognition and to the study of the impact of socioeconomic status (SES) factors on early language development. PMID:17542157
On the recognition of emotional vocal expressions: motivations for a holistic approach.

PubMed

Esposito, Anna; Esposito, Antonietta M

2012-10-01

Human beings seem to be able to recognize emotions from speech very well and information communication technology aims to implement machines and agents that can do the same. However, to be able to automatically recognize affective states from speech signals, it is necessary to solve two main technological problems. The former concerns the identification of effective and efficient processing algorithms capable of capturing emotional acoustic features from speech sentences. The latter focuses on finding computational models able to classify, with an approximation as good as human listeners, a given set of emotional states. This paper will survey these topics and provide some insights for a holistic approach to the automatic analysis, recognition and synthesis of affective states.

Information processing using a single dynamical node as complex system

PubMed Central

Appeltant, L.; Soriano, M.C.; Van der Sande, G.; Danckaert, J.; Massar, S.; Dambre, J.; Schrauwen, B.; Mirasso, C.R.; Fischer, I.

2011-01-01

Novel methods for information processing are highly desired in our information-driven society. Inspired by the brain's ability to process information, the recently introduced paradigm known as 'reservoir computing' shows that complex networks can efficiently perform computation. Here we introduce a novel architecture that reduces the usually required large number of elements to a single nonlinear node with delayed feedback. Through an electronic implementation, we experimentally and numerically demonstrate excellent performance in a speech recognition benchmark. Complementary numerical studies also show excellent performance for a time series prediction benchmark. These results prove that delay-dynamical systems, even in their simplest manifestation, can perform efficient information processing. This finding paves the way to feasible and resource-efficient technological implementations of reservoir computing. PMID:21915110
Impaired auditory temporal selectivity in the inferior colliculus of aged Mongolian gerbils.

PubMed

Khouri, Leila; Lesica, Nicholas A; Grothe, Benedikt

2011-07-06

Aged humans show severe difficulties in temporal auditory processing tasks (e.g., speech recognition in noise, low-frequency sound localization, gap detection). A degradation of auditory function with age is also evident in experimental animals. To investigate age-related changes in temporal processing, we compared extracellular responses to temporally variable pulse trains and human speech in the inferior colliculus of young adult (3 month) and aged (3 years) Mongolian gerbils. We observed a significant decrease of selectivity to the pulse trains in neuronal responses from aged animals. This decrease in selectivity led, on the population level, to an increase in signal correlations and therefore a decrease in heterogeneity of temporal receptive fields and a decreased efficiency in encoding of speech signals. A decrease in selectivity to temporal modulations is consistent with a downregulation of the inhibitory transmitter system in aged animals. These alterations in temporal processing could underlie declines in the aging auditory system, which are unrelated to peripheral hearing loss. These declines cannot be compensated by traditional hearing aids (that rely on amplification of sound) but may rather require pharmacological treatment.
A magnetic resonance imaging study on the articulatory and acoustic speech parameters of Malay vowels

PubMed Central

2014-01-01

The phonetic properties of six Malay vowels are investigated using magnetic resonance imaging (MRI) to visualize the vocal tract in order to obtain dynamic articulatory parameters during speech production. To resolve image blurring due to the tongue movement during the scanning process, a method based on active contour extraction is used to track tongue contours. The proposed method efficiently tracks tongue contours despite the partial blurring of MRI images. Consequently, the articulatory parameters that are effectively measured as tongue movement is observed, and the specific shape of the tongue and its position for all six uttered Malay vowels are determined. Speech rehabilitation procedure demands some kind of visual perceivable prototype of speech articulation. To investigate the validity of the measured articulatory parameters based on acoustic theory of speech production, an acoustic analysis based on the uttered vowels by subjects has been performed. As the acoustic speech and articulatory parameters of uttered speech were examined, a correlation between formant frequencies and articulatory parameters was observed. The experiments reported a positive correlation between the constriction location of the tongue body and the first formant frequency, as well as a negative correlation between the constriction location of the tongue tip and the second formant frequency. The results demonstrate that the proposed method is an effective tool for the dynamic study of speech production. PMID:25060583
A magnetic resonance imaging study on the articulatory and acoustic speech parameters of Malay vowels.

PubMed

Zourmand, Alireza; Mirhassani, Seyed Mostafa; Ting, Hua-Nong; Bux, Shaik Ismail; Ng, Kwan Hoong; Bilgen, Mehmet; Jalaludin, Mohd Amin

2014-07-25

The phonetic properties of six Malay vowels are investigated using magnetic resonance imaging (MRI) to visualize the vocal tract in order to obtain dynamic articulatory parameters during speech production. To resolve image blurring due to the tongue movement during the scanning process, a method based on active contour extraction is used to track tongue contours. The proposed method efficiently tracks tongue contours despite the partial blurring of MRI images. Consequently, the articulatory parameters that are effectively measured as tongue movement is observed, and the specific shape of the tongue and its position for all six uttered Malay vowels are determined.Speech rehabilitation procedure demands some kind of visual perceivable prototype of speech articulation. To investigate the validity of the measured articulatory parameters based on acoustic theory of speech production, an acoustic analysis based on the uttered vowels by subjects has been performed. As the acoustic speech and articulatory parameters of uttered speech were examined, a correlation between formant frequencies and articulatory parameters was observed. The experiments reported a positive correlation between the constriction location of the tongue body and the first formant frequency, as well as a negative correlation between the constriction location of the tongue tip and the second formant frequency. The results demonstrate that the proposed method is an effective tool for the dynamic study of speech production.
Facial Speech Gestures: The Relation between Visual Speech Processing, Phonological Awareness, and Developmental Dyslexia in 10-Year-Olds

ERIC Educational Resources Information Center

Schaadt, Gesa; Männel, Claudia; van der Meer, Elke; Pannekamp, Ann; Friederici, Angela D.

2016-01-01

Successful communication in everyday life crucially involves the processing of auditory and visual components of speech. Viewing our interlocutor and processing visual components of speech facilitates speech processing by triggering auditory processing. Auditory phoneme processing, analyzed by event-related brain potentials (ERP), has been shown…
Speech processing using maximum likelihood continuity mapping

DOEpatents

Hogden, John E.

2000-01-01

Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.
Speech processing using maximum likelihood continuity mapping

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hogden, J.E.

Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.
Short-Term and Working Memory Impairments in Early-Implanted, Long-Term Cochlear Implant Users Are Independent of Audibility and Speech Production

PubMed Central

AuBuchon, Angela M.; Pisoni, David B.; Kronenberger, William G.

2015-01-01

OBJECTIVES Determine if early-implanted, long-term cochlear implant (CI) users display delays in verbal short-term and working memory capacity when processes related to audibility and speech production are eliminated. DESIGN Twenty-three long-term CI users and 23 normal-hearing controls each completed forward and backward digit span tasks under testing conditions which differed in presentation modality (auditory or visual) and response output (spoken recall or manual pointing). RESULTS Normal-hearing controls reproduced more lists of digits than the CI users, even when the test items were presented visually and the responses were made manually via touchscreen response. CONCLUSIONS Short-term and working memory delays observed in CI users are not due to greater demands from peripheral sensory processes such as audibility or from overt speech-motor planning and response output organization. Instead, CI users are less efficient at encoding and maintaining phonological representations in verbal short-term memory utilizing phonological and linguistic strategies during memory tasks. PMID:26496666
Short-Term and Working Memory Impairments in Early-Implanted, Long-Term Cochlear Implant Users Are Independent of Audibility and Speech Production.

PubMed

AuBuchon, Angela M; Pisoni, David B; Kronenberger, William G

2015-01-01

To determine whether early-implanted, long-term cochlear implant (CI) users display delays in verbal short-term and working memory capacity when processes related to audibility and speech production are eliminated. Twenty-three long-term CI users and 23 normal-hearing controls each completed forward and backward digit span tasks under testing conditions that differed in presentation modality (auditory or visual) and response output (spoken recall or manual pointing). Normal-hearing controls reproduced more lists of digits than the CI users, even when the test items were presented visually and the responses were made manually via touchscreen response. Short-term and working memory delays observed in CI users are not due to greater demands from peripheral sensory processes such as audibility or from overt speech-motor planning and response output organization. Instead, CI users are less efficient at encoding and maintaining phonological representations in verbal short-term memory using phonological and linguistic strategies during memory tasks.
The Impact of Early Bilingualism on Face Recognition Processes.

PubMed

Kandel, Sonia; Burfin, Sabine; Méary, David; Ruiz-Tada, Elisa; Costa, Albert; Pascalis, Olivier

2016-01-01

Early linguistic experience has an impact on the way we decode audiovisual speech in face-to-face communication. The present study examined whether differences in visual speech decoding could be linked to a broader difference in face processing. To identify a phoneme we have to do an analysis of the speaker's face to focus on the relevant cues for speech decoding (e.g., locating the mouth with respect to the eyes). Face recognition processes were investigated through two classic effects in face recognition studies: the Other-Race Effect (ORE) and the Inversion Effect. Bilingual and monolingual participants did a face recognition task with Caucasian faces (own race), Chinese faces (other race), and cars that were presented in an Upright or Inverted position. The results revealed that monolinguals exhibited the classic ORE. Bilinguals did not. Overall, bilinguals were slower than monolinguals. These results suggest that bilinguals' face processing abilities differ from monolinguals'. Early exposure to more than one language may lead to a perceptual organization that goes beyond language processing and could extend to face analysis. We hypothesize that these differences could be due to the fact that bilinguals focus on different parts of the face than monolinguals, making them more efficient in other race face processing but slower. However, more studies using eye-tracking techniques are necessary to confirm this explanation.
Age-Related Differences in Lexical Access Relate to Speech Recognition in Noise

PubMed Central

Carroll, Rebecca; Warzybok, Anna; Kollmeier, Birger; Ruigendijk, Esther

2016-01-01

Vocabulary size has been suggested as a useful measure of “verbal abilities” that correlates with speech recognition scores. Knowing more words is linked to better speech recognition. How vocabulary knowledge translates to general speech recognition mechanisms, how these mechanisms relate to offline speech recognition scores, and how they may be modulated by acoustical distortion or age, is less clear. Age-related differences in linguistic measures may predict age-related differences in speech recognition in noise performance. We hypothesized that speech recognition performance can be predicted by the efficiency of lexical access, which refers to the speed with which a given word can be searched and accessed relative to the size of the mental lexicon. We tested speech recognition in a clinical German sentence-in-noise test at two signal-to-noise ratios (SNRs), in 22 younger (18–35 years) and 22 older (60–78 years) listeners with normal hearing. We also assessed receptive vocabulary, lexical access time, verbal working memory, and hearing thresholds as measures of individual differences. Age group, SNR level, vocabulary size, and lexical access time were significant predictors of individual speech recognition scores, but working memory and hearing threshold were not. Interestingly, longer accessing times were correlated with better speech recognition scores. Hierarchical regression models for each subset of age group and SNR showed very similar patterns: the combination of vocabulary size and lexical access time contributed most to speech recognition performance; only for the younger group at the better SNR (yielding about 85% correct speech recognition) did vocabulary size alone predict performance. Our data suggest that successful speech recognition in noise is mainly modulated by the efficiency of lexical access. This suggests that older adults’ poorer performance in the speech recognition task may have arisen from reduced efficiency in lexical access; with an average vocabulary size similar to that of younger adults, they were still slower in lexical access. PMID:27458400
Age-Related Differences in Lexical Access Relate to Speech Recognition in Noise.

PubMed

Carroll, Rebecca; Warzybok, Anna; Kollmeier, Birger; Ruigendijk, Esther

2016-01-01

Vocabulary size has been suggested as a useful measure of "verbal abilities" that correlates with speech recognition scores. Knowing more words is linked to better speech recognition. How vocabulary knowledge translates to general speech recognition mechanisms, how these mechanisms relate to offline speech recognition scores, and how they may be modulated by acoustical distortion or age, is less clear. Age-related differences in linguistic measures may predict age-related differences in speech recognition in noise performance. We hypothesized that speech recognition performance can be predicted by the efficiency of lexical access, which refers to the speed with which a given word can be searched and accessed relative to the size of the mental lexicon. We tested speech recognition in a clinical German sentence-in-noise test at two signal-to-noise ratios (SNRs), in 22 younger (18-35 years) and 22 older (60-78 years) listeners with normal hearing. We also assessed receptive vocabulary, lexical access time, verbal working memory, and hearing thresholds as measures of individual differences. Age group, SNR level, vocabulary size, and lexical access time were significant predictors of individual speech recognition scores, but working memory and hearing threshold were not. Interestingly, longer accessing times were correlated with better speech recognition scores. Hierarchical regression models for each subset of age group and SNR showed very similar patterns: the combination of vocabulary size and lexical access time contributed most to speech recognition performance; only for the younger group at the better SNR (yielding about 85% correct speech recognition) did vocabulary size alone predict performance. Our data suggest that successful speech recognition in noise is mainly modulated by the efficiency of lexical access. This suggests that older adults' poorer performance in the speech recognition task may have arisen from reduced efficiency in lexical access; with an average vocabulary size similar to that of younger adults, they were still slower in lexical access.
Rapid Release From Listening Effort Resulting From Semantic Context, and Effects of Spectral Degradation and Cochlear Implants

PubMed Central

2016-01-01

People with hearing impairment are thought to rely heavily on context to compensate for reduced audibility. Here, we explore the resulting cost of this compensatory behavior, in terms of effort and the efficiency of ongoing predictive language processing. The listening task featured predictable or unpredictable sentences, and participants included people with cochlear implants as well as people with normal hearing who heard full-spectrum/unprocessed or vocoded speech. The crucial metric was the growth of the pupillary response and the reduction of this response for predictable versus unpredictable sentences, which would suggest reduced cognitive load resulting from predictive processing. Semantic context led to rapid reduction of listening effort for people with normal hearing; the reductions were observed well before the offset of the stimuli. Effort reduction was slightly delayed for people with cochlear implants and considerably more delayed for normal-hearing listeners exposed to spectrally degraded noise-vocoded signals; this pattern of results was maintained even when intelligibility was perfect. Results suggest that speed of sentence processing can still be disrupted, and exertion of effort can be elevated, even when intelligibility remains high. We discuss implications for experimental and clinical assessment of speech recognition, in which good performance can arise because of cognitive processes that occur after a stimulus, during a period of silence. Because silent gaps are not common in continuous flowing speech, the cognitive/linguistic restorative processes observed after sentences in such studies might not be available to listeners in everyday conversations, meaning that speech recognition in conventional tests might overestimate sentence-processing capability. PMID:27698260
Audiovisual speech perception development at varying levels of perceptual processing

PubMed Central

Lalonde, Kaylah; Holt, Rachael Frush

2016-01-01

This study used the auditory evaluation framework [Erber (1982). Auditory Training (Alexander Graham Bell Association, Washington, DC)] to characterize the influence of visual speech on audiovisual (AV) speech perception in adults and children at multiple levels of perceptual processing. Six- to eight-year-old children and adults completed auditory and AV speech perception tasks at three levels of perceptual processing (detection, discrimination, and recognition). The tasks differed in the level of perceptual processing required to complete them. Adults and children demonstrated visual speech influence at all levels of perceptual processing. Whereas children demonstrated the same visual speech influence at each level of perceptual processing, adults demonstrated greater visual speech influence on tasks requiring higher levels of perceptual processing. These results support previous research demonstrating multiple mechanisms of AV speech processing (general perceptual and speech-specific mechanisms) with independent maturational time courses. The results suggest that adults rely on both general perceptual mechanisms that apply to all levels of perceptual processing and speech-specific mechanisms that apply when making phonetic decisions and/or accessing the lexicon. Six- to eight-year-old children seem to rely only on general perceptual mechanisms across levels. As expected, developmental differences in AV benefit on this and other recognition tasks likely reflect immature speech-specific mechanisms and phonetic processing in children. PMID:27106318
Audiovisual speech perception development at varying levels of perceptual processing.

PubMed

Lalonde, Kaylah; Holt, Rachael Frush

2016-04-01

This study used the auditory evaluation framework [Erber (1982). Auditory Training (Alexander Graham Bell Association, Washington, DC)] to characterize the influence of visual speech on audiovisual (AV) speech perception in adults and children at multiple levels of perceptual processing. Six- to eight-year-old children and adults completed auditory and AV speech perception tasks at three levels of perceptual processing (detection, discrimination, and recognition). The tasks differed in the level of perceptual processing required to complete them. Adults and children demonstrated visual speech influence at all levels of perceptual processing. Whereas children demonstrated the same visual speech influence at each level of perceptual processing, adults demonstrated greater visual speech influence on tasks requiring higher levels of perceptual processing. These results support previous research demonstrating multiple mechanisms of AV speech processing (general perceptual and speech-specific mechanisms) with independent maturational time courses. The results suggest that adults rely on both general perceptual mechanisms that apply to all levels of perceptual processing and speech-specific mechanisms that apply when making phonetic decisions and/or accessing the lexicon. Six- to eight-year-old children seem to rely only on general perceptual mechanisms across levels. As expected, developmental differences in AV benefit on this and other recognition tasks likely reflect immature speech-specific mechanisms and phonetic processing in children.
Measures of digit span and verbal rehearsal speed in deaf children after more than 10 years of cochlear implantation.

PubMed

Pisoni, David B; Kronenberger, William G; Roman, Adrienne S; Geers, Ann E

2011-02-01

Conventional assessments of outcomes in deaf children with cochlear implants (CIs) have focused primarily on endpoint or product measures of speech and language. Little attention has been devoted to understanding the basic underlying core neurocognitive factors involved in the development and processing of speech and language. In this study, we examined the development of factors related to the quality of phonological information in immediate verbal memory, including immediate memory capacity and verbal rehearsal speed, in a sample of deaf children after >10 yrs of CI use and assessed the correlations between these two process measures and a set of speech and language outcomes. Of an initial sample of 180 prelingually deaf children with CIs assessed at ages 8 to 9 yrs after 3 to 7 yrs of CI use, 112 returned for testing again in adolescence after 10 more years of CI experience. In addition to completing a battery of conventional speech and language outcome measures, subjects were administered the Wechsler Intelligence Scale for Children-III Digit Span subtest to measure immediate verbal memory capacity. Sentence durations obtained from the McGarr speech intelligibility test were used as a measure of verbal rehearsal speed. Relative to norms for normal-hearing children, Digit Span scores were well below average for children with CIs at both elementary and high school ages. Improvement was observed over the 8-yr period in the mean longest digit span forward score but not in the mean longest digit span backward score. Longest digit span forward scores at ages 8 to 9 yrs were significantly correlated with all speech and language outcomes in adolescence, but backward digit spans correlated significantly only with measures of higher-order language functioning over that time period. While verbal rehearsal speed increased for almost all subjects between elementary grades and high school, it was still slower than the rehearsal speed obtained from a control group of normal-hearing adolescents. Verbal rehearsal speed at ages 8 to 9 yrs was also found to be strongly correlated with speech and language outcomes and Digit Span scores in adolescence. Despite improvement after 8 additional years of CI use, measures of immediate verbal memory capacity and verbal rehearsal speed, which reflect core fundamental information processing skills associated with representational efficiency and information processing capacity, continue to be delayed in children with CIs relative to NH peers. Furthermore, immediate verbal memory capacity and verbal rehearsal speed at 8 to 9 yrs of age were both found to predict speech and language outcomes in adolescence, demonstrating the important contribution of these processing measures for speech-language development in children with CIs. Understanding the relations between these core underlying processes and speech-language outcomes in children with CIs may help researchers to develop new approaches to intervention and treatment of deaf children who perform poorly with their CIs. Moreover, this knowledge could be used for early identification of deaf children who may be at high risk for poor speech and language outcomes after cochlear implantation as well as for the development of novel targeted interventions that focus selectively on these core elementary information processing variables.
Voice-processing technologies--their application in telecommunications.

PubMed Central

Wilpon, J G

1995-01-01

As the telecommunications industry evolves over the next decade to provide the products and services that people will desire, several key technologies will become commonplace. Two of these, automatic speech recognition and text-to-speech synthesis, will provide users with more freedom on when, where, and how they access information. While these technologies are currently in their infancy, their capabilities are rapidly increasing and their deployment in today's telephone network is expanding. The economic impact of just one application, the automation of operator services, is well over $100 million per year. Yet there still are many technical challenges that must be resolved before these technologies can be deployed ubiquitously in products and services throughout the worldwide telephone network. These challenges include: (i) High level of accuracy. The technology must be perceived by the user as highly accurate, robust, and reliable. (ii) Easy to use. Speech is only one of several possible input/output modalities for conveying information between a human and a machine, much like a computer terminal or Touch-Tone pad on a telephone. It is not the final product. Therefore, speech technologies must be hidden from the user. That is, the burden of using the technology must be on the technology itself. (iii) Quick prototyping and development of new products and services. The technology must support the creation of new products and services based on speech in an efficient and timely fashion. In this paper I present a vision of the voice-processing industry with a focus on the areas with the broadest base of user penetration: speech recognition, text-to-speech synthesis, natural language processing, and speaker recognition technologies. The current and future applications of these technologies in the telecommunications industry will be examined in terms of their strengths, limitations, and the degree to which user needs have been or have yet to be met. Although noteworthy gains have been made in areas with potentially small user bases and in the more mature speech-coding technologies, these subjects are outside the scope of this paper. Images Fig. 1 PMID:7479815
Evolution of crossmodal reorganization of the voice area in cochlear-implanted deaf patients.

PubMed

Rouger, Julien; Lagleyre, Sébastien; Démonet, Jean-François; Fraysse, Bernard; Deguine, Olivier; Barone, Pascal

2012-08-01

Psychophysical and neuroimaging studies in both animal and human subjects have clearly demonstrated that cortical plasticity following sensory deprivation leads to a brain functional reorganization that favors the spared modalities. In postlingually deaf patients, the use of a cochlear implant (CI) allows a recovery of the auditory function, which will probably counteract the cortical crossmodal reorganization induced by hearing loss. To study the dynamics of such reversed crossmodal plasticity, we designed a longitudinal neuroimaging study involving the follow-up of 10 postlingually deaf adult CI users engaged in a visual speechreading task. While speechreading activates Broca's area in normally hearing subjects (NHS), the activity level elicited in this region in CI patients is abnormally low and increases progressively with post-implantation time. Furthermore, speechreading in CI patients induces abnormal crossmodal activations in right anterior regions of the superior temporal cortex normally devoted to processing human voice stimuli (temporal voice-sensitive areas-TVA). These abnormal activity levels diminish with post-implantation time and tend towards the levels observed in NHS. First, our study revealed that the neuroplasticity after cochlear implantation involves not only auditory but also visual and audiovisual speech processing networks. Second, our results suggest that during deafness, the functional links between cortical regions specialized in face and voice processing are reallocated to support speech-related visual processing through cross-modal reorganization. Such reorganization allows a more efficient audiovisual integration of speech after cochlear implantation. These compensatory sensory strategies are later completed by the progressive restoration of the visuo-audio-motor speech processing loop, including Broca's area. Copyright © 2011 Wiley Periodicals, Inc.
Effects of musical expertise on oscillatory brain activity in response to emotional sounds.

PubMed

Nolden, Sophie; Rigoulot, Simon; Jolicoeur, Pierre; Armony, Jorge L

2017-08-01

Emotions can be conveyed through a variety of channels in the auditory domain, be it via music, non-linguistic vocalizations, or speech prosody. Moreover, recent studies suggest that expertise in one sound category can impact the processing of emotional sounds in other sound categories as they found that musicians process more efficiently emotional musical and vocal sounds than non-musicians. However, the neural correlates of these modulations, especially their time course, are not very well understood. Consequently, we focused here on how the neural processing of emotional information varies as a function of sound category and expertise of participants. Electroencephalogram (EEG) of 20 non-musicians and 17 musicians was recorded while they listened to vocal (speech and vocalizations) and musical sounds. The amplitude of EEG-oscillatory activity in the theta, alpha, beta, and gamma band was quantified and Independent Component Analysis (ICA) was used to identify underlying components of brain activity in each band. Category differences were found in theta and alpha bands, due to larger responses to music and speech than to vocalizations, and in posterior beta, mainly due to differential processing of speech. In addition, we observed greater activation in frontal theta and alpha for musicians than for non-musicians, as well as an interaction between expertise and emotional content of sounds in frontal alpha. The results reflect musicians' expertise in recognition of emotion-conveying music, which seems to also generalize to emotional expressions conveyed by the human voice, in line with previous accounts of effects of expertise on musical and vocal sounds processing. Copyright © 2017 Elsevier Ltd. All rights reserved.
Syllable Structure in Dysfunctional Portuguese Children's Speech

ERIC Educational Resources Information Center

Candeias, Sara; Perdigao, Fernando

2010-01-01

The goal of this work is to investigate whether children with speech dysfunctions (SD) show a deficit in planning some Portuguese syllable structures (PSS) in continuous speech production. Knowledge of which aspects of speech production are affected by SD is necessary for efficient improvement in the therapy techniques. The case-study is focused…

Impairment of speech production predicted by lesion load of the left arcuate fasciculus.

PubMed

Marchina, Sarah; Zhu, Lin L; Norton, Andrea; Zipse, Lauryn; Wan, Catherine Y; Schlaug, Gottfried

2011-08-01

Previous studies have suggested that patients' potential for poststroke language recovery is related to lesion size; however, lesion location may also be of importance, particularly when fiber tracts that are critical to the sensorimotor mapping of sounds for articulation (eg, the arcuate fasciculus) have been damaged. In this study, we tested the hypothesis that lesion loads of the arcuate fasciculus (ie, volume of arcuate fasciculus that is affected by a patient's lesion) and of 2 other tracts involved in language processing (the extreme capsule and the uncinate fasciculus) are inversely related to the severity of speech production impairments in patients with stroke with aphasia. Thirty patients with chronic stroke with residual impairments in speech production underwent high-resolution anatomic MRI and a battery of cognitive and language tests. Impairment was assessed using 3 functional measures of spontaneous speech (eg, rate, informativeness, and overall efficiency) as well as naming ability. To quantitatively analyze the relationship between impairment scores and lesion load along the 3 fiber tracts, we calculated tract-lesion overlap volumes for each patient using probabilistic maps of the tracts derived from diffusion tensor images of 10 age-matched healthy subjects. Regression analyses showed that arcuate fasciculus lesion load, but not extreme capsule or uncinate fasciculus lesion load or overall lesion size, significantly predicted rate, informativeness, and overall efficiency of speech as well as naming ability. A new variable, arcuate fasciculus lesion load, complements established voxel-based lesion mapping techniques and, in the future, may potentially be used to estimate impairment and recovery potential after stroke and refine inclusion criteria for experimental rehabilitation programs.
Hidden Markov models in automatic speech recognition

NASA Astrophysics Data System (ADS)

Wrzoskowicz, Adam

1993-11-01

This article describes a method for constructing an automatic speech recognition system based on hidden Markov models (HMMs). The author discusses the basic concepts of HMM theory and the application of these models to the analysis and recognition of speech signals. The author provides algorithms which make it possible to train the ASR system and recognize signals on the basis of distinct stochastic models of selected speech sound classes. The author describes the specific components of the system and the procedures used to model and recognize speech. The author discusses problems associated with the choice of optimal signal detection and parameterization characteristics and their effect on the performance of the system. The author presents different options for the choice of speech signal segments and their consequences for the ASR process. The author gives special attention to the use of lexical, syntactic, and semantic information for the purpose of improving the quality and efficiency of the system. The author also describes an ASR system developed by the Speech Acoustics Laboratory of the IBPT PAS. The author discusses the results of experiments on the effect of noise on the performance of the ASR system and describes methods of constructing HMM's designed to operate in a noisy environment. The author also describes a language for human-robot communications which was defined as a complex multilevel network from an HMM model of speech sounds geared towards Polish inflections. The author also added mandatory lexical and syntactic rules to the system for its communications vocabulary.
Neural dynamics of audiovisual speech integration under variable listening conditions: an individual participant analysis

PubMed Central

Altieri, Nicholas; Wenger, Michael J.

2013-01-01

Speech perception engages both auditory and visual modalities. Limitations of traditional accuracy-only approaches in the investigation of audiovisual speech perception have motivated the use of new methodologies. In an audiovisual speech identification task, we utilized capacity (Townsend and Nozawa, 1995), a dynamic measure of efficiency, to quantify audiovisual integration. Capacity was used to compare RT distributions from audiovisual trials to RT distributions from auditory-only and visual-only trials across three listening conditions: clear auditory signal, S/N ratio of −12 dB, and S/N ratio of −18 dB. The purpose was to obtain EEG recordings in conjunction with capacity to investigate how a late ERP co-varies with integration efficiency. Results showed efficient audiovisual integration for low auditory S/N ratios, but inefficient audiovisual integration when the auditory signal was clear. The ERP analyses showed evidence for greater audiovisual amplitude compared to the unisensory signals for lower auditory S/N ratios (higher capacity/efficiency) compared to the high S/N ratio (low capacity/inefficient integration). The data are consistent with an interactive framework of integration, where auditory recognition is influenced by speech-reading as a function of signal clarity. PMID:24058358
Neural dynamics of audiovisual speech integration under variable listening conditions: an individual participant analysis.

PubMed

Altieri, Nicholas; Wenger, Michael J

2013-01-01

Speech perception engages both auditory and visual modalities. Limitations of traditional accuracy-only approaches in the investigation of audiovisual speech perception have motivated the use of new methodologies. In an audiovisual speech identification task, we utilized capacity (Townsend and Nozawa, 1995), a dynamic measure of efficiency, to quantify audiovisual integration. Capacity was used to compare RT distributions from audiovisual trials to RT distributions from auditory-only and visual-only trials across three listening conditions: clear auditory signal, S/N ratio of -12 dB, and S/N ratio of -18 dB. The purpose was to obtain EEG recordings in conjunction with capacity to investigate how a late ERP co-varies with integration efficiency. Results showed efficient audiovisual integration for low auditory S/N ratios, but inefficient audiovisual integration when the auditory signal was clear. The ERP analyses showed evidence for greater audiovisual amplitude compared to the unisensory signals for lower auditory S/N ratios (higher capacity/efficiency) compared to the high S/N ratio (low capacity/inefficient integration). The data are consistent with an interactive framework of integration, where auditory recognition is influenced by speech-reading as a function of signal clarity.
Monitoring Progress in Vocal Development in Young Cochlear Implant Recipients: Relationships between Speech Samples and Scores from the Conditioned Assessment of Speech Production (CASP)

PubMed Central

Ertmer, David J.; Jung, Jongmin

2012-01-01

Background Evidence of auditory-guided speech development can be heard as the prelinguistic vocalizations of young cochlear implant recipients become increasingly complex, phonetically diverse, and speech-like. In research settings, these changes are most often documented by collecting and analyzing speech samples. Sampling, however, may be too time-consuming and impractical for widespread use in clinical settings. The Conditioned Assessment of Speech Production (CASP; Ertmer & Stoel-Gammon, 2008) is an easily administered and time-efficient alternative to speech sample analysis. The current investigation examined the concurrent validity of the CASP and data obtained from speech samples recorded at the same intervals. Methods Nineteen deaf children who received CIs before their third birthdays participated in the study. Speech samples and CASP scores were gathered at 6, 12, 18, and 24 months post-activation. Correlation analyses were conducted to assess the concurrent validity of CASP scores and data from samples. Results CASP scores showed strong concurrent validity with scores from speech samples gathered across all recording sessions (6 – 24 months). Conclusions The CASP was found to be a valid, reliable, and time-efficient tool for assessing progress in vocal development during young CI recipient’s first 2 years of device experience. PMID:22628109
Electrophysiological and hemodynamic mismatch responses in rats listening to human speech syllables.

PubMed

Mahmoudzadeh, Mahdi; Dehaene-Lambertz, Ghislaine; Wallois, Fabrice

2017-01-01

Speech is a complex auditory stimulus which is processed according to several time-scales. Whereas consonant discrimination is required to resolve rapid acoustic events, voice perception relies on slower cues. Humans, right from preterm ages, are particularly efficient to encode temporal cues. To compare the capacities of preterms to those observed in other mammals, we tested anesthetized adult rats by using exactly the same paradigm as that used in preterm neonates. We simultaneously recorded neural (using ECoG) and hemodynamic responses (using fNIRS) to series of human speech syllables and investigated the brain response to a change of consonant (ba vs. ga) and to a change of voice (male vs. female). Both methods revealed concordant results, although ECoG measures were more sensitive than fNIRS. Responses to syllables were bilateral, but with marked right-hemispheric lateralization. Responses to voice changes were observed with both methods, while only ECoG was sensitive to consonant changes. These results suggest that rats more effectively processed the speech envelope than fine temporal cues in contrast with human preterm neonates, in whom the opposite effects were observed. Cross-species comparisons constitute a very valuable tool to define the singularities of the human brain and species-specific bias that may help human infants to learn their native language.
Financial and workflow analysis of radiology reporting processes in the planning phase of implementation of a speech recognition system

NASA Astrophysics Data System (ADS)

Whang, Tom; Ratib, Osman M.; Umamoto, Kathleen; Grant, Edward G.; McCoy, Michael J.

2002-05-01

The goal of this study is to determine the financial value and workflow improvements achievable by replacing traditional transcription services with a speech recognition system in a large, university hospital setting. Workflow metrics were measured at two hospitals, one of which exclusively uses a transcription service (UCLA Medical Center), and the other which exclusively uses speech recognition (West Los Angeles VA Hospital). Workflow metrics include time spent per report (the sum of time spent interpreting, dictating, reviewing, and editing), transcription turnaround, and total report turnaround. Compared to traditional transcription, speech recognition resulted in radiologists spending 13-32% more time per report, but it also resulted in reduction of report turnaround time by 22-62% and reduction of marginal cost per report by 94%. The model developed here helps justify the introduction of a speech recognition system by showing that the benefits of reduced operating costs and decreased turnaround time outweigh the cost of increased time spent per report. Whether the ultimate goal is to achieve a financial objective or to improve operational efficiency, it is important to conduct a thorough analysis of workflow before implementation.
Standardization of pitch-range settings in voice acoustic analysis.

PubMed

Vogel, Adam P; Maruff, Paul; Snyder, Peter J; Mundt, James C

2009-05-01

Voice acoustic analysis is typically a labor-intensive, time-consuming process that requires the application of idiosyncratic parameters tailored to individual aspects of the speech signal. Such processes limit the efficiency and utility of voice analysis in clinical practice as well as in applied research and development. In the present study, we analyzed 1,120 voice files, using standard techniques (case-by-case hand analysis), taking roughly 10 work weeks of personnel time to complete. The results were compared with the analytic output of several automated analysis scripts that made use of preset pitch-range parameters. After pitch windows were selected to appropriately account for sex differences, the automated analysis scripts reduced processing time of the 1,120 speech samples to less than 2.5 h and produced results comparable to those obtained with hand analysis. However, caution should be exercised when applying the suggested preset values to pathological voice populations.
Effects of concurrent cognitive processing on the fluency of word repetition: comparison between persons who do and do not stutter.

PubMed

Bosshardt, Hans-Georg

2002-01-01

This study investigated how silent reading and word memorization may affect the fluency of concurrently repeated words. The words silently read or memorized were phonologically similar or dissimilar to the words of the repetition task. Fourteen adults who stutter and 16 who do not participated in the experiment. The two groups were matched for age, education, sex, forward and backward memory span and vocabulary. It was found that the disfluencies of persons who stutter significantly increased during word repetition when similar words were read or memorized concurrently. In contrast, the disfluencies of persons who do not stutter were not significantly affected by either secondary task. These results indicate that the speech of persons who stutter is more sensitive to interference from concurrently performed cognitive processing than that of nonstuttering persons. It is proposed that the phonological and articulatory systems of persons who stutter are protected less efficiently from interference by attention-demanding processing within the central executive system. Alternative interpretations are also discussed. Readers will learn how modern speech production theories and the concept of modularity can account for stuttering, and will be able to explain the greater vulnerability of stutterer's speech fluency to concurrent cognitive processing.
Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion.

PubMed

Gebru, Israel D; Ba, Sileye; Li, Xiaofei; Horaud, Radu

2018-05-01

Speaker diarization consists of assigning speech signals to people engaged in a dialogue. An audio-visual spatiotemporal diarization model is proposed. The model is well suited for challenging scenarios that consist of several participants engaged in multi-party interaction while they move around and turn their heads towards the other participants rather than facing the cameras and the microphones. Multiple-person visual tracking is combined with multiple speech-source localization in order to tackle the speech-to-person association problem. The latter is solved within a novel audio-visual fusion method on the following grounds: binaural spectral features are first extracted from a microphone pair, then a supervised audio-visual alignment technique maps these features onto an image, and finally a semi-supervised clustering method assigns binaural spectral features to visible persons. The main advantage of this method over previous work is that it processes in a principled way speech signals uttered simultaneously by multiple persons. The diarization itself is cast into a latent-variable temporal graphical model that infers speaker identities and speech turns, based on the output of an audio-visual association process, executed at each time slice, and on the dynamics of the diarization variable itself. The proposed formulation yields an efficient exact inference procedure. A novel dataset, that contains audio-visual training data as well as a number of scenarios involving several participants engaged in formal and informal dialogue, is introduced. The proposed method is thoroughly tested and benchmarked with respect to several state-of-the art diarization algorithms.
Speech Research: A Report on the Status and Progress of Studies on the Nature of Speech , Instrumentation for Its Investigation, and Practical Applications, 1 October-31 December 1971.

ERIC Educational Resources Information Center

Turney, Michael T.; And Others

This report on speech research contains papers describing experiments involving both information processing and speech production. The papers concerned with information processing cover such topics as peripheral and central processes in vision, separate speech and nonspeech processing in dichotic listening, and dichotic fusion along an acoustic…
Neurophysiological Influence of Musical Training on Speech Perception

PubMed Central

Shahin, Antoine J.

2011-01-01

Does musical training affect our perception of speech? For example, does learning to play a musical instrument modify the neural circuitry for auditory processing in a way that improves one's ability to perceive speech more clearly in noisy environments? If so, can speech perception in individuals with hearing loss (HL), who struggle in noisy situations, benefit from musical training? While music and speech exhibit some specialization in neural processing, there is evidence suggesting that skills acquired through musical training for specific acoustical processes may transfer to, and thereby improve, speech perception. The neurophysiological mechanisms underlying the influence of musical training on speech processing and the extent of this influence remains a rich area to be explored. A prerequisite for such transfer is the facilitation of greater neurophysiological overlap between speech and music processing following musical training. This review first establishes a neurophysiological link between musical training and speech perception, and subsequently provides further hypotheses on the neurophysiological implications of musical training on speech perception in adverse acoustical environments and in individuals with HL. PMID:21716639
Neurophysiological influence of musical training on speech perception.

PubMed

Shahin, Antoine J

2011-01-01

Does musical training affect our perception of speech? For example, does learning to play a musical instrument modify the neural circuitry for auditory processing in a way that improves one's ability to perceive speech more clearly in noisy environments? If so, can speech perception in individuals with hearing loss (HL), who struggle in noisy situations, benefit from musical training? While music and speech exhibit some specialization in neural processing, there is evidence suggesting that skills acquired through musical training for specific acoustical processes may transfer to, and thereby improve, speech perception. The neurophysiological mechanisms underlying the influence of musical training on speech processing and the extent of this influence remains a rich area to be explored. A prerequisite for such transfer is the facilitation of greater neurophysiological overlap between speech and music processing following musical training. This review first establishes a neurophysiological link between musical training and speech perception, and subsequently provides further hypotheses on the neurophysiological implications of musical training on speech perception in adverse acoustical environments and in individuals with HL.
Speech recognition-based and automaticity programs to help students with severe reading and spelling problems.

PubMed

Higgins, Eleanor L; Raskind, Marshall H

2004-12-01

This study was conducted to assess the effectiveness of two programs developed by the Frostig Center Research Department to improve the reading and spelling of students with learning disabilities (LD): a computer Speech Recognition-based Program (SRBP) and a computer and text-based Automaticity Program (AP). Twenty-eight LD students with reading and spelling difficulties (aged 8 to 18) received each program for 17 weeks and were compared with 16 students in a contrast group who did not receive either program. After adjusting for age and IQ, both the SRBP and AP groups showed significant differences over the contrast group in improving word recognition and reading comprehension. Neither program showed significant differences over contrasts in spelling. The SRBP also improved the performance of the target group when compared with the contrast group on phonological elision and nonword reading efficiency tasks. The AP showed significant differences in all process and reading efficiency measures.
Speech Intelligibility Predicted from Neural Entrainment of the Speech Envelope.

PubMed

Vanthornhout, Jonas; Decruy, Lien; Wouters, Jan; Simon, Jonathan Z; Francart, Tom

2018-04-01

Speech intelligibility is currently measured by scoring how well a person can identify a speech signal. The results of such behavioral measures reflect neural processing of the speech signal, but are also influenced by language processing, motivation, and memory. Very often, electrophysiological measures of hearing give insight in the neural processing of sound. However, in most methods, non-speech stimuli are used, making it hard to relate the results to behavioral measures of speech intelligibility. The use of natural running speech as a stimulus in electrophysiological measures of hearing is a paradigm shift which allows to bridge the gap between behavioral and electrophysiological measures. Here, by decoding the speech envelope from the electroencephalogram, and correlating it with the stimulus envelope, we demonstrate an electrophysiological measure of neural processing of running speech. We show that behaviorally measured speech intelligibility is strongly correlated with our electrophysiological measure. Our results pave the way towards an objective and automatic way of assessing neural processing of speech presented through auditory prostheses, reducing confounds such as attention and cognitive capabilities. We anticipate that our electrophysiological measure will allow better differential diagnosis of the auditory system, and will allow the development of closed-loop auditory prostheses that automatically adapt to individual users.
Phonological processes in the speech of school-age children with hearing loss: Comparisons with children with normal hearing.

PubMed

Asad, Areej Nimer; Purdy, Suzanne C; Ballard, Elaine; Fairgray, Liz; Bowen, Caroline

2018-04-27

In this descriptive study, phonological processes were examined in the speech of children aged 5;0-7;6 (years; months) with mild to profound hearing loss using hearing aids (HAs) and cochlear implants (CIs), in comparison to their peers. A second aim was to compare phonological processes of HA and CI users. Children with hearing loss (CWHL, N = 25) were compared to children with normal hearing (CWNH, N = 30) with similar age, gender, linguistic, and socioeconomic backgrounds. Speech samples obtained from a list of 88 words, derived from three standardized speech tests, were analyzed using the CASALA (Computer Aided Speech and Language Analysis) program to evaluate participants' phonological systems, based on lax (a process appeared at least twice in the speech of at least two children) and strict (a process appeared at least five times in the speech of at least two children) counting criteria. Developmental phonological processes were eliminated in the speech of younger and older CWNH while eleven developmental phonological processes persisted in the speech of both age groups of CWHL. CWHL showed a similar trend of age of elimination to CWNH, but at a slower rate. Children with HAs and CIs produced similar phonological processes. Final consonant deletion, weak syllable deletion, backing, and glottal replacement were present in the speech of HA users, affecting their overall speech intelligibility. Developmental and non-developmental phonological processes persist in the speech of children with mild to profound hearing loss compared to their peers with typical hearing. The findings indicate that it is important for clinicians to consider phonological assessment in pre-school CWHL and the use of evidence-based speech therapy in order to reduce non-developmental and non-age-appropriate developmental processes, thereby enhancing their speech intelligibility. Copyright © 2018 Elsevier Inc. All rights reserved.
An acoustic feature-based similarity scoring system for speech rehabilitation assistance.

PubMed

Syauqy, Dahnial; Wu, Chao-Min; Setyawati, Onny

2016-08-01

The purpose of this study is to develop a tool to assist speech therapy and rehabilitation, which focused on automatic scoring based on the comparison of the patient's speech with another normal speech on several aspects including pitch, vowel, voiced-unvoiced segments, strident fricative and sound intensity. The pitch estimation employed the use of cepstrum-based algorithm for its robustness; the vowel classification used multilayer perceptron (MLP) to classify vowel from pitch and formants; and the strident fricative detection was based on the major peak spectral intensity, location and the pitch existence in the segment. In order to evaluate the performance of the system, this study analyzed eight patient's speech recordings (four males, four females; 4-58-years-old), which had been recorded in previous study in cooperation with Taipei Veterans General Hospital and Taoyuan General Hospital. The experiment result on pitch algorithm showed that the cepstrum method had 5.3% of gross pitch error from a total of 2086 frames. On the vowel classification algorithm, MLP method provided 93% accuracy (men), 87% (women) and 84% (children). In total, the overall results showed that 156 tool's grading results (81%) were consistent compared to 192 audio and visual observations done by four experienced respondents. Implication for Rehabilitation Difficulties in communication may limit the ability of a person to transfer and exchange information. The fact that speech is one of the primary means of communication has encouraged the needs of speech diagnosis and rehabilitation. The advances of technology in computer-assisted speech therapy (CAST) improve the quality, time efficiency of the diagnosis and treatment of the disorders. The present study attempted to develop tool to assist speech therapy and rehabilitation, which provided simple interface to let the assessment be done even by the patient himself without the need of particular knowledge of speech processing while at the same time, also provided further deep analysis of the speech, which can be useful for the speech therapist.
A Diagnostic Marker to Discriminate Childhood Apraxia of Speech From Speech Delay: III. Theoretical Coherence of the Pause Marker with Speech Processing Deficits in Childhood Apraxia of Speech.

PubMed

Shriberg, Lawrence D; Strand, Edythe A; Fourakis, Marios; Jakielski, Kathy J; Hall, Sheryl D; Karlsson, Heather B; Mabie, Heather L; McSweeny, Jane L; Tilkens, Christie M; Wilson, David L

2017-04-14

Previous articles in this supplement described rationale for and development of the pause marker (PM), a diagnostic marker of childhood apraxia of speech (CAS), and studies supporting its validity and reliability. The present article assesses the theoretical coherence of the PM with speech processing deficits in CAS. PM and other scores were obtained for 264 participants in 6 groups: CAS in idiopathic, neurogenetic, and complex neurodevelopmental disorders; adult-onset apraxia of speech (AAS) consequent to stroke and primary progressive apraxia of speech; and idiopathic speech delay. Participants with CAS and AAS had significantly lower scores than typically speaking reference participants and speech delay controls on measures posited to assess representational and transcoding processes. Representational deficits differed between CAS and AAS groups, with support for both underspecified linguistic representations and memory/access deficits in CAS, but for only the latter in AAS. CAS-AAS similarities in the age-sex standardized percentages of occurrence of the most frequent type of inappropriate pauses (abrupt) and significant differences in the standardized occurrence of appropriate pauses were consistent with speech processing findings. Results support the hypotheses of core representational and transcoding speech processing deficits in CAS and theoretical coherence of the PM's pause-speech elements with these deficits.
Integrating speech in time depends on temporal expectancies and attention.

PubMed

Scharinger, Mathias; Steinberg, Johanna; Tavano, Alessandro

2017-08-01

Sensory information that unfolds in time, such as in speech perception, relies on efficient chunking mechanisms in order to yield optimally-sized units for further processing. Whether or not two successive acoustic events receive a one-unit or a two-unit interpretation seems to depend on the fit between their temporal extent and a stipulated temporal window of integration. However, there is ongoing debate on how flexible this temporal window of integration should be, especially for the processing of speech sounds. Furthermore, there is no direct evidence of whether attention may modulate the temporal constraints on the integration window. For this reason, we here examine how different word durations, which lead to different temporal separations of sound onsets, interact with attention. In an Electroencephalography (EEG) study, participants actively and passively listened to words where word-final consonants were occasionally omitted. Words had either a natural duration or were artificially prolonged in order to increase the separation of speech sound onsets. Omission responses to incomplete speech input, originating in left temporal cortex, decreased when the critical speech sound was separated from previous sounds by more than 250 msec, i.e., when the separation was larger than the stipulated temporal window of integration (125-150 msec). Attention, on the other hand, only increased omission responses for stimuli with natural durations. We complemented the event-related potential (ERP) analyses by a frequency-domain analysis on the stimulus presentation rate. Notably, the power of stimulation frequency showed the same duration and attention effects than the omission responses. We interpret these findings on the background of existing research on temporal integration windows and further suggest that our findings may be accounted for within the framework of predictive coding. Copyright © 2017 Elsevier Ltd. All rights reserved.
The Downside of Greater Lexical Influences: Selectively Poorer Speech Perception in Noise

PubMed Central

Xie, Zilong; Tessmer, Rachel; Chandrasekaran, Bharath

2017-01-01

Purpose Although lexical information influences phoneme perception, the extent to which reliance on lexical information enhances speech processing in challenging listening environments is unclear. We examined the extent to which individual differences in lexical influences on phonemic processing impact speech processing in maskers containing varying degrees of linguistic information (2-talker babble or pink noise). Method Twenty-nine monolingual English speakers were instructed to ignore the lexical status of spoken syllables (e.g., gift vs. kift) and to only categorize the initial phonemes (/g/ vs. /k/). The same participants then performed speech recognition tasks in the presence of 2-talker babble or pink noise in audio-only and audiovisual conditions. Results Individuals who demonstrated greater lexical influences on phonemic processing experienced greater speech processing difficulties in 2-talker babble than in pink noise. These selective difficulties were present across audio-only and audiovisual conditions. Conclusion Individuals with greater reliance on lexical processes during speech perception exhibit impaired speech recognition in listening conditions in which competing talkers introduce audible linguistic interferences. Future studies should examine the locus of lexical influences/interferences on phonemic processing and speech-in-speech processing. PMID:28586824

A causal test of the motor theory of speech perception: A case of impaired speech production and spared speech perception

PubMed Central

Stasenko, Alena; Bonn, Cory; Teghipco, Alex; Garcea, Frank E.; Sweet, Catherine; Dombovy, Mary; McDonough, Joyce; Mahon, Bradford Z.

2015-01-01

In the last decade, the debate about the causal role of the motor system in speech perception has been reignited by demonstrations that motor processes are engaged during the processing of speech sounds. However, the exact role of the motor system in auditory speech processing remains elusive. Here we evaluate which aspects of auditory speech processing are affected, and which are not, in a stroke patient with dysfunction of the speech motor system. The patient’s spontaneous speech was marked by frequent phonological/articulatory errors, and those errors were caused, at least in part, by motor-level impairments with speech production. We found that the patient showed a normal phonemic categorical boundary when discriminating two nonwords that differ by a minimal pair (e.g., ADA-AGA). However, using the same stimuli, the patient was unable to identify or label the nonword stimuli (using a button-press response). A control task showed that he could identify speech sounds by speaker gender, ruling out a general labeling impairment. These data suggest that the identification (i.e. labeling) of nonword speech sounds may involve the speech motor system, but that the perception of speech sounds (i.e., discrimination) does not require the motor system. This means that motor processes are not causally involved in perception of the speech signal, and suggest that the motor system may be used when other cues (e.g., meaning, context) are not available. PMID:25951749
Ageing without hearing loss or cognitive impairment causes a decrease in speech intelligibility only in informational maskers.

PubMed

Rajan, R; Cainer, K E

2008-06-23

In most everyday settings, speech is heard in the presence of competing sounds and understanding speech requires skills in auditory streaming and segregation, followed by identification and recognition, of the attended signals. Ageing leads to difficulties in understanding speech in noisy backgrounds. In addition to age-related changes in hearing-related factors, cognitive factors also play a role but it is unclear to what extent these are generalized or modality-specific cognitive factors. We examined how ageing in normal-hearing decade age cohorts from 20 to 69 years affected discrimination of open-set speech in background noise. We used two types of sentences of similar structural and linguistic characteristics but different masking levels (i.e. differences in signal-to-noise ratios required for detection of sentences in a standard masker) so as to vary sentence demand, and two background maskers (one causing purely energetic masking effects and the other causing energetic and informational masking) to vary load conditions. There was a decline in performance (measured as speech reception thresholds for perception of sentences in noise) in the oldest cohort for both types of sentences, but only in the presence of the more demanding informational masker. We interpret these results to indicate a modality-specific decline in cognitive processing, likely a decrease in the ability to use acoustic and phonetic cues efficiently to segregate speech from background noise, in subjects aged >60.
Normative Topographic ERP Analyses of Speed of Speech Processing and Grammar Before and After Grammatical Treatment

PubMed Central

Yoder, Paul J.; Molfese, Dennis; Murray, Micah M.; Key, Alexandra P. F.

2013-01-01

Typically developing (TD) preschoolers and age-matched preschoolers with specific language impairment (SLI) received event-related potentials (ERPs) to four monosyllabic speech sounds prior to treatment and, in the SLI group, after 6 months of grammatical treatment. Before treatment, the TD group processed speech sounds faster than the SLI group. The SLI group increased the speed of their speech processing after treatment. Post-treatment speed of speech processing predicted later impairment in comprehending phrase elaboration in the SLI group. During the treatment phase, change in speed of speech processing predicted growth rate of grammar in the SLI group. PMID:24219693
The development of sentence interpretation: effects of perceptual, attentional and semantic interference.

PubMed

Leech, Robert; Aydelott, Jennifer; Symons, Germaine; Carnevale, Julia; Dick, Frederic

2007-11-01

How does the development and consolidation of perceptual, attentional, and higher cognitive abilities interact with language acquisition and processing? We explored children's (ages 5-17) and adults' (ages 18-51) comprehension of morphosyntactically varied sentences under several competing speech conditions that varied in the degree of attentional demands, auditory masking, and semantic interference. We also evaluated the relationship between subjects' syntactic comprehension and their word reading efficiency and general 'speed of processing'. We found that the interactions between perceptual and attentional processes and complex sentence interpretation changed considerably over the course of development. Perceptual masking of the speech signal had an early and lasting impact on comprehension, particularly for more complex sentence structures. In contrast, increased attentional demand in the absence of energetic auditory masking primarily affected younger children's comprehension of difficult sentence types. Finally, the predictability of syntactic comprehension abilities by external measures of development and expertise is contingent upon the perceptual, attentional, and semantic milieu in which language processing takes place.
Multiple Stakeholder Perspectives on Teletherapy Delivery of Speech Pathology Services in Rural Schools: A Preliminary, Qualitative Investigation

PubMed Central

LINCOLN, MICHELLE; HINES, MONIQUE; FAIRWEATHER, CRAIG; RAMSDEN, ROBYN; MARTINOVICH, JULIA

2015-01-01

The objective of this study was to investigate stakeholders’ views on the feasibility and acceptability of a pilot speech pathology teletherapy program for children attending schools in rural New South Wales, Australia. Nine children received speech pathology sessions delivered via Adobe Connect® web-conferencing software. During semi-structured interviews, school principals (n = 3), therapy facilitators (n = 7), and parents (n = 6) described factors that promoted or threatened the program’s feasibility and acceptability. Themes were categorized according to whether they related to (a) the use of technology; (b) the school-based nature of the program; or (c) the combination of using technology with a school-based program. Despite frequent reports of difficulties with technology, teletherapy delivery of speech pathology services in schools was highly acceptable to stakeholders. However, the use of technology within a school environment increased the complexities of service delivery. Service providers should pay careful attention to planning processes and lines of communication in order to promote efficiency and acceptability of teletherapy programs. PMID:25945230
Two-microphone spatial filtering provides speech reception benefits for cochlear implant users in difficult acoustic environments

PubMed Central

Goldsworthy, Raymond L.; Delhorne, Lorraine A.; Desloge, Joseph G.; Braida, Louis D.

2014-01-01

This article introduces and provides an assessment of a spatial-filtering algorithm based on two closely-spaced (∼1 cm) microphones in a behind-the-ear shell. The evaluated spatial-filtering algorithm used fast (∼10 ms) temporal-spectral analysis to determine the location of incoming sounds and to enhance sounds arriving from straight ahead of the listener. Speech reception thresholds (SRTs) were measured for eight cochlear implant (CI) users using consonant and vowel materials under three processing conditions: An omni-directional response, a dipole-directional response, and the spatial-filtering algorithm. The background noise condition used three simultaneous time-reversed speech signals as interferers located at 90°, 180°, and 270°. Results indicated that the spatial-filtering algorithm can provide speech reception benefits of 5.8 to 10.7 dB SRT compared to an omni-directional response in a reverberant room with multiple noise sources. Given the observed SRT benefits, coupled with an efficient design, the proposed algorithm is promising as a CI noise-reduction solution. PMID:25096120
Reaction times of normal listeners to laryngeal, alaryngeal, and synthetic speech.

PubMed

Evitts, Paul M; Searl, Jeff

2006-12-01

The purpose of this study was to compare listener processing demands when decoding alaryngeal compared to laryngeal speech. Fifty-six listeners were presented with single words produced by 1 proficient speaker from 5 different modes of speech: normal, tracheosophageal (TE), esophageal (ES), electrolaryngeal (EL), and synthetic speech (SS). Cognitive processing load was indexed by listener reaction time (RT). To account for significant durational differences among the modes of speech, an RT ratio was calculated (stimulus duration divided by RT). Results indicated that the cognitive processing load was greater for ES and EL relative to normal speech. TE and normal speech did not differ in terms of RT ratio, suggesting fairly comparable cognitive demands placed on the listener. SS required greater cognitive processing load than normal and alaryngeal speech. The results are discussed relative to alaryngeal speech intelligibility and the role of the listener. Potential clinical applications and directions for future research are also presented.
Sensory-motor relationships in speech production in post-lingually deaf cochlear-implanted adults and normal-hearing seniors: Evidence from phonetic convergence and speech imitation.

PubMed

Scarbel, Lucie; Beautemps, Denis; Schwartz, Jean-Luc; Sato, Marc

2017-07-01

Speech communication can be viewed as an interactive process involving a functional coupling between sensory and motor systems. One striking example comes from phonetic convergence, when speakers automatically tend to mimic their interlocutor's speech during communicative interaction. The goal of this study was to investigate sensory-motor linkage in speech production in postlingually deaf cochlear implanted participants and normal hearing elderly adults through phonetic convergence and imitation. To this aim, two vowel production tasks, with or without instruction to imitate an acoustic vowel, were proposed to three groups of young adults with normal hearing, elderly adults with normal hearing and post-lingually deaf cochlear-implanted patients. Measure of the deviation of each participant's f 0 from their own mean f 0 was measured to evaluate the ability to converge to each acoustic target. showed that cochlear-implanted participants have the ability to converge to an acoustic target, both intentionally and unintentionally, albeit with a lower degree than young and elderly participants with normal hearing. By providing evidence for phonetic convergence and speech imitation, these results suggest that, as in young adults, perceptuo-motor relationships are efficient in elderly adults with normal hearing and that cochlear-implanted adults recovered significant perceptuo-motor abilities following cochlear implantation. Copyright © 2017 Elsevier Ltd. All rights reserved.
A Diagnostic Marker to Discriminate Childhood Apraxia of Speech From Speech Delay: III. Theoretical Coherence of the Pause Marker with Speech Processing Deficits in Childhood Apraxia of Speech

PubMed Central

Strand, Edythe A.; Fourakis, Marios; Jakielski, Kathy J.; Hall, Sheryl D.; Karlsson, Heather B.; Mabie, Heather L.; McSweeny, Jane L.; Tilkens, Christie M.; Wilson, David L.

2017-01-01

Purpose Previous articles in this supplement described rationale for and development of the pause marker (PM), a diagnostic marker of childhood apraxia of speech (CAS), and studies supporting its validity and reliability. The present article assesses the theoretical coherence of the PM with speech processing deficits in CAS. Method PM and other scores were obtained for 264 participants in 6 groups: CAS in idiopathic, neurogenetic, and complex neurodevelopmental disorders; adult-onset apraxia of speech (AAS) consequent to stroke and primary progressive apraxia of speech; and idiopathic speech delay. Results Participants with CAS and AAS had significantly lower scores than typically speaking reference participants and speech delay controls on measures posited to assess representational and transcoding processes. Representational deficits differed between CAS and AAS groups, with support for both underspecified linguistic representations and memory/access deficits in CAS, but for only the latter in AAS. CAS–AAS similarities in the age–sex standardized percentages of occurrence of the most frequent type of inappropriate pauses (abrupt) and significant differences in the standardized occurrence of appropriate pauses were consistent with speech processing findings. Conclusions Results support the hypotheses of core representational and transcoding speech processing deficits in CAS and theoretical coherence of the PM's pause-speech elements with these deficits. PMID:28384751
Linguistic Profiles of Children with CI as Compared with Children with Hearing or Specific Language Impairment

ERIC Educational Resources Information Center

Hoog, Brigitte E.; Langereis, Margreet C.; Weerdenburg, Marjolijn; Knoors, Harry E. T.; Verhoeven, Ludo

2016-01-01

Background: The spoken language difficulties of children with moderate or severe to profound hearing loss are mainly related to limited auditory speech perception. However, degraded or filtered auditory input as evidenced in children with cochlear implants (CIs) may result in less efficient or slower language processing as well. To provide insight…
The Cortical Organization of Speech Processing: Feedback Control and Predictive Coding the Context of a Dual-Stream Model

ERIC Educational Resources Information Center

Hickok, Gregory

2012-01-01

Speech recognition is an active process that involves some form of predictive coding. This statement is relatively uncontroversial. What is less clear is the source of the prediction. The dual-stream model of speech processing suggests that there are two possible sources of predictive coding in speech perception: the motor speech system and the…
Contribution of auditory working memory to speech understanding in mandarin-speaking cochlear implant users.

PubMed

Tao, Duoduo; Deng, Rui; Jiang, Ye; Galvin, John J; Fu, Qian-Jie; Chen, Bing

2014-01-01

To investigate how auditory working memory relates to speech perception performance by Mandarin-speaking cochlear implant (CI) users. Auditory working memory and speech perception was measured in Mandarin-speaking CI and normal-hearing (NH) participants. Working memory capacity was measured using forward digit span and backward digit span; working memory efficiency was measured using articulation rate. Speech perception was assessed with: (a) word-in-sentence recognition in quiet, (b) word-in-sentence recognition in speech-shaped steady noise at +5 dB signal-to-noise ratio, (c) Chinese disyllable recognition in quiet, (d) Chinese lexical tone recognition in quiet. Self-reported school rank was also collected regarding performance in schoolwork. There was large inter-subject variability in auditory working memory and speech performance for CI participants. Working memory and speech performance were significantly poorer for CI than for NH participants. All three working memory measures were strongly correlated with each other for both CI and NH participants. Partial correlation analyses were performed on the CI data while controlling for demographic variables. Working memory efficiency was significantly correlated only with sentence recognition in quiet when working memory capacity was partialled out. Working memory capacity was correlated with disyllable recognition and school rank when efficiency was partialled out. There was no correlation between working memory and lexical tone recognition in the present CI participants. Mandarin-speaking CI users experience significant deficits in auditory working memory and speech performance compared with NH listeners. The present data suggest that auditory working memory may contribute to CI users' difficulties in speech understanding. The present pattern of results with Mandarin-speaking CI users is consistent with previous auditory working memory studies with English-speaking CI users, suggesting that the lexical importance of voice pitch cues (albeit poorly coded by the CI) did not influence the relationship between working memory and speech perception.
The Timing and Effort of Lexical Access in Natural and Degraded Speech

PubMed Central

Wagner, Anita E.; Toffanin, Paolo; Başkent, Deniz

2016-01-01

Understanding speech is effortless in ideal situations, and although adverse conditions, such as caused by hearing impairment, often render it an effortful task, they do not necessarily suspend speech comprehension. A prime example of this is speech perception by cochlear implant users, whose hearing prostheses transmit speech as a significantly degraded signal. It is yet unknown how mechanisms of speech processing deal with such degraded signals, and whether they are affected by effortful processing of speech. This paper compares the automatic process of lexical competition between natural and degraded speech, and combines gaze fixations, which capture the course of lexical disambiguation, with pupillometry, which quantifies the mental effort involved in processing speech. Listeners’ ocular responses were recorded during disambiguation of lexical embeddings with matching and mismatching durational cues. Durational cues were selected due to their substantial role in listeners’ quick limitation of the number of lexical candidates for lexical access in natural speech. Results showed that lexical competition increased mental effort in processing natural stimuli in particular in presence of mismatching cues. Signal degradation reduced listeners’ ability to quickly integrate durational cues in lexical selection, and delayed and prolonged lexical competition. The effort of processing degraded speech was increased overall, and because it had its sources at the pre-lexical level this effect can be attributed to listening to degraded speech rather than to lexical disambiguation. In sum, the course of lexical competition was largely comparable for natural and degraded speech, but showed crucial shifts in timing, and different sources of increased mental effort. We argue that well-timed progress of information from sensory to pre-lexical and lexical stages of processing, which is the result of perceptual adaptation during speech development, is the reason why in ideal situations speech is perceived as an undemanding task. Degradation of the signal or the receiver channel can quickly bring this well-adjusted timing out of balance and lead to increase in mental effort. Incomplete and effortful processing at the early pre-lexical stages has its consequences on lexical processing as it adds uncertainty to the forming and revising of lexical hypotheses. PMID:27065901
Speech systems research at Texas Instruments

NASA Technical Reports Server (NTRS)

Doddington, George R.

1977-01-01

An assessment of automatic speech processing technology is presented. Fundamental problems in the development and the deployment of automatic speech processing systems are defined and a technology forecast for speech systems is presented.
Did you or I say pretty, rude or brief? An ERP study of the effects of speaker's identity on emotional word processing.

PubMed

Pinheiro, Ana P; Rezaii, Neguine; Nestor, Paul G; Rauber, Andréia; Spencer, Kevin M; Niznikiewicz, Margaret

2016-02-01

During speech comprehension, multiple cues need to be integrated at a millisecond speed, including semantic information, as well as voice identity and affect cues. A processing advantage has been demonstrated for self-related stimuli when compared with non-self stimuli, and for emotional relative to neutral stimuli. However, very few studies investigated self-other speech discrimination and, in particular, how emotional valence and voice identity interactively modulate speech processing. In the present study we probed how the processing of words' semantic valence is modulated by speaker's identity (self vs. non-self voice). Sixteen healthy subjects listened to 420 prerecorded adjectives differing in voice identity (self vs. non-self) and semantic valence (neutral, positive and negative), while electroencephalographic data were recorded. Participants were instructed to decide whether the speech they heard was their own (self-speech condition), someone else's (non-self speech), or if they were unsure. The ERP results demonstrated interactive effects of speaker's identity and emotional valence on both early (N1, P2) and late (Late Positive Potential - LPP) processing stages: compared with non-self speech, self-speech with neutral valence elicited more negative N1 amplitude, self-speech with positive valence elicited more positive P2 amplitude, and self-speech with both positive and negative valence elicited more positive LPP. ERP differences between self and non-self speech occurred in spite of similar accuracy in the recognition of both types of stimuli. Together, these findings suggest that emotion and speaker's identity interact during speech processing, in line with observations of partially dependent processing of speech and speaker information. Copyright © 2016. Published by Elsevier Inc.
Speech perception as an active cognitive process

PubMed Central

Heald, Shannon L. M.; Nusbaum, Howard C.

2014-01-01

One view of speech perception is that acoustic signals are transformed into representations for pattern matching to determine linguistic structure. This process can be taken as a statistical pattern-matching problem, assuming realtively stable linguistic categories are characterized by neural representations related to auditory properties of speech that can be compared to speech input. This kind of pattern matching can be termed a passive process which implies rigidity of processing with few demands on cognitive processing. An alternative view is that speech recognition, even in early stages, is an active process in which speech analysis is attentionally guided. Note that this does not mean consciously guided but that information-contingent changes in early auditory encoding can occur as a function of context and experience. Active processing assumes that attention, plasticity, and listening goals are important in considering how listeners cope with adverse circumstances that impair hearing by masking noise in the environment or hearing loss. Although theories of speech perception have begun to incorporate some active processing, they seldom treat early speech encoding as plastic and attentionally guided. Recent research has suggested that speech perception is the product of both feedforward and feedback interactions between a number of brain regions that include descending projections perhaps as far downstream as the cochlea. It is important to understand how the ambiguity of the speech signal and constraints of context dynamically determine cognitive resources recruited during perception including focused attention, learning, and working memory. Theories of speech perception need to go beyond the current corticocentric approach in order to account for the intrinsic dynamics of the auditory encoding of speech. In doing so, this may provide new insights into ways in which hearing disorders and loss may be treated either through augementation or therapy. PMID:24672438
A Proposed Process for Managing the First Amendment Aspects of Campus Hate Speech.

ERIC Educational Resources Information Center

Kaplan, William A.

1992-01-01

A carefully structured process for campus administrative decision making concerning hate speech is proposed and suggestions for implementation are offered. In addition, criteria for evaluating hate speech processes are outlined, and First Amendment principles circumscribing the institution's discretion to regulate hate speech are discussed.…
Left Lateralized Enhancement of Orofacial Somatosensory Processing Due to Speech Sounds

ERIC Educational Resources Information Center

Ito, Takayuki; Johns, Alexis R.; Ostry, David J.

2013-01-01

Purpose: Somatosensory information associated with speech articulatory movements affects the perception of speech sounds and vice versa, suggesting an intimate linkage between speech production and perception systems. However, it is unclear which cortical processes are involved in the interaction between speech sounds and orofacial somatosensory…
Incorporating Auditory Models in Speech/Audio Applications

NASA Astrophysics Data System (ADS)

Krishnamoorthi, Harish

2011-12-01

Following the success in incorporating perceptual models in audio coding algorithms, their application in other speech/audio processing systems is expanding. In general, all perceptual speech/audio processing algorithms involve minimization of an objective function that directly/indirectly incorporates properties of human perception. This dissertation primarily investigates the problems associated with directly embedding an auditory model in the objective function formulation and proposes possible solutions to overcome high complexity issues for use in real-time speech/audio algorithms. Specific problems addressed in this dissertation include: 1) the development of approximate but computationally efficient auditory model implementations that are consistent with the principles of psychoacoustics, 2) the development of a mapping scheme that allows synthesizing a time/frequency domain representation from its equivalent auditory model output. The first problem is aimed at addressing the high computational complexity involved in solving perceptual objective functions that require repeated application of auditory model for evaluation of different candidate solutions. In this dissertation, a frequency pruning and a detector pruning algorithm is developed that efficiently implements the various auditory model stages. The performance of the pruned model is compared to that of the original auditory model for different types of test signals in the SQAM database. Experimental results indicate only a 4-7% relative error in loudness while attaining up to 80-90 % reduction in computational complexity. Similarly, a hybrid algorithm is developed specifically for use with sinusoidal signals and employs the proposed auditory pattern combining technique together with a look-up table to store representative auditory patterns. The second problem obtains an estimate of the auditory representation that minimizes a perceptual objective function and transforms the auditory pattern back to its equivalent time/frequency representation. This avoids the repeated application of auditory model stages to test different candidate time/frequency vectors in minimizing perceptual objective functions. In this dissertation, a constrained mapping scheme is developed by linearizing certain auditory model stages that ensures obtaining a time/frequency mapping corresponding to the estimated auditory representation. This paradigm was successfully incorporated in a perceptual speech enhancement algorithm and a sinusoidal component selection task.
Speech Processing and Recognition (SPaRe)

DTIC Science & Technology

2011-01-01

results in the areas of automatic speech recognition (ASR), speech processing, machine translation (MT), natural language processing ( NLP ), and...Processing ( NLP ), Information Retrieval (IR) 16. SECURITY CLASSIFICATION OF: UNCLASSIFED 17. LIMITATION OF ABSTRACT 18. NUMBER OF PAGES 19a. NAME...Figure 9, the IOC was only expected to provide document submission and search; automatic speech recognition (ASR) for English, Spanish, Arabic , and

Determining stability in connected speech in primary progressive aphasia and Alzheimer's disease.

PubMed

Beales, Ashleigh; Whitworth, Anne; Cartwright, Jade; Panegyres, Peter K; Kane, Robert T

2018-03-08

Using connected speech to assess progressive language disorders is confounded by uncertainty around whether connected speech is stable over successive sampling, and therefore representative of an individual's performance, and whether some contexts and/or language behaviours show greater stability than others. A repeated measure, within groups, research design was used to investigate stability of a range of behaviours in the connected speech of six individuals with primary progressive aphasia and three individuals with Alzheimer's disease. Stability was evaluated, at a group and individual level, across three samples, collected over 3 weeks, involving everyday monologue, narrative and picture description, and analysed for lexical content, fluency and communicative informativeness and efficiency. Excellent and significant stability was found on the majority of measures, at a group and individual level, across all genres, with isolated measures (e.g. nouns use, communicative efficiency) showing good, but greater variability, within one of the three genres. Findings provide evidence of stability on measures of lexical content, fluency and communicative informativeness and efficiency. While preliminary evidence suggests that task selection is influential when considering stability of particular connected speech measures, replication over a larger sample is necessary to reproduce findings.
Speech endpoint detection with non-language speech sounds for generic speech processing applications

NASA Astrophysics Data System (ADS)

McClain, Matthew; Romanowski, Brian

2009-05-01

Non-language speech sounds (NLSS) are sounds produced by humans that do not carry linguistic information. Examples of these sounds are coughs, clicks, breaths, and filled pauses such as "uh" and "um" in English. NLSS are prominent in conversational speech, but can be a significant source of errors in speech processing applications. Traditionally, these sounds are ignored by speech endpoint detection algorithms, where speech regions are identified in the audio signal prior to processing. The ability to filter NLSS as a pre-processing step can significantly enhance the performance of many speech processing applications, such as speaker identification, language identification, and automatic speech recognition. In order to be used in all such applications, NLSS detection must be performed without the use of language models that provide knowledge of the phonology and lexical structure of speech. This is especially relevant to situations where the languages used in the audio are not known apriori. We present the results of preliminary experiments using data from American and British English speakers, in which segments of audio are classified as language speech sounds (LSS) or NLSS using a set of acoustic features designed for language-agnostic NLSS detection and a hidden-Markov model (HMM) to model speech generation. The results of these experiments indicate that the features and model used are capable of detection certain types of NLSS, such as breaths and clicks, while detection of other types of NLSS such as filled pauses will require future research.
Sadness is unique: neural processing of emotions in speech prosody in musicians and non-musicians.

PubMed

Park, Mona; Gutyrchik, Evgeny; Welker, Lorenz; Carl, Petra; Pöppel, Ernst; Zaytseva, Yuliya; Meindl, Thomas; Blautzik, Janusch; Reiser, Maximilian; Bao, Yan

2014-01-01

Musical training has been shown to have positive effects on several aspects of speech processing, however, the effects of musical training on the neural processing of speech prosody conveying distinct emotions are yet to be better understood. We used functional magnetic resonance imaging (fMRI) to investigate whether the neural responses to speech prosody conveying happiness, sadness, and fear differ between musicians and non-musicians. Differences in processing of emotional speech prosody between the two groups were only observed when sadness was expressed. Musicians showed increased activation in the middle frontal gyrus, the anterior medial prefrontal cortex, the posterior cingulate cortex and the retrosplenial cortex. Our results suggest an increased sensitivity of emotional processing in musicians with respect to sadness expressed in speech, possibly reflecting empathic processes.
The influence of speech rate and accent on access and use of semantic information.

PubMed

Sajin, Stanislav M; Connine, Cynthia M

2017-04-01

Circumstances in which the speech input is presented in sub-optimal conditions generally lead to processing costs affecting spoken word recognition. The current study indicates that some processing demands imposed by listening to difficult speech can be mitigated by feedback from semantic knowledge. A set of lexical decision experiments examined how foreign accented speech and word duration impact access to semantic knowledge in spoken word recognition. Results indicate that when listeners process accented speech, the reliance on semantic information increases. Speech rate was not observed to influence semantic access, except in the setting in which unusually slow accented speech was presented. These findings support interactive activation models of spoken word recognition in which attention is modulated based on speech demands.
Comparison of formant detection methods used in speech processing applications

NASA Astrophysics Data System (ADS)

Belean, Bogdan

2013-11-01

The paper describes time frequency representations of speech signal together with the formant significance in speech processing applications. Speech formants can be used in emotion recognition, sex discrimination or diagnosing different neurological diseases. Taking into account the various applications of formant detection in speech signal, two methods for detecting formants are presented. First, the poles resulted after a complex analysis of LPC coefficients are used for formants detection. The second approach uses the Kalman filter for formant prediction along the speech signal. Results are presented for both approaches on real life speech spectrograms. A comparison regarding the features of the proposed methods is also performed, in order to establish which method is more suitable in case of different speech processing applications.
Visual speech information: a help or hindrance in perceptual processing of dysarthric speech.

PubMed

Borrie, Stephanie A

2015-03-01

This study investigated the influence of visual speech information on perceptual processing of neurologically degraded speech. Fifty listeners identified spastic dysarthric speech under both audio (A) and audiovisual (AV) conditions. Condition comparisons revealed that the addition of visual speech information enhanced processing of the neurologically degraded input in terms of (a) acuity (percent phonemes correct) of vowels and consonants and (b) recognition (percent words correct) of predictive and nonpredictive phrases. Listeners exploited stress-based segmentation strategies more readily in AV conditions, suggesting that the perceptual benefit associated with adding visual speech information to the auditory signal-the AV advantage-has both segmental and suprasegmental origins. Results also revealed that the magnitude of the AV advantage can be predicted, to some degree, by the extent to which an individual utilizes syllabic stress cues to inform word recognition in AV conditions. Findings inform the development of a listener-specific model of speech perception that applies to processing of dysarthric speech in everyday communication contexts.
Rhythmic Priming Enhances the Phonological Processing of Speech

ERIC Educational Resources Information Center

Cason, Nia; Schon, Daniele

2012-01-01

While natural speech does not possess the same degree of temporal regularity found in music, there is recent evidence to suggest that temporal regularity enhances speech processing. The aim of this experiment was to examine whether speech processing would be enhanced by the prior presentation of a rhythmical prime. We recorded electrophysiological…
Visual and Auditory Input in Second-Language Speech Processing

ERIC Educational Resources Information Center

Hardison, Debra M.

2010-01-01

The majority of studies in second-language (L2) speech processing have involved unimodal (i.e., auditory) input; however, in many instances, speech communication involves both visual and auditory sources of information. Some researchers have argued that multimodal speech is the primary mode of speech perception (e.g., Rosenblum 2005). Research on…
Noise and pitch interact during the cortical segregation of concurrent speech.

PubMed

Bidelman, Gavin M; Yellamsetty, Anusha

2017-08-01

Behavioral studies reveal listeners exploit intrinsic differences in voice fundamental frequency (F0) to segregate concurrent speech sounds-the so-called "F0-benefit." More favorable signal-to-noise ratio (SNR) in the environment, an extrinsic acoustic factor, similarly benefits the parsing of simultaneous speech. Here, we examined the neurobiological substrates of these two cues in the perceptual segregation of concurrent speech mixtures. We recorded event-related brain potentials (ERPs) while listeners performed a speeded double-vowel identification task. Listeners heard two concurrent vowels whose F0 differed by zero or four semitones presented in either clean (no noise) or noise-degraded (+5 dB SNR) conditions. Behaviorally, listeners were more accurate in correctly identifying both vowels for larger F0 separations but F0-benefit was more pronounced at more favorable SNRs (i.e., pitch × SNR interaction). Analysis of the ERPs revealed that only the P2 wave (∼200 ms) showed a similar F0 x SNR interaction as behavior and was correlated with listeners' perceptual F0-benefit. Neural classifiers applied to the ERPs further suggested that speech sounds are segregated neurally within 200 ms based on SNR whereas segregation based on pitch occurs later in time (400-700 ms). The earlier timing of extrinsic SNR compared to intrinsic F0-based segregation implies that the cortical extraction of speech from noise is more efficient than differentiating speech based on pitch cues alone, which may recruit additional cortical processes. Findings indicate that noise and pitch differences interact relatively early in cerebral cortex and that the brain arrives at the identities of concurrent speech mixtures as early as ∼200 ms. Copyright © 2017 Elsevier B.V. All rights reserved.
The effects of speech output technology in the learning of graphic symbols.

PubMed Central

Schlosser, R W; Belfiore, P J; Nigam, R; Blischak, D; Hetzroni, O

1995-01-01

The effects of auditory stimuli in the form of synthetic speech output on the learning of graphic symbols were evaluated. Three adults with severe to profound mental retardation and communication impairments were taught to point to lexigrams when presented with words under two conditions. In the first condition, participants used a voice output communication aid to receive synthetic speech as antecedent and consequent stimuli. In the second condition, with a nonelectronic communications board, participants did not receive synthetic speech. A parallel treatments design was used to evaluate the effects of the synthetic speech output as an added component of the augmentative and alternative communication system. The 3 participants reached criterion when not provided with the auditory stimuli. Although 2 participants also reached criterion when not provided with the auditory stimuli, the addition of auditory stimuli resulted in more efficient learning and a decreased error rate. Maintenance results, however, indicated no differences between conditions. Finding suggest that auditory stimuli in the form of synthetic speech contribute to the efficient acquisition of graphic communication symbols. PMID:14743828
Do not throw out the baby with the bath water: choosing an effective baseline for a functional localizer of speech processing.

PubMed

Stoppelman, Nadav; Harpaz, Tamar; Ben-Shachar, Michal

2013-05-01

Speech processing engages multiple cortical regions in the temporal, parietal, and frontal lobes. Isolating speech-sensitive cortex in individual participants is of major clinical and scientific importance. This task is complicated by the fact that responses to sensory and linguistic aspects of speech are tightly packed within the posterior superior temporal cortex. In functional magnetic resonance imaging (fMRI), various baseline conditions are typically used in order to isolate speech-specific from basic auditory responses. Using a short, continuous sampling paradigm, we show that reversed ("backward") speech, a commonly used auditory baseline for speech processing, removes much of the speech responses in frontal and temporal language regions of adult individuals. On the other hand, signal correlated noise (SCN) serves as an effective baseline for removing primary auditory responses while maintaining strong signals in the same language regions. We show that the response to reversed speech in left inferior frontal gyrus decays significantly faster than the response to speech, thus suggesting that this response reflects bottom-up activation of speech analysis followed up by top-down attenuation once the signal is classified as nonspeech. The results overall favor SCN as an auditory baseline for speech processing.
Sadness is unique: neural processing of emotions in speech prosody in musicians and non-musicians

PubMed Central

Park, Mona; Gutyrchik, Evgeny; Welker, Lorenz; Carl, Petra; Pöppel, Ernst; Zaytseva, Yuliya; Meindl, Thomas; Blautzik, Janusch; Reiser, Maximilian; Bao, Yan

2015-01-01

Musical training has been shown to have positive effects on several aspects of speech processing, however, the effects of musical training on the neural processing of speech prosody conveying distinct emotions are yet to be better understood. We used functional magnetic resonance imaging (fMRI) to investigate whether the neural responses to speech prosody conveying happiness, sadness, and fear differ between musicians and non-musicians. Differences in processing of emotional speech prosody between the two groups were only observed when sadness was expressed. Musicians showed increased activation in the middle frontal gyrus, the anterior medial prefrontal cortex, the posterior cingulate cortex and the retrosplenial cortex. Our results suggest an increased sensitivity of emotional processing in musicians with respect to sadness expressed in speech, possibly reflecting empathic processes. PMID:25688196
The Wildcat Corpus of Native- and Foreign-Accented English: Communicative Efficiency across Conversational Dyads with Varying Language Alignment Profiles

PubMed Central

Van Engen, Kristin J.; Baese-Berk, Melissa; Baker, Rachel E.; Choi, Arim; Kim, Midam; Bradlow, Ann R.

2012-01-01

This paper describes the development of the Wildcat Corpus of native- and foreign-accented English, a corpus containing scripted and spontaneous speech recordings from 24 native speakers of American English and 52 non-native speakers of English. The core element of this corpus is a set of spontaneous speech recordings, for which a new method of eliciting dialogue-based, laboratory-quality speech recordings was developed (the Diapix task). Dialogues between two native speakers of English, between two non-native speakers of English (with either shared or different L1s), and between one native and one non-native speaker of English are included and analyzed in terms of general measures of communicative efficiency. The overall finding was that pairs of native talkers were most efficient, followed by mixed native/non-native pairs and non-native pairs with shared L1. Non-native pairs with different L1s were least efficient. These results support the hypothesis that successful speech communication depends both on the alignment of talkers to the target language and on the alignment of talkers to one another in terms of native language background. PMID:21313992
Cortical oscillations and entrainment in speech processing during working memory load.

PubMed

Hjortkjaer, Jens; Märcher-Rørsted, Jonatan; Fuglsang, Søren A; Dau, Torsten

2018-02-02

Neuronal oscillations are thought to play an important role in working memory (WM) and speech processing. Listening to speech in real-life situations is often cognitively demanding but it is unknown whether WM load influences how auditory cortical activity synchronizes to speech features. Here, we developed an auditory n-back paradigm to investigate cortical entrainment to speech envelope fluctuations under different degrees of WM load. We measured the electroencephalogram, pupil dilations and behavioural performance from 22 subjects listening to continuous speech with an embedded n-back task. The speech stimuli consisted of long spoken number sequences created to match natural speech in terms of sentence intonation, syllabic rate and phonetic content. To burden different WM functions during speech processing, listeners performed an n-back task on the speech sequences in different levels of background noise. Increasing WM load at higher n-back levels was associated with a decrease in posterior alpha power as well as increased pupil dilations. Frontal theta power increased at the start of the trial and increased additionally with higher n-back level. The observed alpha-theta power changes are consistent with visual n-back paradigms suggesting general oscillatory correlates of WM processing load. Speech entrainment was measured as a linear mapping between the envelope of the speech signal and low-frequency cortical activity (< 13 Hz). We found that increases in both types of WM load (background noise and n-back level) decreased cortical speech envelope entrainment. Although entrainment persisted under high load, our results suggest a top-down influence of WM processing on cortical speech entrainment. © 2018 The Authors. European Journal of Neuroscience published by Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
The influence of (central) auditory processing disorder in speech sound disorders.

PubMed

Barrozo, Tatiane Faria; Pagan-Neves, Luciana de Oliveira; Vilela, Nadia; Carvallo, Renata Mota Mamede; Wertzner, Haydée Fiszbein

2016-01-01

Considering the importance of auditory information for the acquisition and organization of phonological rules, the assessment of (central) auditory processing contributes to both the diagnosis and targeting of speech therapy in children with speech sound disorders. To study phonological measures and (central) auditory processing of children with speech sound disorder. Clinical and experimental study, with 21 subjects with speech sound disorder aged between 7.0 and 9.11 years, divided into two groups according to their (central) auditory processing disorder. The assessment comprised tests of phonology, speech inconsistency, and metalinguistic abilities. The group with (central) auditory processing disorder demonstrated greater severity of speech sound disorder. The cutoff value obtained for the process density index was the one that best characterized the occurrence of phonological processes for children above 7 years of age. The comparison among the tests evaluated between the two groups showed differences in some phonological and metalinguistic abilities. Children with an index value above 0.54 demonstrated strong tendencies towards presenting a (central) auditory processing disorder, and this measure was effective to indicate the need for evaluation in children with speech sound disorder. Copyright © 2015 Associação Brasileira de Otorrinolaringologia e Cirurgia Cérvico-Facial. Published by Elsevier Editora Ltda. All rights reserved.
Text as a Supplement to Speech in Young and Older Adults a)

PubMed Central

Krull, Vidya; Humes, Larry E.

2015-01-01

Objective The purpose of this experiment was to quantify the contribution of visual text to auditory speech recognition in background noise. Specifically, we tested the hypothesis that partially accurate visual text from an automatic speech recognizer could be used successfully to supplement speech understanding in difficult listening conditions in older adults, with normal or impaired hearing. Our working hypotheses were based on what is known regarding audiovisual speech perception in the elderly from speechreading literature. We hypothesized that: 1) combining auditory and visual text information will result in improved recognition accuracy compared to auditory or visual text information alone; 2) benefit from supplementing speech with visual text (auditory and visual enhancement) in young adults will be greater than that in older adults; and 3) individual differences in performance on perceptual measures would be associated with cognitive abilities. Design Fifteen young adults with normal hearing, fifteen older adults with normal hearing, and fifteen older adults with hearing loss participated in this study. All participants completed sentence recognition tasks in auditory-only, text-only, and combined auditory-text conditions. The auditory sentence stimuli were spectrally shaped to restore audibility for the older participants with impaired hearing. All participants also completed various cognitive measures, including measures of working memory, processing speed, verbal comprehension, perceptual and cognitive speed, processing efficiency, inhibition, and the ability to form wholes from parts. Group effects were examined for each of the perceptual and cognitive measures. Audiovisual benefit was calculated relative to performance on auditory-only and visual-text only conditions. Finally, the relationship between perceptual measures and other independent measures were examined using principal-component factor analyses, followed by regression analyses. Results Both young and older adults performed similarly on nine out of ten perceptual measures (auditory, visual, and combined measures). Combining degraded speech with partially correct text from an automatic speech recognizer improved the understanding of speech in both young and older adults, relative to both auditory- and text-only performance. In all subjects, cognition emerged as a key predictor for a general speech-text integration ability. Conclusions These results suggest that neither age nor hearing loss affected the ability of subjects to benefit from text when used to support speech, after ensuring audibility through spectral shaping. These results also suggest that the benefit obtained by supplementing auditory input with partially accurate text is modulated by cognitive ability, specifically lexical and verbal skills. PMID:26458131
A Diagnostic Marker to Discriminate Childhood Apraxia of Speech from Speech Delay: III. Theoretical Coherence of the Pause Marker with Speech Processing Deficits in Childhood Apraxia of Speech

ERIC Educational Resources Information Center

Shriberg, Lawrence D.; Strand, Edythe A.; Fourakis, Marios; Jakielski, Kathy J.; Hall, Sheryl D.; Karlsson, Heather B.; Mabie, Heather L.; McSweeny, Jane L.; Tilkens, Christie M.; Wilson, David L.

2017-01-01

Purpose: Previous articles in this supplement described rationale for and development of the pause marker (PM), a diagnostic marker of childhood apraxia of speech (CAS), and studies supporting its validity and reliability. The present article assesses the theoretical coherence of the PM with speech processing deficits in CAS. Method: PM and other…
Objective speech quality assessment and the RPE-LTP coding algorithm in different noise and language conditions.

PubMed

Hansen, J H; Nandkumar, S

1995-01-01

The formulation of reliable signal processing algorithms for speech coding and synthesis require the selection of a prior criterion of performance. Though coding efficiency (bits/second) or computational requirements can be used, a final performance measure must always include speech quality. In this paper, three objective speech quality measures are considered with respect to quality assessment for American English, noisy American English, and noise-free versions of seven languages. The purpose is to determine whether objective quality measures can be used to quantify changes in quality for a given voice coding method, with a known subjective performance level, as background noise or language conditions are changed. The speech coding algorithm chosen is regular-pulse excitation with long-term prediction (RPE-LTP), which has been chosen as the standard voice compression algorithm for the European Digital Mobile Radio system. Three areas are considered for objective quality assessment which include: (i) vocoder performance for American English in a noise-free environment, (ii) speech quality variation for three additive background noise sources, and (iii) noise-free performance for seven languages which include English, Japanese, Finnish, German, Hindi, Spanish, and French. It is suggested that although existing objective quality measures will never replace subjective testing, they can be a useful means of assessing changes in performance, identifying areas for improvement in algorithm design, and augmenting subjective quality tests for voice coding/compression algorithms in noise-free, noisy, and/or non-English applications.
Effects of Semantic Context and Fundamental Frequency Contours on Mandarin Speech Recognition by Second Language Learners.

PubMed

Zhang, Linjun; Li, Yu; Wu, Han; Li, Xin; Shu, Hua; Zhang, Yang; Li, Ping

2016-01-01

Speech recognition by second language (L2) learners in optimal and suboptimal conditions has been examined extensively with English as the target language in most previous studies. This study extended existing experimental protocols (Wang et al., 2013) to investigate Mandarin speech recognition by Japanese learners of Mandarin at two different levels (elementary vs. intermediate) of proficiency. The overall results showed that in addition to L2 proficiency, semantic context, F0 contours, and listening condition all affected the recognition performance on the Mandarin sentences. However, the effects of semantic context and F0 contours on L2 speech recognition diverged to some extent. Specifically, there was significant modulation effect of listening condition on semantic context, indicating that L2 learners made use of semantic context less efficiently in the interfering background than in quiet. In contrast, no significant modulation effect of listening condition on F0 contours was found. Furthermore, there was significant interaction between semantic context and F0 contours, indicating that semantic context becomes more important for L2 speech recognition when F0 information is degraded. None of these effects were found to be modulated by L2 proficiency. The discrepancy in the effects of semantic context and F0 contours on L2 speech recognition in the interfering background might be related to differences in processing capacities required by the two types of information in adverse listening conditions.
Atypical speech versus non-speech detection and discrimination in 4- to 6- yr old children with autism spectrum disorder: An ERP study.

PubMed

Galilee, Alena; Stefanidou, Chrysi; McCleery, Joseph P

2017-01-01

Previous event-related potential (ERP) research utilizing oddball stimulus paradigms suggests diminished processing of speech versus non-speech sounds in children with an Autism Spectrum Disorder (ASD). However, brain mechanisms underlying these speech processing abnormalities, and to what extent they are related to poor language abilities in this population remain unknown. In the current study, we utilized a novel paired repetition paradigm in order to investigate ERP responses associated with the detection and discrimination of speech and non-speech sounds in 4- to 6-year old children with ASD, compared with gender and verbal age matched controls. ERPs were recorded while children passively listened to pairs of stimuli that were either both speech sounds, both non-speech sounds, speech followed by non-speech, or non-speech followed by speech. Control participants exhibited N330 match/mismatch responses measured from temporal electrodes, reflecting speech versus non-speech detection, bilaterally, whereas children with ASD exhibited this effect only over temporal electrodes in the left hemisphere. Furthermore, while the control groups exhibited match/mismatch effects at approximately 600 ms (central N600, temporal P600) when a non-speech sound was followed by a speech sound, these effects were absent in the ASD group. These findings suggest that children with ASD fail to activate right hemisphere mechanisms, likely associated with social or emotional aspects of speech detection, when distinguishing non-speech from speech stimuli. Together, these results demonstrate the presence of atypical speech versus non-speech processing in children with ASD when compared with typically developing children matched on verbal age.

Atypical speech versus non-speech detection and discrimination in 4- to 6- yr old children with autism spectrum disorder: An ERP study

PubMed Central

Stefanidou, Chrysi; McCleery, Joseph P.

2017-01-01

Previous event-related potential (ERP) research utilizing oddball stimulus paradigms suggests diminished processing of speech versus non-speech sounds in children with an Autism Spectrum Disorder (ASD). However, brain mechanisms underlying these speech processing abnormalities, and to what extent they are related to poor language abilities in this population remain unknown. In the current study, we utilized a novel paired repetition paradigm in order to investigate ERP responses associated with the detection and discrimination of speech and non-speech sounds in 4- to 6—year old children with ASD, compared with gender and verbal age matched controls. ERPs were recorded while children passively listened to pairs of stimuli that were either both speech sounds, both non-speech sounds, speech followed by non-speech, or non-speech followed by speech. Control participants exhibited N330 match/mismatch responses measured from temporal electrodes, reflecting speech versus non-speech detection, bilaterally, whereas children with ASD exhibited this effect only over temporal electrodes in the left hemisphere. Furthermore, while the control groups exhibited match/mismatch effects at approximately 600 ms (central N600, temporal P600) when a non-speech sound was followed by a speech sound, these effects were absent in the ASD group. These findings suggest that children with ASD fail to activate right hemisphere mechanisms, likely associated with social or emotional aspects of speech detection, when distinguishing non-speech from speech stimuli. Together, these results demonstrate the presence of atypical speech versus non-speech processing in children with ASD when compared with typically developing children matched on verbal age. PMID:28738063
Automated recognition of helium speech. Phase I: Investigation of microprocessor based analysis/synthesis system

NASA Astrophysics Data System (ADS)

Jelinek, H. J.

1986-01-01

This is the Final Report of Electronic Design Associates on its Phase I SBIR project. The purpose of this project is to develop a method for correcting helium speech, as experienced in diver-surface communication. The goal of the Phase I study was to design, prototype, and evaluate a real time helium speech corrector system based upon digital signal processing techniques. The general approach was to develop hardware (an IBM PC board) to digitize helium speech and software (a LAMBDA computer based simulation) to translate the speech. As planned in the study proposal, this initial prototype may now be used to assess expected performance from a self contained real time system which uses an identical algorithm. The Final Report details the work carried out to produce the prototype system. Four major project tasks were: a signal processing scheme for converting helium speech to normal sounding speech was generated. The signal processing scheme was simulated on a general purpose (LAMDA) computer. Actual helium speech was supplied to the simulation and the converted speech was generated. An IBM-PC based 14 bit data Input/Output board was designed and built. A bibliography of references on speech processing was generated.
How vocabulary size in two languages relates to efficiency in spoken word recognition by young Spanish-English bilinguals

PubMed Central

Marchman, Virginia A.; Fernald, Anne; Hurtado, Nereyda

2010-01-01

Research using online comprehension measures with monolingual children shows that speed and accuracy of spoken word recognition are correlated with lexical development. Here we examined speech processing efficiency in relation to vocabulary development in bilingual children learning both Spanish and English (n=26; 2;6 yrs). Between-language associations were weak: vocabulary size in Spanish was uncorrelated with vocabulary in English, and children’s facility in online comprehension in Spanish was unrelated to their facility in English. Instead, efficiency of online processing in one language was significantly related to vocabulary size in that language, after controlling for processing speed and vocabulary size in the other language. These links between efficiency of lexical access and vocabulary knowledge in bilinguals parallel those previously reported for Spanish and English monolinguals, suggesting that children’s ability to abstract information from the input in building a working lexicon relates fundamentally to mechanisms underlying the construction of language. PMID:19726000
Processing changes when listening to foreign-accented speech

PubMed Central

Romero-Rivas, Carlos; Martin, Clara D.; Costa, Albert

2015-01-01

This study investigates the mechanisms responsible for fast changes in processing foreign-accented speech. Event Related brain Potentials (ERPs) were obtained while native speakers of Spanish listened to native and foreign-accented speakers of Spanish. We observed a less positive P200 component for foreign-accented speech relative to native speech comprehension. This suggests that the extraction of spectral information and other important acoustic features was hampered during foreign-accented speech comprehension. However, the amplitude of the N400 component for foreign-accented speech comprehension decreased across the experiment, suggesting the use of a higher level, lexical mechanism. Furthermore, during native speech comprehension, semantic violations in the critical words elicited an N400 effect followed by a late positivity. During foreign-accented speech comprehension, semantic violations only elicited an N400 effect. Overall, our results suggest that, despite a lack of improvement in phonetic discrimination, native listeners experience changes at lexical-semantic levels of processing after brief exposure to foreign-accented speech. Moreover, these results suggest that lexical access, semantic integration and linguistic re-analysis processes are permeable to external factors, such as the accent of the speaker. PMID:25859209
Barista: A Framework for Concurrent Speech Processing by USC-SAIL

PubMed Central

Can, Doğan; Gibson, James; Vaz, Colin; Georgiou, Panayiotis G.; Narayanan, Shrikanth S.

2016-01-01

We present Barista, an open-source framework for concurrent speech processing based on the Kaldi speech recognition toolkit and the libcppa actor library. With Barista, we aim to provide an easy-to-use, extensible framework for constructing highly customizable concurrent (and/or distributed) networks for a variety of speech processing tasks. Each Barista network specifies a flow of data between simple actors, concurrent entities communicating by message passing, modeled after Kaldi tools. Leveraging the fast and reliable concurrency and distribution mechanisms provided by libcppa, Barista lets demanding speech processing tasks, such as real-time speech recognizers and complex training workflows, to be scheduled and executed on parallel (and/or distributed) hardware. Barista is released under the Apache License v2.0. PMID:27610047
Barista: A Framework for Concurrent Speech Processing by USC-SAIL.

PubMed

Can, Doğan; Gibson, James; Vaz, Colin; Georgiou, Panayiotis G; Narayanan, Shrikanth S

2014-05-01

We present Barista, an open-source framework for concurrent speech processing based on the Kaldi speech recognition toolkit and the libcppa actor library. With Barista, we aim to provide an easy-to-use, extensible framework for constructing highly customizable concurrent (and/or distributed) networks for a variety of speech processing tasks. Each Barista network specifies a flow of data between simple actors, concurrent entities communicating by message passing, modeled after Kaldi tools. Leveraging the fast and reliable concurrency and distribution mechanisms provided by libcppa, Barista lets demanding speech processing tasks, such as real-time speech recognizers and complex training workflows, to be scheduled and executed on parallel (and/or distributed) hardware. Barista is released under the Apache License v2.0.
Computationally Efficient Clustering of Audio-Visual Meeting Data

NASA Astrophysics Data System (ADS)

Hung, Hayley; Friedland, Gerald; Yeo, Chuohao

This chapter presents novel computationally efficient algorithms to extract semantically meaningful acoustic and visual events related to each of the participants in a group discussion using the example of business meeting recordings. The recording setup involves relatively few audio-visual sensors, comprising a limited number of cameras and microphones. We first demonstrate computationally efficient algorithms that can identify who spoke and when, a problem in speech processing known as speaker diarization. We also extract visual activity features efficiently from MPEG4 video by taking advantage of the processing that was already done for video compression. Then, we present a method of associating the audio-visual data together so that the content of each participant can be managed individually. The methods presented in this article can be used as a principal component that enables many higher-level semantic analysis tasks needed in search, retrieval, and navigation.
Neural pathways for visual speech perception

PubMed Central

Bernstein, Lynne E.; Liebenthal, Einat

2014-01-01

This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody) can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns of activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1) The visual perception of speech relies on visual pathway representations of speech qua speech. (2) A proposed site of these representations, the temporal visual speech area (TVSA) has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS). (3) Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA. PMID:25520611
Eye’m talking to you: speakers’ gaze direction modulates co-speech gesture processing in the right MTG

PubMed Central

Toni, Ivan; Hagoort, Peter; Kelly, Spencer D.; Özyürek, Aslı

2015-01-01

Recipients process information from speech and co-speech gestures, but it is currently unknown how this processing is influenced by the presence of other important social cues, especially gaze direction, a marker of communicative intent. Such cues may modulate neural activity in regions associated either with the processing of ostensive cues, such as eye gaze, or with the processing of semantic information, provided by speech and gesture. Participants were scanned (fMRI) while taking part in triadic communication involving two recipients and a speaker. The speaker uttered sentences that were and were not accompanied by complementary iconic gestures. Crucially, the speaker alternated her gaze direction, thus creating two recipient roles: addressed (direct gaze) vs unaddressed (averted gaze) recipient. The comprehension of Speech&Gesture relative to SpeechOnly utterances recruited middle occipital, middle temporal and inferior frontal gyri, bilaterally. The calcarine sulcus and posterior cingulate cortex were sensitive to differences between direct and averted gaze. Most importantly, Speech&Gesture utterances, but not SpeechOnly utterances, produced additional activity in the right middle temporal gyrus when participants were addressed. Marking communicative intent with gaze direction modulates the processing of speech–gesture utterances in cerebral areas typically associated with the semantic processing of multi-modal communicative acts. PMID:24652857
Auditory-motor interactions in pediatric motor speech disorders: neurocomputational modeling of disordered development.

PubMed

Terband, H; Maassen, B; Guenther, F H; Brumberg, J

2014-01-01

Differentiating the symptom complex due to phonological-level disorders, speech delay and pediatric motor speech disorders is a controversial issue in the field of pediatric speech and language pathology. The present study investigated the developmental interaction between neurological deficits in auditory and motor processes using computational modeling with the DIVA model. In a series of computer simulations, we investigated the effect of a motor processing deficit alone (MPD), and the effect of a motor processing deficit in combination with an auditory processing deficit (MPD+APD) on the trajectory and endpoint of speech motor development in the DIVA model. Simulation results showed that a motor programming deficit predominantly leads to deterioration on the phonological level (phonemic mappings) when auditory self-monitoring is intact, and on the systemic level (systemic mapping) if auditory self-monitoring is impaired. These findings suggest a close relation between quality of auditory self-monitoring and the involvement of phonological vs. motor processes in children with pediatric motor speech disorders. It is suggested that MPD+APD might be involved in typically apraxic speech output disorders and MPD in pediatric motor speech disorders that also have a phonological component. Possibilities to verify these hypotheses using empirical data collected from human subjects are discussed. The reader will be able to: (1) identify the difficulties in studying disordered speech motor development; (2) describe the differences in speech motor characteristics between SSD and subtype CAS; (3) describe the different types of learning that occur in the sensory-motor system during babbling and early speech acquisition; (4) identify the neural control subsystems involved in speech production; (5) describe the potential role of auditory self-monitoring in developmental speech disorders. Copyright © 2014 Elsevier Inc. All rights reserved.
All words are not created equal: Expectations about word length guide infant statistical learning

PubMed Central

Lew-Williams, Casey; Saffran, Jenny R.

2011-01-01

Infants have been described as ‘statistical learners’ capable of extracting structure (such as words) from patterned input (such as language). Here, we investigated whether prior knowledge influences how infants track transitional probabilities in word segmentation tasks. Are infants biased by prior experience when engaging in sequential statistical learning? In a laboratory simulation of learning across time, we exposed 9- and 10-month-old infants to a list of either bisyllabic or trisyllabic nonsense words, followed by a pause-free speech stream composed of a different set of bisyllabic or trisyllabic nonsense words. Listening times revealed successful segmentation of words from fluent speech only when words were uniformly bisyllabic or trisyllabic throughout both phases of the experiment. Hearing trisyllabic words during the pre-exposure phase derailed infants’ abilities to segment speech into bisyllabic words, and vice versa. We conclude that prior knowledge about word length equips infants with perceptual expectations that facilitate efficient processing of subsequent language input. PMID:22088408
Speech recognition in advanced rotorcraft - Using speech controls to reduce manual control overload

NASA Technical Reports Server (NTRS)

Vidulich, Michael A.; Bortolussi, Michael R.

1988-01-01

An experiment has been conducted to ascertain the usefulness of helicopter pilot speech controls and their effect on time-sharing performance, under the impetus of multiple-resource theories of attention which predict that time-sharing should be more efficient with mixed manual and speech controls than with all-manual ones. The test simulation involved an advanced, single-pilot scout/attack helicopter. Performance and subjective workload levels obtained supported the claimed utility of speech recognition-based controls; specifically, time-sharing performance was improved while preparing a data-burst transmission of information during helicopter hover.
A Hierarchical multi-input and output Bi-GRU Model for Sentiment Analysis on Customer Reviews

NASA Astrophysics Data System (ADS)

Zhang, Liujie; Zhou, Yanquan; Duan, Xiuyu; Chen, Ruiqi

2018-03-01

Multi-label sentiment classification on customer reviews is a practical challenging task in Natural Language Processing. In this paper, we propose a hierarchical multi-input and output model based bi-directional recurrent neural network, which both considers the semantic and lexical information of emotional expression. Our model applies two independent Bi-GRU layer to generate part of speech and sentence representation. Then the lexical information is considered via attention over output of softmax activation on part of speech representation. In addition, we combine probability of auxiliary labels as feature with hidden layer to capturing crucial correlation between output labels. The experimental result shows that our model is computationally efficient and achieves breakthrough improvements on customer reviews dataset.
Auditory processing and speech perception in children with specific language impairment: relations with oral language and literacy skills.

PubMed

Vandewalle, Ellen; Boets, Bart; Ghesquière, Pol; Zink, Inge

2012-01-01

This longitudinal study investigated temporal auditory processing (frequency modulation and between-channel gap detection) and speech perception (speech-in-noise and categorical perception) in three groups of 6 years 3 months to 6 years 8 months-old children attending grade 1: (1) children with specific language impairment (SLI) and literacy delay (n = 8), (2) children with SLI and normal literacy (n = 10) and (3) typically developing children (n = 14). Moreover, the relations between these auditory processing and speech perception skills and oral language and literacy skills in grade 1 and grade 3 were analyzed. The SLI group with literacy delay scored significantly lower than both other groups on speech perception, but not on temporal auditory processing. Both normal reading groups did not differ in terms of speech perception or auditory processing. Speech perception was significantly related to reading and spelling in grades 1 and 3 and had a unique predictive contribution to reading growth in grade 3, even after controlling reading level, phonological ability, auditory processing and oral language skills in grade 1. These findings indicated that speech perception also had a unique direct impact upon reading development and not only through its relation with phonological awareness. Moreover, speech perception seemed to be more associated with the development of literacy skills and less with oral language ability. Copyright © 2011 Elsevier Ltd. All rights reserved.
Speech Perception as a Cognitive Process: The Interactive Activation Model.

ERIC Educational Resources Information Center

Elman, Jeffrey L.; McClelland, James L.

Research efforts to model speech perception in terms of a processing system in which knowledge and processing are distributed over large numbers of highly interactive--but computationally primative--elements are described in this report. After discussing the properties of speech that demand a parallel interactive processing system, the report…
Nursing acceptance of a speech-input interface: a preliminary investigation.

PubMed

Dillon, T W; McDowell, D; Norcio, A F; DeHaemer, M J

1994-01-01

Many new technologies are being developed to improve the efficiency and productivity of nursing staffs. User acceptance is a key to the success of these technologies. In this article, the authors present a discussion of nursing acceptance of computer systems, review the basic design issues for creating a speech-input interface, and report preliminary findings of a study of nursing acceptance of a prototype speech-input interface. Results of the study showed that the 19 nursing subjects expressed acceptance of the prototype speech-input interface.
Status Report on Speech Research. A Report on the Status and Progress of Studies on the Nature of Speech, Instrumentation for Its Investigation, and Practical Applications.

DTIC Science & Technology

1983-09-30

determines, in part, what the infant says; and if perception is to guide production, the two processes must be, in some sense, isomorphic. An artificial speech ...influences on speech perception processes . Perception & Psychophysics, 24, 253-257. MacKain, K. S., Studdert-Kennedy, M., Spieker, S., & Stern, D. (1983...sentence contexts. In A. Cohen & S. E. G. Nooteboom (Eds.), Structure and process in speech perception (pp. 69-89). New York: Springer- Verlag. Larkey
Action planning and predictive coding when speaking

PubMed Central

Wang, Jun; Mathalon, Daniel H.; Roach, Brian J.; Reilly, James; Keedy, Sarah; Sweeney, John A.; Ford, Judith M.

2014-01-01

Across the animal kingdom, sensations resulting from an animal's own actions are processed differently from sensations resulting from external sources, with self-generated sensations being suppressed. A forward model has been proposed to explain this process across sensorimotor domains. During vocalization, reduced processing of one's own speech is believed to result from a comparison of speech sounds to corollary discharges of intended speech production generated from efference copies of commands to speak. Until now, anatomical and functional evidence validating this model in humans has been indirect. Using EEG with anatomical MRI to facilitate source localization, we demonstrate that inferior frontal gyrus activity during the 300ms before speaking was associated with suppressed processing of speech sounds in auditory cortex around 100ms after speech onset (N1). These findings indicate that an efference copy from speech areas in prefrontal cortex is transmitted to auditory cortex, where it is used to suppress processing of anticipated speech sounds. About 100ms after N1, a subsequent auditory cortical component (P2) was not suppressed during talking. The combined N1 and P2 effects suggest that although sensory processing is suppressed as reflected in N1, perceptual gaps are filled as reflected in the lack of P2 suppression, explaining the discrepancy between sensory suppression and preserved sensory experiences. These findings, coupled with the coherence between relevant brain regions before and during speech, provide new mechanistic understanding of the complex interactions between action planning and sensory processing that provide for differentiated tagging and monitoring of one's own speech, processes disrupted in neuropsychiatric disorders. PMID:24423729
A causal test of the motor theory of speech perception: a case of impaired speech production and spared speech perception.

PubMed

Stasenko, Alena; Bonn, Cory; Teghipco, Alex; Garcea, Frank E; Sweet, Catherine; Dombovy, Mary; McDonough, Joyce; Mahon, Bradford Z

2015-01-01

The debate about the causal role of the motor system in speech perception has been reignited by demonstrations that motor processes are engaged during the processing of speech sounds. Here, we evaluate which aspects of auditory speech processing are affected, and which are not, in a stroke patient with dysfunction of the speech motor system. We found that the patient showed a normal phonemic categorical boundary when discriminating two non-words that differ by a minimal pair (e.g., ADA-AGA). However, using the same stimuli, the patient was unable to identify or label the non-word stimuli (using a button-press response). A control task showed that he could identify speech sounds by speaker gender, ruling out a general labelling impairment. These data suggest that while the motor system is not causally involved in perception of the speech signal, it may be used when other cues (e.g., meaning, context) are not available.
Visual Feedback of Tongue Movement for Novel Speech Sound Learning

PubMed Central

Katz, William F.; Mehta, Sonya

2015-01-01

Pronunciation training studies have yielded important information concerning the processing of audiovisual (AV) information. Second language (L2) learners show increased reliance on bottom-up, multimodal input for speech perception (compared to monolingual individuals). However, little is known about the role of viewing one's own speech articulation processes during speech training. The current study investigated whether real-time, visual feedback for tongue movement can improve a speaker's learning of non-native speech sounds. An interactive 3D tongue visualization system based on electromagnetic articulography (EMA) was used in a speech training experiment. Native speakers of American English produced a novel speech sound (/ɖ/; a voiced, coronal, palatal stop) before, during, and after trials in which they viewed their own speech movements using the 3D model. Talkers' productions were evaluated using kinematic (tongue-tip spatial positioning) and acoustic (burst spectra) measures. The results indicated a rapid gain in accuracy associated with visual feedback training. The findings are discussed with respect to neural models for multimodal speech processing. PMID:26635571

Is complex signal processing for bone conduction hearing aids useful?

PubMed

Kompis, Martin; Kurz, Anja; Pfiffner, Flurin; Senn, Pascal; Arnold, Andreas; Caversaccio, Marco

2014-05-01

To establish whether complex signal processing is beneficial for users of bone anchored hearing aids. Review and analysis of two studies from our own group, each comparing a speech processor with basic digital signal processing (either Baha Divino or Baha Intenso) and a processor with complex digital signal processing (either Baha BP100 or Baha BP110 power). The main differences between basic and complex signal processing are the number of audiologist accessible frequency channels and the availability and complexity of the directional multi-microphone noise reduction and loudness compression systems. Both studies show a small, statistically non-significant improvement of speech understanding in quiet with the complex digital signal processing. The average improvement for speech in noise is +0.9 dB, if speech and noise are emitted both from the front of the listener. If noise is emitted from the rear and speech from the front of the listener, the advantage of the devices with complex digital signal processing as opposed to those with basic signal processing increases, on average, to +3.2 dB (range +2.3 … +5.1 dB, p ≤ 0.0032). Complex digital signal processing does indeed improve speech understanding, especially in noise coming from the rear. This finding has been supported by another study, which has been published recently by a different research group. When compared to basic digital signal processing, complex digital signal processing can increase speech understanding of users of bone anchored hearing aids. The benefit is most significant for speech understanding in noise.
Alternative Speech Communication System for Persons with Severe Speech Disorders

NASA Astrophysics Data System (ADS)

Selouani, Sid-Ahmed; Sidi Yakoub, Mohammed; O'Shaughnessy, Douglas

2009-12-01

Assistive speech-enabled systems are proposed to help both French and English speaking persons with various speech disorders. The proposed assistive systems use automatic speech recognition (ASR) and speech synthesis in order to enhance the quality of communication. These systems aim at improving the intelligibility of pathologic speech making it as natural as possible and close to the original voice of the speaker. The resynthesized utterances use new basic units, a new concatenating algorithm and a grafting technique to correct the poorly pronounced phonemes. The ASR responses are uttered by the new speech synthesis system in order to convey an intelligible message to listeners. Experiments involving four American speakers with severe dysarthria and two Acadian French speakers with sound substitution disorders (SSDs) are carried out to demonstrate the efficiency of the proposed methods. An improvement of the Perceptual Evaluation of the Speech Quality (PESQ) value of 5% and more than 20% is achieved by the speech synthesis systems that deal with SSD and dysarthria, respectively.
Speech Research: A Report on the Status and Progress of Studies on the Nature of Speech, Instrumentation for Its Investigation, and Practical Applications, 1 July-31 December 1972.

ERIC Educational Resources Information Center

Haskins Labs., New Haven, CT.

This report on speech research contains 21 papers describing research conducted on a variety of topics concerning speech perception, processing, and production. The initial two reports deal with brain function in speech; several others concern ear function, both in terms of perception and information processing. A number of reports describe…
Perception of temporally modified speech in auditory neuropathy.

PubMed

Hassan, Dalia Mohamed

2011-01-01

Disrupted auditory nerve activity in auditory neuropathy (AN) significantly impairs the sequential processing of auditory information, resulting in poor speech perception. This study investigated the ability of AN subjects to perceive temporally modified consonant-vowel (CV) pairs and shed light on their phonological awareness skills. Four Arabic CV pairs were selected: /ki/-/gi/, /to/-/do/, /si/-/sti/ and /so/-/zo/. The formant transitions in consonants and the pauses between CV pairs were prolonged. Rhyming, segmentation and blending skills were tested using words at a natural rate of speech and with prolongation of the speech stream. Fourteen adult AN subjects were compared to a matched group of cochlear-impaired patients in their perception of acoustically processed speech. The AN group distinguished the CV pairs at a low speech rate, in particular with modification of the consonant duration. Phonological awareness skills deteriorated in adult AN subjects but improved with prolongation of the speech inter-syllabic time interval. A rehabilitation program for AN should consider temporal modification of speech, training for auditory temporal processing and the use of devices with innovative signal processing schemes. Verbal modifications as well as visual imaging appear to be promising compensatory strategies for remediating the affected phonological processing skills.
Slowed Speech Input has a Differential Impact on On-line and Off-line Processing in Children’s Comprehension of Pronouns

PubMed Central

Walenski, Matthew; Swinney, David

2009-01-01

The central question underlying this study revolves around how children process co-reference relationships—such as those evidenced by pronouns (him) and reflexives (himself)—and how a slowed rate of speech input may critically affect this process. Previous studies of child language processing have demonstrated that typical language developing (TLD) children as young as 4 years of age process co-reference relations in a manner similar to adults on-line. In contrast, off-line measures of pronoun comprehension suggest a developmental delay for pronouns (relative to reflexives). The present study examines dependency relations in TLD children (ages 5–13) and investigates how a slowed rate of speech input affects the unconscious (on-line) and conscious (off-line) parsing of these constructions. For the on-line investigations (using a cross-modal picture priming paradigm), results indicate that at a normal rate of speech TLD children demonstrate adult-like syntactic reflexes. At a slowed rate of speech the typical language developing children displayed a breakdown in automatic syntactic parsing (again, similar to the pattern seen in unimpaired adults). As demonstrated in the literature, our off-line investigations (sentence/picture matching task) revealed that these children performed much better on reflexives than on pronouns at a regular speech rate. However, at the slow speech rate, performance on pronouns was substantially improved, whereas performance on reflexives was not different than at the regular speech rate. We interpret these results in light of a distinction between fast automatic processes (relied upon for on-line processing in real time) and conscious reflective processes (relied upon for off-line processing), such that slowed speech input disrupts the former, yet improves the latter. PMID:19343495
Statistical properties of Chinese phonemic networks

NASA Astrophysics Data System (ADS)

Yu, Shuiyuan; Liu, Haitao; Xu, Chunshan

2011-04-01

The study of properties of speech sound systems is of great significance in understanding the human cognitive mechanism and the working principles of speech sound systems. Some properties of speech sound systems, such as the listener-oriented feature and the talker-oriented feature, have been unveiled with the statistical study of phonemes in human languages and the research of the interrelations between human articulatory gestures and the corresponding acoustic parameters. With all the phonemes of speech sound systems treated as a coherent whole, our research, which focuses on the dynamic properties of speech sound systems in operation, investigates some statistical parameters of Chinese phoneme networks based on real text and dictionaries. The findings are as follows: phonemic networks have high connectivity degrees and short average distances; the degrees obey normal distribution and the weighted degrees obey power law distribution; vowels enjoy higher priority than consonants in the actual operation of speech sound systems; the phonemic networks have high robustness against targeted attacks and random errors. In addition, for investigating the structural properties of a speech sound system, a statistical study of dictionaries is conducted, which shows the higher frequency of shorter words and syllables and the tendency that the longer a word is, the shorter the syllables composing it are. From these structural properties and dynamic properties one can derive the following conclusion: the static structure of a speech sound system tends to promote communication efficiency and save articulation effort while the dynamic operation of this system gives preference to reliable transmission and easy recognition. In short, a speech sound system is an effective, efficient and reliable communication system optimized in many aspects.
Engaged listeners: shared neural processing of powerful political speeches

PubMed Central

Häcker, Frank E. K.; Honey, Christopher J.; Hasson, Uri

2015-01-01

Powerful speeches can captivate audiences, whereas weaker speeches fail to engage their listeners. What is happening in the brains of a captivated audience? Here, we assess audience-wide functional brain dynamics during listening to speeches of varying rhetorical quality. The speeches were given by German politicians and evaluated as rhetorically powerful or weak. Listening to each of the speeches induced similar neural response time courses, as measured by inter-subject correlation analysis, in widespread brain regions involved in spoken language processing. Crucially, alignment of the time course across listeners was stronger for rhetorically powerful speeches, especially for bilateral regions of the superior temporal gyri and medial prefrontal cortex. Thus, during powerful speeches, listeners as a group are more coupled to each other, suggesting that powerful speeches are more potent in taking control of the listeners’ brain responses. Weaker speeches were processed more heterogeneously, although they still prompted substantially correlated responses. These patterns of coupled neural responses bear resemblance to metaphors of resonance, which are often invoked in discussions of speech impact, and contribute to the literature on auditory attention under natural circumstances. Overall, this approach opens up possibilities for research on the neural mechanisms mediating the reception of entertaining or persuasive messages. PMID:25653012
Processing of speech signals for physical and sensory disabilities.

PubMed Central

Levitt, H

1995-01-01

Assistive technology involving voice communication is used primarily by people who are deaf, hard of hearing, or who have speech and/or language disabilities. It is also used to a lesser extent by people with visual or motor disabilities. A very wide range of devices has been developed for people with hearing loss. These devices can be categorized not only by the modality of stimulation [i.e., auditory, visual, tactile, or direct electrical stimulation of the auditory nerve (auditory-neural)] but also in terms of the degree of speech processing that is used. At least four such categories can be distinguished: assistive devices (a) that are not designed specifically for speech, (b) that take the average characteristics of speech into account, (c) that process articulatory or phonetic characteristics of speech, and (d) that embody some degree of automatic speech recognition. Assistive devices for people with speech and/or language disabilities typically involve some form of speech synthesis or symbol generation for severe forms of language disability. Speech synthesis is also used in text-to-speech systems for sightless persons. Other applications of assistive technology involving voice communication include voice control of wheelchairs and other devices for people with mobility disabilities. Images Fig. 4 PMID:7479816
Loss tolerant speech decoder for telecommunications

NASA Technical Reports Server (NTRS)

Prieto, Jr., Jaime L. (Inventor)

1999-01-01

A method and device for extrapolating past signal-history data for insertion into missing data segments in order to conceal digital speech frame errors. The extrapolation method uses past-signal history that is stored in a buffer. The method is implemented with a device that utilizes a finite-impulse response (FIR) multi-layer feed-forward artificial neural network that is trained by back-propagation for one-step extrapolation of speech compression algorithm (SCA) parameters. Once a speech connection has been established, the speech compression algorithm device begins sending encoded speech frames. As the speech frames are received, they are decoded and converted back into speech signal voltages. During the normal decoding process, pre-processing of the required SCA parameters will occur and the results stored in the past-history buffer. If a speech frame is detected to be lost or in error, then extrapolation modules are executed and replacement SCA parameters are generated and sent as the parameters required by the SCA. In this way, the information transfer to the SCA is transparent, and the SCA processing continues as usual. The listener will not normally notice that a speech frame has been lost because of the smooth transition between the last-received, lost, and next-received speech frames.
Processing of Speech Signals for Physical and Sensory Disabilities

NASA Astrophysics Data System (ADS)

Levitt, Harry

1995-10-01

Assistive technology involving voice communication is used primarily by people who are deaf, hard of hearing, or who have speech and/or language disabilities. It is also used to a lesser extent by people with visual or motor disabilities. A very wide range of devices has been developed for people with hearing loss. These devices can be categorized not only by the modality of stimulation [i.e., auditory, visual, tactile, or direct electrical stimulation of the auditory nerve (auditory-neural)] but also in terms of the degree of speech processing that is used. At least four such categories can be distinguished: assistive devices (a) that are not designed specifically for speech, (b) that take the average characteristics of speech into account, (c) that process articulatory or phonetic characteristics of speech, and (d) that embody some degree of automatic speech recognition. Assistive devices for people with speech and/or language disabilities typically involve some form of speech synthesis or symbol generation for severe forms of language disability. Speech synthesis is also used in text-to-speech systems for sightless persons. Other applications of assistive technology involving voice communication include voice control of wheelchairs and other devices for people with mobility disabilities.
Modeling Driving Performance Using In-Vehicle Speech Data From a Naturalistic Driving Study.

PubMed

Kuo, Jonny; Charlton, Judith L; Koppel, Sjaan; Rudin-Brown, Christina M; Cross, Suzanne

2016-09-01

We aimed to (a) describe the development and application of an automated approach for processing in-vehicle speech data from a naturalistic driving study (NDS), (b) examine the influence of child passenger presence on driving performance, and (c) model this relationship using in-vehicle speech data. Parent drivers frequently engage in child-related secondary behaviors, but the impact on driving performance is unknown. Applying automated speech-processing techniques to NDS audio data would facilitate the analysis of in-vehicle driver-child interactions and their influence on driving performance. Speech activity detection and speaker diarization algorithms were applied to audio data from a Melbourne-based NDS involving 42 families. Multilevel models were developed to evaluate the effect of speech activity and the presence of child passengers on driving performance. Speech activity was significantly associated with velocity and steering angle variability. Child passenger presence alone was not associated with changes in driving performance. However, speech activity in the presence of two child passengers was associated with the most variability in driving performance. The effects of in-vehicle speech on driving performance in the presence of child passengers appear to be heterogeneous, and multiple factors may need to be considered in evaluating their impact. This goal can potentially be achieved within large-scale NDS through the automated processing of observational data, including speech. Speech-processing algorithms enable new perspectives on driving performance to be gained from existing NDS data, and variables that were once labor-intensive to process can be readily utilized in future research. © 2016, Human Factors and Ergonomics Society.
Visemic Processing in Audiovisual Discrimination of Natural Speech: A Simultaneous fMRI-EEG Study

ERIC Educational Resources Information Center

Dubois, Cyril; Otzenberger, Helene; Gounot, Daniel; Sock, Rudolph; Metz-Lutz, Marie-Noelle

2012-01-01

In a noisy environment, visual perception of articulatory movements improves natural speech intelligibility. Parallel to phonemic processing based on auditory signal, visemic processing constitutes a counterpart based on "visemes", the distinctive visual units of speech. Aiming at investigating the neural substrates of visemic processing in a…
The influence of (central) auditory processing disorder on the severity of speech-sound disorders in children.

PubMed

Vilela, Nadia; Barrozo, Tatiane Faria; Pagan-Neves, Luciana de Oliveira; Sanches, Seisse Gabriela Gandolfi; Wertzner, Haydée Fiszbein; Carvallo, Renata Mota Mamede

2016-02-01

To identify a cutoff value based on the Percentage of Consonants Correct-Revised index that could indicate the likelihood of a child with a speech-sound disorder also having a (central) auditory processing disorder . Language, audiological and (central) auditory processing evaluations were administered. The participants were 27 subjects with speech-sound disorders aged 7 to 10 years and 11 months who were divided into two different groups according to their (central) auditory processing evaluation results. When a (central) auditory processing disorder was present in association with a speech disorder, the children tended to have lower scores on phonological assessments. A greater severity of speech disorder was related to a greater probability of the child having a (central) auditory processing disorder. The use of a cutoff value for the Percentage of Consonants Correct-Revised index successfully distinguished between children with and without a (central) auditory processing disorder. The severity of speech-sound disorder in children was influenced by the presence of (central) auditory processing disorder. The attempt to identify a cutoff value based on a severity index was successful.
Issues in Perceptual Speech Analysis in Cleft Palate and Related Disorders: A Review

ERIC Educational Resources Information Center

Sell, Debbie

2005-01-01

Perceptual speech assessment is central to the evaluation of speech outcomes associated with cleft palate and velopharyngeal dysfunction. However, the complexity of this process is perhaps sometimes underestimated. To draw together the many different strands in the complex process of perceptual speech assessment and analysis, and make…
The right hemisphere is highlighted in connected natural speech production and perception.

PubMed

Alexandrou, Anna Maria; Saarinen, Timo; Mäkelä, Sasu; Kujala, Jan; Salmelin, Riitta

2017-05-15

Current understanding of the cortical mechanisms of speech perception and production stems mostly from studies that focus on single words or sentences. However, it has been suggested that processing of real-life connected speech may rely on additional cortical mechanisms. In the present study, we examined the neural substrates of natural speech production and perception with magnetoencephalography by modulating three central features related to speech: amount of linguistic content, speaking rate and social relevance. The amount of linguistic content was modulated by contrasting natural speech production and perception to speech-like non-linguistic tasks. Meaningful speech was produced and perceived at three speaking rates: normal, slow and fast. Social relevance was probed by having participants attend to speech produced by themselves and an unknown person. These speech-related features were each associated with distinct spatiospectral modulation patterns that involved cortical regions in both hemispheres. Natural speech processing markedly engaged the right hemisphere in addition to the left. In particular, the right temporo-parietal junction, previously linked to attentional processes and social cognition, was highlighted in the task modulations. The present findings suggest that its functional role extends to active generation and perception of meaningful, socially relevant speech. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Acoustic Processing of Temporally Modulated Sounds in Infants: Evidence from a Combined Near-Infrared Spectroscopy and EEG Study

PubMed Central

Telkemeyer, Silke; Rossi, Sonja; Nierhaus, Till; Steinbrink, Jens; Obrig, Hellmuth; Wartenburger, Isabell

2010-01-01

Speech perception requires rapid extraction of the linguistic content from the acoustic signal. The ability to efficiently process rapid changes in auditory information is important for decoding speech and thereby crucial during language acquisition. Investigating functional networks of speech perception in infancy might elucidate neuronal ensembles supporting perceptual abilities that gate language acquisition. Interhemispheric specializations for language have been demonstrated in infants. How these asymmetries are shaped by basic temporal acoustic properties is under debate. We recently provided evidence that newborns process non-linguistic sounds sharing temporal features with language in a differential and lateralized fashion. The present study used the same material while measuring brain responses of 6 and 3 month old infants using simultaneous recordings of electroencephalography (EEG) and near-infrared spectroscopy (NIRS). NIRS reveals that the lateralization observed in newborns remains constant over the first months of life. While fast acoustic modulations elicit bilateral neuronal activations, slow modulations lead to right-lateralized responses. Additionally, auditory-evoked potentials and oscillatory EEG responses show differential responses for fast and slow modulations indicating a sensitivity for temporal acoustic variations. Oscillatory responses reveal an effect of development, that is, 6 but not 3 month old infants show stronger theta-band desynchronization for slowly modulated sounds. Whether this developmental effect is due to increasing fine-grained perception for spectrotemporal sounds in general remains speculative. Our findings support the notion that a more general specialization for acoustic properties can be considered the basis for lateralization of speech perception. The results show that concurrent assessment of vascular based imaging and electrophysiological responses have great potential in the research on language acquisition. PMID:21716574
Review of Visual Speech Perception by Hearing and Hearing-Impaired People: Clinical Implications

ERIC Educational Resources Information Center

Woodhouse, Lynn; Hickson, Louise; Dodd, Barbara

2009-01-01

Background: Speech perception is often considered specific to the auditory modality, despite convincing evidence that speech processing is bimodal. The theoretical and clinical roles of speech-reading for speech perception, however, have received little attention in speech-language therapy. Aims: The role of speech-read information for speech…
Model-Based Speech Signal Coding Using Optimized Temporal Decomposition for Storage and Broadcasting Applications

NASA Astrophysics Data System (ADS)

Athaudage, Chandranath R. N.; Bradley, Alan B.; Lech, Margaret

2003-12-01

A dynamic programming-based optimization strategy for a temporal decomposition (TD) model of speech and its application to low-rate speech coding in storage and broadcasting is presented. In previous work with the spectral stability-based event localizing (SBEL) TD algorithm, the event localization was performed based on a spectral stability criterion. Although this approach gave reasonably good results, there was no assurance on the optimality of the event locations. In the present work, we have optimized the event localizing task using a dynamic programming-based optimization strategy. Simulation results show that an improved TD model accuracy can be achieved. A methodology of incorporating the optimized TD algorithm within the standard MELP speech coder for the efficient compression of speech spectral information is also presented. The performance evaluation results revealed that the proposed speech coding scheme achieves 50%-60% compression of speech spectral information with negligible degradation in the decoded speech quality.
Method and apparatus for obtaining complete speech signals for speech recognition applications

NASA Technical Reports Server (NTRS)

Abrash, Victor (Inventor); Cesari, Federico (Inventor); Franco, Horacio (Inventor); George, Christopher (Inventor); Zheng, Jing (Inventor)

2009-01-01

The present invention relates to a method and apparatus for obtaining complete speech signals for speech recognition applications. In one embodiment, the method continuously records an audio stream comprising a sequence of frames to a circular buffer. When a user command to commence or terminate speech recognition is received, the method obtains a number of frames of the audio stream occurring before or after the user command in order to identify an augmented audio signal for speech recognition processing. In further embodiments, the method analyzes the augmented audio signal in order to locate starting and ending speech endpoints that bound at least a portion of speech to be processed for recognition. At least one of the speech endpoints is located using a Hidden Markov Model.
Varying acoustic-phonemic ambiguity reveals that talker normalization is obligatory in speech processing.

PubMed

Choi, Ja Young; Hu, Elly R; Perrachione, Tyler K

2018-04-01

The nondeterministic relationship between speech acoustics and abstract phonemic representations imposes a challenge for listeners to maintain perceptual constancy despite the highly variable acoustic realization of speech. Talker normalization facilitates speech processing by reducing the degrees of freedom for mapping between encountered speech and phonemic representations. While this process has been proposed to facilitate the perception of ambiguous speech sounds, it is currently unknown whether talker normalization is affected by the degree of potential ambiguity in acoustic-phonemic mapping. We explored the effects of talker normalization on speech processing in a series of speeded classification paradigms, parametrically manipulating the potential for inconsistent acoustic-phonemic relationships across talkers for both consonants and vowels. Listeners identified words with varying potential acoustic-phonemic ambiguity across talkers (e.g., beet/boat vs. boot/boat) spoken by single or mixed talkers. Auditory categorization of words was always slower when listening to mixed talkers compared to a single talker, even when there was no potential acoustic ambiguity between target sounds. Moreover, the processing cost imposed by mixed talkers was greatest when words had the most potential acoustic-phonemic overlap across talkers. Models of acoustic dissimilarity between target speech sounds did not account for the pattern of results. These results suggest (a) that talker normalization incurs the greatest processing cost when disambiguating highly confusable sounds and (b) that talker normalization appears to be an obligatory component of speech perception, taking place even when the acoustic-phonemic relationships across sounds are unambiguous.

EEG oscillations entrain their phase to high-level features of speech sound.

PubMed

Zoefel, Benedikt; VanRullen, Rufin

2016-01-01

Phase entrainment of neural oscillations, the brain's adjustment to rhythmic stimulation, is a central component in recent theories of speech comprehension: the alignment between brain oscillations and speech sound improves speech intelligibility. However, phase entrainment to everyday speech sound could also be explained by oscillations passively following the low-level periodicities (e.g., in sound amplitude and spectral content) of auditory stimulation-and not by an adjustment to the speech rhythm per se. Recently, using novel speech/noise mixture stimuli, we have shown that behavioral performance can entrain to speech sound even when high-level features (including phonetic information) are not accompanied by fluctuations in sound amplitude and spectral content. In the present study, we report that neural phase entrainment might underlie our behavioral findings. We observed phase-locking between electroencephalogram (EEG) and speech sound in response not only to original (unprocessed) speech but also to our constructed "high-level" speech/noise mixture stimuli. Phase entrainment to original speech and speech/noise sound did not differ in the degree of entrainment, but rather in the actual phase difference between EEG signal and sound. Phase entrainment was not abolished when speech/noise stimuli were presented in reverse (which disrupts semantic processing), indicating that acoustic (rather than linguistic) high-level features play a major role in the observed neural entrainment. Our results provide further evidence for phase entrainment as a potential mechanism underlying speech processing and segmentation, and for the involvement of high-level processes in the adjustment to the rhythm of speech. Copyright © 2015 Elsevier Inc. All rights reserved.
Scaling and universality in the human voice.

PubMed

Luque, Jordi; Luque, Bartolo; Lacasa, Lucas

2015-04-06

Speech is a distinctive complex feature of human capabilities. In order to understand the physics underlying speech production, in this work, we empirically analyse the statistics of large human speech datasets ranging several languages. We first show that during speech, the energy is unevenly released and power-law distributed, reporting a universal robust Gutenberg-Richter-like law in speech. We further show that such 'earthquakes in speech' show temporal correlations, as the interevent statistics are again power-law distributed. As this feature takes place in the intraphoneme range, we conjecture that the process responsible for this complex phenomenon is not cognitive, but it resides in the physiological (mechanical) mechanisms of speech production. Moreover, we show that these waiting time distributions are scale invariant under a renormalization group transformation, suggesting that the process of speech generation is indeed operating close to a critical point. These results are put in contrast with current paradigms in speech processing, which point towards low dimensional deterministic chaos as the origin of nonlinear traits in speech fluctuations. As these latter fluctuations are indeed the aspects that humanize synthetic speech, these findings may have an impact in future speech synthesis technologies. Results are robust and independent of the communication language or the number of speakers, pointing towards a universal pattern and yet another hint of complexity in human speech. © 2015 The Author(s) Published by the Royal Society. All rights reserved.
Language Comprehension in Language-Learning Impaired Children Improved with Acoustically Modified Speech

NASA Astrophysics Data System (ADS)

Tallal, Paula; Miller, Steve L.; Bedi, Gail; Byma, Gary; Wang, Xiaoqin; Nagarajan, Srikantan S.; Schreiner, Christoph; Jenkins, William M.; Merzenich, Michael M.

1996-01-01

A speech processing algorithm was developed to create more salient versions of the rapidly changing elements in the acoustic waveform of speech that have been shown to be deficiently processed by language-learning impaired (LLI) children. LLI children received extensive daily training, over a 4-week period, with listening exercises in which all speech was translated into this synthetic form. They also received daily training with computer "games" designed to adaptively drive improvements in temporal processing thresholds. Significant improvements in speech discrimination and language comprehension abilities were demonstrated in two independent groups of LLI children.
A PILOT STUDY COMPARING THE BLOCK SYSTEM AND THE INTERMITTENT SYSTEM OF SCHEDULING SPEECH CORRECTION CASES IN THE PUBLIC SCHOOLS.

ERIC Educational Resources Information Center

WEAVER, JOHN B.; WOLLERSHEIM, JANET P.

TO DETERMINE THE MOST EFFICIENT USES OF THE PUBLIC SCHOOL SPEECH CORRECTIONIST'S SKILLS AND TIME, A STUDY WAS UNDERTAKEN TO INVESTIGATE THE EFFECTIVENESS OF THE INTERMITTENT SYSTEM AND THE BLOCK SYSTEM OF SCHEDULING SPEECH CASES. WITH THE INTERMITTENT SYSTEM THE CORRECTIONIST IS ASSIGNED TO A NUMBER OF SCHOOLS AND GENERALLY SEES CHILDREN TWICE A…
Don’t speak too fast! Processing of fast rate speech in children with specific language impairment

PubMed Central

Bedoin, Nathalie; Krifi-Papoz, Sonia; Herbillon, Vania; Caillot-Bascoul, Aurélia; Gonzalez-Monge, Sibylle; Boulenger, Véronique

2018-01-01

Background Perception of speech rhythm requires the auditory system to track temporal envelope fluctuations, which carry syllabic and stress information. Reduced sensitivity to rhythmic acoustic cues has been evidenced in children with Specific Language Impairment (SLI), impeding syllabic parsing and speech decoding. Our study investigated whether these children experience specific difficulties processing fast rate speech as compared with typically developing (TD) children. Method Sixteen French children with SLI (8–13 years old) with mainly expressive phonological disorders and with preserved comprehension and 16 age-matched TD children performed a judgment task on sentences produced 1) at normal rate, 2) at fast rate or 3) time-compressed. Sensitivity index (d′) to semantically incongruent sentence-final words was measured. Results Overall children with SLI perform significantly worse than TD children. Importantly, as revealed by the significant Group × Speech Rate interaction, children with SLI find it more challenging than TD children to process both naturally or artificially accelerated speech. The two groups do not significantly differ in normal rate speech processing. Conclusion In agreement with rhythm-processing deficits in atypical language development, our results suggest that children with SLI face difficulties adjusting to rapid speech rate. These findings are interpreted in light of temporal sampling and prosodic phrasing frameworks and of oscillatory mechanisms underlying speech perception. PMID:29373610
Mental imagery and post-event processing in anticipation of a speech performance among socially anxious individuals.

PubMed

Brozovich, Faith A; Heimberg, Richard G

2013-12-01

The present study investigated whether post-event processing (PEP) involving mental imagery about a past speech is particularly detrimental for socially anxious individuals who are currently anticipating giving a speech. One hundred fourteen high and low socially anxious participants were told they would give a 5 min impromptu speech at the end of the experimental session. They were randomly assigned to one of three manipulation conditions: post-event processing about a past speech incorporating imagery (PEP-Imagery), semantic post-event processing about a past speech (PEP-Semantic), or a control condition, (n=19 per experimental group, per condition [high vs low socially anxious]). After the condition inductions, individuals' anxiety, their predictions of performance in the anticipated speech, and their interpretations of other ambiguous social events were measured. Consistent with predictions, high socially anxious individuals in the PEP-Imagery condition displayed greater anxiety than individuals in the other conditions immediately following the induction and before the anticipated speech task. They also interpreted ambiguous social scenarios in a more socially anxious manner than socially anxious individuals in the control condition. High socially anxious individuals made more negative predictions about their upcoming speech performance than low anxious participants in all conditions. The impact of imagery during post-event processing in social anxiety and its implications are discussed. © 2013.
Central Presbycusis: A Review and Evaluation of the Evidence

PubMed Central

Humes, Larry E.; Dubno, Judy R.; Gordon-Salant, Sandra; Lister, Jennifer J.; Cacace, Anthony T.; Cruickshanks, Karen J.; Gates, George A.; Wilson, Richard H.; Wingfield, Arthur

2018-01-01

Background The authors reviewed the evidence regarding the existence of age-related declines in central auditory processes and the consequences of any such declines for everyday communication. Purpose This report summarizes the review process and presents its findings. Data Collection and Analysis The authors reviewed 165 articles germane to central presbycusis. Of the 165 articles, 132 articles with a focus on human behavioral measures for either speech or nonspeech stimuli were selected for further analysis. Results For 76 smaller-scale studies of speech understanding in older adults reviewed, the following findings emerged: (1) the three most commonly studied behavioral measures were speech in competition, temporally distorted speech, and binaural speech perception (especially dichotic listening); (2) for speech in competition and temporally degraded speech, hearing loss proved to have a significant negative effect on performance in most of the laboratory studies; (3) significant negative effects of age, unconfounded by hearing loss, were observed in most of the studies of speech in competing speech, time-compressed speech, and binaural speech perception; and (4) the influence of cognitive processing on speech understanding has been examined much less frequently, but when included, significant positive associations with speech understanding were observed. For 36 smaller-scale studies of the perception of nonspeech stimuli by older adults reviewed, the following findings emerged: (1) the three most frequently studied behavioral measures were gap detection, temporal discrimination, and temporal-order discrimination or identification; (2) hearing loss was seldom a significant factor; and (3) negative effects of age were almost always observed. For 18 studies reviewed that made use of test batteries and medium-to-large sample sizes, the following findings emerged: (1) all studies included speech-based measures of auditory processing; (2) 4 of the 18 studies included nonspeech stimuli; (3) for the speech-based measures, monaural speech in a competing-speech background, dichotic speech, and monaural time-compressed speech were investigated most frequently; (4) the most frequently used tests were the Synthetic Sentence Identification (SSI) test with Ipsilateral Competing Message (ICM), the Dichotic Sentence Identification (DSI) test, and time-compressed speech; (5) many of these studies using speech-based measures reported significant effects of age, but most of these studies were confounded by declines in hearing, cognition, or both; (6) for nonspeech auditory-processing measures, the focus was on measures of temporal processing in all four studies; (7) effects of cognition on nonspeech measures of auditory processing have been studied less frequently, with mixed results, whereas the effects of hearing loss on performance were minimal due to judicious selection of stimuli; and (8) there is a paucity of observational studies using test batteries and longitudinal designs. Conclusions Based on this review of the scientific literature, there is insufficient evidence to confirm the existence of central presbycusis as an isolated entity. On the other hand, recent evidence has been accumulating in support of the existence of central presbycusis as a multifactorial condition that involves age- and/or disease-related changes in the auditory system and in the brain. Moreover, there is a clear need for additional research in this area. PMID:22967738
Automatic Speech Recognition from Neural Signals: A Focused Review.

PubMed

Herff, Christian; Schultz, Tanja

2016-01-01

Speech interfaces have become widely accepted and are nowadays integrated in various real-life applications and devices. They have become a part of our daily life. However, speech interfaces presume the ability to produce intelligible speech, which might be impossible due to either loud environments, bothering bystanders or incapabilities to produce speech (i.e., patients suffering from locked-in syndrome). For these reasons it would be highly desirable to not speak but to simply envision oneself to say words or sentences. Interfaces based on imagined speech would enable fast and natural communication without the need for audible speech and would give a voice to otherwise mute people. This focused review analyzes the potential of different brain imaging techniques to recognize speech from neural signals by applying Automatic Speech Recognition technology. We argue that modalities based on metabolic processes, such as functional Near Infrared Spectroscopy and functional Magnetic Resonance Imaging, are less suited for Automatic Speech Recognition from neural signals due to low temporal resolution but are very useful for the investigation of the underlying neural mechanisms involved in speech processes. In contrast, electrophysiologic activity is fast enough to capture speech processes and is therefor better suited for ASR. Our experimental results indicate the potential of these signals for speech recognition from neural data with a focus on invasively measured brain activity (electrocorticography). As a first example of Automatic Speech Recognition techniques used from neural signals, we discuss the Brain-to-text system.
The role of auditory and cognitive factors in understanding speech in noise by normal-hearing older listeners

PubMed Central

Schoof, Tim; Rosen, Stuart

2014-01-01

Normal-hearing older adults often experience increased difficulties understanding speech in noise. In addition, they benefit less from amplitude fluctuations in the masker. These difficulties may be attributed to an age-related auditory temporal processing deficit. However, a decline in cognitive processing likely also plays an important role. This study examined the relative contribution of declines in both auditory and cognitive processing to the speech in noise performance in older adults. Participants included older (60–72 years) and younger (19–29 years) adults with normal hearing. Speech reception thresholds (SRTs) were measured for sentences in steady-state speech-shaped noise (SS), 10-Hz sinusoidally amplitude-modulated speech-shaped noise (AM), and two-talker babble. In addition, auditory temporal processing abilities were assessed by measuring thresholds for gap, amplitude-modulation, and frequency-modulation detection. Measures of processing speed, attention, working memory, Text Reception Threshold (a visual analog of the SRT), and reading ability were also obtained. Of primary interest was the extent to which the various measures correlate with listeners' abilities to perceive speech in noise. SRTs were significantly worse for older adults in the presence of two-talker babble but not SS and AM noise. In addition, older adults showed some cognitive processing declines (working memory and processing speed) although no declines in auditory temporal processing. However, working memory and processing speed did not correlate significantly with SRTs in babble. Despite declines in cognitive processing, normal-hearing older adults do not necessarily have problems understanding speech in noise as SRTs in SS and AM noise did not differ significantly between the two groups. Moreover, while older adults had higher SRTs in two-talker babble, this could not be explained by age-related cognitive declines in working memory or processing speed. PMID:25429266
Near-Term Fetuses Process Temporal Features of Speech

ERIC Educational Resources Information Center

Granier-Deferre, Carolyn; Ribeiro, Aurelie; Jacquet, Anne-Yvonne; Bassereau, Sophie

2011-01-01

The perception of speech and music requires processing of variations in spectra and amplitude over different time intervals. Near-term fetuses can discriminate acoustic features, such as frequencies and spectra, but whether they can process complex auditory streams, such as speech sequences and more specifically their temporal variations, fast or…
The Downside of Greater Lexical Influences: Selectively Poorer Speech Perception in Noise

ERIC Educational Resources Information Center

Lam, Boji P. W.; Xie, Zilong; Tessmer, Rachel; Chandrasekaran, Bharath

2017-01-01

Purpose: Although lexical information influences phoneme perception, the extent to which reliance on lexical information enhances speech processing in challenging listening environments is unclear. We examined the extent to which individual differences in lexical influences on phonemic processing impact speech processing in maskers containing…
Sleep Disrupts High-Level Speech Parsing Despite Significant Basic Auditory Processing.

PubMed

Makov, Shiri; Sharon, Omer; Ding, Nai; Ben-Shachar, Michal; Nir, Yuval; Zion Golumbic, Elana

2017-08-09

The extent to which the sleeping brain processes sensory information remains unclear. This is particularly true for continuous and complex stimuli such as speech, in which information is organized into hierarchically embedded structures. Recently, novel metrics for assessing the neural representation of continuous speech have been developed using noninvasive brain recordings that have thus far only been tested during wakefulness. Here we investigated, for the first time, the sleeping brain's capacity to process continuous speech at different hierarchical levels using a newly developed Concurrent Hierarchical Tracking (CHT) approach that allows monitoring the neural representation and processing-depth of continuous speech online. Speech sequences were compiled with syllables, words, phrases, and sentences occurring at fixed time intervals such that different linguistic levels correspond to distinct frequencies. This enabled us to distinguish their neural signatures in brain activity. We compared the neural tracking of intelligible versus unintelligible (scrambled and foreign) speech across states of wakefulness and sleep using high-density EEG in humans. We found that neural tracking of stimulus acoustics was comparable across wakefulness and sleep and similar across all conditions regardless of speech intelligibility. In contrast, neural tracking of higher-order linguistic constructs (words, phrases, and sentences) was only observed for intelligible speech during wakefulness and could not be detected at all during nonrapid eye movement or rapid eye movement sleep. These results suggest that, whereas low-level auditory processing is relatively preserved during sleep, higher-level hierarchical linguistic parsing is severely disrupted, thereby revealing the capacity and limits of language processing during sleep. SIGNIFICANCE STATEMENT Despite the persistence of some sensory processing during sleep, it is unclear whether high-level cognitive processes such as speech parsing are also preserved. We used a novel approach for studying the depth of speech processing across wakefulness and sleep while tracking neuronal activity with EEG. We found that responses to the auditory sound stream remained intact; however, the sleeping brain did not show signs of hierarchical parsing of the continuous stream of syllables into words, phrases, and sentences. The results suggest that sleep imposes a functional barrier between basic sensory processing and high-level cognitive processing. This paradigm also holds promise for studying residual cognitive abilities in a wide array of unresponsive states. Copyright © 2017 the authors 0270-6474/17/377772-10$15.00/0.
Visual Speech Fills in Both Discrimination and Identification of Non-Intact Auditory Speech in Children

ERIC Educational Resources Information Center

Jerger, Susan; Damian, Markus F.; McAlpine, Rachel P.; Abdi, Herve

2018-01-01

To communicate, children must discriminate and identify speech sounds. Because visual speech plays an important role in this process, we explored how visual speech influences phoneme discrimination and identification by children. Critical items had intact visual speech (e.g. baez) coupled to non-intact (excised onsets) auditory speech (signified…
Temporal processing of speech in a time-feature space

NASA Astrophysics Data System (ADS)

Avendano, Carlos

1997-09-01

The performance of speech communication systems often degrades under realistic environmental conditions. Adverse environmental factors include additive noise sources, room reverberation, and transmission channel distortions. This work studies the processing of speech in the temporal-feature or modulation spectrum domain, aiming for alleviation of the effects of such disturbances. Speech reflects the geometry of the vocal organs, and the linguistically dominant component is in the shape of the vocal tract. At any given point in time, the shape of the vocal tract is reflected in the short-time spectral envelope of the speech signal. The rate of change of the vocal tract shape appears to be important for the identification of linguistic components. This rate of change, or the rate of change of the short-time spectral envelope can be described by the modulation spectrum, i.e. the spectrum of the time trajectories described by the short-time spectral envelope. For a wide range of frequency bands, the modulation spectrum of speech exhibits a maximum at about 4 Hz, the average syllabic rate. Disturbances often have modulation frequency components outside the speech range, and could in principle be attenuated without significantly affecting the range with relevant linguistic information. Early efforts for exploiting the modulation spectrum domain (temporal processing), such as the dynamic cepstrum or the RASTA processing, used ad hoc designed processing and appear to be suboptimal. As a major contribution, in this dissertation we aim for a systematic data-driven design of temporal processing. First we analytically derive and discuss some properties and merits of temporal processing for speech signals. We attempt to formalize the concept and provide a theoretical background which has been lacking in the field. In the experimental part we apply temporal processing to a number of problems including adaptive noise reduction in cellular telephone environments, reduction of reverberation for speech enhancement, and improvements on automatic recognition of speech degraded by linear distortions and reverberation.
A multimodal spectral approach to characterize rhythm in natural speech.

PubMed

Alexandrou, Anna Maria; Saarinen, Timo; Kujala, Jan; Salmelin, Riitta

2016-01-01

Human utterances demonstrate temporal patterning, also referred to as rhythm. While simple oromotor behaviors (e.g., chewing) feature a salient periodical structure, conversational speech displays a time-varying quasi-rhythmic pattern. Quantification of periodicity in speech is challenging. Unimodal spectral approaches have highlighted rhythmic aspects of speech. However, speech is a complex multimodal phenomenon that arises from the interplay of articulatory, respiratory, and vocal systems. The present study addressed the question of whether a multimodal spectral approach, in the form of coherence analysis between electromyographic (EMG) and acoustic signals, would allow one to characterize rhythm in natural speech more efficiently than a unimodal analysis. The main experimental task consisted of speech production at three speaking rates; a simple oromotor task served as control. The EMG-acoustic coherence emerged as a sensitive means of tracking speech rhythm, whereas spectral analysis of either EMG or acoustic amplitude envelope alone was less informative. Coherence metrics seem to distinguish and highlight rhythmic structure in natural speech.
A hybrid technique for speech segregation and classification using a sophisticated deep neural network

PubMed Central

Nawaz, Tabassam; Mehmood, Zahid; Rashid, Muhammad; Habib, Hafiz Adnan

2018-01-01

Recent research on speech segregation and music fingerprinting has led to improvements in speech segregation and music identification algorithms. Speech and music segregation generally involves the identification of music followed by speech segregation. However, music segregation becomes a challenging task in the presence of noise. This paper proposes a novel method of speech segregation for unlabelled stationary noisy audio signals using the deep belief network (DBN) model. The proposed method successfully segregates a music signal from noisy audio streams. A recurrent neural network (RNN)-based hidden layer segregation model is applied to remove stationary noise. Dictionary-based fisher algorithms are employed for speech classification. The proposed method is tested on three datasets (TIMIT, MIR-1K, and MusicBrainz), and the results indicate the robustness of proposed method for speech segregation. The qualitative and quantitative analysis carried out on three datasets demonstrate the efficiency of the proposed method compared to the state-of-the-art speech segregation and classification-based methods. PMID:29558485
Multi-sensory learning and learning to read.

PubMed

Blomert, Leo; Froyen, Dries

2010-09-01

The basis of literacy acquisition in alphabetic orthographies is the learning of the associations between the letters and the corresponding speech sounds. In spite of this primacy in learning to read, there is only scarce knowledge on how this audiovisual integration process works and which mechanisms are involved. Recent electrophysiological studies of letter-speech sound processing have revealed that normally developing readers take years to automate these associations and dyslexic readers hardly exhibit automation of these associations. It is argued that the reason for this effortful learning may reside in the nature of the audiovisual process that is recruited for the integration of in principle arbitrarily linked elements. It is shown that letter-speech sound integration does not resemble the processes involved in the integration of natural audiovisual objects such as audiovisual speech. The automatic symmetrical recruitment of the assumedly uni-sensory visual and auditory cortices in audiovisual speech integration does not occur for letter and speech sound integration. It is also argued that letter-speech sound integration only partly resembles the integration of arbitrarily linked unfamiliar audiovisual objects. Letter-sound integration and artificial audiovisual objects share the necessity of a narrow time window for integration to occur. However, they differ from these artificial objects, because they constitute an integration of partly familiar elements which acquire meaning through the learning of an orthography. Although letter-speech sound pairs share similarities with audiovisual speech processing as well as with unfamiliar, arbitrary objects, it seems that letter-speech sound pairs develop into unique audiovisual objects that furthermore have to be processed in a unique way in order to enable fluent reading and thus very likely recruit other neurobiological learning mechanisms than the ones involved in learning natural or arbitrary unfamiliar audiovisual associations. Copyright 2010 Elsevier B.V. All rights reserved.
Temporal factors affecting somatosensory–auditory interactions in speech processing

PubMed Central

Ito, Takayuki; Gracco, Vincent L.; Ostry, David J.

2014-01-01

Speech perception is known to rely on both auditory and visual information. However, sound-specific somatosensory input has been shown also to influence speech perceptual processing (Ito et al., 2009). In the present study, we addressed further the relationship between somatosensory information and speech perceptual processing by addressing the hypothesis that the temporal relationship between orofacial movement and sound processing contributes to somatosensory–auditory interaction in speech perception. We examined the changes in event-related potentials (ERPs) in response to multisensory synchronous (simultaneous) and asynchronous (90 ms lag and lead) somatosensory and auditory stimulation compared to individual unisensory auditory and somatosensory stimulation alone. We used a robotic device to apply facial skin somatosensory deformations that were similar in timing and duration to those experienced in speech production. Following synchronous multisensory stimulation the amplitude of the ERP was reliably different from the two unisensory potentials. More importantly, the magnitude of the ERP difference varied as a function of the relative timing of the somatosensory–auditory stimulation. Event-related activity change due to stimulus timing was seen between 160 and 220 ms following somatosensory onset, mostly around the parietal area. The results demonstrate a dynamic modulation of somatosensory–auditory convergence and suggest the contribution of somatosensory information for speech processing process is dependent on the specific temporal order of sensory inputs in speech production. PMID:25452733
Neural Tuning to Low-Level Features of Speech throughout the Perisylvian Cortex.

PubMed

Berezutskaya, Julia; Freudenburg, Zachary V; Güçlü, Umut; van Gerven, Marcel A J; Ramsey, Nick F

2017-08-16

Despite a large body of research, we continue to lack a detailed account of how auditory processing of continuous speech unfolds in the human brain. Previous research showed the propagation of low-level acoustic features of speech from posterior superior temporal gyrus toward anterior superior temporal gyrus in the human brain (Hullett et al., 2016). In this study, we investigate what happens to these neural representations past the superior temporal gyrus and how they engage higher-level language processing areas such as inferior frontal gyrus. We used low-level sound features to model neural responses to speech outside of the primary auditory cortex. Two complementary imaging techniques were used with human participants (both males and females): electrocorticography (ECoG) and fMRI. Both imaging techniques showed tuning of the perisylvian cortex to low-level speech features. With ECoG, we found evidence of propagation of the temporal features of speech sounds along the ventral pathway of language processing in the brain toward inferior frontal gyrus. Increasingly coarse temporal features of speech spreading from posterior superior temporal cortex toward inferior frontal gyrus were associated with linguistic features such as voice onset time, duration of the formant transitions, and phoneme, syllable, and word boundaries. The present findings provide the groundwork for a comprehensive bottom-up account of speech comprehension in the human brain. SIGNIFICANCE STATEMENT We know that, during natural speech comprehension, a broad network of perisylvian cortical regions is involved in sound and language processing. Here, we investigated the tuning to low-level sound features within these regions using neural responses to a short feature film. We also looked at whether the tuning organization along these brain regions showed any parallel to the hierarchy of language structures in continuous speech. Our results show that low-level speech features propagate throughout the perisylvian cortex and potentially contribute to the emergence of "coarse" speech representations in inferior frontal gyrus typically associated with high-level language processing. These findings add to the previous work on auditory processing and underline a distinctive role of inferior frontal gyrus in natural speech comprehension. Copyright © 2017 the authors 0270-6474/17/377906-15$15.00/0.
Auditory-Motor Interactions in Pediatric Motor Speech Disorders: Neurocomputational Modeling of Disordered Development

PubMed Central

Terband, H.; Maassen, B.; Guenther, F.H.; Brumberg, J.

2014-01-01

Background/Purpose Differentiating the symptom complex due to phonological-level disorders, speech delay and pediatric motor speech disorders is a controversial issue in the field of pediatric speech and language pathology. The present study investigated the developmental interaction between neurological deficits in auditory and motor processes using computational modeling with the DIVA model. Method In a series of computer simulations, we investigated the effect of a motor processing deficit alone (MPD), and the effect of a motor processing deficit in combination with an auditory processing deficit (MPD+APD) on the trajectory and endpoint of speech motor development in the DIVA model. Results Simulation results showed that a motor programming deficit predominantly leads to deterioration on the phonological level (phonemic mappings) when auditory self-monitoring is intact, and on the systemic level (systemic mapping) if auditory self-monitoring is impaired. Conclusions These findings suggest a close relation between quality of auditory self-monitoring and the involvement of phonological vs. motor processes in children with pediatric motor speech disorders. It is suggested that MPD+APD might be involved in typically apraxic speech output disorders and MPD in pediatric motor speech disorders that also have a phonological component. Possibilities to verify these hypotheses using empirical data collected from human subjects are discussed. PMID:24491630

Comparing auditory filter bandwidths, spectral ripple modulation detection, spectral ripple discrimination, and speech recognition: Normal and impaired hearinga)

PubMed Central

Davies-Venn, Evelyn; Nelson, Peggy; Souza, Pamela

2015-01-01

Some listeners with hearing loss show poor speech recognition scores in spite of using amplification that optimizes audibility. Beyond audibility, studies have suggested that suprathreshold abilities such as spectral and temporal processing may explain differences in amplified speech recognition scores. A variety of different methods has been used to measure spectral processing. However, the relationship between spectral processing and speech recognition is still inconclusive. This study evaluated the relationship between spectral processing and speech recognition in listeners with normal hearing and with hearing loss. Narrowband spectral resolution was assessed using auditory filter bandwidths estimated from simultaneous notched-noise masking. Broadband spectral processing was measured using the spectral ripple discrimination (SRD) task and the spectral ripple depth detection (SMD) task. Three different measures were used to assess unamplified and amplified speech recognition in quiet and noise. Stepwise multiple linear regression revealed that SMD at 2.0 cycles per octave (cpo) significantly predicted speech scores for amplified and unamplified speech in quiet and noise. Commonality analyses revealed that SMD at 2.0 cpo combined with SRD and equivalent rectangular bandwidth measures to explain most of the variance captured by the regression model. Results suggest that SMD and SRD may be promising clinical tools for diagnostic evaluation and predicting amplification outcomes. PMID:26233047
Comparing auditory filter bandwidths, spectral ripple modulation detection, spectral ripple discrimination, and speech recognition: Normal and impaired hearing.

PubMed

Davies-Venn, Evelyn; Nelson, Peggy; Souza, Pamela

2015-07-01

Some listeners with hearing loss show poor speech recognition scores in spite of using amplification that optimizes audibility. Beyond audibility, studies have suggested that suprathreshold abilities such as spectral and temporal processing may explain differences in amplified speech recognition scores. A variety of different methods has been used to measure spectral processing. However, the relationship between spectral processing and speech recognition is still inconclusive. This study evaluated the relationship between spectral processing and speech recognition in listeners with normal hearing and with hearing loss. Narrowband spectral resolution was assessed using auditory filter bandwidths estimated from simultaneous notched-noise masking. Broadband spectral processing was measured using the spectral ripple discrimination (SRD) task and the spectral ripple depth detection (SMD) task. Three different measures were used to assess unamplified and amplified speech recognition in quiet and noise. Stepwise multiple linear regression revealed that SMD at 2.0 cycles per octave (cpo) significantly predicted speech scores for amplified and unamplified speech in quiet and noise. Commonality analyses revealed that SMD at 2.0 cpo combined with SRD and equivalent rectangular bandwidth measures to explain most of the variance captured by the regression model. Results suggest that SMD and SRD may be promising clinical tools for diagnostic evaluation and predicting amplification outcomes.
The Role of Early Language Experience in the Development of Speech Perception and Phonological Processing Abilities: Evidence from 5-Year-Olds with Histories of Otitis Media with Effusion and Low Socioeconomic Status

ERIC Educational Resources Information Center

Nittrouer, Susan; Burton, Lisa Thuente

2005-01-01

This study tested the hypothesis that early language experience facilitates the development of language-specific perceptual weighting strategies believed to be critical for accessing phonetic structure. In turn, that structure allows for efficient storage and retrieval of words in verbal working memory, which is necessary for sentence…
Natural Communication with Computers. Volume 1. Speech Understanding Research at BBN

DTIC Science & Technology

1974-12-01

Signal Processing Concurrently with the incremental simulation experiments used to develop insights into the organization of the control component and...us to seek an alternative organization for our phonemic dictionary. There is also the potential problem of new words being used to name new places... organize the lexicon for maximization of efficient retrieval by taking advantage of phonetic, syntactic and semantic relationsnips. Work has already
[Telemonitoring of swallowing function: technologies in speech therapy practice.

PubMed

Tedesco, Angela; Lavermicocca, Valentina; Notarnicola, Marilina; De Francesco, Luca; Dellomonaco, Anna Rita

2018-02-01

The process of medical-healthcare technological revolution represents an advantage for the patient and for the care provider, in terms of costs and distances reduction. The telehomecare approach could be useful for monitoring the swallowing disorder in neurodegenerative diseases, preventing complications. In this study the applicability of telemedicine techniques for the monitoring of swallowing function, in patients affected by Huntington's disease (HD), was evaluated through the acquisition and analysis of the sound of swallowing. Two patients with HD were outpatient screened for dysphagia through the Bedside Swallowing Assessment Scale (BSAS) sensitized with pulse oximetry and cervical auscultation. Subsequently, the swallowing functionality was telemonitored for three months with Skype. The swallowing sounds were acquired with a detection microphone attached to the lateral edge of the trachea during fluid intake. The sounds were instantly processed and graphically represented through the Praat software. The analysis of the acoustic signal acquired remotely has made it possible to identify the situations that required immediate speech therapy intervention, suggesting to the patients further modifications of food consistencies, and saving frequent moving to the hospital even in the absence of critical situations. Remote assistance applied to speech therapy could represent a benefit for patients and their carers and a more efficient use of medical and health resources.
Auditory detection of non-speech and speech stimuli in noise: Effects of listeners' native language background.

PubMed

Liu, Chang; Jin, Su-Hyun

2015-11-01

This study investigated whether native listeners processed speech differently from non-native listeners in a speech detection task. Detection thresholds of Mandarin Chinese and Korean vowels and non-speech sounds in noise, frequency selectivity, and the nativeness of Mandarin Chinese and Korean vowels were measured for Mandarin Chinese- and Korean-native listeners. The two groups of listeners exhibited similar non-speech sound detection and frequency selectivity; however, the Korean listeners had better detection thresholds of Korean vowels than Chinese listeners, while the Chinese listeners performed no better at Chinese vowel detection than the Korean listeners. Moreover, thresholds predicted from an auditory model highly correlated with behavioral thresholds of the two groups of listeners, suggesting that detection of speech sounds not only depended on listeners' frequency selectivity, but also might be affected by their native language experience. Listeners evaluated their native vowels with higher nativeness scores than non-native listeners. Native listeners may have advantages over non-native listeners when processing speech sounds in noise, even without the required phonetic processing; however, such native speech advantages might be offset by Chinese listeners' lower sensitivity to vowel sounds, a characteristic possibly resulting from their sparse vowel system and their greater cognitive and attentional demands for vowel processing.
Why would Musical Training Benefit the Neural Encoding of Speech? The OPERA Hypothesis.

PubMed

Patel, Aniruddh D

2011-01-01

Mounting evidence suggests that musical training benefits the neural encoding of speech. This paper offers a hypothesis specifying why such benefits occur. The "OPERA" hypothesis proposes that such benefits are driven by adaptive plasticity in speech-processing networks, and that this plasticity occurs when five conditions are met. These are: (1) Overlap: there is anatomical overlap in the brain networks that process an acoustic feature used in both music and speech (e.g., waveform periodicity, amplitude envelope), (2) Precision: music places higher demands on these shared networks than does speech, in terms of the precision of processing, (3) Emotion: the musical activities that engage this network elicit strong positive emotion, (4) Repetition: the musical activities that engage this network are frequently repeated, and (5) Attention: the musical activities that engage this network are associated with focused attention. According to the OPERA hypothesis, when these conditions are met neural plasticity drives the networks in question to function with higher precision than needed for ordinary speech communication. Yet since speech shares these networks with music, speech processing benefits. The OPERA hypothesis is used to account for the observed superior subcortical encoding of speech in musically trained individuals, and to suggest mechanisms by which musical training might improve linguistic reading abilities.
Speech Recognition as a Transcription Aid: A Randomized Comparison With Standard Transcription

PubMed Central

Mohr, David N.; Turner, David W.; Pond, Gregory R.; Kamath, Joseph S.; De Vos, Cathy B.; Carpenter, Paul C.

2003-01-01

Objective. Speech recognition promises to reduce information entry costs for clinical information systems. It is most likely to be accepted across an organization if physicians can dictate without concerning themselves with real-time recognition and editing; assistants can then edit and process the computer-generated document. Our objective was to evaluate the use of speech-recognition technology in a randomized controlled trial using our institutional infrastructure. Design. Clinical note dictations from physicians in two specialty divisions were randomized to either a standard transcription process or a speech-recognition process. Secretaries and transcriptionists also were assigned randomly to each of these processes. Measurements. The duration of each dictation was measured. The amount of time spent processing a dictation to yield a finished document also was measured. Secretarial and transcriptionist productivity, defined as hours of secretary work per minute of dictation processed, was determined for speech recognition and standard transcription. Results. Secretaries in the endocrinology division were 87.3% (confidence interval, 83.3%, 92.3%) as productive with the speech-recognition technology as implemented in this study as they were using standard transcription. Psychiatry transcriptionists and secretaries were similarly less productive. Author, secretary, and type of clinical note were significant (p < 0.05) predictors of productivity. Conclusion. When implemented in an organization with an existing document-processing infrastructure (which included training and interfaces of the speech-recognition editor with the existing document entry application), speech recognition did not improve the productivity of secretaries or transcriptionists. PMID:12509359
Engaged listeners: shared neural processing of powerful political speeches.

PubMed

Schmälzle, Ralf; Häcker, Frank E K; Honey, Christopher J; Hasson, Uri

2015-08-01

Powerful speeches can captivate audiences, whereas weaker speeches fail to engage their listeners. What is happening in the brains of a captivated audience? Here, we assess audience-wide functional brain dynamics during listening to speeches of varying rhetorical quality. The speeches were given by German politicians and evaluated as rhetorically powerful or weak. Listening to each of the speeches induced similar neural response time courses, as measured by inter-subject correlation analysis, in widespread brain regions involved in spoken language processing. Crucially, alignment of the time course across listeners was stronger for rhetorically powerful speeches, especially for bilateral regions of the superior temporal gyri and medial prefrontal cortex. Thus, during powerful speeches, listeners as a group are more coupled to each other, suggesting that powerful speeches are more potent in taking control of the listeners' brain responses. Weaker speeches were processed more heterogeneously, although they still prompted substantially correlated responses. These patterns of coupled neural responses bear resemblance to metaphors of resonance, which are often invoked in discussions of speech impact, and contribute to the literature on auditory attention under natural circumstances. Overall, this approach opens up possibilities for research on the neural mechanisms mediating the reception of entertaining or persuasive messages. © The Author (2015). Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing.

PubMed

Jørgensen, Søren; Dau, Torsten

2011-09-01

A model for predicting the intelligibility of processed noisy speech is proposed. The speech-based envelope power spectrum model has a similar structure as the model of Ewert and Dau [(2000). J. Acoust. Soc. Am. 108, 1181-1196], developed to account for modulation detection and masking data. The model estimates the speech-to-noise envelope power ratio, SNR(env), at the output of a modulation filterbank and relates this metric to speech intelligibility using the concept of an ideal observer. Predictions were compared to data on the intelligibility of speech presented in stationary speech-shaped noise. The model was further tested in conditions with noisy speech subjected to reverberation and spectral subtraction. Good agreement between predictions and data was found in all cases. For spectral subtraction, an analysis of the model's internal representation of the stimuli revealed that the predicted decrease of intelligibility was caused by the estimated noise envelope power exceeding that of the speech. The classical concept of the speech transmission index fails in this condition. The results strongly suggest that the signal-to-noise ratio at the output of a modulation frequency selective process provides a key measure of speech intelligibility. © 2011 Acoustical Society of America
Speech-Language Dissociations, Distractibility, and Childhood Stuttering

PubMed Central

Conture, Edward G.; Walden, Tedra A.; Lambert, Warren E.

2015-01-01

Purpose This study investigated the relation among speech-language dissociations, attentional distractibility, and childhood stuttering. Method Participants were 82 preschool-age children who stutter (CWS) and 120 who do not stutter (CWNS). Correlation-based statistics (Bates, Appelbaum, Salcedo, Saygin, & Pizzamiglio, 2003) identified dissociations across 5 norm-based speech-language subtests. The Behavioral Style Questionnaire Distractibility subscale measured attentional distractibility. Analyses addressed (a) between-groups differences in the number of children exhibiting speech-language dissociations; (b) between-groups distractibility differences; (c) the relation between distractibility and speech-language dissociations; and (d) whether interactions between distractibility and dissociations predicted the frequency of total, stuttered, and nonstuttered disfluencies. Results More preschool-age CWS exhibited speech-language dissociations compared with CWNS, and more boys exhibited dissociations compared with girls. In addition, male CWS were less distractible than female CWS and female CWNS. For CWS, but not CWNS, less distractibility (i.e., greater attention) was associated with more speech-language dissociations. Last, interactions between distractibility and dissociations did not predict speech disfluencies in CWS or CWNS. Conclusions The present findings suggest that for preschool-age CWS, attentional processes are associated with speech-language dissociations. Future investigations are warranted to better understand the directionality of effect of this association (e.g., inefficient attentional processes → speech-language dissociations vs. inefficient attentional processes ← speech-language dissociations). PMID:26126203
Mismatch Negativity with Visual-only and Audiovisual Speech

PubMed Central

Ponton, Curtis W.; Bernstein, Lynne E.; Auer, Edward T.

2009-01-01

The functional organization of cortical speech processing is thought to be hierarchical, increasing in complexity and proceeding from primary sensory areas centrifugally. The current study used the mismatch negativity (MMN) obtained with electrophysiology (EEG) to investigate the early latency period of visual speech processing under both visual-only (VO) and audiovisual (AV) conditions. Current density reconstruction (CDR) methods were used to model the cortical MMN generator locations. MMNs were obtained with VO and AV speech stimuli at early latencies (approximately 82-87 ms peak in time waveforms relative to the acoustic onset) and in regions of the right lateral temporal and parietal cortices. Latencies were consistent with bottom-up processing of the visible stimuli. We suggest that a visual pathway extracts phonetic cues from visible speech, and that previously reported effects of AV speech in classical early auditory areas, given later reported latencies, could be attributable to modulatory feedback from visual phonetic processing. PMID:19404730
How visual timing and form information affect speech and non-speech processing.

PubMed

Kim, Jeesun; Davis, Chris

2014-10-01

Auditory speech processing is facilitated when the talker's face/head movements are seen. This effect is typically explained in terms of visual speech providing form and/or timing information. We determined the effect of both types of information on a speech/non-speech task (non-speech stimuli were spectrally rotated speech). All stimuli were presented paired with the talker's static or moving face. Two types of moving face stimuli were used: full-face versions (both spoken form and timing information available) and modified face versions (only timing information provided by peri-oral motion available). The results showed that the peri-oral timing information facilitated response time for speech and non-speech stimuli compared to a static face. An additional facilitatory effect was found for full-face versions compared to the timing condition; this effect only occurred for speech stimuli. We propose the timing effect was due to cross-modal phase resetting; the form effect to cross-modal priming. Copyright © 2014 Elsevier Inc. All rights reserved.
Robust estimators for speech enhancement in real environments

NASA Astrophysics Data System (ADS)

Sandoval-Ibarra, Yuma; Diaz-Ramirez, Victor H.; Kober, Vitaly

2015-09-01

Common statistical estimators for speech enhancement rely on several assumptions about stationarity of speech signals and noise. These assumptions may not always valid in real-life due to nonstationary characteristics of speech and noise processes. We propose new estimators based on existing estimators by incorporation of computation of rank-order statistics. The proposed estimators are better adapted to non-stationary characteristics of speech signals and noise processes. Through computer simulations we show that the proposed estimators yield a better performance in terms of objective metrics than that of known estimators when speech signals are contaminated with airport, babble, restaurant, and train-station noise.
Differentiation of Speech Delay and Global Developmental Delay in Children Using DTI Tractography-Based Connectome.

PubMed

Jeong, J-W; Sundaram, S; Behen, M E; Chugani, H T

2016-06-01

Pure speech delay is a common developmental disorder which, according to some estimates, affects 5%-8% of the population. Speech delay may not only be an isolated condition but also can be part of a broader condition such as global developmental delay. The present study investigated whether diffusion tensor imaging tractography-based connectome can differentiate global developmental delay from speech delay in young children. Twelve children with pure speech delay (39.1 ± 20.9 months of age, 9 boys), 14 children with global developmental delay (39.3 ± 18.2 months of age, 12 boys), and 10 children with typical development (38.5 ± 20.5 months of age, 7 boys) underwent 3T DTI. For each subject, whole-brain connectome analysis was performed by using 116 cortical ROIs. The following network metrics were measured at individual regions: strength (number of the shortest paths), efficiency (measures of global and local integration), cluster coefficient (a measure of local aggregation), and betweeness (a measure of centrality). Compared with typical development, global and local efficiency were significantly reduced in both global developmental delay and speech delay (P < .0001). The nodal strength of the cognitive network is reduced in global developmental delay, whereas the nodal strength of the language network is reduced in speech delay. This finding resulted in a high accuracy of >83% ± 4% to discriminate global developmental delay from speech delay. The network abnormalities identified in the present study may underlie the neurocognitive and behavioral consequences commonly identified in children with global developmental delay and speech delay. Further validation studies in larger samples are required. © 2016 by American Journal of Neuroradiology.
Speeches Presented at the Annual National Leadership Development Seminar for State Directors of Vocational Education. (6th, Nashville, Tennessee, September 18-21, 1973).

ERIC Educational Resources Information Center

Ohio State Univ., Columbus. Center for Vocational and Technical Education.

Following a welcoming address by the Governor of Tennessee, fifteen speeches were presented at the seminar. W. H. Pierce examined approaches to educational governance at the State level. Management, its efficiency, effectiveness, and organization, was the topic of several speeches (J. D. Mills, D. K. Gentry and C. F. Lamar). The development of…
Status Report on Speech Research. A Report on the Status and Progress of Studies on the Nature of Speech, Instrumentation for Its Investigation, and Practical Applications.

DTIC Science & Technology

1985-10-01

speech errors. References Anderson, V. A. (1942). Training the speaking voice. New York: Oxford University Press. 50...is only about speech perception , in contrast to some t.at deal with other perceptual processes (e.g., Berkeley, 1709; Fest- inger, Burnham, Ono...there a process of learned equivalence. An example is the claim that the 66 * ° - . . Liberman & Mattingly: The Motor Theory of Speech Perception Revised
Distributed Processing and Cortical Specialization for Speech and Environmental Sounds in Human Temporal Cortex

ERIC Educational Resources Information Center

Leech, Robert; Saygin, Ayse Pinar

2011-01-01

Using functional MRI, we investigated whether auditory processing of both speech and meaningful non-linguistic environmental sounds in superior and middle temporal cortex relies on a complex and spatially distributed neural system. We found that evidence for spatially distributed processing of speech and environmental sounds in a substantial…
Protocols for second-generation business satellites systems

NASA Astrophysics Data System (ADS)

Evans, B. G.; Coakley, F. P.; El Amin, M. H. M.

The paper discusses the nature and mix of traffic in business satellite systems and describes the limitations on the protocol imposed by the differing impairments of speech, video, and data. A simple TDMA system protocol is presented which meets the requirements of mixed-service operation. The efficiency of the protocol together with implications for allocation, scheduling and synchronisation are discussed. Future-generation satellites will probably use on-board processing. Some initial work on protocols that make use of on-board processing and the implications for satellite and earth-station equipment are presented.
Children and adults integrate talker and verb information in online processing.

PubMed Central

Borovsky, Arielle; Creel, Sarah

2015-01-01

Children seem able to efficiently interpret a variety of linguistic cues during speech comprehension, yet have difficulty interpreting sources of non-linguistic and paralinguistic information that accompany speech. The current study asked whether (paralinguistic) voice-activated role knowledge is rapidly interpreted in coordination with a linguistic cue (a sentential action) during speech comprehension in an eye-tracked sentence comprehension task with children (aged 3-10) and college-aged adults. Participants were initially familiarized with two talkers who identified their respective roles (e.g. PRINCESS and PIRATE) before hearing a previously-introduced talker name an action and object (“I want to hold the sword,” in the pirate's voice). As the sentence was spoken, eye-movements were recorded to four objects that varied in relationship to the sentential talker and action (Target: SWORD, Talker-Related: SHIP, Action-Related: WAND, and Unrelated: CARRIAGE). The task was to select the named image. Even young child listeners rapidly combined inferences about talker identity with the action, allowing them to fixate on the Target before it was mentioned, although there were developmental and vocabulary differences on this task. Results suggest that children, like adults, store real-world knowledge of a talker's role and actively use this information to interpret speech. PMID:24611671

Children with dyslexia show a reduced processing benefit from bimodal speech information compared to their typically developing peers.

PubMed

Schaadt, Gesa; van der Meer, Elke; Pannekamp, Ann; Oberecker, Regine; Männel, Claudia

2018-01-17

During information processing, individuals benefit from bimodally presented input, as has been demonstrated for speech perception (i.e., printed letters and speech sounds) or the perception of emotional expressions (i.e., facial expression and voice tuning). While typically developing individuals show this bimodal benefit, school children with dyslexia do not. Currently, it is unknown whether the bimodal processing deficit in dyslexia also occurs for visual-auditory speech processing that is independent of reading and spelling acquisition (i.e., no letter-sound knowledge is required). Here, we tested school children with and without spelling problems on their bimodal perception of video-recorded mouth movements pronouncing syllables. We analyzed the event-related potential Mismatch Response (MMR) to visual-auditory speech information and compared this response to the MMR to monomodal speech information (i.e., auditory-only, visual-only). We found a reduced MMR with later onset to visual-auditory speech information in children with spelling problems compared to children without spelling problems. Moreover, when comparing bimodal and monomodal speech perception, we found that children without spelling problems showed significantly larger responses in the visual-auditory experiment compared to the visual-only response, whereas children with spelling problems did not. Our results suggest that children with dyslexia exhibit general difficulties in bimodal speech perception independently of letter-speech sound knowledge, as apparent in altered bimodal speech perception and lacking benefit from bimodal information. This general deficit in children with dyslexia may underlie the previously reported reduced bimodal benefit for letter-speech sound combinations and similar findings in emotion perception. Copyright © 2018 Elsevier Ltd. All rights reserved.
Electrophysiological Correlates of Semantic Dissimilarity Reflect the Comprehension of Natural, Narrative Speech.

PubMed

Broderick, Michael P; Anderson, Andrew J; Di Liberto, Giovanni M; Crosse, Michael J; Lalor, Edmund C

2018-03-05

People routinely hear and understand speech at rates of 120-200 words per minute [1, 2]. Thus, speech comprehension must involve rapid, online neural mechanisms that process words' meanings in an approximately time-locked fashion. However, electrophysiological evidence for such time-locked processing has been lacking for continuous speech. Although valuable insights into semantic processing have been provided by the "N400 component" of the event-related potential [3-6], this literature has been dominated by paradigms using incongruous words within specially constructed sentences, with less emphasis on natural, narrative speech comprehension. Building on the discovery that cortical activity "tracks" the dynamics of running speech [7-9] and psycholinguistic work demonstrating [10-12] and modeling [13-15] how context impacts on word processing, we describe a new approach for deriving an electrophysiological correlate of natural speech comprehension. We used a computational model [16] to quantify the meaning carried by words based on how semantically dissimilar they were to their preceding context and then regressed this measure against electroencephalographic (EEG) data recorded from subjects as they listened to narrative speech. This produced a prominent negativity at a time lag of 200-600 ms on centro-parietal EEG channels, characteristics common to the N400. Applying this approach to EEG datasets involving time-reversed speech, cocktail party attention, and audiovisual speech-in-noise demonstrated that this response was very sensitive to whether or not subjects understood the speech they heard. These findings demonstrate that, when successfully comprehending natural speech, the human brain responds to the contextual semantic content of each word in a relatively time-locked fashion. Copyright © 2018 Elsevier Ltd. All rights reserved.
The influence of visual and auditory information on the perception of speech and non-speech oral movements in patients with left hemisphere lesions.

PubMed

Schmid, Gabriele; Thielmann, Anke; Ziegler, Wolfram

2009-03-01

Patients with lesions of the left hemisphere often suffer from oral-facial apraxia, apraxia of speech, and aphasia. In these patients, visual features often play a critical role in speech and language therapy, when pictured lip shapes or the therapist's visible mouth movements are used to facilitate speech production and articulation. This demands audiovisual processing both in speech and language treatment and in the diagnosis of oral-facial apraxia. The purpose of this study was to investigate differences in audiovisual perception of speech as compared to non-speech oral gestures. Bimodal and unimodal speech and non-speech items were used and additionally discordant stimuli constructed, which were presented for imitation. This study examined a group of healthy volunteers and a group of patients with lesions of the left hemisphere. Patients made substantially more errors than controls, but the factors influencing imitation accuracy were more or less the same in both groups. Error analyses in both groups suggested different types of representations for speech as compared to the non-speech domain, with speech having a stronger weight on the auditory modality and non-speech processing on the visual modality. Additionally, this study was able to show that the McGurk effect is not limited to speech.
Behavioral and electrophysiological evidence for early and automatic detection of phonological equivalence in variable speech inputs.

PubMed

Kharlamov, Viktor; Campbell, Kenneth; Kazanina, Nina

2011-11-01

Speech sounds are not always perceived in accordance with their acoustic-phonetic content. For example, an early and automatic process of perceptual repair, which ensures conformity of speech inputs to the listener's native language phonology, applies to individual input segments that do not exist in the native inventory or to sound sequences that are illicit according to the native phonotactic restrictions on sound co-occurrences. The present study with Russian and Canadian English speakers shows that listeners may perceive phonetically distinct and licit sound sequences as equivalent when the native language system provides robust evidence for mapping multiple phonetic forms onto a single phonological representation. In Russian, due to an optional but productive t-deletion process that affects /stn/ clusters, the surface forms [sn] and [stn] may be phonologically equivalent and map to a single phonological form /stn/. In contrast, [sn] and [stn] clusters are usually phonologically distinct in (Canadian) English. Behavioral data from identification and discrimination tasks indicated that [sn] and [stn] clusters were more confusable for Russian than for English speakers. The EEG experiment employed an oddball paradigm with nonwords [asna] and [astna] used as the standard and deviant stimuli. A reliable mismatch negativity response was elicited approximately 100 msec postchange in the English group but not in the Russian group. These findings point to a perceptual repair mechanism that is engaged automatically at a prelexical level to ensure immediate encoding of speech inputs in phonological terms, which in turn enables efficient access to the meaning of a spoken utterance.
Distributed Fusion in Sensor Networks with Information Genealogy

DTIC Science & Technology

2011-06-28

image processing [2], acoustic and speech recognition [3], multitarget tracking [4], distributed fusion [5], and Bayesian inference [6-7]. For...Adaptation for Distant-Talking Speech Recognition." in Proc Acoustics. Speech , and Signal Processing, 2004 |4| Y Bar-Shalom and T 1-. Fortmann...used in speech recognition and other classification applications [8]. But their use in underwater mine classification is limited. In this paper, we
Schizophrenia alters intra-network functional connectivity in the caudate for detecting speech under informational speech masking conditions.

PubMed

Zheng, Yingjun; Wu, Chao; Li, Juanhua; Li, Ruikeng; Peng, Hongjun; She, Shenglin; Ning, Yuping; Li, Liang

2018-04-04

Speech recognition under noisy "cocktail-party" environments involves multiple perceptual/cognitive processes, including target detection, selective attention, irrelevant signal inhibition, sensory/working memory, and speech production. Compared to health listeners, people with schizophrenia are more vulnerable to masking stimuli and perform worse in speech recognition under speech-on-speech masking conditions. Although the schizophrenia-related speech-recognition impairment under "cocktail-party" conditions is associated with deficits of various perceptual/cognitive processes, it is crucial to know whether the brain substrates critically underlying speech detection against informational speech masking are impaired in people with schizophrenia. Using functional magnetic resonance imaging (fMRI), this study investigated differences between people with schizophrenia (n = 19, mean age = 33 ± 10 years) and their matched healthy controls (n = 15, mean age = 30 ± 9 years) in intra-network functional connectivity (FC) specifically associated with target-speech detection under speech-on-speech-masking conditions. The target-speech detection performance under the speech-on-speech-masking condition in participants with schizophrenia was significantly worse than that in matched healthy participants (healthy controls). Moreover, in healthy controls, but not participants with schizophrenia, the strength of intra-network FC within the bilateral caudate was positively correlated with the speech-detection performance under the speech-masking conditions. Compared to controls, patients showed altered spatial activity pattern and decreased intra-network FC in the caudate. In people with schizophrenia, the declined speech-detection performance under speech-on-speech masking conditions is associated with reduced intra-caudate functional connectivity, which normally contributes to detecting target speech against speech masking via its functions of suppressing masking-speech signals.
Relationship Among Signal Fidelity, Hearing Loss, and Working Memory for Digital Noise Suppression.

PubMed

Arehart, Kathryn; Souza, Pamela; Kates, James; Lunner, Thomas; Pedersen, Michael Syskind

2015-01-01

This study considered speech modified by additive babble combined with noise-suppression processing. The purpose was to determine the relative importance of the signal modifications, individual peripheral hearing loss, and individual cognitive capacity on speech intelligibility and speech quality. The participant group consisted of 31 individuals with moderate high-frequency hearing loss ranging in age from 51 to 89 years (mean = 69.6 years). Speech intelligibility and speech quality were measured using low-context sentences presented in babble at several signal-to-noise ratios. Speech stimuli were processed with a binary mask noise-suppression strategy with systematic manipulations of two parameters (error rate and attenuation values). The cumulative effects of signal modification produced by babble and signal processing were quantified using an envelope-distortion metric. Working memory capacity was assessed with a reading span test. Analysis of variance was used to determine the effects of signal processing parameters on perceptual scores. Hierarchical linear modeling was used to determine the role of degree of hearing loss and working memory capacity in individual listener response to the processed noisy speech. The model also considered improvements in envelope fidelity caused by the binary mask and the degradations to envelope caused by error and noise. The participants showed significant benefits in terms of intelligibility scores and quality ratings for noisy speech processed by the ideal binary mask noise-suppression strategy. This benefit was observed across a range of signal-to-noise ratios and persisted when up to a 30% error rate was introduced into the processing. Average intelligibility scores and average quality ratings were well predicted by an objective metric of envelope fidelity. Degree of hearing loss and working memory capacity were significant factors in explaining individual listener's intelligibility scores for binary mask processing applied to speech in babble. Degree of hearing loss and working memory capacity did not predict listeners' quality ratings. The results indicate that envelope fidelity is a primary factor in determining the combined effects of noise and binary mask processing for intelligibility and quality of speech presented in babble noise. Degree of hearing loss and working memory capacity are significant factors in explaining variability in listeners' speech intelligibility scores but not in quality ratings.
Iconic gestures prime words: comparison of priming effects when gestures are presented alone and when they are accompanying speech

PubMed Central

So, Wing-Chee; Yi-Feng, Alvan Low; Yap, De-Fu; Kheng, Eugene; Yap, Ju-Min Melvin

2013-01-01

Previous studies have shown that iconic gestures presented in an isolated manner prime visually presented semantically related words. Since gestures and speech are almost always produced together, this study examined whether iconic gestures accompanying speech would prime words and compared the priming effect of iconic gestures with speech to that of iconic gestures presented alone. Adult participants (N = 180) were randomly assigned to one of three conditions in a lexical decision task: Gestures-Only (the primes were iconic gestures presented alone); Speech-Only (the primes were auditory tokens conveying the same meaning as the iconic gestures); Gestures-Accompanying-Speech (the primes were the simultaneous coupling of iconic gestures and their corresponding auditory tokens). Our findings revealed significant priming effects in all three conditions. However, the priming effect in the Gestures-Accompanying-Speech condition was comparable to that in the Speech-Only condition and was significantly weaker than that in the Gestures-Only condition, suggesting that the facilitatory effect of iconic gestures accompanying speech may be constrained by the level of language processing required in the lexical decision task where linguistic processing of words forms is more dominant than semantic processing. Hence, the priming effect afforded by the co-speech iconic gestures was weakened. PMID:24155738
Potential interactions among linguistic, autonomic, and motor factors in speech.

PubMed

Kleinow, Jennifer; Smith, Anne

2006-05-01

Though anecdotal reports link certain speech disorders to increases in autonomic arousal, few studies have described the relationship between arousal and speech processes. Additionally, it is unclear how increases in arousal may interact with other cognitive-linguistic processes to affect speech motor control. In this experiment we examine potential interactions between autonomic arousal, linguistic processing, and speech motor coordination in adults and children. Autonomic responses (heart rate, finger pulse volume, tonic skin conductance, and phasic skin conductance) were recorded simultaneously with upper and lower lip movements during speech. The lip aperture variability (LA variability index) across multiple repetitions of sentences that varied in length and syntactic complexity was calculated under low- and high-arousal conditions. High arousal conditions were elicited by performance of the Stroop color word task. Children had significantly higher lip aperture variability index values across all speaking tasks, indicating more variable speech motor coordination. Increases in syntactic complexity and utterance length were associated with increases in speech motor coordination variability in both speaker groups. There was a significant effect of Stroop task, which produced increases in autonomic arousal and increased speech motor variability in both adults and children. These results provide novel evidence that high arousal levels can influence speech motor control in both adults and children. (c) 2006 Wiley Periodicals, Inc.
Role of Visual Speech in Phonological Processing by Children With Hearing Loss

PubMed Central

Jerger, Susan; Tye-Murray, Nancy; Abdi, Hervé

2011-01-01

Purpose This research assessed the influence of visual speech on phonological processing by children with hearing loss (HL). Method Children with HL and children with normal hearing (NH) named pictures while attempting to ignore auditory or audiovisual speech distractors whose onsets relative to the pictures were either congruent, conflicting in place of articulation, or conflicting in voicing—for example, the picture “pizza” coupled with the distractors “peach,” “teacher,” or “beast,” respectively. Speed of picture naming was measured. Results The conflicting conditions slowed naming, and phonological processing by children with HL displayed the age-related shift in sensitivity to visual speech seen in children with NH, although with developmental delay. Younger children with HL exhibited a disproportionately large influence of visual speech and a negligible influence of auditory speech, whereas older children with HL showed a robust influence of auditory speech with no benefit to performance from adding visual speech. The congruent conditions did not speed naming in children with HL, nor did the addition of visual speech influence performance. Unexpectedly, the /∧/-vowel congruent distractors slowed naming in children with HL and decreased articulatory proficiency. Conclusions Results for the conflicting conditions are consistent with the hypothesis that speech representations in children with HL (a) are initially disproportionally structured in terms of visual speech and (b) become better specified with age in terms of auditorily encoded information. PMID:19339701
Towards Contactless Silent Speech Recognition Based on Detection of Active and Visible Articulators Using IR-UWB Radar

PubMed Central

Shin, Young Hoon; Seo, Jiwon

2016-01-01

People with hearing or speaking disabilities are deprived of the benefits of conventional speech recognition technology because it is based on acoustic signals. Recent research has focused on silent speech recognition systems that are based on the motions of a speaker’s vocal tract and articulators. Because most silent speech recognition systems use contact sensors that are very inconvenient to users or optical systems that are susceptible to environmental interference, a contactless and robust solution is hence required. Toward this objective, this paper presents a series of signal processing algorithms for a contactless silent speech recognition system using an impulse radio ultra-wide band (IR-UWB) radar. The IR-UWB radar is used to remotely and wirelessly detect motions of the lips and jaw. In order to extract the necessary features of lip and jaw motions from the received radar signals, we propose a feature extraction algorithm. The proposed algorithm noticeably improved speech recognition performance compared to the existing algorithm during our word recognition test with five speakers. We also propose a speech activity detection algorithm to automatically select speech segments from continuous input signals. Thus, speech recognition processing is performed only when speech segments are detected. Our testbed consists of commercial off-the-shelf radar products, and the proposed algorithms are readily applicable without designing specialized radar hardware for silent speech processing. PMID:27801867
Towards Contactless Silent Speech Recognition Based on Detection of Active and Visible Articulators Using IR-UWB Radar.

PubMed

Shin, Young Hoon; Seo, Jiwon

2016-10-29

People with hearing or speaking disabilities are deprived of the benefits of conventional speech recognition technology because it is based on acoustic signals. Recent research has focused on silent speech recognition systems that are based on the motions of a speaker's vocal tract and articulators. Because most silent speech recognition systems use contact sensors that are very inconvenient to users or optical systems that are susceptible to environmental interference, a contactless and robust solution is hence required. Toward this objective, this paper presents a series of signal processing algorithms for a contactless silent speech recognition system using an impulse radio ultra-wide band (IR-UWB) radar. The IR-UWB radar is used to remotely and wirelessly detect motions of the lips and jaw. In order to extract the necessary features of lip and jaw motions from the received radar signals, we propose a feature extraction algorithm. The proposed algorithm noticeably improved speech recognition performance compared to the existing algorithm during our word recognition test with five speakers. We also propose a speech activity detection algorithm to automatically select speech segments from continuous input signals. Thus, speech recognition processing is performed only when speech segments are detected. Our testbed consists of commercial off-the-shelf radar products, and the proposed algorithms are readily applicable without designing specialized radar hardware for silent speech processing.
Research in speech communication.

PubMed

Flanagan, J

1995-10-24

Advances in digital speech processing are now supporting application and deployment of a variety of speech technologies for human/machine communication. In fact, new businesses are rapidly forming about these technologies. But these capabilities are of little use unless society can afford them. Happily, explosive advances in microelectronics over the past two decades have assured affordable access to this sophistication as well as to the underlying computing technology. The research challenges in speech processing remain in the traditionally identified areas of recognition, synthesis, and coding. These three areas have typically been addressed individually, often with significant isolation among the efforts. But they are all facets of the same fundamental issue--how to represent and quantify the information in the speech signal. This implies deeper understanding of the physics of speech production, the constraints that the conventions of language impose, and the mechanism for information processing in the auditory system. In ongoing research, therefore, we seek more accurate models of speech generation, better computational formulations of language, and realistic perceptual guides for speech processing--along with ways to coalesce the fundamental issues of recognition, synthesis, and coding. Successful solution will yield the long-sought dictation machine, high-quality synthesis from text, and the ultimate in low bit-rate transmission of speech. It will also open the door to language-translating telephony, where the synthetic foreign translation can be in the voice of the originating talker.
Poor Speech Perception Is Not a Core Deficit of Childhood Apraxia of Speech: Preliminary Findings

ERIC Educational Resources Information Center

Zuk, Jennifer; Iuzzini-Seigel, Jenya; Cabbage, Kathryn; Green, Jordan R.; Hogan, Tiffany P.

2018-01-01

Purpose: Childhood apraxia of speech (CAS) is hypothesized to arise from deficits in speech motor planning and programming, but the influence of abnormal speech perception in CAS on these processes is debated. This study examined speech perception abilities among children with CAS with and without language impairment compared to those with…
Spatio-Temporal Progression of Cortical Activity Related to Continuous Overt and Covert Speech Production in a Reading Task.

PubMed

Brumberg, Jonathan S; Krusienski, Dean J; Chakrabarti, Shreya; Gunduz, Aysegul; Brunner, Peter; Ritaccio, Anthony L; Schalk, Gerwin

2016-01-01

How the human brain plans, executes, and monitors continuous and fluent speech has remained largely elusive. For example, previous research has defined the cortical locations most important for different aspects of speech function, but has not yet yielded a definition of the temporal progression of involvement of those locations as speech progresses either overtly or covertly. In this paper, we uncovered the spatio-temporal evolution of neuronal population-level activity related to continuous overt speech, and identified those locations that shared activity characteristics across overt and covert speech. Specifically, we asked subjects to repeat continuous sentences aloud or silently while we recorded electrical signals directly from the surface of the brain (electrocorticography (ECoG)). We then determined the relationship between cortical activity and speech output across different areas of cortex and at sub-second timescales. The results highlight a spatio-temporal progression of cortical involvement in the continuous speech process that initiates utterances in frontal-motor areas and ends with the monitoring of auditory feedback in superior temporal gyrus. Direct comparison of cortical activity related to overt versus covert conditions revealed a common network of brain regions involved in speech that may implement orthographic and phonological processing. Our results provide one of the first characterizations of the spatiotemporal electrophysiological representations of the continuous speech process, and also highlight the common neural substrate of overt and covert speech. These results thereby contribute to a refined understanding of speech functions in the human brain.
Spatio-Temporal Progression of Cortical Activity Related to Continuous Overt and Covert Speech Production in a Reading Task

PubMed Central

Brumberg, Jonathan S.; Krusienski, Dean J.; Chakrabarti, Shreya; Gunduz, Aysegul; Brunner, Peter; Ritaccio, Anthony L.; Schalk, Gerwin

2016-01-01

How the human brain plans, executes, and monitors continuous and fluent speech has remained largely elusive. For example, previous research has defined the cortical locations most important for different aspects of speech function, but has not yet yielded a definition of the temporal progression of involvement of those locations as speech progresses either overtly or covertly. In this paper, we uncovered the spatio-temporal evolution of neuronal population-level activity related to continuous overt speech, and identified those locations that shared activity characteristics across overt and covert speech. Specifically, we asked subjects to repeat continuous sentences aloud or silently while we recorded electrical signals directly from the surface of the brain (electrocorticography (ECoG)). We then determined the relationship between cortical activity and speech output across different areas of cortex and at sub-second timescales. The results highlight a spatio-temporal progression of cortical involvement in the continuous speech process that initiates utterances in frontal-motor areas and ends with the monitoring of auditory feedback in superior temporal gyrus. Direct comparison of cortical activity related to overt versus covert conditions revealed a common network of brain regions involved in speech that may implement orthographic and phonological processing. Our results provide one of the first characterizations of the spatiotemporal electrophysiological representations of the continuous speech process, and also highlight the common neural substrate of overt and covert speech. These results thereby contribute to a refined understanding of speech functions in the human brain. PMID:27875590
[Vocal effectiveness in speech and singing: acoustical, physiological and perceptive aspects. applications in speech therapy].

PubMed

Pillot, C; Vaissière, J

2006-01-01

What is vocal effectiveness in lyrical singing in comparison to speech? Our study tries to answer this question, using vocal efficiency and spectral vocal effectiveness. Vocal efficiency was mesured for a trained and untrained subject. According to these invasive measures, it appears that the trained singer uses her larynx less efficiently. Efficiency of the larynx in terms of energy then appears to be secondary to the desired voice quality. The acoustic measures of spectral vocal effectiveness of vowels and sentences, spoken and sung by 23 singers, reveal two complementary markers: The "singing power ratio" and the difference in amplitude between the singing formant and the spectral minimum that follows it. Magnetic resonance imaging and simulations of [a], [i] and [o] spoken and sung show laryngeal lowering and the role of the piriform sinuses as the physiological foundations of spectral vocal effectiveness, perceptively related to carrying power. These scientifical aspects allow applications in voice therapy, such as physiological and perceptual foundations allowing patients to recuperate voice carrying power with or without background noise.
The effect of simultaneous text on the recall of noise-degraded speech.

PubMed

Grossman, Irina; Rajan, Ramesh

2017-05-01

Written and spoken language utilize the same processing system, enabling text to modulate speech processing. We investigated how simultaneously presented text affected speech recall in babble noise using a retrospective recall task. Participants were presented with text-speech sentence pairs in multitalker babble noise and then prompted to recall what they heard or what they read. In Experiment 1, sentence pairs were either congruent or incongruent and they were presented in silence or at 1 of 4 noise levels. Audio and Visual control groups were also tested with sentences presented in only 1 modality. Congruent text facilitated accurate recall of degraded speech; incongruent text had no effect. Text and speech were seldom confused for each other. A consideration of the effects of the language background found that monolingual English speakers outperformed early multilinguals at recalling degraded speech; however the effects of text on speech processing were analogous. Experiment 2 considered if the benefit provided by matching text was maintained when the congruency of the text and speech becomes more ambiguous because of the addition of partially mismatching text-speech sentence pairs that differed only on their final keyword and because of the use of low signal-to-noise ratios. The experiment focused on monolingual English speakers; the results showed that even though participants commonly confused text-for-speech during incongruent text-speech pairings, these confusions could not fully account for the benefit provided by matching text. Thus, we uniquely demonstrate that congruent text benefits the recall of noise-degraded speech. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Speech recognition technology: an outlook for human-to-machine interaction.

PubMed

Erdel, T; Crooks, S

2000-01-01

Speech recognition, as an enabling technology in healthcare-systems computing, is a topic that has been discussed for quite some time, but is just now coming to fruition. Traditionally, speech-recognition software has been constrained by hardware, but improved processors and increased memory capacities are starting to remove some of these limitations. With these barriers removed, companies that create software for the healthcare setting have the opportunity to write more successful applications. Among the criticisms of speech-recognition applications are the high rates of error and steep training curves. However, even in the face of such negative perceptions, there remains significant opportunities for speech recognition to allow healthcare providers and, more specifically, physicians, to work more efficiently and ultimately spend more time with their patients and less time completing necessary documentation. This article will identify opportunities for inclusion of speech-recognition technology in the healthcare setting and examine major categories of speech-recognition software--continuous speech recognition, command and control, and text-to-speech. We will discuss the advantages and disadvantages of each area, the limitations of the software today, and how future trends might affect them.
Selective spatial attention modulates bottom-up informational masking of speech

PubMed Central

Carlile, Simon; Corkhill, Caitlin

2015-01-01

To hear out a conversation against other talkers listeners overcome energetic and informational masking. Largely attributed to top-down processes, information masking has also been demonstrated using unintelligible speech and amplitude-modulated maskers suggesting bottom-up processes. We examined the role of speech-like amplitude modulations in information masking using a spatial masking release paradigm. Separating a target talker from two masker talkers produced a 20 dB improvement in speech reception threshold; 40% of which was attributed to a release from informational masking. When across frequency temporal modulations in the masker talkers are decorrelated the speech is unintelligible, although the within frequency modulation characteristics remains identical. Used as a masker as above, the information masking accounted for 37% of the spatial unmasking seen with this masker. This unintelligible and highly differentiable masker is unlikely to involve top-down processes. These data provides strong evidence of bottom-up masking involving speech-like, within-frequency modulations and that this, presumably low level process, can be modulated by selective spatial attention. PMID:25727100

Selective spatial attention modulates bottom-up informational masking of speech.

PubMed

Carlile, Simon; Corkhill, Caitlin

2015-03-02

To hear out a conversation against other talkers listeners overcome energetic and informational masking. Largely attributed to top-down processes, information masking has also been demonstrated using unintelligible speech and amplitude-modulated maskers suggesting bottom-up processes. We examined the role of speech-like amplitude modulations in information masking using a spatial masking release paradigm. Separating a target talker from two masker talkers produced a 20 dB improvement in speech reception threshold; 40% of which was attributed to a release from informational masking. When across frequency temporal modulations in the masker talkers are decorrelated the speech is unintelligible, although the within frequency modulation characteristics remains identical. Used as a masker as above, the information masking accounted for 37% of the spatial unmasking seen with this masker. This unintelligible and highly differentiable masker is unlikely to involve top-down processes. These data provides strong evidence of bottom-up masking involving speech-like, within-frequency modulations and that this, presumably low level process, can be modulated by selective spatial attention.
Noise suppression methods for robust speech processing

NASA Astrophysics Data System (ADS)

Boll, S. F.; Ravindra, H.; Randall, G.; Armantrout, R.; Power, R.

1980-05-01

Robust speech processing in practical operating environments requires effective environmental and processor noise suppression. This report describes the technical findings and accomplishments during this reporting period for the research program funded to develop real time, compressed speech analysis synthesis algorithms whose performance in invariant under signal contamination. Fulfillment of this requirement is necessary to insure reliable secure compressed speech transmission within realistic military command and control environments. Overall contributions resulting from this research program include the understanding of how environmental noise degrades narrow band, coded speech, development of appropriate real time noise suppression algorithms, and development of speech parameter identification methods that consider signal contamination as a fundamental element in the estimation process. This report describes the current research and results in the areas of noise suppression using the dual input adaptive noise cancellation using the short time Fourier transform algorithms, articulation rate change techniques, and a description of an experiment which demonstrated that the spectral subtraction noise suppression algorithm can improve the intelligibility of 2400 bps, LPC 10 coded, helicopter speech by 10.6 point.
Phonemic Characteristics of Apraxia of Speech Resulting from Subcortical Hemorrhage

ERIC Educational Resources Information Center

Peach, Richard K.; Tonkovich, John D.

2004-01-01

Reports describing subcortical apraxia of speech (AOS) have received little consideration in the development of recent speech processing models because the speech characteristics of patients with this diagnosis have not been described precisely. We describe a case of AOS with aphasia secondary to basal ganglia hemorrhage. Speech-language symptoms…
The Effectiveness of Clear Speech as a Masker

ERIC Educational Resources Information Center

Calandruccio, Lauren; Van Engen, Kristin; Dhar, Sumitrajit; Bradlow, Ann R.

2010-01-01

Purpose: It is established that speaking clearly is an effective means of enhancing intelligibility. Because any signal-processing scheme modeled after known acoustic-phonetic features of clear speech will likely affect both target and competing speech, it is important to understand how speech recognition is affected when a competing speech signal…
ON THE NATURE OF SPEECH SCIENCE.

ERIC Educational Resources Information Center

PETERSON, GORDON E.

IN THIS ARTICLE THE NATURE OF THE DISCIPLINE OF SPEECH SCIENCE IS CONSIDERED AND THE VARIOUS BASIC AND APPLIED AREAS OF THE DISCIPLINE ARE DISCUSSED. THE BASIC AREAS ENCOMPASS THE VARIOUS PROCESSES OF THE PHYSIOLOGY OF SPEECH PRODUCTION, THE ACOUSTICAL CHARACTERISTICS OF SPEECH, INCLUDING THE SPEECH WAVE TYPES AND THE INFORMATION-BEARING ACOUSTIC…
Attention Is Required for Knowledge-Based Sequential Grouping: Insights from the Integration of Syllables into Words.

PubMed

Ding, Nai; Pan, Xunyi; Luo, Cheng; Su, Naifei; Zhang, Wen; Zhang, Jianfeng

2018-01-31

How the brain groups sequential sensory events into chunks is a fundamental question in cognitive neuroscience. This study investigates whether top-down attention or specific tasks are required for the brain to apply lexical knowledge to group syllables into words. Neural responses tracking the syllabic and word rhythms of a rhythmic speech sequence were concurrently monitored using electroencephalography (EEG). The participants performed different tasks, attending to either the rhythmic speech sequence or a distractor, which was another speech stream or a nonlinguistic auditory/visual stimulus. Attention to speech, but not a lexical-meaning-related task, was required for reliable neural tracking of words, even when the distractor was a nonlinguistic stimulus presented cross-modally. Neural tracking of syllables, however, was reliably observed in all tested conditions. These results strongly suggest that neural encoding of individual auditory events (i.e., syllables) is automatic, while knowledge-based construction of temporal chunks (i.e., words) crucially relies on top-down attention. SIGNIFICANCE STATEMENT Why we cannot understand speech when not paying attention is an old question in psychology and cognitive neuroscience. Speech processing is a complex process that involves multiple stages, e.g., hearing and analyzing the speech sound, recognizing words, and combining words into phrases and sentences. The current study investigates which speech-processing stage is blocked when we do not listen carefully. We show that the brain can reliably encode syllables, basic units of speech sounds, even when we do not pay attention. Nevertheless, when distracted, the brain cannot group syllables into multisyllabic words, which are basic units for speech meaning. Therefore, the process of converting speech sound into meaning crucially relies on attention. Copyright © 2018 the authors 0270-6474/18/381178-11$15.00/0.
Voice technology and BBN

NASA Technical Reports Server (NTRS)

Wolf, Jared J.

1977-01-01

The following research was discussed: (1) speech signal processing; (2) automatic speech recognition; (3) continuous speech understanding; (4) speaker recognition; (5) speech compression; (6) subjective and objective evaluation of speech communication system; (7) measurement of the intelligibility and quality of speech when degraded by noise or other masking stimuli; (8) speech synthesis; (9) instructional aids for second-language learning and for training of the deaf; and (10) investigation of speech correlates of psychological stress. Experimental psychology, control systems, and human factors engineering, which are often relevant to the proper design and operation of speech systems are described.
Speech processing: from peripheral to hemispheric asymmetry of the auditory system.

PubMed

Lazard, Diane S; Collette, Jean-Louis; Perrot, Xavier

2012-01-01

Language processing from the cochlea to auditory association cortices shows side-dependent specificities with an apparent left hemispheric dominance. The aim of this article was to propose to nonspeech specialists a didactic review of two complementary theories about hemispheric asymmetry in speech processing. Starting from anatomico-physiological and clinical observations of auditory asymmetry and interhemispheric connections, this review then exposes behavioral (dichotic listening paradigm) as well as functional (functional magnetic resonance imaging and positron emission tomography) experiments that assessed hemispheric specialization for speech processing. Even though speech at an early phonological level is regarded as being processed bilaterally, a left-hemispheric dominance exists for higher-level processing. This asymmetry may arise from a segregation of the speech signal, broken apart within nonprimary auditory areas in two distinct temporal integration windows--a fast one on the left and a slower one on the right--modeled through the asymmetric sampling in time theory or a spectro-temporal trade-off, with a higher temporal resolution in the left hemisphere and a higher spectral resolution in the right hemisphere, modeled through the spectral/temporal resolution trade-off theory. Both theories deal with the concept that lower-order tuning principles for acoustic signal might drive higher-order organization for speech processing. However, the precise nature, mechanisms, and origin of speech processing asymmetry are still being debated. Finally, an example of hemispheric asymmetry alteration, which has direct clinical implications, is given through the case of auditory aging that mixes peripheral disorder and modifications of central processing. Copyright © 2011 The American Laryngological, Rhinological, and Otological Society, Inc.
Cohesive and coherent connected speech deficits in mild stroke.

PubMed

Barker, Megan S; Young, Breanne; Robinson, Gail A

2017-05-01

Spoken language production theories and lesion studies highlight several important prelinguistic conceptual preparation processes involved in the production of cohesive and coherent connected speech. Cohesion and coherence broadly connect sentences with preceding ideas and the overall topic. Broader cognitive mechanisms may mediate these processes. This study aims to investigate (1) whether stroke patients without aphasia exhibit impairments in cohesion and coherence in connected speech, and (2) the role of attention and executive functions in the production of connected speech. Eighteen stroke patients (8 right hemisphere stroke [RHS]; 6 left [LHS]) and 21 healthy controls completed two self-generated narrative tasks to elicit connected speech. A multi-level analysis of within and between-sentence processing ability was conducted. Cohesion and coherence impairments were found in the stroke group, particularly RHS patients, relative to controls. In the whole stroke group, better performance on the Hayling Test of executive function, which taps verbal initiation/suppression, was related to fewer propositional repetitions and global coherence errors. Better performance on attention tasks was related to fewer propositional repetitions, and decreased global coherence errors. In the RHS group, aspects of cohesive and coherent speech were associated with better performance on attention tasks. Better Hayling Test scores were related to more cohesive and coherent speech in RHS patients, and more coherent speech in LHS patients. Thus, we documented connected speech deficits in a heterogeneous stroke group without prominent aphasia. Our results suggest that broader cognitive processes may play a role in producing connected speech at the early conceptual preparation stage. Copyright © 2017 Elsevier Inc. All rights reserved.
Preschoolers Benefit From Visually Salient Speech Cues

PubMed Central

Holt, Rachael Frush

2015-01-01

Purpose This study explored visual speech influence in preschoolers using 3 developmentally appropriate tasks that vary in perceptual difficulty and task demands. They also examined developmental differences in the ability to use visually salient speech cues and visual phonological knowledge. Method Twelve adults and 27 typically developing 3- and 4-year-old children completed 3 audiovisual (AV) speech integration tasks: matching, discrimination, and recognition. The authors compared AV benefit for visually salient and less visually salient speech discrimination contrasts and assessed the visual saliency of consonant confusions in auditory-only and AV word recognition. Results Four-year-olds and adults demonstrated visual influence on all measures. Three-year-olds demonstrated visual influence on speech discrimination and recognition measures. All groups demonstrated greater AV benefit for the visually salient discrimination contrasts. AV recognition benefit in 4-year-olds and adults depended on the visual saliency of speech sounds. Conclusions Preschoolers can demonstrate AV speech integration. Their AV benefit results from efficient use of visually salient speech cues. Four-year-olds, but not 3-year-olds, used visual phonological knowledge to take advantage of visually salient speech cues, suggesting possible developmental differences in the mechanisms of AV benefit. PMID:25322336
Cortical Integration of Audio-Visual Information

PubMed Central

Vander Wyk, Brent C.; Ramsay, Gordon J.; Hudac, Caitlin M.; Jones, Warren; Lin, David; Klin, Ami; Lee, Su Mei; Pelphrey, Kevin A.

2013-01-01

We investigated the neural basis of audio-visual processing in speech and non-speech stimuli. Physically identical auditory stimuli (speech and sinusoidal tones) and visual stimuli (animated circles and ellipses) were used in this fMRI experiment. Relative to unimodal stimuli, each of the multimodal conjunctions showed increased activation in largely non-overlapping areas. The conjunction of Ellipse and Speech, which most resembles naturalistic audiovisual speech, showed higher activation in the right inferior frontal gyrus, fusiform gyri, left posterior superior temporal sulcus, and lateral occipital cortex. The conjunction of Circle and Tone, an arbitrary audio-visual pairing with no speech association, activated middle temporal gyri and lateral occipital cortex. The conjunction of Circle and Speech showed activation in lateral occipital cortex, and the conjunction of Ellipse and Tone did not show increased activation relative to unimodal stimuli. Further analysis revealed that middle temporal regions, although identified as multimodal only in the Circle-Tone condition, were more strongly active to Ellipse-Speech or Circle-Speech, but regions that were identified as multimodal for Ellipse-Speech were always strongest for Ellipse-Speech. Our results suggest that combinations of auditory and visual stimuli may together be processed by different cortical networks, depending on the extent to which speech or non-speech percepts are evoked. PMID:20709442
Speech perception in individuals with auditory dys-synchrony.

PubMed

Kumar, U A; Jayaram, M

2011-03-01

This study aimed to evaluate the effect of lengthening the transition duration of selected speech segments upon the perception of those segments in individuals with auditory dys-synchrony. Thirty individuals with auditory dys-synchrony participated in the study, along with 30 age-matched normal hearing listeners. Eight consonant-vowel syllables were used as auditory stimuli. Two experiments were conducted. Experiment one measured the 'just noticeable difference' time: the smallest prolongation of the speech sound transition duration which was noticeable by the subject. In experiment two, speech sounds were modified by lengthening the transition duration by multiples of the just noticeable difference time, and subjects' speech identification scores for the modified speech sounds were assessed. Subjects with auditory dys-synchrony demonstrated poor processing of temporal auditory information. Lengthening of speech sound transition duration improved these subjects' perception of both the placement and voicing features of the speech syllables used. These results suggest that innovative speech processing strategies which enhance temporal cues may benefit individuals with auditory dys-synchrony.
How does cognitive load influence speech perception? An encoding hypothesis.

PubMed

Mitterer, Holger; Mattys, Sven L

2017-01-01

Two experiments investigated the conditions under which cognitive load exerts an effect on the acuity of speech perception. These experiments extend earlier research by using a different speech perception task (four-interval oddity task) and by implementing cognitive load through a task often thought to be modular, namely, face processing. In the cognitive-load conditions, participants were required to remember two faces presented before the speech stimuli. In Experiment 1, performance in the speech-perception task under cognitive load was not impaired in comparison to a no-load baseline condition. In Experiment 2, we modified the load condition minimally such that it required encoding of the two faces simultaneously with the speech stimuli. As a reference condition, we also used a visual search task that in earlier experiments had led to poorer speech perception. Both concurrent tasks led to decrements in the speech task. The results suggest that speech perception is affected even by loads thought to be processed modularly, and that, critically, encoding in working memory might be the locus of interference.
Neuronal populations in the occipital cortex of the blind synchronize to the temporal dynamics of speech

PubMed Central

Van Ackeren, Markus Johannes; Barbero, Francesca M; Mattioni, Stefania; Bottini, Roberto

2018-01-01

The occipital cortex of early blind individuals (EB) activates during speech processing, challenging the notion of a hard-wired neurobiology of language. But, at what stage of speech processing do occipital regions participate in EB? Here we demonstrate that parieto-occipital regions in EB enhance their synchronization to acoustic fluctuations in human speech in the theta-range (corresponding to syllabic rate), irrespective of speech intelligibility. Crucially, enhanced synchronization to the intelligibility of speech was selectively observed in primary visual cortex in EB, suggesting that this region is at the interface between speech perception and comprehension. Moreover, EB showed overall enhanced functional connectivity between temporal and occipital cortices that are sensitive to speech intelligibility and altered directionality when compared to the sighted group. These findings suggest that the occipital cortex of the blind adopts an architecture that allows the tracking of speech material, and therefore does not fully abstract from the reorganized sensory inputs it receives. PMID:29338838
Maps and streams in the auditory cortex: nonhuman primates illuminate human speech processing

PubMed Central

Rauschecker, Josef P; Scott, Sophie K

2010-01-01

Speech and language are considered uniquely human abilities: animals have communication systems, but they do not match human linguistic skills in terms of recursive structure and combinatorial power. Yet, in evolution, spoken language must have emerged from neural mechanisms at least partially available in animals. In this paper, we will demonstrate how our understanding of speech perception, one important facet of language, has profited from findings and theory in nonhuman primate studies. Chief among these are physiological and anatomical studies showing that primate auditory cortex, across species, shows patterns of hierarchical structure, topographic mapping and streams of functional processing. We will identify roles for different cortical areas in the perceptual processing of speech and review functional imaging work in humans that bears on our understanding of how the brain decodes and monitors speech. A new model connects structures in the temporal, frontal and parietal lobes linking speech perception and production. PMID:19471271
Processing load induced by informational masking is related to linguistic abilities.

PubMed

Koelewijn, Thomas; Zekveld, Adriana A; Festen, Joost M; Rönnberg, Jerker; Kramer, Sophia E

2012-01-01

It is often assumed that the benefit of hearing aids is not primarily reflected in better speech performance, but that it is reflected in less effortful listening in the aided than in the unaided condition. Before being able to assess such a hearing aid benefit the present study examined how processing load while listening to masked speech relates to inter-individual differences in cognitive abilities relevant for language processing. Pupil dilation was measured in thirty-two normal hearing participants while listening to sentences masked by fluctuating noise or interfering speech at either 50% and 84% intelligibility. Additionally, working memory capacity, inhibition of irrelevant information, and written text reception was tested. Pupil responses were larger during interfering speech as compared to fluctuating noise. This effect was independent of intelligibility level. Regression analysis revealed that high working memory capacity, better inhibition, and better text reception were related to better speech reception thresholds. Apart from a positive relation to speech recognition, better inhibition and better text reception are also positively related to larger pupil dilation in the single-talker masker conditions. We conclude that better cognitive abilities not only relate to better speech perception, but also partly explain higher processing load in complex listening conditions.
An Intrinsically Digital Amplification Scheme for Hearing Aids

NASA Astrophysics Data System (ADS)

Blamey, Peter J.; Macfarlane, David S.; Steele, Brenton R.

2005-12-01

Results for linear and wide-dynamic range compression were compared with a new 64-channel digital amplification strategy in three separate studies. The new strategy addresses the requirements of the hearing aid user with efficient computations on an open-platform digital signal processor (DSP). The new amplification strategy is not modeled on prior analog strategies like compression and linear amplification, but uses statistical analysis of the signal to optimize the output dynamic range in each frequency band independently. Using the open-platform DSP processor also provided the opportunity for blind trial comparisons of the different processing schemes in BTE and ITE devices of a high commercial standard. The speech perception scores and questionnaire results show that it is possible to provide improved audibility for sound in many narrow frequency bands while simultaneously improving comfort, speech intelligibility in noise, and sound quality.
Emotional Speech Perception Unfolding in Time: The Role of the Basal Ganglia

PubMed Central

Paulmann, Silke; Ott, Derek V. M.; Kotz, Sonja A.

2011-01-01

The basal ganglia (BG) have repeatedly been linked to emotional speech processing in studies involving patients with neurodegenerative and structural changes of the BG. However, the majority of previous studies did not consider that (i) emotional speech processing entails multiple processing steps, and the possibility that (ii) the BG may engage in one rather than the other of these processing steps. In the present study we investigate three different stages of emotional speech processing (emotional salience detection, meaning-related processing, and identification) in the same patient group to verify whether lesions to the BG affect these stages in a qualitatively different manner. Specifically, we explore early implicit emotional speech processing (probe verification) in an ERP experiment followed by an explicit behavioral emotional recognition task. In both experiments, participants listened to emotional sentences expressing one of four emotions (anger, fear, disgust, happiness) or neutral sentences. In line with previous evidence patients and healthy controls show differentiation of emotional and neutral sentences in the P200 component (emotional salience detection) and a following negative-going brain wave (meaning-related processing). However, the behavioral recognition (identification stage) of emotional sentences was impaired in BG patients, but not in healthy controls. The current data provide further support that the BG are involved in late, explicit rather than early emotional speech processing stages. PMID:21437277
Perception of Melodic Contour and Intonation in Autism Spectrum Disorder: Evidence From Mandarin Speakers.

PubMed

Jiang, Jun; Liu, Fang; Wan, Xuan; Jiang, Cunmei

2015-07-01

Tone language experience benefits pitch processing in music and speech for typically developing individuals. No known studies have examined pitch processing in individuals with autism who speak a tone language. This study investigated discrimination and identification of melodic contour and speech intonation in a group of Mandarin-speaking individuals with high-functioning autism. Individuals with autism showed superior melodic contour identification but comparable contour discrimination relative to controls. In contrast, these individuals performed worse than controls on both discrimination and identification of speech intonation. These findings provide the first evidence for differential pitch processing in music and speech in tone language speakers with autism, suggesting that tone language experience may not compensate for speech intonation perception deficits in individuals with autism.
Brainstem transcription of speech is disrupted in children with autism spectrum disorders

PubMed Central

Russo, Nicole; Nicol, Trent; Trommer, Barbara; Zecker, Steve; Kraus, Nina

2009-01-01

Language impairment is a hallmark of autism spectrum disorders (ASD). The origin of the deficit is poorly understood although deficiencies in auditory processing have been detected in both perception and cortical encoding of speech sounds. Little is known about the processing and transcription of speech sounds at earlier (brainstem) levels or about how background noise may impact this transcription process. Unlike cortical encoding of sounds, brainstem representation preserves stimulus features with a degree of fidelity that enables a direct link between acoustic components of the speech syllable (e.g., onsets) to specific aspects of neural encoding (e.g., waves V and A). We measured brainstem responses to the syllable /da/, in quiet and background noise, in children with and without ASD. Children with ASD exhibited deficits in both the neural synchrony (timing) and phase locking (frequency encoding) of speech sounds, despite normal click-evoked brainstem responses. They also exhibited reduced magnitude and fidelity of speech-evoked responses and inordinate degradation of responses by background noise in comparison to typically developing controls. Neural synchrony in noise was significantly related to measures of core and receptive language ability. These data support the idea that abnormalities in the brainstem processing of speech contribute to the language impairment in ASD. Because it is both passively-elicited and malleable, the speech-evoked brainstem response may serve as a clinical tool to assess auditory processing as well as the effects of auditory training in the ASD population. PMID:19635083

Speech perception in autism spectrum disorder: An activation likelihood estimation meta-analysis.

PubMed

Tryfon, Ana; Foster, Nicholas E V; Sharda, Megha; Hyde, Krista L

2018-02-15

Autism spectrum disorder (ASD) is often characterized by atypical language profiles and auditory and speech processing. These can contribute to aberrant language and social communication skills in ASD. The study of the neural basis of speech perception in ASD can serve as a potential neurobiological marker of ASD early on, but mixed results across studies renders it difficult to find a reliable neural characterization of speech processing in ASD. To this aim, the present study examined the functional neural basis of speech perception in ASD versus typical development (TD) using an activation likelihood estimation (ALE) meta-analysis of 18 qualifying studies. The present study included separate analyses for TD and ASD, which allowed us to examine patterns of within-group brain activation as well as both common and distinct patterns of brain activation across the ASD and TD groups. Overall, ASD and TD showed mostly common brain activation of speech processing in bilateral superior temporal gyrus (STG) and left inferior frontal gyrus (IFG). However, the results revealed trends for some distinct activation in the TD group showing additional activation in higher-order brain areas including left superior frontal gyrus (SFG), left medial frontal gyrus (MFG), and right IFG. These results provide a more reliable neural characterization of speech processing in ASD relative to previous single neuroimaging studies and motivate future work to investigate how these brain signatures relate to behavioral measures of speech processing in ASD. Copyright © 2017 Elsevier B.V. All rights reserved.
Spatial Release From Masking in Simulated Cochlear Implant Users With and Without Access to Low-Frequency Acoustic Hearing

PubMed Central

Dietz, Mathias; Hohmann, Volker; Jürgens, Tim

2015-01-01

For normal-hearing listeners, speech intelligibility improves if speech and noise are spatially separated. While this spatial release from masking has already been quantified in normal-hearing listeners in many studies, it is less clear how spatial release from masking changes in cochlear implant listeners with and without access to low-frequency acoustic hearing. Spatial release from masking depends on differences in access to speech cues due to hearing status and hearing device. To investigate the influence of these factors on speech intelligibility, the present study measured speech reception thresholds in spatially separated speech and noise for 10 different listener types. A vocoder was used to simulate cochlear implant processing and low-frequency filtering was used to simulate residual low-frequency hearing. These forms of processing were combined to simulate cochlear implant listening, listening based on low-frequency residual hearing, and combinations thereof. Simulated cochlear implant users with additional low-frequency acoustic hearing showed better speech intelligibility in noise than simulated cochlear implant users without acoustic hearing and had access to more spatial speech cues (e.g., higher binaural squelch). Cochlear implant listener types showed higher spatial release from masking with bilateral access to low-frequency acoustic hearing than without. A binaural speech intelligibility model with normal binaural processing showed overall good agreement with measured speech reception thresholds, spatial release from masking, and spatial speech cues. This indicates that differences in speech cues available to listener types are sufficient to explain the changes of spatial release from masking across these simulated listener types. PMID:26721918
The influence of cochlear spectral processing on the timing and amplitude of the speech-evoked auditory brain stem response

PubMed Central

Nuttall, Helen E.; Moore, David R.; Barry, Johanna G.; Krumbholz, Katrin

2015-01-01

The speech-evoked auditory brain stem response (speech ABR) is widely considered to provide an index of the quality of neural temporal encoding in the central auditory pathway. The aim of the present study was to evaluate the extent to which the speech ABR is shaped by spectral processing in the cochlea. High-pass noise masking was used to record speech ABRs from delimited octave-wide frequency bands between 0.5 and 8 kHz in normal-hearing young adults. The latency of the frequency-delimited responses decreased from the lowest to the highest frequency band by up to 3.6 ms. The observed frequency-latency function was compatible with model predictions based on wave V of the click ABR. The frequency-delimited speech ABR amplitude was largest in the 2- to 4-kHz frequency band and decreased toward both higher and lower frequency bands despite the predominance of low-frequency energy in the speech stimulus. We argue that the frequency dependence of speech ABR latency and amplitude results from the decrease in cochlear filter width with decreasing frequency. The results suggest that the amplitude and latency of the speech ABR may reflect interindividual differences in cochlear, as well as central, processing. The high-pass noise-masking technique provides a useful tool for differentiating between peripheral and central effects on the speech ABR. It can be used for further elucidating the neural basis of the perceptual speech deficits that have been associated with individual differences in speech ABR characteristics. PMID:25787954
Prosody and Semantics Are Separate but Not Separable Channels in the Perception of Emotional Speech: Test for Rating of Emotions in Speech.

PubMed

Ben-David, Boaz M; Multani, Namita; Shakuf, Vered; Rudzicz, Frank; van Lieshout, Pascal H H M

2016-02-01

Our aim is to explore the complex interplay of prosody (tone of speech) and semantics (verbal content) in the perception of discrete emotions in speech. We implement a novel tool, the Test for Rating of Emotions in Speech. Eighty native English speakers were presented with spoken sentences made of different combinations of 5 discrete emotions (anger, fear, happiness, sadness, and neutral) presented in prosody and semantics. Listeners were asked to rate the sentence as a whole, integrating both speech channels, or to focus on one channel only (prosody or semantics). We observed supremacy of congruency, failure of selective attention, and prosodic dominance. Supremacy of congruency means that a sentence that presents the same emotion in both speech channels was rated highest; failure of selective attention means that listeners were unable to selectively attend to one channel when instructed; and prosodic dominance means that prosodic information plays a larger role than semantics in processing emotional speech. Emotional prosody and semantics are separate but not separable channels, and it is difficult to perceive one without the influence of the other. Our findings indicate that the Test for Rating of Emotions in Speech can reveal specific aspects in the processing of emotional speech and may in the future prove useful for understanding emotion-processing deficits in individuals with pathologies.
Effect of technological advances on cochlear implant performance in adults.

PubMed

Lenarz, Minoo; Joseph, Gert; Sönmez, Hasibe; Büchner, Andreas; Lenarz, Thomas

2011-12-01

To evaluate the effect of technological advances in the past 20 years on the hearing performance of a large cohort of adult cochlear implant (CI) patients. Individual, retrospective, cohort study. According to technological developments in electrode design and speech-processing strategies, we defined five virtual intervals on the time scale between 1984 and 2008. A cohort of 1,005 postlingually deafened adults was selected for this study, and their hearing performance with a CI was evaluated retrospectively according to these five technological intervals. The test battery was composed of four standard German speech tests: Freiburger monosyllabic test, speech tracking test, Hochmair-Schulz-Moser (HSM) sentence test in quiet, and HSM sentence test in 10 dB noise. The direct comparison of the speech perception in postlingually deafened adults, who were implanted during different technological periods, reveals an obvious improvement in the speech perception in patients who benefited from the recent electrode designs and speech-processing strategies. The major influence of technological advances on CI performance seems to be on speech perception in noise. Better speech perception in noisy surroundings is strong proof for demonstrating the success rate of new electrode designs and speech-processing strategies. Standard (internationally comparable) speech tests in noise should become an obligatory part of the postoperative test battery for adult CI patients. Copyright © 2011 The American Laryngological, Rhinological, and Otological Society, Inc.
Getting the cocktail party started: masking effects in speech perception

PubMed Central

Evans, S; McGettigan, C; Agnew, ZK; Rosen, S; Scott, SK

2016-01-01

Spoken conversations typically take place in noisy environments and different kinds of masking sounds place differing demands on cognitive resources. Previous studies, examining the modulation of neural activity associated with the properties of competing sounds, have shown that additional speech streams engage the superior temporal gyrus. However, the absence of a condition in which target speech was heard without additional masking made it difficult to identify brain networks specific to masking and to ascertain the extent to which competing speech was processed equivalently to target speech. In this study, we scanned young healthy adults with continuous functional Magnetic Resonance Imaging (fMRI), whilst they listened to stories masked by sounds that differed in their similarity to speech. We show that auditory attention and control networks are activated during attentive listening to masked speech in the absence of an overt behavioural task. We demonstrate that competing speech is processed predominantly in the left hemisphere within the same pathway as target speech but is not treated equivalently within that stream, and that individuals who perform better in speech in noise tasks activate the left mid-posterior superior temporal gyrus more. Finally, we identify neural responses associated with the onset of sounds in the auditory environment, activity was found within right lateralised frontal regions consistent with a phasic alerting response. Taken together, these results provide a comprehensive account of the neural processes involved in listening in noise. PMID:26696297
Functional overlap between regions involved in speech perception and in monitoring one's own voice during speech production.

PubMed

Zheng, Zane Z; Munhall, Kevin G; Johnsrude, Ingrid S

2010-08-01

The fluency and the reliability of speech production suggest a mechanism that links motor commands and sensory feedback. Here, we examined the neural organization supporting such links by using fMRI to identify regions in which activity during speech production is modulated according to whether auditory feedback matches the predicted outcome or not and by examining the overlap with the network recruited during passive listening to speech sounds. We used real-time signal processing to compare brain activity when participants whispered a consonant-vowel-consonant word ("Ted") and either heard this clearly or heard voice-gated masking noise. We compared this to when they listened to yoked stimuli (identical recordings of "Ted" or noise) without speaking. Activity along the STS and superior temporal gyrus bilaterally was significantly greater if the auditory stimulus was (a) processed as the auditory concomitant of speaking and (b) did not match the predicted outcome (noise). The network exhibiting this Feedback Type x Production/Perception interaction includes a superior temporal gyrus/middle temporal gyrus region that is activated more when listening to speech than to noise. This is consistent with speech production and speech perception being linked in a control system that predicts the sensory outcome of speech acts and that processes an error signal in speech-sensitive regions when this and the sensory data do not match.
Functional overlap between regions involved in speech perception and in monitoring one’s own voice during speech production

PubMed Central

Zheng, Zane Z.; Munhall, Kevin G; Johnsrude, Ingrid S

2009-01-01

The fluency and reliability of speech production suggests a mechanism that links motor commands and sensory feedback. Here, we examine the neural organization supporting such links by using fMRI to identify regions in which activity during speech production is modulated according to whether auditory feedback matches the predicted outcome or not, and examining the overlap with the network recruited during passive listening to speech sounds. We use real-time signal processing to compare brain activity when participants whispered a consonant-vowel-consonant word (‘Ted’) and either heard this clearly, or heard voice-gated masking noise. We compare this to when they listened to yoked stimuli (identical recordings of ‘Ted’ or noise) without speaking. Activity along the superior temporal sulcus (STS) and superior temporal gyrus (STG) bilaterally was significantly greater if the auditory stimulus was a) processed as the auditory concomitant of speaking and b) did not match the predicted outcome (noise). The network exhibiting this Feedback type by Production/Perception interaction includes an STG/MTG region that is activated more when listening to speech than to noise. This is consistent with speech production and speech perception being linked in a control system that predicts the sensory outcome of speech acts, and that processes an error signal in speech-sensitive regions when this and the sensory data do not match. PMID:19642886
An efficient robust sound classification algorithm for hearing aids.

PubMed

Nordqvist, Peter; Leijon, Arne

2004-06-01

An efficient robust sound classification algorithm based on hidden Markov models is presented. The system would enable a hearing aid to automatically change its behavior for differing listening environments according to the user's preferences. This work attempts to distinguish between three listening environment categories: speech in traffic noise, speech in babble, and clean speech, regardless of the signal-to-noise ratio. The classifier uses only the modulation characteristics of the signal. The classifier ignores the absolute sound pressure level and the absolute spectrum shape, resulting in an algorithm that is robust against irrelevant acoustic variations. The measured classification hit rate was 96.7%-99.5% when the classifier was tested with sounds representing one of the three environment categories included in the classifier. False-alarm rates were 0.2%-1.7% in these tests. The algorithm is robust and efficient and consumes a small amount of instructions and memory. It is fully possible to implement the classifier in a DSP-based hearing instrument.
Characterizing Speech Intelligibility in Noise After Wide Dynamic Range Compression.

PubMed

Rhebergen, Koenraad S; Maalderink, Thijs H; Dreschler, Wouter A

The effects of nonlinear signal processing on speech intelligibility in noise are difficult to evaluate. Often, the effects are examined by comparing speech intelligibility scores with and without processing measured at fixed signal to noise ratios (SNRs) or by comparing the adaptive measured speech reception thresholds corresponding to 50% intelligibility (SRT50) with and without processing. These outcome measures might not be optimal. Measuring at fixed SNRs can be affected by ceiling or floor effects, because the range of relevant SNRs is not know in advance. The SRT50 is less time consuming, has a fixed performance level (i.e., 50% correct), but the SRT50 could give a limited view, because we hypothesize that the effect of most nonlinear signal processing algorithms at the SRT50 cannot be generalized to other points of the psychometric function. In this article, we tested the value of estimating the entire psychometric function. We studied the effect of wide dynamic range compression (WDRC) on speech intelligibility in stationary, and interrupted speech-shaped noise in normal-hearing subjects, using a fast method-based local linear fitting approach and by two adaptive procedures. The measured performance differences for conditions with and without WDRC for the psychometric functions in stationary noise and interrupted speech-shaped noise show that the effects of WDRC on speech intelligibility are SNR dependent. We conclude that favorable and unfavorable effects of WDRC on speech intelligibility can be missed if the results are presented in terms of SRT50 values only.
Research in speech communication.

PubMed Central

Flanagan, J

1995-01-01

Advances in digital speech processing are now supporting application and deployment of a variety of speech technologies for human/machine communication. In fact, new businesses are rapidly forming about these technologies. But these capabilities are of little use unless society can afford them. Happily, explosive advances in microelectronics over the past two decades have assured affordable access to this sophistication as well as to the underlying computing technology. The research challenges in speech processing remain in the traditionally identified areas of recognition, synthesis, and coding. These three areas have typically been addressed individually, often with significant isolation among the efforts. But they are all facets of the same fundamental issue--how to represent and quantify the information in the speech signal. This implies deeper understanding of the physics of speech production, the constraints that the conventions of language impose, and the mechanism for information processing in the auditory system. In ongoing research, therefore, we seek more accurate models of speech generation, better computational formulations of language, and realistic perceptual guides for speech processing--along with ways to coalesce the fundamental issues of recognition, synthesis, and coding. Successful solution will yield the long-sought dictation machine, high-quality synthesis from text, and the ultimate in low bit-rate transmission of speech. It will also open the door to language-translating telephony, where the synthetic foreign translation can be in the voice of the originating talker. Images Fig. 1 Fig. 2 Fig. 5 Fig. 8 Fig. 11 Fig. 12 Fig. 13 PMID:7479806
Common features of fluency-evoking conditions studied in stuttering subjects and controls: an H(2)15O PET study.

PubMed

Stager, Sheila V; Jeffries, Keith J; Braun, Allen R

2003-01-01

We used H(2)15O PET to characterize the common features of two successful but markedly different fluency-evoking conditions -- paced speech and singing -- in order to identify brain mechanisms that enable fluent speech in people who stutter. To do so, we compared responses under fluency-evoking conditions with responses elicited by tasks that typically elicit dysfluent speech (quantifying the degree of stuttering and using this measure as a confounding covariate in our analyses). We evaluated task-related activations in both stuttering subjects and age- and gender-matched controls. Areas that were either uniquely activated during fluency-evoking conditions, or in which the magnitude of activation was significantly greater during fluency-evoking than dysfluency-evoking tasks included auditory association areas that process speech and voice and motor regions related to control of the larynx and oral articulators. This suggests that a common fluency-evoking mechanism might relate to more effective coupling of auditory and motor systems -- that is, more efficient self-monitoring, allowing motor areas to more effectively modify speech. These effects were seen in both PWS and controls, suggesting that they are due to the sensorimotor or cognitive demands of the fluency-evoking tasks themselves. While responses seen in both groups were bilateral, however, the fluency-evoking tasks elicited more robust activation of auditory and motor regions within the left hemisphere of stuttering subjects, suggesting a role for the left hemisphere in compensatory processes that enable fluency. The reader will learn about and be able to: (1) compare brain activation patterns under fluency- and dysfluency-evoking conditions in stuttering and control subjects; (2) appraise the common features, both central and peripheral, of fluency-evoking conditions; and (3) discuss ways in which neuroimaging methods can be used to understand the pathophysiology of stuttering.
A Model for Speech Processing in Second Language Listening Activities

ERIC Educational Resources Information Center

Zoghbor, Wafa Shahada

2016-01-01

Teachers' understanding of the process of speech perception could inform practice in listening classrooms. Catford (1950) developed a model for speech perception taking into account the influence of the acoustic features of the linguistic forms used by the speaker, whereby the listener "identifies" and "interprets" these…
Positron Emission Tomography in Cochlear Implant and Auditory Brainstem Implant Recipients.

ERIC Educational Resources Information Center

Miyamoto, Richard T.; Wong, Donald

2001-01-01

Positron emission tomography imaging was used to evaluate the brain's response to auditory stimulation, including speech, in deaf adults (five with cochlear implants and one with an auditory brainstem implant). Functional speech processing was associated with activation in areas classically associated with speech processing. (Contains five…
Differences in Talker Recognition by Preschoolers and Adults

ERIC Educational Resources Information Center

Creel, Sarah C.; Jimenez, Sofia R.

2012-01-01

Talker variability in speech influences language processing from infancy through adulthood and is inextricably embedded in the very cues that identify speech sounds. Yet little is known about developmental changes in the processing of talker information. On one account, children have not yet learned to separate speech sound variability from…
Masked Speech Recognition and Reading Ability in School-Age Children: Is There a Relationship?

ERIC Educational Resources Information Center

Miller, Gabrielle; Lewis, Barbara; Benchek, Penelope; Buss, Emily; Calandruccio, Lauren

2018-01-01

Purpose: The relationship between reading (decoding) skills, phonological processing abilities, and masked speech recognition in typically developing children was explored. This experiment was designed to evaluate the relationship between phonological processing and decoding abilities and 2 aspects of masked speech recognition in typically…
Status Report on Speech Research, No. 27, July-September 1971.

ERIC Educational Resources Information Center

Haskins Labs., New Haven, CT.

This report contains fourteen papers on a wide range of current topics and experiments in speech research, ranging from the relationship between speech and reading to questions of memory and perception of speech sounds. The following papers are included: "How Is Language Conveyed by Speech?;""Reading, the Linguistic Process, and Linguistic…
Speech and Language Skills of Parents of Children with Speech Sound Disorders

ERIC Educational Resources Information Center

Lewis, Barbara A.; Freebairn, Lisa A.; Hansen, Amy J.; Miscimarra, Lara; Iyengar, Sudha K.; Taylor, H. Gerry

2007-01-01

Purpose: This study compared parents with histories of speech sound disorders (SSD) to parents without known histories on measures of speech sound production, phonological processing, language, reading, and spelling. Familial aggregation for speech and language disorders was also examined. Method: The participants were 147 parents of children with…
Musical intervention enhances infants’ neural processing of temporal structure in music and speech

PubMed Central

Zhao, T. Christina; Kuhl, Patricia K.

2016-01-01

Individuals with music training in early childhood show enhanced processing of musical sounds, an effect that generalizes to speech processing. However, the conclusions drawn from previous studies are limited due to the possible confounds of predisposition and other factors affecting musicians and nonmusicians. We used a randomized design to test the effects of a laboratory-controlled music intervention on young infants’ neural processing of music and speech. Nine-month-old infants were randomly assigned to music (intervention) or play (control) activities for 12 sessions. The intervention targeted temporal structure learning using triple meter in music (e.g., waltz), which is difficult for infants, and it incorporated key characteristics of typical infant music classes to maximize learning (e.g., multimodal, social, and repetitive experiences). Controls had similar multimodal, social, repetitive play, but without music. Upon completion, infants’ neural processing of temporal structure was tested in both music (tones in triple meter) and speech (foreign syllable structure). Infants’ neural processing was quantified by the mismatch response (MMR) measured with a traditional oddball paradigm using magnetoencephalography (MEG). The intervention group exhibited significantly larger MMRs in response to music temporal structure violations in both auditory and prefrontal cortical regions. Identical results were obtained for temporal structure changes in speech. The intervention thus enhanced temporal structure processing not only in music, but also in speech, at 9 mo of age. We argue that the intervention enhanced infants’ ability to extract temporal structure information and to predict future events in time, a skill affecting both music and speech processing. PMID:27114512
Musical intervention enhances infants' neural processing of temporal structure in music and speech.

PubMed

Zhao, T Christina; Kuhl, Patricia K

2016-05-10

Individuals with music training in early childhood show enhanced processing of musical sounds, an effect that generalizes to speech processing. However, the conclusions drawn from previous studies are limited due to the possible confounds of predisposition and other factors affecting musicians and nonmusicians. We used a randomized design to test the effects of a laboratory-controlled music intervention on young infants' neural processing of music and speech. Nine-month-old infants were randomly assigned to music (intervention) or play (control) activities for 12 sessions. The intervention targeted temporal structure learning using triple meter in music (e.g., waltz), which is difficult for infants, and it incorporated key characteristics of typical infant music classes to maximize learning (e.g., multimodal, social, and repetitive experiences). Controls had similar multimodal, social, repetitive play, but without music. Upon completion, infants' neural processing of temporal structure was tested in both music (tones in triple meter) and speech (foreign syllable structure). Infants' neural processing was quantified by the mismatch response (MMR) measured with a traditional oddball paradigm using magnetoencephalography (MEG). The intervention group exhibited significantly larger MMRs in response to music temporal structure violations in both auditory and prefrontal cortical regions. Identical results were obtained for temporal structure changes in speech. The intervention thus enhanced temporal structure processing not only in music, but also in speech, at 9 mo of age. We argue that the intervention enhanced infants' ability to extract temporal structure information and to predict future events in time, a skill affecting both music and speech processing.

How Does Reading Performance Modulate the Impact of Orthographic Knowledge on Speech Processing? A Comparison of Normal Readers and Dyslexic Adults

ERIC Educational Resources Information Center

Pattamadilok, Chotiga; Nelis, Aubéline; Kolinsky, Régine

2014-01-01

Studies on proficient readers showed that speech processing is affected by knowledge of the orthographic code. Yet, the automaticity of the orthographic influence depends on task demand. Here, we addressed this automaticity issue in normal and dyslexic adult readers by comparing the orthographic effects obtained in two speech processing tasks that…
The Relationship between Spoken Language and Speech and Nonspeech Processing in Children with Autism: A Magnetic Event-Related Field Study

ERIC Educational Resources Information Center

Yau, Shu Hui; Brock, Jon; McArthur, Genevieve

2016-01-01

It has been proposed that language impairments in children with Autism Spectrum Disorders (ASD) stem from atypical neural processing of speech and/or nonspeech sounds. However, the strength of this proposal is compromised by the unreliable outcomes of previous studies of speech and nonspeech processing in ASD. The aim of this study was to…
The Role of Audiovisual Speech in the Early Stages of Lexical Processing as Revealed by the ERP Word Repetition Effect

ERIC Educational Resources Information Center

Basirat, Anahita; Brunellière, Angèle; Hartsuiker, Robert

2018-01-01

Numerous studies suggest that audiovisual speech influences lexical processing. However, it is not clear which stages of lexical processing are modulated by audiovisual speech. In this study, we examined the time course of the access to word representations in long-term memory when they were presented in auditory-only and audiovisual modalities.…
The prevalence of stuttering, voice, and speech-sound disorders in primary school students in Australia.

PubMed

McKinnon, David H; McLeod, Sharynne; Reilly, Sheena

2007-01-01

The aims of this study were threefold: to report teachers' estimates of the prevalence of speech disorders (specifically, stuttering, voice, and speech-sound disorders); to consider correspondence between the prevalence of speech disorders and gender, grade level, and socioeconomic status; and to describe the level of support provided to schoolchildren with speech disorders. Students with speech disorders were identified from 10,425 students in Australia using a 4-stage process: training in the data collection process, teacher identification, confirmation by a speech-language pathologist, and consultation with district special needs advisors. The prevalence of students with speech disorders was estimated; specifically, 0.33% of students were identified as stuttering, 0.12% as having a voice disorder, and 1.06% as having a speech-sound disorder. There was a higher prevalence of speech disorders in males than in females. As grade level increased, the prevalence of speech disorders decreased. There was no significant difference in the pattern of prevalence across the three speech disorders and four socioeconomic groups; however, students who were identified with a speech disorder were more likely to be in the higher socioeconomic groups. Finally, there was a difference between the perceived and actual level of support that was provided to these students. These prevalence figures are lower than those using initial identification by speech-language pathologists and similar to those using parent report.
Role of contextual cues on the perception of spectrally reduced interrupted speech.

PubMed

Patro, Chhayakanta; Mendel, Lisa Lucks

2016-08-01

Understanding speech within an auditory scene is constantly challenged by interfering noise in suboptimal listening environments when noise hinders the continuity of the speech stream. In such instances, a typical auditory-cognitive system perceptually integrates available speech information and "fills in" missing information in the light of semantic context. However, individuals with cochlear implants (CIs) find it difficult and effortful to understand interrupted speech compared to their normal hearing counterparts. This inefficiency in perceptual integration of speech could be attributed to further degradations in the spectral-temporal domain imposed by CIs making it difficult to utilize the contextual evidence effectively. To address these issues, 20 normal hearing adults listened to speech that was spectrally reduced and spectrally reduced interrupted in a manner similar to CI processing. The Revised Speech Perception in Noise test, which includes contextually rich and contextually poor sentences, was used to evaluate the influence of semantic context on speech perception. Results indicated that listeners benefited more from semantic context when they listened to spectrally reduced speech alone. For the spectrally reduced interrupted speech, contextual information was not as helpful under significant spectral reductions, but became beneficial as the spectral resolution improved. These results suggest top-down processing facilitates speech perception up to a point, and it fails to facilitate speech understanding when the speech signals are significantly degraded.
The Auditory-Brainstem Response to Continuous, Non-repetitive Speech Is Modulated by the Speech Envelope and Reflects Speech Processing

PubMed Central

Reichenbach, Chagit S.; Braiman, Chananel; Schiff, Nicholas D.; Hudspeth, A. J.; Reichenbach, Tobias

2016-01-01

The auditory-brainstem response (ABR) to short and simple acoustical signals is an important clinical tool used to diagnose the integrity of the brainstem. The ABR is also employed to investigate the auditory brainstem in a multitude of tasks related to hearing, such as processing speech or selectively focusing on one speaker in a noisy environment. Such research measures the response of the brainstem to short speech signals such as vowels or words. Because the voltage signal of the ABR has a tiny amplitude, several hundred to a thousand repetitions of the acoustic signal are needed to obtain a reliable response. The large number of repetitions poses a challenge to assessing cognitive functions due to neural adaptation. Here we show that continuous, non-repetitive speech, lasting several minutes, may be employed to measure the ABR. Because the speech is not repeated during the experiment, the precise temporal form of the ABR cannot be determined. We show, however, that important structural features of the ABR can nevertheless be inferred. In particular, the brainstem responds at the fundamental frequency of the speech signal, and this response is modulated by the envelope of the voiced parts of speech. We accordingly introduce a novel measure that assesses the ABR as modulated by the speech envelope, at the fundamental frequency of speech and at the characteristic latency of the response. This measure has a high signal-to-noise ratio and can hence be employed effectively to measure the ABR to continuous speech. We use this novel measure to show that the ABR is weaker to intelligible speech than to unintelligible, time-reversed speech. The methods presented here can be employed for further research on speech processing in the auditory brainstem and can lead to the development of future clinical diagnosis of brainstem function. PMID:27303286
Cognitive Spare Capacity and Speech Communication: A Narrative Overview

PubMed Central

2014-01-01

Background noise can make speech communication tiring and cognitively taxing, especially for individuals with hearing impairment. It is now well established that better working memory capacity is associated with better ability to understand speech under adverse conditions as well as better ability to benefit from the advanced signal processing in modern hearing aids. Recent work has shown that although such processing cannot overcome hearing handicap, it can increase cognitive spare capacity, that is, the ability to engage in higher level processing of speech. This paper surveys recent work on cognitive spare capacity and suggests new avenues of investigation. PMID:24971355
Processing of speech and non-speech stimuli in children with specific language impairment

NASA Astrophysics Data System (ADS)

Basu, Madhavi L.; Surprenant, Aimee M.

2003-10-01

Specific Language Impairment (SLI) is a developmental language disorder in which children demonstrate varying degrees of difficulties in acquiring a spoken language. One possible underlying cause is that children with SLI have deficits in processing sounds that are of short duration or when they are presented rapidly. Studies so far have compared their performance on speech and nonspeech sounds of unequal complexity. Hence, it is still unclear whether the deficit is specific to the perception of speech sounds or whether it more generally affects the auditory function. The current study aims to answer this question by comparing the performance of children with SLI on speech and nonspeech sounds synthesized from sine-wave stimuli. The children will be tested using the classic categorical perception paradigm that includes both the identification and discrimination of stimuli along a continuum. If there is a deficit in the performance on both speech and nonspeech tasks, it will show that these children have a deficit in processing complex sounds. Poor performance on only the speech sounds will indicate that the deficit is more related to language. The findings will offer insights into the exact nature of the speech perception deficits in children with SLI. [Work supported by ASHF.
The pupil response is sensitive to divided attention during speech processing.

PubMed

Koelewijn, Thomas; Shinn-Cunningham, Barbara G; Zekveld, Adriana A; Kramer, Sophia E

2014-06-01

Dividing attention over two streams of speech strongly decreases performance compared to focusing on only one. How divided attention affects cognitive processing load as indexed with pupillometry during speech recognition has so far not been investigated. In 12 young adults the pupil response was recorded while they focused on either one or both of two sentences that were presented dichotically and masked by fluctuating noise across a range of signal-to-noise ratios. In line with previous studies, the performance decreases when processing two target sentences instead of one. Additionally, dividing attention to process two sentences caused larger pupil dilation and later peak pupil latency than processing only one. This suggests an effect of attention on cognitive processing load (pupil dilation) during speech processing in noise. Copyright © 2014 The Authors. Published by Elsevier B.V. All rights reserved.
Cognitive processing specificity of anxious apprehension: impact on distress and performance during speech exposure.

PubMed

Philippot, Pierre; Vrielynck, Nathalie; Muller, Valérie

2010-12-01

The present study examined the impact of different modes of processing anxious apprehension on subsequent anxiety and performance in a stressful speech task. Participants were informed that they would have to give a speech on a difficult topic while being videotaped and evaluated on their performance. They were then randomly assigned to one of three conditions. In a specific processing condition, they were encouraged to explore in detail all the specific aspects (thoughts, emotions, sensations) they experienced while anticipating giving the speech; in a general processing condition, they had to focus on the generic aspects that they would typically experience during anxious anticipation; and in a control, no-processing condition, participants were distracted. Results revealed that at the end of the speech, participants in the specific processing condition reported less anxiety than those in the two other conditions. They were also evaluated by judges to have performed better than those in the control condition, who in turn did better than those in the general processing condition. Copyright © 2010. Published by Elsevier Ltd.
Neural integration of iconic and unrelated coverbal gestures: a functional MRI study.

PubMed

Green, Antonia; Straube, Benjamin; Weis, Susanne; Jansen, Andreas; Willmes, Klaus; Konrad, Kerstin; Kircher, Tilo

2009-10-01

Gestures are an important part of interpersonal communication, for example by illustrating physical properties of speech contents (e.g., "the ball is round"). The meaning of these so-called iconic gestures is strongly intertwined with speech. We investigated the neural correlates of the semantic integration for verbal and gestural information. Participants watched short videos of five speech and gesture conditions performed by an actor, including variation of language (familiar German vs. unfamiliar Russian), variation of gesture (iconic vs. unrelated), as well as isolated familiar language, while brain activation was measured using functional magnetic resonance imaging. For familiar speech with either of both gesture types contrasted to Russian speech-gesture pairs, activation increases were observed at the left temporo-occipital junction. Apart from this shared location, speech with iconic gestures exclusively engaged left occipital areas, whereas speech with unrelated gestures activated bilateral parietal and posterior temporal regions. Our results demonstrate that the processing of speech with speech-related versus speech-unrelated gestures occurs in two distinct but partly overlapping networks. The distinct processing streams (visual versus linguistic/spatial) are interpreted in terms of "auxiliary systems" allowing the integration of speech and gesture in the left temporo-occipital region.
Vocal Pitch Discrimination in the Motor System

ERIC Educational Resources Information Center

D'Ausilio, Alessandro; Bufalari, Ilaria; Salmas, Paola; Busan, Pierpaolo; Fadiga, Luciano

2011-01-01

Speech production can be broadly separated into two distinct components: Phonation and Articulation. These two aspects require the efficient control of several phono-articulatory effectors. Speech is indeed generated by the vibration of the vocal-folds in the larynx (F0) followed by "filtering" by articulators, to select certain resonant…
The effect of hearing aid technologies on listening in an automobile.

PubMed

Wu, Yu-Hsiang; Stangl, Elizabeth; Bentler, Ruth A; Stanziola, Rachel W

2013-06-01

Communication while traveling in an automobile often is very difficult for hearing aid users. This is because the automobile/road noise level is usually high, and listeners/drivers often do not have access to visual cues. Since the talker of interest usually is not located in front of the listener/driver, conventional directional processing that places the directivity beam toward the listener's front may not be helpful and, in fact, could have a negative impact on speech recognition (when compared to omnidirectional processing). Recently, technologies have become available in commercial hearing aids that are designed to improve speech recognition and/or listening effort in noisy conditions where talkers are located behind or beside the listener. These technologies include (1) a directional microphone system that uses a backward-facing directivity pattern (Back-DIR processing), (2) a technology that transmits audio signals from the ear with the better signal-to-noise ratio (SNR) to the ear with the poorer SNR (Side-Transmission processing), and (3) a signal processing scheme that suppresses the noise at the ear with the poorer SNR (Side-Suppression processing). The purpose of the current study was to determine the effect of (1) conventional directional microphones and (2) newer signal processing schemes (Back-DIR, Side-Transmission, and Side-Suppression) on listener's speech recognition performance and preference for communication in a traveling automobile. A single-blinded, repeated-measures design was used. Twenty-five adults with bilateral symmetrical sensorineural hearing loss aged 44 through 84 yr participated in the study. The automobile/road noise and sentences of the Connected Speech Test (CST) were recorded through hearing aids in a standard van moving at a speed of 70 mph on a paved highway. The hearing aids were programmed to omnidirectional microphone, conventional adaptive directional microphone, and the three newer schemes. CST sentences were presented from the side and back of the hearing aids, which were placed on the ears of a manikin. The recorded stimuli were presented to listeners via earphones in a sound-treated booth to assess speech recognition performance and preference with each programmed condition. Compared to omnidirectional microphones, conventional adaptive directional processing had a detrimental effect on speech recognition when speech was presented from the back or side of the listener. Back-DIR and Side-Transmission processing improved speech recognition performance (relative to both omnidirectional and adaptive directional processing) when speech was from the back and side, respectively. The performance with Side-Suppression processing was better than with adaptive directional processing when speech was from the side. The participants' preferences for a given processing scheme were generally consistent with speech recognition results. The finding that performance with adaptive directional processing was poorer than with omnidirectional microphones demonstrates the importance of selecting the correct microphone technology for different listening situations. The results also suggest the feasibility of using hearing aid technologies to provide a better listening experience for hearing aid users in automobiles. American Academy of Audiology.
V2S: Voice to Sign Language Translation System for Malaysian Deaf People

NASA Astrophysics Data System (ADS)

Mean Foong, Oi; Low, Tang Jung; La, Wai Wan

The process of learning and understand the sign language may be cumbersome to some, and therefore, this paper proposes a solution to this problem by providing a voice (English Language) to sign language translation system using Speech and Image processing technique. Speech processing which includes Speech Recognition is the study of recognizing the words being spoken, regardless of whom the speaker is. This project uses template-based recognition as the main approach in which the V2S system first needs to be trained with speech pattern based on some generic spectral parameter set. These spectral parameter set will then be stored as template in a database. The system will perform the recognition process through matching the parameter set of the input speech with the stored templates to finally display the sign language in video format. Empirical results show that the system has 80.3% recognition rate.
A dissociation of objective and subjective workload measures in assessing the impact of speech controls in advanced helicopters

NASA Technical Reports Server (NTRS)

Vidulich, Michael A.; Bortolussi, Michael R.

1988-01-01

Among the new technologies that are expected to aid helicopter designers are speech controls. Proponents suggest that speech controls could reduce the potential for manual control overloads and improve time-sharing performance in environments that have heavy demands for manual control. This was tested in a simulation of an advanced single-pilot, scout/attack helicopter. Objective performance indicated that the speech controls were effective in decreasing the interference of discrete responses during moments of heavy flight control activity. However, subjective ratings indicated that the use of speech controls required extra effort to speak precisely and to attend to feedback. Although the operational reliability of speech controls must be improved, the present results indicate that reliable speech controls could enhance the time-sharing efficiency of helicopter pilots. Furthermore, the results demonstrated the importance of using multiple assessment techniques to completely assess a task. Neither the objective nor the subjective measures alone provided complete information. It was the contrast between the measures that was most informative.
A special purpose knowledge-based face localization method

NASA Astrophysics Data System (ADS)

Hassanat, Ahmad; Jassim, Sabah

2008-04-01

This paper is concerned with face localization for visual speech recognition (VSR) system. Face detection and localization have got a great deal of attention in the last few years, because it is an essential pre-processing step in many techniques that handle or deal with faces, (e.g. age, face, gender, race and visual speech recognition). We shall present an efficient method for localization human's faces in video images captured on mobile constrained devices, under a wide variation in lighting conditions. We use a multiphase method that may include all or some of the following steps starting with image pre-processing, followed by a special purpose edge detection, then an image refinement step. The output image will be passed through a discrete wavelet decomposition procedure, and the computed LL sub-band at a certain level will be transformed into a binary image that will be scanned by using a special template to select a number of possible candidate locations. Finally, we fuse the scores from the wavelet step with scores determined by color information for the candidate location and employ a form of fuzzy logic to distinguish face from non-face locations. We shall present results of large number of experiments to demonstrate that the proposed face localization method is efficient and achieve high level of accuracy that outperforms existing general-purpose face detection methods.
The Mechanism of Speech Processing in Congenital Amusia: Evidence from Mandarin Speakers

PubMed Central

Liu, Fang; Jiang, Cunmei; Thompson, William Forde; Xu, Yi; Yang, Yufang; Stewart, Lauren

2012-01-01

Congenital amusia is a neuro-developmental disorder of pitch perception that causes severe problems with music processing but only subtle difficulties in speech processing. This study investigated speech processing in a group of Mandarin speakers with congenital amusia. Thirteen Mandarin amusics and thirteen matched controls participated in a set of tone and intonation perception tasks and two pitch threshold tasks. Compared with controls, amusics showed impaired performance on word discrimination in natural speech and their gliding tone analogs. They also performed worse than controls on discriminating gliding tone sequences derived from statements and questions, and showed elevated thresholds for pitch change detection and pitch direction discrimination. However, they performed as well as controls on word identification, and on statement-question identification and discrimination in natural speech. Overall, tasks that involved multiple acoustic cues to communicative meaning were not impacted by amusia. Only when the tasks relied mainly on pitch sensitivity did amusics show impaired performance compared to controls. These findings help explain why amusia only affects speech processing in subtle ways. Further studies on a larger sample of Mandarin amusics and on amusics of other language backgrounds are needed to consolidate these results. PMID:22347374
Relationship between individual differences in speech processing and cognitive functions.

PubMed

Ou, Jinghua; Law, Sam-Po; Fung, Roxana

2015-12-01

A growing body of research has suggested that cognitive abilities may play a role in individual differences in speech processing. The present study took advantage of a widespread linguistic phenomenon of sound change to systematically assess the relationships between speech processing and various components of attention and working memory in the auditory and visual modalities among typically developed Cantonese-speaking individuals. The individual variations in speech processing are captured in an ongoing sound change-tone merging in Hong Kong Cantonese, in which typically developed native speakers are reported to lose the distinctions between some tonal contrasts in perception and/or production. Three groups of participants were recruited, with a first group of good perception and production, a second group of good perception but poor production, and a third group of good production but poor perception. Our findings revealed that modality-independent abilities of attentional switching/control and working memory might contribute to individual differences in patterns of speech perception and production as well as discrimination latencies among typically developed speakers. The findings not only have the potential to generalize to speech processing in other languages, but also broaden our understanding of the omnipresent phenomenon of language change in all languages.
The mechanism of speech processing in congenital amusia: evidence from Mandarin speakers.

PubMed

Liu, Fang; Jiang, Cunmei; Thompson, William Forde; Xu, Yi; Yang, Yufang; Stewart, Lauren

2012-01-01

Congenital amusia is a neuro-developmental disorder of pitch perception that causes severe problems with music processing but only subtle difficulties in speech processing. This study investigated speech processing in a group of Mandarin speakers with congenital amusia. Thirteen Mandarin amusics and thirteen matched controls participated in a set of tone and intonation perception tasks and two pitch threshold tasks. Compared with controls, amusics showed impaired performance on word discrimination in natural speech and their gliding tone analogs. They also performed worse than controls on discriminating gliding tone sequences derived from statements and questions, and showed elevated thresholds for pitch change detection and pitch direction discrimination. However, they performed as well as controls on word identification, and on statement-question identification and discrimination in natural speech. Overall, tasks that involved multiple acoustic cues to communicative meaning were not impacted by amusia. Only when the tasks relied mainly on pitch sensitivity did amusics show impaired performance compared to controls. These findings help explain why amusia only affects speech processing in subtle ways. Further studies on a larger sample of Mandarin amusics and on amusics of other language backgrounds are needed to consolidate these results.
The Effect of Lexical Content on Dichotic Speech Recognition in Older Adults.

PubMed

Findlen, Ursula M; Roup, Christina M

2016-01-01

Age-related auditory processing deficits have been shown to negatively affect speech recognition for older adult listeners. In contrast, older adults gain benefit from their ability to make use of semantic and lexical content of the speech signal (i.e., top-down processing), particularly in complex listening situations. Assessment of auditory processing abilities among aging adults should take into consideration semantic and lexical content of the speech signal. The purpose of this study was to examine the effects of lexical and attentional factors on dichotic speech recognition performance characteristics for older adult listeners. A repeated measures design was used to examine differences in dichotic word recognition as a function of lexical and attentional factors. Thirty-five older adults (61-85 yr) with sensorineural hearing loss participated in this study. Dichotic speech recognition was evaluated using consonant-vowel-consonant (CVC) word and nonsense CVC syllable stimuli administered in the free recall, directed recall right, and directed recall left response conditions. Dichotic speech recognition performance for nonsense CVC syllables was significantly poorer than performance for CVC words. Dichotic recognition performance varied across response condition for both stimulus types, which is consistent with previous studies on dichotic speech recognition. Inspection of individual results revealed that five listeners demonstrated an auditory-based left ear deficit for one or both stimulus types. Lexical content of stimulus materials affects performance characteristics for dichotic speech recognition tasks in the older adult population. The use of nonsense CVC syllable material may provide a way to assess dichotic speech recognition performance while potentially lessening the effects of lexical content on performance (i.e., measuring bottom-up auditory function both with and without top-down processing). American Academy of Audiology.

Influence of musical training on understanding voiced and whispered speech in noise.

PubMed

Ruggles, Dorea R; Freyman, Richard L; Oxenham, Andrew J

2014-01-01

This study tested the hypothesis that the previously reported advantage of musicians over non-musicians in understanding speech in noise arises from more efficient or robust coding of periodic voiced speech, particularly in fluctuating backgrounds. Speech intelligibility was measured in listeners with extensive musical training, and in those with very little musical training or experience, using normal (voiced) or whispered (unvoiced) grammatically correct nonsense sentences in noise that was spectrally shaped to match the long-term spectrum of the speech, and was either continuous or gated with a 16-Hz square wave. Performance was also measured in clinical speech-in-noise tests and in pitch discrimination. Musicians exhibited enhanced pitch discrimination, as expected. However, no systematic or statistically significant advantage for musicians over non-musicians was found in understanding either voiced or whispered sentences in either continuous or gated noise. Musicians also showed no statistically significant advantage in the clinical speech-in-noise tests. Overall, the results provide no evidence for a significant difference between young adult musicians and non-musicians in their ability to understand speech in noise.
Cleft Audit Protocol for Speech (CAPS-A): A Comprehensive Training Package for Speech Analysis

ERIC Educational Resources Information Center

Sell, D.; John, A.; Harding-Bell, A.; Sweeney, T.; Hegarty, F.; Freeman, J.

2009-01-01

Background: The previous literature has largely focused on speech analysis systems and ignored process issues, such as the nature of adequate speech samples, data acquisition, recording and playback. Although there has been recognition of the need for training on tools used in speech analysis associated with cleft palate, little attention has been…
Speech processing and production in two-year-old children acquiring isiXhosa: A tale of two children

PubMed Central

Rossouw, Kate; Fish, Laura; Jansen, Charne; Manley, Natalie; Powell, Michelle; Rosen, Loren

2016-01-01

We investigated the speech processing and production of 2-year-old children acquiring isiXhosa in South Africa. Two children (2 years, 5 months; 2 years, 8 months) are presented as single cases. Speech input processing, stored phonological knowledge and speech output are described, based on data from auditory discrimination, naming, and repetition tasks. Both children were approximating adult levels of accuracy in their speech output, although naming was constrained by vocabulary. Performance across tasks was variable: One child showed a relative strength with repetition, and experienced most difficulties with auditory discrimination. The other performed equally well in naming and repetition, and obtained 100% for her auditory task. There is limited data regarding typical development of isiXhosa, and the focus has mainly been on speech production. This exploratory study describes typical development of isiXhosa using a variety of tasks understood within a psycholinguistic framework. We describe some ways in which speech and language therapists can devise and carry out assessment with children in situations where few formal assessments exist, and also detail the challenges of such work. PMID:27245131
Perceptual learning for speech in noise after application of binary time-frequency masks

PubMed Central

Ahmadi, Mahnaz; Gross, Vauna L.; Sinex, Donal G.

2013-01-01

Ideal time-frequency (TF) masks can reject noise and improve the recognition of speech-noise mixtures. An ideal TF mask is constructed with prior knowledge of the target speech signal. The intelligibility of a processed speech-noise mixture depends upon the threshold criterion used to define the TF mask. The study reported here assessed the effect of training on the recognition of speech in noise after processing by ideal TF masks that did not restore perfect speech intelligibility. Two groups of listeners with normal hearing listened to speech-noise mixtures processed by TF masks calculated with different threshold criteria. For each group, a threshold criterion that initially produced word recognition scores between 0.56–0.69 was chosen for training. Listeners practiced with one set of TF-masked sentences until their word recognition performance approached asymptote. Perceptual learning was quantified by comparing word-recognition scores in the first and last training sessions. Word recognition scores improved with practice for all listeners with the greatest improvement observed for the same materials used in training. PMID:23464038
Early infant diet and ERP Correlates of Speech Stimuli Discrimination in 9 month old infants

USDA-ARS?s Scientific Manuscript database

Processing and discrimination of speech stimuli were examined during the initial period of weaning in infants enrolled in a longitudinal study of infant diet and development (the Beginnings Study). Event-related potential measures (ERP; 128 sites) were used to compare the processing of speech stimul...
Brainstem Transcription of Speech Is Disrupted in Children with Autism Spectrum Disorders

ERIC Educational Resources Information Center

Russo, Nicole; Nicol, Trent; Trommer, Barbara; Zecker, Steve; Kraus, Nina

2009-01-01

Language impairment is a hallmark of autism spectrum disorders (ASD). The origin of the deficit is poorly understood although deficiencies in auditory processing have been detected in both perception and cortical encoding of speech sounds. Little is known about the processing and transcription of speech sounds at earlier (brainstem) levels or…
Role of Visual Speech in Phonological Processing by Children with Hearing Loss

ERIC Educational Resources Information Center

Jerger, Susan; Tye-Murray, Nancy; Abdi, Herve

2009-01-01

Purpose: This research assessed the influence of visual speech on phonological processing by children with hearing loss (HL). Method: Children with HL and children with normal hearing (NH) named pictures while attempting to ignore auditory or audiovisual speech distractors whose onsets relative to the pictures were either congruent, conflicting in…
Hemispheric Differences in Processing Dichotic Meaningful and Non-Meaningful Words

ERIC Educational Resources Information Center

Yasin, Ifat

2007-01-01

Classic dichotic-listening paradigms reveal a right-ear advantage (REA) for speech sounds as compared to non-speech sounds. This REA is assumed to be associated with a left-hemisphere dominance for meaningful speech processing. This study objectively probed the relationship between ear advantage and hemispheric dominance in a dichotic-listening…
Application of an auditory model to speech recognition.

PubMed

Cohen, J R

1989-06-01

Some aspects of auditory processing are incorporated in a front end for the IBM speech-recognition system [F. Jelinek, "Continuous speech recognition by statistical methods," Proc. IEEE 64 (4), 532-556 (1976)]. This new process includes adaptation, loudness scaling, and mel warping. Tests show that the design is an improvement over previous algorithms.
The Functional Neuroanatomy of Prelexical Processing in Speech Perception

ERIC Educational Resources Information Center

Scott, Sophie K.; Wise, Richard J. S.

2004-01-01

In this paper we attempt to relate the prelexical processing of speech, with particular emphasis on functional neuroimaging studies, to the study of auditory perceptual systems by disciplines in the speech and hearing sciences. The elaboration of the sound-to-meaning pathways in the human brain enables their integration into models of the human…
L'indirection: Procede d'expression et de persuasion en communication publique (Indirection: Process of Expression and Persuasion in Public Communication).

ERIC Educational Resources Information Center

Gauthier, Gilles

2001-01-01

Focuses on the indirection process presented in Searle's and Vanderveken's theory of speech acts: the performance of a primary speech act by means of the accomplishment of a secondary speech act. Discusses indirection mechanisms used in advertising and in political communication. (Author/VWL)
Audiovisual Matching in Speech and Nonspeech Sounds: A Neurodynamical Model

ERIC Educational Resources Information Center

Loh, Marco; Schmid, Gabriele; Deco, Gustavo; Ziegler, Wolfram

2010-01-01

Audiovisual speech perception provides an opportunity to investigate the mechanisms underlying multimodal processing. By using nonspeech stimuli, it is possible to investigate the degree to which audiovisual processing is specific to the speech domain. It has been shown in a match-to-sample design that matching across modalities is more difficult…
Bridging music and speech rhythm: rhythmic priming and audio-motor training affect speech perception.

PubMed

Cason, Nia; Astésano, Corine; Schön, Daniele

2015-02-01

Following findings that musical rhythmic priming enhances subsequent speech perception, we investigated whether rhythmic priming for spoken sentences can enhance phonological processing - the building blocks of speech - and whether audio-motor training enhances this effect. Participants heard a metrical prime followed by a sentence (with a matching/mismatching prosodic structure), for which they performed a phoneme detection task. Behavioural (RT) data was collected from two groups: one who received audio-motor training, and one who did not. We hypothesised that 1) phonological processing would be enhanced in matching conditions, and 2) audio-motor training with the musical rhythms would enhance this effect. Indeed, providing a matching rhythmic prime context resulted in faster phoneme detection, thus revealing a cross-domain effect of musical rhythm on phonological processing. In addition, our results indicate that rhythmic audio-motor training enhances this priming effect. These results have important implications for rhythm-based speech therapies, and suggest that metrical rhythm in music and speech may rely on shared temporal processing brain resources. Copyright © 2015 Elsevier B.V. All rights reserved.
Six characteristics of effective structured reporting and the inevitable integration with speech recognition.

PubMed

Liu, David; Zucherman, Mark; Tulloss, William B

2006-03-01

The reporting of radiological images is undergoing dramatic changes due to the introduction of two new technologies: structured reporting and speech recognition. Each technology has its own unique advantages. The highly organized content of structured reporting facilitates data mining and billing, whereas speech recognition offers a natural succession from the traditional dictation-transcription process. This article clarifies the distinction between the process and outcome of structured reporting, describes fundamental requirements for any effective structured reporting system, and describes the potential development of a novel, easy-to-use, customizable structured reporting system that incorporates speech recognition. This system should have all the advantages derived from structured reporting, accommodate a wide variety of user needs, and incorporate speech recognition as a natural component and extension of the overall reporting process.
Cognitive processing load during listening is reduced more by decreasing voice similarity than by increasing spatial separation between target and masker speech.

PubMed

Zekveld, Adriana A; Rudner, Mary; Kramer, Sophia E; Lyzenga, Johannes; Rönnberg, Jerker

2014-01-01

We investigated changes in speech recognition and cognitive processing load due to the masking release attributable to decreasing similarity between target and masker speech. This was achieved by using masker voices with either the same (female) gender as the target speech or different gender (male) and/or by spatially separating the target and masker speech using HRTFs. We assessed the relation between the signal-to-noise ratio required for 50% sentence intelligibility, the pupil response and cognitive abilities. We hypothesized that the pupil response, a measure of cognitive processing load, would be larger for co-located maskers and for same-gender compared to different-gender maskers. We further expected that better cognitive abilities would be associated with better speech perception and larger pupil responses as the allocation of larger capacity may result in more intense mental processing. In line with previous studies, the performance benefit from different-gender compared to same-gender maskers was larger for co-located masker signals. The performance benefit of spatially-separated maskers was larger for same-gender maskers. The pupil response was larger for same-gender than for different-gender maskers, but was not reduced by spatial separation. We observed associations between better perception performance and better working memory, better information updating, and better executive abilities when applying no corrections for multiple comparisons. The pupil response was not associated with cognitive abilities. Thus, although both gender and location differences between target and masker facilitate speech perception, only gender differences lower cognitive processing load. Presenting a more dissimilar masker may facilitate target-masker separation at a later (cognitive) processing stage than increasing the spatial separation between the target and masker. The pupil response provides information about speech perception that complements intelligibility data.
Cognitive processing load during listening is reduced more by decreasing voice similarity than by increasing spatial separation between target and masker speech

PubMed Central

Zekveld, Adriana A.; Rudner, Mary; Kramer, Sophia E.; Lyzenga, Johannes; Rönnberg, Jerker

2014-01-01

We investigated changes in speech recognition and cognitive processing load due to the masking release attributable to decreasing similarity between target and masker speech. This was achieved by using masker voices with either the same (female) gender as the target speech or different gender (male) and/or by spatially separating the target and masker speech using HRTFs. We assessed the relation between the signal-to-noise ratio required for 50% sentence intelligibility, the pupil response and cognitive abilities. We hypothesized that the pupil response, a measure of cognitive processing load, would be larger for co-located maskers and for same-gender compared to different-gender maskers. We further expected that better cognitive abilities would be associated with better speech perception and larger pupil responses as the allocation of larger capacity may result in more intense mental processing. In line with previous studies, the performance benefit from different-gender compared to same-gender maskers was larger for co-located masker signals. The performance benefit of spatially-separated maskers was larger for same-gender maskers. The pupil response was larger for same-gender than for different-gender maskers, but was not reduced by spatial separation. We observed associations between better perception performance and better working memory, better information updating, and better executive abilities when applying no corrections for multiple comparisons. The pupil response was not associated with cognitive abilities. Thus, although both gender and location differences between target and masker facilitate speech perception, only gender differences lower cognitive processing load. Presenting a more dissimilar masker may facilitate target-masker separation at a later (cognitive) processing stage than increasing the spatial separation between the target and masker. The pupil response provides information about speech perception that complements intelligibility data. PMID:24808818
The Effect of Early Visual Deprivation on the Neural Bases of Auditory Processing.

PubMed

Guerreiro, Maria J S; Putzar, Lisa; Röder, Brigitte

2016-02-03

Transient congenital visual deprivation affects visual and multisensory processing. In contrast, the extent to which it affects auditory processing has not been investigated systematically. Research in permanently blind individuals has revealed brain reorganization during auditory processing, involving both intramodal and crossmodal plasticity. The present study investigated the effect of transient congenital visual deprivation on the neural bases of auditory processing in humans. Cataract-reversal individuals and normally sighted controls performed a speech-in-noise task while undergoing functional magnetic resonance imaging. Although there were no behavioral group differences, groups differed in auditory cortical responses: in the normally sighted group, auditory cortex activation increased with increasing noise level, whereas in the cataract-reversal group, no activation difference was observed across noise levels. An auditory activation of visual cortex was not observed at the group level in cataract-reversal individuals. The present data suggest prevailing auditory processing advantages after transient congenital visual deprivation, even many years after sight restoration. The present study demonstrates that people whose sight was restored after a transient period of congenital blindness show more efficient cortical processing of auditory stimuli (here speech), similarly to what has been observed in congenitally permanently blind individuals. These results underscore the importance of early sensory experience in permanently shaping brain function. Copyright © 2016 the authors 0270-6474/16/361620-11$15.00/0.
Speech training alters consonant and vowel responses in multiple auditory cortex fields

PubMed Central

Engineer, Crystal T.; Rahebi, Kimiya C.; Buell, Elizabeth P.; Fink, Melyssa K.; Kilgard, Michael P.

2015-01-01

Speech sounds evoke unique neural activity patterns in primary auditory cortex (A1). Extensive speech sound discrimination training alters A1 responses. While the neighboring auditory cortical fields each contain information about speech sound identity, each field processes speech sounds differently. We hypothesized that while all fields would exhibit training-induced plasticity following speech training, there would be unique differences in how each field changes. In this study, rats were trained to discriminate speech sounds by consonant or vowel in quiet and in varying levels of background speech-shaped noise. Local field potential and multiunit responses were recorded from four auditory cortex fields in rats that had received 10 weeks of speech discrimination training. Our results reveal that training alters speech evoked responses in each of the auditory fields tested. The neural response to consonants was significantly stronger in anterior auditory field (AAF) and A1 following speech training. The neural response to vowels following speech training was significantly weaker in ventral auditory field (VAF) and posterior auditory field (PAF). This differential plasticity of consonant and vowel sound responses may result from the greater paired pulse depression, expanded low frequency tuning, reduced frequency selectivity, and lower tone thresholds, which occurred across the four auditory fields. These findings suggest that alterations in the distributed processing of behaviorally relevant sounds may contribute to robust speech discrimination. PMID:25827927
Speech processing in children with functional articulation disorders.

PubMed

Gósy, Mária; Horváth, Viktória

2015-03-01

This study explored auditory speech processing and comprehension abilities in 5-8-year-old monolingual Hungarian children with functional articulation disorders (FADs) and their typically developing peers. Our main hypothesis was that children with FAD would show co-existing auditory speech processing disorders, with different levels of these skills depending on the nature of the receptive processes. The tasks included (i) sentence and non-word repetitions, (ii) non-word discrimination and (iii) sentence and story comprehension. Results suggest that the auditory speech processing of children with FAD is underdeveloped compared with that of typically developing children, and largely varies across task types. In addition, there are differences between children with FAD and controls in all age groups from 5 to 8 years. Our results have several clinical implications.
Musical ability and non-native speech-sound processing are linked through sensitivity to pitch and spectral information.

PubMed

Kempe, Vera; Bublitz, Dennis; Brooks, Patricia J

2015-05-01

Is the observed link between musical ability and non-native speech-sound processing due to enhanced sensitivity to acoustic features underlying both musical and linguistic processing? To address this question, native English speakers (N = 118) discriminated Norwegian tonal contrasts and Norwegian vowels. Short tones differing in temporal, pitch, and spectral characteristics were used to measure sensitivity to the various acoustic features implicated in musical and speech processing. Musical ability was measured using Gordon's Advanced Measures of Musical Audiation. Results showed that sensitivity to specific acoustic features played a role in non-native speech-sound processing: Controlling for non-verbal intelligence, prior foreign language-learning experience, and sex, sensitivity to pitch and spectral information partially mediated the link between musical ability and discrimination of non-native vowels and lexical tones. The findings suggest that while sensitivity to certain acoustic features partially mediates the relationship between musical ability and non-native speech-sound processing, complex tests of musical ability also tap into other shared mechanisms. © 2014 The British Psychological Society.

Simulation of talking faces in the human brain improves auditory speech recognition

PubMed Central

von Kriegstein, Katharina; Dogan, Özgür; Grüter, Martina; Giraud, Anne-Lise; Kell, Christian A.; Grüter, Thomas; Kleinschmidt, Andreas; Kiebel, Stefan J.

2008-01-01

Human face-to-face communication is essentially audiovisual. Typically, people talk to us face-to-face, providing concurrent auditory and visual input. Understanding someone is easier when there is visual input, because visual cues like mouth and tongue movements provide complementary information about speech content. Here, we hypothesized that, even in the absence of visual input, the brain optimizes both auditory-only speech and speaker recognition by harvesting speaker-specific predictions and constraints from distinct visual face-processing areas. To test this hypothesis, we performed behavioral and neuroimaging experiments in two groups: subjects with a face recognition deficit (prosopagnosia) and matched controls. The results show that observing a specific person talking for 2 min improves subsequent auditory-only speech and speaker recognition for this person. In both prosopagnosics and controls, behavioral improvement in auditory-only speech recognition was based on an area typically involved in face-movement processing. Improvement in speaker recognition was only present in controls and was based on an area involved in face-identity processing. These findings challenge current unisensory models of speech processing, because they show that, in auditory-only speech, the brain exploits previously encoded audiovisual correlations to optimize communication. We suggest that this optimization is based on speaker-specific audiovisual internal models, which are used to simulate a talking face. PMID:18436648
Communicating by Language: The Speech Process.

ERIC Educational Resources Information Center

House, Arthur S., Ed.

This document reports on a conference focused on speech problems. The main objective of these discussions was to facilitate a deeper understanding of human communication through interaction of conference participants with colleagues in other disciplines. Topics discussed included speech production, feedback, speech perception, and development of…
Speech Databases of Typical Children and Children with SLI

PubMed Central

Grill, Pavel; Tučková, Jana

2016-01-01

The extent of research on children’s speech in general and on disordered speech specifically is very limited. In this article, we describe the process of creating databases of children’s speech and the possibilities for using such databases, which have been created by the LANNA research group in the Faculty of Electrical Engineering at Czech Technical University in Prague. These databases have been principally compiled for medical research but also for use in other areas, such as linguistics. Two databases were recorded: one for healthy children’s speech (recorded in kindergarten and in the first level of elementary school) and the other for pathological speech of children with a Specific Language Impairment (recorded at a surgery of speech and language therapists and at the hospital). Both databases were sub-divided according to specific demands of medical research. Their utilization can be exoteric, specifically for linguistic research and pedagogical use as well as for studies of speech-signal processing. PMID:26963508
Speech Enhancement of Mobile Devices Based on the Integration of a Dual Microphone Array and a Background Noise Elimination Algorithm.

PubMed

Chen, Yung-Yue

2018-05-08

Mobile devices are often used in our daily lives for the purposes of speech and communication. The speech quality of mobile devices is always degraded due to the environmental noises surrounding mobile device users. Regretfully, an effective background noise reduction solution cannot easily be developed for this speech enhancement problem. Due to these depicted reasons, a methodology is systematically proposed to eliminate the effects of background noises for the speech communication of mobile devices. This methodology integrates a dual microphone array with a background noise elimination algorithm. The proposed background noise elimination algorithm includes a whitening process, a speech modelling method and an H ₂ estimator. Due to the adoption of the dual microphone array, a low-cost design can be obtained for the speech enhancement of mobile devices. Practical tests have proven that this proposed method is immune to random background noises, and noiseless speech can be obtained after executing this denoise process.
Consequences of Stimulus Type on Higher-Order Processing in Single-Sided Deaf Cochlear Implant Users.

PubMed

Finke, Mareike; Sandmann, Pascale; Bönitz, Hanna; Kral, Andrej; Büchner, Andreas

2016-01-01

Single-sided deaf subjects with a cochlear implant (CI) provide the unique opportunity to compare central auditory processing of the electrical input (CI ear) and the acoustic input (normal-hearing, NH, ear) within the same individual. In these individuals, sensory processing differs between their two ears, while cognitive abilities are the same irrespectively of the sensory input. To better understand perceptual-cognitive factors modulating speech intelligibility with a CI, this electroencephalography study examined the central-auditory processing of words, the cognitive abilities, and the speech intelligibility in 10 postlingually single-sided deaf CI users. We found lower hit rates and prolonged response times for word classification during an oddball task for the CI ear when compared with the NH ear. Also, event-related potentials reflecting sensory (N1) and higher-order processing (N2/N4) were prolonged for word classification (targets versus nontargets) with the CI ear compared with the NH ear. Our results suggest that speech processing via the CI ear and the NH ear differs both at sensory (N1) and cognitive (N2/N4) processing stages, thereby affecting the behavioral performance for speech discrimination. These results provide objective evidence for cognition to be a key factor for speech perception under adverse listening conditions, such as the degraded speech signal provided from the CI. © 2016 S. Karger AG, Basel.
DETECTION AND IDENTIFICATION OF SPEECH SOUNDS USING CORTICAL ACTIVITY PATTERNS

PubMed Central

Centanni, T.M.; Sloan, A.M.; Reed, A.C.; Engineer, C.T.; Rennaker, R.; Kilgard, M.P.

2014-01-01

We have developed a classifier capable of locating and identifying speech sounds using activity from rat auditory cortex with an accuracy equivalent to behavioral performance without the need to specify the onset time of the speech sounds. This classifier can identify speech sounds from a large speech set within 40 ms of stimulus presentation. To compare the temporal limits of the classifier to behavior, we developed a novel task that requires rats to identify individual consonant sounds from a stream of distracter consonants. The classifier successfully predicted the ability of rats to accurately identify speech sounds for syllable presentation rates up to 10 syllables per second (up to 17.9 ± 1.5 bits/sec), which is comparable to human performance. Our results demonstrate that the spatiotemporal patterns generated in primary auditory cortex can be used to quickly and accurately identify consonant sounds from a continuous speech stream without prior knowledge of the stimulus onset times. Improved understanding of the neural mechanisms that support robust speech processing in difficult listening conditions could improve the identification and treatment of a variety of speech processing disorders. PMID:24286757
Multisensory integration of speech sounds with letters vs. visual speech: only visual speech induces the mismatch negativity.

PubMed

Stekelenburg, Jeroen J; Keetels, Mirjam; Vroomen, Jean

2018-05-01

Numerous studies have demonstrated that the vision of lip movements can alter the perception of auditory speech syllables (McGurk effect). While there is ample evidence for integration of text and auditory speech, there are only a few studies on the orthographic equivalent of the McGurk effect. Here, we examined whether written text, like visual speech, can induce an illusory change in the perception of speech sounds on both the behavioural and neural levels. In a sound categorization task, we found that both text and visual speech changed the identity of speech sounds from an /aba/-/ada/ continuum, but the size of this audiovisual effect was considerably smaller for text than visual speech. To examine at which level in the information processing hierarchy these multisensory interactions occur, we recorded electroencephalography in an audiovisual mismatch negativity (MMN, a component of the event-related potential reflecting preattentive auditory change detection) paradigm in which deviant text or visual speech was used to induce an illusory change in a sequence of ambiguous sounds halfway between /aba/ and /ada/. We found that only deviant visual speech induced an MMN, but not deviant text, which induced a late P3-like positive potential. These results demonstrate that text has much weaker effects on sound processing than visual speech does, possibly because text has different biological roots than visual speech. © 2018 The Authors. European Journal of Neuroscience published by Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
Speech Sound Processing Deficits and Training-Induced Neural Plasticity in Rats with Dyslexia Gene Knockdown

PubMed Central

Centanni, Tracy M.; Chen, Fuyi; Booker, Anne M.; Engineer, Crystal T.; Sloan, Andrew M.; Rennaker, Robert L.; LoTurco, Joseph J.; Kilgard, Michael P.

2014-01-01

In utero RNAi of the dyslexia-associated gene Kiaa0319 in rats (KIA-) degrades cortical responses to speech sounds and increases trial-by-trial variability in onset latency. We tested the hypothesis that KIA- rats would be impaired at speech sound discrimination. KIA- rats needed twice as much training in quiet conditions to perform at control levels and remained impaired at several speech tasks. Focused training using truncated speech sounds was able to normalize speech discrimination in quiet and background noise conditions. Training also normalized trial-by-trial neural variability and temporal phase locking. Cortical activity from speech trained KIA- rats was sufficient to accurately discriminate between similar consonant sounds. These results provide the first direct evidence that assumed reduced expression of the dyslexia-associated gene KIAA0319 can cause phoneme processing impairments similar to those seen in dyslexia and that intensive behavioral therapy can eliminate these impairments. PMID:24871331
Discharge experiences of speech-language pathologists working in Cyprus and Greece.

PubMed

Kambanaros, Maria

2010-08-01

Post-termination relationships are complex because the client may need additional services and it may be difficult to determine when the speech-language pathologist-client relationship is truly terminated. In my contribution to this scientific forum, discharge experiences from speech-language pathologists working in Cyprus and Greece will be explored in search of commonalities and differences in the way in which pathologists end therapy from different cultural perspectives. Within this context the personal impact on speech-language pathologists of the discharge process will be highlighted. Inherent in this process is how speech-language pathologists learn to hold their feelings, anxieties and reactions when communicating discharge to clients. Overall speech-language pathologists working in Cyprus and Greece experience similar emotional responses to positive and negative therapy endings as speech-language pathologists working in Australia. The major difference is that Cypriot and Greek therapists face serious limitations in moving their clients on after therapy has ended.
Elevated depressive symptoms enhance reflexive but not reflective auditory category learning.

PubMed

Maddox, W Todd; Chandrasekaran, Bharath; Smayda, Kirsten; Yi, Han-Gyol; Koslov, Seth; Beevers, Christopher G

2014-09-01

In vision an extensive literature supports the existence of competitive dual-processing systems of category learning that are grounded in neuroscience and are partially-dissociable. The reflective system is prefrontally-mediated and uses working memory and executive attention to develop and test rules for classifying in an explicit fashion. The reflexive system is striatally-mediated and operates by implicitly associating perception with actions that lead to reinforcement. Although categorization is fundamental to auditory processing, little is known about the learning systems that mediate auditory categorization and even less is known about the effects of individual difference in the relative efficiency of the two learning systems. Previous studies have shown that individuals with elevated depressive symptoms show deficits in reflective processing. We exploit this finding to test critical predictions of the dual-learning systems model in audition. Specifically, we examine the extent to which the two systems are dissociable and competitive. We predicted that elevated depressive symptoms would lead to reflective-optimal learning deficits but reflexive-optimal learning advantages. Because natural speech category learning is reflexive in nature, we made the prediction that elevated depressive symptoms would lead to superior speech learning. In support of our predictions, individuals with elevated depressive symptoms showed a deficit in reflective-optimal auditory category learning, but an advantage in reflexive-optimal auditory category learning. In addition, individuals with elevated depressive symptoms showed an advantage in learning a non-native speech category structure. Computational modeling suggested that the elevated depressive symptom advantage was due to faster, more accurate, and more frequent use of reflexive category learning strategies in individuals with elevated depressive symptoms. The implications of this work for dual-process approach to auditory learning and depression are discussed. Copyright © 2014 Elsevier Ltd. All rights reserved.
Elevated Depressive Symptoms Enhance Reflexive but not Reflective Auditory Category Learning

PubMed Central

Maddox, W. Todd; Chandrasekaran, Bharath; Smayda, Kirsten; Yi, Han-Gyol; Koslov, Seth; Beevers, Christopher G.

2014-01-01

In vision an extensive literature supports the existence of competitive dual-processing systems of category learning that are grounded in neuroscience and are partially-dissociable. The reflective system is prefrontally-mediated and uses working memory and executive attention to develop and test rules for classifying in an explicit fashion. The reflexive system is striatally-mediated and operates by implicitly associating perception with actions that lead to reinforcement. Although categorization is fundamental to auditory processing, little is known about the learning systems that mediate auditory categorization and even less is known about the effects of individual difference in the relative efficiency of the two learning systems. Previous studies have shown that individuals with elevated depressive symptoms show deficits in reflective processing. We exploit this finding to test critical predictions of the dual-learning systems model in audition. Specifically, we examine the extent to which the two systems are dissociable and competitive. We predicted that elevated depressive symptoms would lead to reflective-optimal learning deficits but reflexive-optimal learning advantages. Because natural speech category learning is reflexive in nature, we made the prediction that elevated depressive symptoms would lead to superior speech learning. In support of our predictions, individuals with elevated depressive symptoms showed a deficit in reflective-optimal auditory category learning, but an advantage in reflexive-optimal auditory category learning. In addition, individuals with elevated depressive symptoms showed an advantage in learning a non-native speech category structure. Computational modeling suggested that the elevated depressive symptom advantage was due to faster, more accurate, and more frequent use of reflexive category learning strategies in individuals with elevated depressive symptoms. The implications of this work for dual-process approach to auditory learning and depression are discussed. PMID:25041936
Language familiarity modulates relative attention to the eyes and mouth of a talker.

PubMed

Barenholtz, Elan; Mavica, Lauren; Lewkowicz, David J

2016-02-01

We investigated whether the audiovisual speech cues available in a talker's mouth elicit greater attention when adults have to process speech in an unfamiliar language vs. a familiar language. Participants performed a speech-encoding task while watching and listening to videos of a talker in a familiar language (English) or an unfamiliar language (Spanish or Icelandic). Attention to the mouth increased in monolingual subjects in response to an unfamiliar language condition but did not in bilingual subjects when the task required speech processing. In the absence of an explicit speech-processing task, subjects attended equally to the eyes and mouth in response to both familiar and unfamiliar languages. Overall, these results demonstrate that language familiarity modulates selective attention to the redundant audiovisual speech cues in a talker's mouth in adults. When our findings are considered together with similar findings from infants, they suggest that this attentional strategy emerges very early in life. Copyright © 2015 Elsevier B.V. All rights reserved.
Using Compressed Speech to Measure Simultaneous Processing in Persons with and without Visual Impairment

ERIC Educational Resources Information Center

Marks, William J.; Jones, W. Paul; Loe, Scott A.

2013-01-01

This study investigated the use of compressed speech as a modality for assessment of the simultaneous processing function for participants with visual impairment. A 24-item compressed speech test was created using a sound editing program to randomly remove sound elements from aural stimuli, holding pitch constant, with the objective to emulate the…
Temporal Context in Speech Processing and Attentional Stream Selection: A Behavioral and Neural Perspective

ERIC Educational Resources Information Center

Golumbic, Elana M. Zion; Poeppel, David; Schroeder, Charles E.

2012-01-01

The human capacity for processing speech is remarkable, especially given that information in speech unfolds over multiple time scales concurrently. Similarly notable is our ability to filter out of extraneous sounds and focus our attention on one conversation, epitomized by the "Cocktail Party" effect. Yet, the neural mechanisms underlying on-line…
Connecting Stuttering Management and Measurement: I. Core Speech Measures of Clinical Process and Outcome

ERIC Educational Resources Information Center

Shenker, Rosalee C.

2006-01-01

Background: There will always be a place for stuttering treatments designed to eliminate or reduce stuttered speech. When those treatments are required, direct speech measures of treatment process and outcome are needed in clinical practice. Aims: Based on the contents of published clinical trials of such treatments, three "core" measures of…
Little Houses and Casas Pequenas: Message Formulation and Syntactic Form in Unscripted Speech with Speakers of English and Spanish

ERIC Educational Resources Information Center

Brown-Schmidt, Sarah; Konopka, Agnieszka E.

2008-01-01

During unscripted speech, speakers coordinate the formulation of pre-linguistic messages with the linguistic processes that implement those messages into speech. We examine the process of constructing a contextually appropriate message and interfacing that message with utterance planning in English ("the small butterfly") and Spanish ("la mariposa…
Teaching Turkish as a Foreign Language: Extrapolating from Experimental Psychology

ERIC Educational Resources Information Center

Erdener, Dogu

2017-01-01

Speech perception is beyond the auditory domain and a multimodal process, specifically, an auditory-visual one--we process lip and face movements during speech. In this paper, the findings in cross-language studies of auditory-visual speech perception in the past two decades are interpreted to the applied domain of second language (L2)…
Dynamic Processes of Speech Development by Seven Adult Learners of Japanese in a Domestic Immersion Context

ERIC Educational Resources Information Center

Fukuda, Makiko

2014-01-01

The present study revealed the dynamic process of speech development in a domestic immersion program by seven adult beginning learners of Japanese. The speech data were analyzed with fluency, accuracy, and complexity measurements at group, interindividual, and intraindividual levels. The results revealed the complex nature of language development…
A Study of Job Satisfaction Correlates among Urban School Speech Language Pathologists

ERIC Educational Resources Information Center

Maxie-Brown, Gwendolyn J.

2011-01-01

The purpose of this study was to examine relationships between the job satisfaction of speech language pathologists (SLPs) and self-efficacy, work relationships and two components of job performance: teacher judgments of student improvement and supervisor ratings of teacher efficiency. It was hypothesized that each of the variables would be…
Lessons Learned in Part-of-Speech Tagging of Conversational Speech

DTIC Science & Technology

2010-10-01

for conversational speech recognition. In Plenary Meeting and Symposium on Prosody and Speech Processing. Slav Petrov and Dan Klein. 2007. Improved...inference for unlexicalized parsing. In HLT-NAACL. Slav Petrov. 2010. Products of random latent variable grammars. In HLT-NAACL. Brian Roark, Yang Liu

Hemispheric lateralization of linguistic prosody recognition in comparison to speech and speaker recognition.

PubMed

Kreitewolf, Jens; Friederici, Angela D; von Kriegstein, Katharina

2014-11-15

Hemispheric specialization for linguistic prosody is a controversial issue. While it is commonly assumed that linguistic prosody and emotional prosody are preferentially processed in the right hemisphere, neuropsychological work directly comparing processes of linguistic prosody and emotional prosody suggests a predominant role of the left hemisphere for linguistic prosody processing. Here, we used two functional magnetic resonance imaging (fMRI) experiments to clarify the role of left and right hemispheres in the neural processing of linguistic prosody. In the first experiment, we sought to confirm previous findings showing that linguistic prosody processing compared to other speech-related processes predominantly involves the right hemisphere. Unlike previous studies, we controlled for stimulus influences by employing a prosody and speech task using the same speech material. The second experiment was designed to investigate whether a left-hemispheric involvement in linguistic prosody processing is specific to contrasts between linguistic prosody and emotional prosody or whether it also occurs when linguistic prosody is contrasted against other non-linguistic processes (i.e., speaker recognition). Prosody and speaker tasks were performed on the same stimulus material. In both experiments, linguistic prosody processing was associated with activity in temporal, frontal, parietal and cerebellar regions. Activation in temporo-frontal regions showed differential lateralization depending on whether the control task required recognition of speech or speaker: recognition of linguistic prosody predominantly involved right temporo-frontal areas when it was contrasted against speech recognition; when contrasted against speaker recognition, recognition of linguistic prosody predominantly involved left temporo-frontal areas. The results show that linguistic prosody processing involves functions of both hemispheres and suggest that recognition of linguistic prosody is based on an inter-hemispheric mechanism which exploits both a right-hemispheric sensitivity to pitch information and a left-hemispheric dominance in speech processing. Copyright © 2014 Elsevier Inc. All rights reserved.
Letting go of yesterday: Effect of distraction on post-event processing and anticipatory anxiety in a socially anxious sample.

PubMed

Blackie, Rebecca A; Kocovski, Nancy L

2016-01-01

According to cognitive models, post-event processing (PEP) is a key factor in the maintenance of social anxiety. Given that decreasing PEP can be challenging for socially anxious individuals, it is important to identify potentially useful strategies. Although distraction may help to decrease PEP, the findings have been equivocal. The primary purpose of this study was to examine whether a brief distraction period immediately following a speech would lead to less PEP the next day. The secondary aim was to examine the effect of distraction following an initial speech on anticipatory anxiety for a second speech, via reductions in PEP. Participants (N = 77 undergraduates with elevated social anxiety; 67.53% female) delivered a speech and were randomly assigned to a distraction, rumination, or control condition. The following day, participants reported levels of PEP in relation to the first speech, as well as anxiety regarding a second, upcoming speech. As expected, those in the distraction condition reported less PEP than those in the rumination and control conditions. Additionally, distraction following the first speech was indirectly related to anticipatory anxiety for the second speech, via PEP. Distraction may represent a potentially useful strategy for reducing PEP and other maladaptive processes that may maintain social anxiety.
Dissociating Cortical Activity during Processing of Native and Non-Native Audiovisual Speech from Early to Late Infancy

PubMed Central

Fava, Eswen; Hull, Rachel; Bortfeld, Heather

2014-01-01

Initially, infants are capable of discriminating phonetic contrasts across the world’s languages. Starting between seven and ten months of age, they gradually lose this ability through a process of perceptual narrowing. Although traditionally investigated with isolated speech sounds, such narrowing occurs in a variety of perceptual domains (e.g., faces, visual speech). Thus far, tracking the developmental trajectory of this tuning process has been focused primarily on auditory speech alone, and generally using isolated sounds. But infants learn from speech produced by people talking to them, meaning they learn from a complex audiovisual signal. Here, we use near-infrared spectroscopy to measure blood concentration changes in the bilateral temporal cortices of infants in three different age groups: 3-to-6 months, 7-to-10 months, and 11-to-14-months. Critically, all three groups of infants were tested with continuous audiovisual speech in both their native and another, unfamiliar language. We found that at each age range, infants showed different patterns of cortical activity in response to the native and non-native stimuli. Infants in the youngest group showed bilateral cortical activity that was greater overall in response to non-native relative to native speech; the oldest group showed left lateralized activity in response to native relative to non-native speech. These results highlight perceptual tuning as a dynamic process that happens across modalities and at different levels of stimulus complexity. PMID:25116572
The lateralized arcuate fasciculus in developmental pitch disorders among mandarin amusics: left for speech and right for music.

PubMed

Chen, Xizhuo; Zhao, Yanxin; Zhong, Suyu; Cui, Zaixu; Li, Jiaqi; Gong, Gaolang; Dong, Qi; Nan, Yun

2018-05-01

The arcuate fasciculus (AF) is a neural fiber tract that is critical to speech and music development. Although the predominant role of the left AF in speech development is relatively clear, how the AF engages in music development is not understood. Congenital amusia is a special neurodevelopmental condition, which not only affects musical pitch but also speech tone processing. Using diffusion tensor tractography, we aimed at understanding the role of AF in music and speech processing by examining the neural connectivity characteristics of the bilateral AF among thirty Mandarin amusics. Compared to age- and intelligence quotient (IQ)-matched controls, amusics demonstrated increased connectivity as reflected by the increased fractional anisotropy in the right posterior AF but decreased connectivity as reflected by the decreased volume in the right anterior AF. Moreover, greater fractional anisotropy in the left direct AF was correlated with worse performance in speech tone perception among amusics. This study is the first to examine the neural connectivity of AF in the neurodevelopmental condition of amusia as a result of disrupted music pitch and speech tone processing. We found abnormal white matter structural connectivity in the right AF for the amusic individuals. Moreover, we demonstrated that the white matter microstructural properties of the left direct AF is modulated by lexical tone deficits among the amusic individuals. These data support the notion of distinctive pitch processing systems between music and speech.
The functional neuroanatomy of language

NASA Astrophysics Data System (ADS)

Hickok, Gregory

2009-09-01

There has been substantial progress over the last several years in understanding aspects of the functional neuroanatomy of language. Some of these advances are summarized in this review. It will be argued that recognizing speech sounds is carried out in the superior temporal lobe bilaterally, that the superior temporal sulcus bilaterally is involved in phonological-level aspects of this process, that the frontal/motor system is not central to speech recognition although it may modulate auditory perception of speech, that conceptual access mechanisms are likely located in the lateral posterior temporal lobe (middle and inferior temporal gyri), that speech production involves sensory-related systems in the posterior superior temporal lobe in the left hemisphere, that the interface between perceptual and motor systems is supported by a sensory-motor circuit for vocal tract actions (not dedicated to speech) that is very similar to sensory-motor circuits found in primate parietal lobe, and that verbal short-term memory can be understood as an emergent property of this sensory-motor circuit. These observations are considered within the context of a dual stream model of speech processing in which one pathway supports speech comprehension and the other supports sensory-motor integration. Additional topics of discussion include the functional organization of the planum temporale for spatial hearing and speech-related sensory-motor processes, the anatomical and functional basis of a form of acquired language disorder, conduction aphasia, the neural basis of vocabulary development, and sentence-level/grammatical processing.
Degraded neural and behavioral processing of speech sounds in a rat model of Rett syndrome

PubMed Central

Engineer, Crystal T.; Rahebi, Kimiya C.; Borland, Michael S.; Buell, Elizabeth P.; Centanni, Tracy M.; Fink, Melyssa K.; Im, Kwok W.; Wilson, Linda G.; Kilgard, Michael P.

2015-01-01

Individuals with Rett syndrome have greatly impaired speech and language abilities. Auditory brainstem responses to sounds are normal, but cortical responses are highly abnormal. In this study, we used the novel rat Mecp2 knockout model of Rett syndrome to document the neural and behavioral processing of speech sounds. We hypothesized that both speech discrimination ability and the neural response to speech sounds would be impaired in Mecp2 rats. We expected that extensive speech training would improve speech discrimination ability and the cortical response to speech sounds. Our results reveal that speech responses across all four auditory cortex fields of Mecp2 rats were hyperexcitable, responded slower, and were less able to follow rapidly presented sounds. While Mecp2 rats could accurately perform consonant and vowel discrimination tasks in quiet, they were significantly impaired at speech sound discrimination in background noise. Extensive speech training improved discrimination ability. Training shifted cortical responses in both Mecp2 and control rats to favor the onset of speech sounds. While training increased the response to low frequency sounds in control rats, the opposite occurred in Mecp2 rats. Although neural coding and plasticity are abnormal in the rat model of Rett syndrome, extensive therapy appears to be effective. These findings may help to explain some aspects of communication deficits in Rett syndrome and suggest that extensive rehabilitation therapy might prove beneficial. PMID:26321676
The effect of hearing aid technologies on listening in an automobile

PubMed Central

Wu, Yu-Hsiang; Stangl, Elizabeth; Bentler, Ruth A.; Stanziola, Rachel W.

2014-01-01

Background Communication while traveling in an automobile often is very difficult for hearing aid users. This is because the automobile /road noise level is usually high, and listeners/drivers often do not have access to visual cues. Since the talker of interest usually is not located in front of the driver/listener, conventional directional processing that places the directivity beam toward the listener’s front may not be helpful, and in fact, could have a negative impact on speech recognition (when compared to omnidirectional processing). Recently, technologies have become available in commercial hearing aids that are designed to improve speech recognition and/or listening effort in noisy conditions where talkers are located behind or beside the listener. These technologies include (1) a directional microphone system that uses a backward-facing directivity pattern (Back-DIR processing), (2) a technology that transmits audio signals from the ear with the better signal-to-noise ratio (SNR) to the ear with the poorer SNR (Side-Transmission processing), and (3) a signal processing scheme that suppresses the noise at the ear with the poorer SNR (Side-Suppression processing). Purpose The purpose of the current study was to determine the effect of (1) conventional directional microphones and (2) newer signal processing schemes (Back-DIR, Side-Transmission, and Side-Suppression) on listener’s speech recognition performance and preference for communication in a traveling automobile. Research design A single-blinded, repeated-measures design was used. Study Sample Twenty-five adults with bilateral symmetrical sensorineural hearing loss aged 44 through 84 years participated in the study. Data Collection and Analysis The automobile/road noise and sentences of the Connected Speech Test (CST) were recorded through hearing aids in a standard van moving at a speed of 70 miles/hour on a paved highway. The hearing aids were programmed to omnidirectional microphone, conventional adaptive directional microphone, and the three newer schemes. CST sentences were presented from the side and back of the hearing aids, which were placed on the ears of a manikin. The recorded stimuli were presented to listeners via earphones in a sound treated booth to assess speech recognition performance and preference with each programmed condition. Results Compared to omnidirectional microphones, conventional adaptive directional processing had a detrimental effect on speech recognition when speech was presented from the back or side of the listener. Back-DIR and Side-Transmission processing improved speech recognition performance (relative to both omnidirectional and adaptive directional processing) when speech was from the back and side, respectively. The performance with Side-Suppression processing was better than with adaptive directional processing when speech was from the side. The participants’ preferences for a given processing scheme were generally consistent with speech recognition results. Conclusions The finding that performance with adaptive directional processing was poorer than with omnidirectional microphones demonstrates the importance of selecting the correct microphone technology for different listening situations. The results also suggest the feasibility of using hearing aid technologies to provide a better listening experience for hearing aid users in automobiles. PMID:23886425
[Intranet applications in radiology].

PubMed

Knopp, M V; von Hippel, G M; Koch, T; Knopp, M A

2000-01-01

The aim of the paper is to present the conceptual basis and capabilities of intranet applications in radiology. The intranet, which is the local brother of the internet can be readily realized using existing computer components and a network. All current computer operating systems support intranet applications which allow hard and software independent communication of text, images, video and sound with the use of browser software without dedicated programs on the individual personal computers. Radiological applications for text communication e.g. department specific bulletin boards and access to examination protocols; use of image communication for viewing and limited processing and documentation of radiological images can be achieved on decentralized PCs as well as speech communication for dictation, distribution of dictation and speech recognition. The intranet helps to optimize the organizational efficiency and cost effectiveness in the daily work of radiological departments in outpatients and hospital settings. The general interest in internet and intranet technology will guarantee its continuous development.
Development of a Voice Activity Controlled Noise Canceller

PubMed Central

Abid Noor, Ali O.; Samad, Salina Abdul; Hussain, Aini

2012-01-01

In this paper, a variable threshold voice activity detector (VAD) is developed to control the operation of a two-sensor adaptive noise canceller (ANC). The VAD prohibits the reference input of the ANC from containing some strength of actual speech signal during adaptation periods. The novelty of this approach resides in using the residual output from the noise canceller to control the decisions made by the VAD. Thresholds of full-band energy and zero-crossing features are adjusted according to the residual output of the adaptive filter. Performance evaluation of the proposed approach is quoted in terms of signal to noise ratio improvements as well mean square error (MSE) convergence of the ANC. The new approach showed an improved noise cancellation performance when tested under several types of environmental noise. Furthermore, the computational power of the adaptive process is reduced since the output of the adaptive filter is efficiently calculated only during non-speech periods. PMID:22778667
Joint Dictionary Learning-Based Non-Negative Matrix Factorization for Voice Conversion to Improve Speech Intelligibility After Oral Surgery.

PubMed

Fu, Szu-Wei; Li, Pei-Chun; Lai, Ying-Hui; Yang, Cheng-Chien; Hsieh, Li-Chun; Tsao, Yu

2017-11-01

Objective: This paper focuses on machine learning based voice conversion (VC) techniques for improving the speech intelligibility of surgical patients who have had parts of their articulators removed. Because of the removal of parts of the articulator, a patient's speech may be distorted and difficult to understand. To overcome this problem, VC methods can be applied to convert the distorted speech such that it is clear and more intelligible. To design an effective VC method, two key points must be considered: 1) the amount of training data may be limited (because speaking for a long time is usually difficult for postoperative patients); 2) rapid conversion is desirable (for better communication). Methods: We propose a novel joint dictionary learning based non-negative matrix factorization (JD-NMF) algorithm. Compared to conventional VC techniques, JD-NMF can perform VC efficiently and effectively with only a small amount of training data. Results: The experimental results demonstrate that the proposed JD-NMF method not only achieves notably higher short-time objective intelligibility (STOI) scores (a standardized objective intelligibility evaluation metric) than those obtained using the original unconverted speech but is also significantly more efficient and effective than a conventional exemplar-based NMF VC method. Conclusion: The proposed JD-NMF method may outperform the state-of-the-art exemplar-based NMF VC method in terms of STOI scores under the desired scenario. Significance: We confirmed the advantages of the proposed joint training criterion for the NMF-based VC. Moreover, we verified that the proposed JD-NMF can effectively improve the speech intelligibility scores of oral surgery patients. Objective: This paper focuses on machine learning based voice conversion (VC) techniques for improving the speech intelligibility of surgical patients who have had parts of their articulators removed. Because of the removal of parts of the articulator, a patient's speech may be distorted and difficult to understand. To overcome this problem, VC methods can be applied to convert the distorted speech such that it is clear and more intelligible. To design an effective VC method, two key points must be considered: 1) the amount of training data may be limited (because speaking for a long time is usually difficult for postoperative patients); 2) rapid conversion is desirable (for better communication). Methods: We propose a novel joint dictionary learning based non-negative matrix factorization (JD-NMF) algorithm. Compared to conventional VC techniques, JD-NMF can perform VC efficiently and effectively with only a small amount of training data. Results: The experimental results demonstrate that the proposed JD-NMF method not only achieves notably higher short-time objective intelligibility (STOI) scores (a standardized objective intelligibility evaluation metric) than those obtained using the original unconverted speech but is also significantly more efficient and effective than a conventional exemplar-based NMF VC method. Conclusion: The proposed JD-NMF method may outperform the state-of-the-art exemplar-based NMF VC method in terms of STOI scores under the desired scenario. Significance: We confirmed the advantages of the proposed joint training criterion for the NMF-based VC. Moreover, we verified that the proposed JD-NMF can effectively improve the speech intelligibility scores of oral surgery patients.
Speech Processing to Improve the Perception of Speech in Background Noise for Children With Auditory Processing Disorder and Typically Developing Peers.

PubMed

Flanagan, Sheila; Zorilă, Tudor-Cătălin; Stylianou, Yannis; Moore, Brian C J

2018-01-01

Auditory processing disorder (APD) may be diagnosed when a child has listening difficulties but has normal audiometric thresholds. For adults with normal hearing and with mild-to-moderate hearing impairment, an algorithm called spectral shaping with dynamic range compression (SSDRC) has been shown to increase the intelligibility of speech when background noise is added after the processing. Here, we assessed the effect of such processing using 8 children with APD and 10 age-matched control children. The loudness of the processed and unprocessed sentences was matched using a loudness model. The task was to repeat back sentences produced by a female speaker when presented with either speech-shaped noise (SSN) or a male competing speaker (CS) at two signal-to-background ratios (SBRs). Speech identification was significantly better with SSDRC processing than without, for both groups. The benefit of SSDRC processing was greater for the SSN than for the CS background. For the SSN, scores were similar for the two groups at both SBRs. For the CS, the APD group performed significantly more poorly than the control group. The overall improvement produced by SSDRC processing could be useful for enhancing communication in a classroom where the teacher's voice is broadcast using a wireless system.
Normal Aspects of Speech, Hearing, and Language.

ERIC Educational Resources Information Center

Minifie, Fred. D., Ed.; And Others

This book is written as a guide to the understanding of the processes involved in human speech communication. Ten authorities contributed material to provide an introduction to the physiological aspects of speech production and reception, the acoustical aspects of speech production and transmission, the psychophysics of sound reception, the nature…
Hemispheric Differences in the Effects of Context on Vowel Perception

ERIC Educational Resources Information Center

Sjerps, Matthias J.; Mitterer, Holger; McQueen, James M.

2012-01-01

Listeners perceive speech sounds relative to context. Contextual influences might differ over hemispheres if different types of auditory processing are lateralized. Hemispheric differences in contextual influences on vowel perception were investigated by presenting speech targets and both speech and non-speech contexts to listeners' right or left…
Inferring Speaker Affect in Spoken Natural Language Communication

ERIC Educational Resources Information Center

Pon-Barry, Heather Roberta

2013-01-01

The field of spoken language processing is concerned with creating computer programs that can understand human speech and produce human-like speech. Regarding the problem of understanding human speech, there is currently growing interest in moving beyond speech recognition (the task of transcribing the words in an audio stream) and towards…
Perceptual Learning of Noise Vocoded Words: Effects of Feedback and Lexicality

ERIC Educational Resources Information Center

Hervais-Adelman, Alexis; Davis, Matthew H.; Johnsrude, Ingrid S.; Carlyon, Robert P.

2008-01-01

Speech comprehension is resistant to acoustic distortion in the input, reflecting listeners' ability to adjust perceptual processes to match the speech input. This adjustment is reflected in improved comprehension of distorted speech with experience. For noise vocoding, a manipulation that removes spectral detail from speech, listeners' word…
Deficits in sequential processing manifest in motor and linguistic tasks in a multigenerational family with childhood apraxia of speech

PubMed Central

PETER, BEATE; BUTTON, LE; STOEL-GAMMON, CAROL; CHAPMAN, KATHY; RASKIND, WENDY H.

2013-01-01

The purpose of this study was to evaluate a global deficit in sequential processing as candidate endophenotypein a family with familial childhood apraxia of speech (CAS). Of 10 adults and 13 children in a three-generational family with speech sound disorder (SSD) consistent with CAS, 3 adults and 6 children had past or present SSD diagnoses. Two preschoolers with unremediated CAS showed a high number of sequencing errors during single-word production. Performance on tasks with high sequential processing loads differentiated between the affected and unaffected family members, whereas there were no group differences in tasks with low processing loads. Adults with a history of SSD produced more sequencing errors during nonword and multisyllabic real word imitation, compared to those without such a history. Results are consistent with a global deficit in sequential processing that influences speech development as well as cognitive and linguistic processing. PMID:23339324
The Bilingual Language Interaction Network for Comprehension of Speech*

PubMed Central

Marian, Viorica

2013-01-01

During speech comprehension, bilinguals co-activate both of their languages, resulting in cross-linguistic interaction at various levels of processing. This interaction has important consequences for both the structure of the language system and the mechanisms by which the system processes spoken language. Using computational modeling, we can examine how cross-linguistic interaction affects language processing in a controlled, simulated environment. Here we present a connectionist model of bilingual language processing, the Bilingual Language Interaction Network for Comprehension of Speech (BLINCS), wherein interconnected levels of processing are created using dynamic, self-organizing maps. BLINCS can account for a variety of psycholinguistic phenomena, including cross-linguistic interaction at and across multiple levels of processing, cognate facilitation effects, and audio-visual integration during speech comprehension. The model also provides a way to separate two languages without requiring a global language-identification system. We conclude that BLINCS serves as a promising new model of bilingual spoken language comprehension. PMID:24363602
Effects of reverberation time on the cognitive load in speech communication: theoretical considerations.

PubMed

Kjellberg, A

2004-01-01

The paper presents a theoretical analysis of possible effects of reverberation time on the cognitive load in speech communication. Speech comprehension requires not only phonological processing of the spoken words. Simultaneously, this information must be further processed and stored. All this processing takes place in the working memory, which has a limited processing capacity. The more resources that are allocated to word identification, the fewer resources are therefore left for the further processing and storing of the information. Reverberation conditions that allow the identification of almost all words may therefore still interfere with speech comprehension and memory storing. These problems are likely to be especially serious in situations where speech has to be followed continuously for a long time. An unfavourable reverberation time (RT) then could contribute to the development of cognitive fatigue, which means that working memory resources are gradually reduced. RT may also affect the cognitive load in two other ways: RT may change the distracting effects of a sound and a person's mood. Both effects could influence the cognitive load of a listener. It is argued that we need studies of RT effects in realistic long-lasting listening situations to better understand the effect of RT on speech communication. Furthermore, the effect of RT on distraction and mood need to be better understood.
Temporal and speech processing skills in normal hearing individuals exposed to occupational noise.

PubMed

Kumar, U Ajith; Ameenudin, Syed; Sangamanatha, A V

2012-01-01

Prolonged exposure to high levels of occupational noise can cause damage to hair cells in the cochlea and result in permanent noise-induced cochlear hearing loss. Consequences of cochlear hearing loss on speech perception and psychophysical abilities have been well documented. Primary goal of this research was to explore temporal processing and speech perception Skills in individuals who are exposed to occupational noise of more than 80 dBA and not yet incurred clinically significant threshold shifts. Contribution of temporal processing skills to speech perception in adverse listening situation was also evaluated. A total of 118 participants took part in this research. Participants comprised three groups of train drivers in the age range of 30-40 (n= 13), 41 50 ( = 13), 41-50 (n = 9), and 51-60 (n = 6) years and their non-noise-exposed counterparts (n = 30 in each age group). Participants of all the groups including the train drivers had hearing sensitivity within 25 dB HL in the octave frequencies between 250 and 8 kHz. Temporal processing was evaluated using gap detection, modulation detection, and duration pattern tests. Speech recognition was tested in presence multi-talker babble at -5dB SNR. Differences between experimental and control groups were analyzed using ANOVA and independent sample t-tests. Results showed a trend of reduced temporal processing skills in individuals with noise exposure. These deficits were observed despite normal peripheral hearing sensitivity. Speech recognition scores in the presence of noise were also significantly poor in noise-exposed group. Furthermore, poor temporal processing skills partially accounted for the speech recognition difficulties exhibited by the noise-exposed individuals. These results suggest that noise can cause significant distortions in the processing of suprathreshold temporal cues which may add to difficulties in hearing in adverse listening conditions.
A comparative analysis of whispered and normally phonated speech using an LPC-10 vocoder

NASA Astrophysics Data System (ADS)

Wilson, J. B.; Mosko, J. D.

1985-12-01

The determination of the performance of an LPC-10 vocoder in the processing of adult male and female whispered and normally phonated connected speech was the focus of this study. The LPC-10 vocoder's analysis of whispered speech compared quite favorably with similar studies which used sound spectrographic processing techniques. Shifting from phonated speech to whispered speech caused a substantial increase in the phonomic formant frequencies and formant bandwidths for both male and female speakers. The data from this study showed no evidence that the LPC-10 vocoder's ability to process voices with pitch extremes and quality extremes was limited in any significant manner. A comparison of the unprocessed natural vowel waveforms and qualities with the synthesized vowel waveforms and qualities revealed almost imperceptible differences. An LPC-10 vocoder's ability to process linguistic and dialectical suprasegmental features such as intonation, rate and stress at low bit rates should be a critical issue of concern for future research.

The role of the supplementary motor area for speech and language processing.

PubMed

Hertrich, Ingo; Dietrich, Susanne; Ackermann, Hermann

2016-09-01

Apart from its function in speech motor control, the supplementary motor area (SMA) has largely been neglected in models of speech and language processing in the brain. The aim of this review paper is to summarize more recent work, suggesting that the SMA has various superordinate control functions during speech communication and language reception, which is particularly relevant in case of increased task demands. The SMA is subdivided into a posterior region serving predominantly motor-related functions (SMA proper) whereas the anterior part (pre-SMA) is involved in higher-order cognitive control mechanisms. In analogy to motor triggering functions of the SMA proper, the pre-SMA seems to manage procedural aspects of cognitive processing. These latter functions, among others, comprise attentional switching, ambiguity resolution, context integration, and coordination between procedural and declarative memory structures. Regarding language processing, this refers, for example, to the use of inner speech mechanisms during language encoding, but also to lexical disambiguation, syntax and prosody integration, and context-tracking. Copyright © 2016 Elsevier Ltd. All rights reserved.
Subcortical processing of speech regularities underlies reading and music aptitude in children.

PubMed

Strait, Dana L; Hornickel, Jane; Kraus, Nina

2011-10-17

Neural sensitivity to acoustic regularities supports fundamental human behaviors such as hearing in noise and reading. Although the failure to encode acoustic regularities in ongoing speech has been associated with language and literacy deficits, how auditory expertise, such as the expertise that is associated with musical skill, relates to the brainstem processing of speech regularities is unknown. An association between musical skill and neural sensitivity to acoustic regularities would not be surprising given the importance of repetition and regularity in music. Here, we aimed to define relationships between the subcortical processing of speech regularities, music aptitude, and reading abilities in children with and without reading impairment. We hypothesized that, in combination with auditory cognitive abilities, neural sensitivity to regularities in ongoing speech provides a common biological mechanism underlying the development of music and reading abilities. We assessed auditory working memory and attention, music aptitude, reading ability, and neural sensitivity to acoustic regularities in 42 school-aged children with a wide range of reading ability. Neural sensitivity to acoustic regularities was assessed by recording brainstem responses to the same speech sound presented in predictable and variable speech streams. Through correlation analyses and structural equation modeling, we reveal that music aptitude and literacy both relate to the extent of subcortical adaptation to regularities in ongoing speech as well as with auditory working memory and attention. Relationships between music and speech processing are specifically driven by performance on a musical rhythm task, underscoring the importance of rhythmic regularity for both language and music. These data indicate common brain mechanisms underlying reading and music abilities that relate to how the nervous system responds to regularities in auditory input. Definition of common biological underpinnings for music and reading supports the usefulness of music for promoting child literacy, with the potential to improve reading remediation.
Visual activity predicts auditory recovery from deafness after adult cochlear implantation.

PubMed

Strelnikov, Kuzma; Rouger, Julien; Demonet, Jean-François; Lagleyre, Sebastien; Fraysse, Bernard; Deguine, Olivier; Barone, Pascal

2013-12-01

Modern cochlear implantation technologies allow deaf patients to understand auditory speech; however, the implants deliver only a coarse auditory input and patients must use long-term adaptive processes to achieve coherent percepts. In adults with post-lingual deafness, the high progress of speech recovery is observed during the first year after cochlear implantation, but there is a large range of variability in the level of cochlear implant outcomes and the temporal evolution of recovery. It has been proposed that when profoundly deaf subjects receive a cochlear implant, the visual cross-modal reorganization of the brain is deleterious for auditory speech recovery. We tested this hypothesis in post-lingually deaf adults by analysing whether brain activity shortly after implantation correlated with the level of auditory recovery 6 months later. Based on brain activity induced by a speech-processing task, we found strong positive correlations in areas outside the auditory cortex. The highest positive correlations were found in the occipital cortex involved in visual processing, as well as in the posterior-temporal cortex known for audio-visual integration. The other area, which positively correlated with auditory speech recovery, was localized in the left inferior frontal area known for speech processing. Our results demonstrate that the visual modality's functional level is related to the proficiency level of auditory recovery. Based on the positive correlation of visual activity with auditory speech recovery, we suggest that visual modality may facilitate the perception of the word's auditory counterpart in communicative situations. The link demonstrated between visual activity and auditory speech perception indicates that visuoauditory synergy is crucial for cross-modal plasticity and fostering speech-comprehension recovery in adult cochlear-implanted deaf patients.
Functional Brain Activation Differences in School-Age Children with Speech Sound Errors: Speech and Print Processing

ERIC Educational Resources Information Center

Preston, Jonathan L.; Felsenfeld, Susan; Frost, Stephen J.; Mencl, W. Einar; Fulbright, Robert K.; Grigorenko, Elena L.; Landi, Nicole; Seki, Ayumi; Pugh, Kenneth R.

2012-01-01

Purpose: To examine neural response to spoken and printed language in children with speech sound errors (SSE). Method: Functional magnetic resonance imaging was used to compare processing of auditorily and visually presented words and pseudowords in 17 children with SSE, ages 8;6[years;months] through 10;10, with 17 matched controls. Results: When…
Assessing Children's Home Language Environments Using Automatic Speech Recognition Technology

ERIC Educational Resources Information Center

Greenwood, Charles R.; Thiemann-Bourque, Kathy; Walker, Dale; Buzhardt, Jay; Gilkerson, Jill

2011-01-01

The purpose of this research was to replicate and extend some of the findings of Hart and Risley using automatic speech processing instead of human transcription of language samples. The long-term goal of this work is to make the current approach to speech processing possible by researchers and clinicians working on a daily basis with families and…
Auditory Processing and Speech Perception in Children with Specific Language Impairment: Relations with Oral Language and Literacy Skills

ERIC Educational Resources Information Center

Vandewalle, Ellen; Boets, Bart; Ghesquiere, Pol; Zink, Inge

2012-01-01

This longitudinal study investigated temporal auditory processing (frequency modulation and between-channel gap detection) and speech perception (speech-in-noise and categorical perception) in three groups of 6 years 3 months to 6 years 8 months-old children attending grade 1: (1) children with specific language impairment (SLI) and literacy delay…
Multilingual Phoneme Models for Rapid Speech Processing System Development

DTIC Science & Technology

2006-09-01

processes are used to develop an Arabic speech recognition system starting from monolingual English models, In- ternational Phonetic Association (IPA...clusters. It was found that multilingual bootstrapping methods out- perform monolingual English bootstrapping methods on the Arabic evaluation data initially...International Phonetic Alphabet . . . . . . . . . 7 2.3.2 Multilingual vs. Monolingual Speech Recognition 7 2.3.3 Data-Driven Approaches
When speaker identity is unavoidable: Neural processing of speaker identity cues in natural speech.

PubMed

Tuninetti, Alba; Chládková, Kateřina; Peter, Varghese; Schiller, Niels O; Escudero, Paola

2017-11-01

Speech sound acoustic properties vary largely across speakers and accents. When perceiving speech, adult listeners normally disregard non-linguistic variation caused by speaker or accent differences, in order to comprehend the linguistic message, e.g. to correctly identify a speech sound or a word. Here we tested whether the process of normalizing speaker and accent differences, facilitating the recognition of linguistic information, is found at the level of neural processing, and whether it is modulated by the listeners' native language. In a multi-deviant oddball paradigm, native and nonnative speakers of Dutch were exposed to naturally-produced Dutch vowels varying in speaker, sex, accent, and phoneme identity. Unexpectedly, the analysis of mismatch negativity (MMN) amplitudes elicited by each type of change shows a large degree of early perceptual sensitivity to non-linguistic cues. This finding on perception of naturally-produced stimuli contrasts with previous studies examining the perception of synthetic stimuli wherein adult listeners automatically disregard acoustic cues to speaker identity. The present finding bears relevance to speech normalization theories, suggesting that at an unattended level of processing, listeners are indeed sensitive to changes in fundamental frequency in natural speech tokens. Copyright © 2017 Elsevier Inc. All rights reserved.
Transfer of Training between Music and Speech: Common Processing, Attention, and Memory.

PubMed

Besson, Mireille; Chobert, Julie; Marie, Céline

2011-01-01

After a brief historical perspective of the relationship between language and music, we review our work on transfer of training from music to speech that aimed at testing the general hypothesis that musicians should be more sensitive than non-musicians to speech sounds. In light of recent results in the literature, we argue that when long-term experience in one domain influences acoustic processing in the other domain, results can be interpreted as common acoustic processing. But when long-term experience in one domain influences the building-up of abstract and specific percepts in another domain, results are taken as evidence for transfer of training effects. Moreover, we also discuss the influence of attention and working memory on transfer effects and we highlight the usefulness of the event-related potentials method to disentangle the different processes that unfold in the course of music and speech perception. Finally, we give an overview of an on-going longitudinal project with children aimed at testing transfer effects from music to different levels and aspects of speech processing.
Transfer of Training between Music and Speech: Common Processing, Attention, and Memory

PubMed Central

Besson, Mireille; Chobert, Julie; Marie, Céline

2011-01-01

After a brief historical perspective of the relationship between language and music, we review our work on transfer of training from music to speech that aimed at testing the general hypothesis that musicians should be more sensitive than non-musicians to speech sounds. In light of recent results in the literature, we argue that when long-term experience in one domain influences acoustic processing in the other domain, results can be interpreted as common acoustic processing. But when long-term experience in one domain influences the building-up of abstract and specific percepts in another domain, results are taken as evidence for transfer of training effects. Moreover, we also discuss the influence of attention and working memory on transfer effects and we highlight the usefulness of the event-related potentials method to disentangle the different processes that unfold in the course of music and speech perception. Finally, we give an overview of an on-going longitudinal project with children aimed at testing transfer effects from music to different levels and aspects of speech processing. PMID:21738519
Processing Electromyographic Signals to Recognize Words

NASA Technical Reports Server (NTRS)

Jorgensen, C. C.; Lee, D. D.

2009-01-01

A recently invented speech-recognition method applies to words that are articulated by means of the tongue and throat muscles but are otherwise not voiced or, at most, are spoken sotto voce. This method could satisfy a need for speech recognition under circumstances in which normal audible speech is difficult, poses a hazard, is disturbing to listeners, or compromises privacy. The method could also be used to augment traditional speech recognition by providing an additional source of information about articulator activity. The method can be characterized as intermediate between (1) conventional speech recognition through processing of voice sounds and (2) a method, not yet developed, of processing electroencephalographic signals to extract unspoken words directly from thoughts. This method involves computational processing of digitized electromyographic (EMG) signals from muscle innervation acquired by surface electrodes under a subject's chin near the tongue and on the side of the subject s throat near the larynx. After preprocessing, digitization, and feature extraction, EMG signals are processed by a neural-network pattern classifier, implemented in software, that performs the bulk of the recognition task as described.
Is infant-directed speech interesting because it is surprising? - Linking properties of IDS to statistical learning and attention at the prosodic level.

PubMed

Räsänen, Okko; Kakouros, Sofoklis; Soderstrom, Melanie

2018-06-06

The exaggerated intonation and special rhythmic properties of infant-directed speech (IDS) have been hypothesized to attract infants' attention to the speech stream. However, there has been little work actually connecting the properties of IDS to models of attentional processing or perceptual learning. A number of such attention models suggest that surprising or novel perceptual inputs attract attention, where novelty can be operationalized as the statistical (un)predictability of the stimulus in the given context. Since prosodic patterns such as F0 contours are accessible to young infants who are also known to be adept statistical learners, the present paper investigates a hypothesis that F0 contours in IDS are less predictable than those in adult-directed speech (ADS), given previous exposure to both speaking styles, thereby potentially tapping into basic attentional mechanisms of the listeners in a similar manner that relative probabilities of other linguistic patterns are known to modulate attentional processing in infants and adults. Computational modeling analyses with naturalistic IDS and ADS speech from matched speakers and contexts show that IDS intonation has lower overall temporal predictability even when the F0 contours of both speaking styles are normalized to have equal means and variances. A closer analysis reveals that there is a tendency of IDS intonation to be less predictable at the end of short utterances, whereas ADS exhibits more stable average predictability patterns across the full extent of the utterances. The difference between IDS and ADS persists even when the proportion of IDS and ADS exposure is varied substantially, simulating different relative amounts of IDS heard in different family and cultural environments. Exposure to IDS is also found to be more efficient for predicting ADS intonation contours in new utterances than exposure to the equal amount of ADS speech. This indicates that the more variable prosodic contours of IDS also generalize to ADS, and may therefore enhance prosodic learning in infancy. Overall, the study suggests that one reason behind infant preference for IDS could be its higher information value at the prosodic level, as measured by the amount of surprisal in the F0 contours. This provides the first formal link between the properties of IDS and the models of attentional processing and statistical learning in the brain. However, this finding does not rule out the possibility that other differences between the IDS and ADS also play a role. Copyright © 2018 Elsevier B.V. All rights reserved.
A Mis-recognized Medical Vocabulary Correction System for Speech-based Electronic Medical Record

PubMed Central

Seo, Hwa Jeong; Kim, Ju Han; Sakabe, Nagamasa

2002-01-01

Speech recognition as an input tool for electronic medical record (EMR) enables efficient data entry at the point of care. However, the recognition accuracy for medical vocabulary is much poorer than that for doctor-patient dialogue. We developed a mis-recognized medical vocabulary correction system based on syllable-by-syllable comparison of speech text against medical vocabulary database. Using specialty medical vocabulary, the algorithm detects and corrects mis-recognized medical vocabularies in narrative text. Our preliminary evaluation showed 94% of accuracy in mis-recognized medical vocabulary correction.
Linguistic Processing of Accented Speech Across the Lifespan

PubMed Central

Cristia, Alejandrina; Seidl, Amanda; Vaughn, Charlotte; Schmale, Rachel; Bradlow, Ann; Floccia, Caroline

2012-01-01

In most of the world, people have regular exposure to multiple accents. Therefore, learning to quickly process accented speech is a prerequisite to successful communication. In this paper, we examine work on the perception of accented speech across the lifespan, from early infancy to late adulthood. Unfamiliar accents initially impair linguistic processing by infants, children, younger adults, and older adults, but listeners of all ages come to adapt to accented speech. Emergent research also goes beyond these perceptual abilities, by assessing links with production and the relative contributions of linguistic knowledge and general cognitive skills. We conclude by underlining points of convergence across ages, and the gaps left to face in future work. PMID:23162513
Auditory-neurophysiological responses to speech during early childhood: Effects of background noise

PubMed Central

White-Schwoch, Travis; Davies, Evan C.; Thompson, Elaine C.; Carr, Kali Woodruff; Nicol, Trent; Bradlow, Ann R.; Kraus, Nina

2015-01-01

Early childhood is a critical period of auditory learning, during which children are constantly mapping sounds to meaning. But learning rarely occurs under ideal listening conditions—children are forced to listen against a relentless din. This background noise degrades the neural coding of these critical sounds, in turn interfering with auditory learning. Despite the importance of robust and reliable auditory processing during early childhood, little is known about the neurophysiology underlying speech processing in children so young. To better understand the physiological constraints these adverse listening scenarios impose on speech sound coding during early childhood, auditory-neurophysiological responses were elicited to a consonant-vowel syllable in quiet and background noise in a cohort of typically-developing preschoolers (ages 3–5 yr). Overall, responses were degraded in noise: they were smaller, less stable across trials, slower, and there was poorer coding of spectral content and the temporal envelope. These effects were exacerbated in response to the consonant transition relative to the vowel, suggesting that the neural coding of spectrotemporally-dynamic speech features is more tenuous in noise than the coding of static features—even in children this young. Neural coding of speech temporal fine structure, however, was more resilient to the addition of background noise than coding of temporal envelope information. Taken together, these results demonstrate that noise places a neurophysiological constraint on speech processing during early childhood by causing a breakdown in neural processing of speech acoustics. These results may explain why some listeners have inordinate difficulties understanding speech in noise. Speech-elicited auditory-neurophysiological responses offer objective insight into listening skills during early childhood by reflecting the integrity of neural coding in quiet and noise; this paper documents typical response properties in this age group. These normative metrics may be useful clinically to evaluate auditory processing difficulties during early childhood. PMID:26113025
Audio-visual speech processing in age-related hearing loss: Stronger integration and increased frontal lobe recruitment.

PubMed

Rosemann, Stephanie; Thiel, Christiane M

2018-07-15

Hearing loss is associated with difficulties in understanding speech, especially under adverse listening conditions. In these situations, seeing the speaker improves speech intelligibility in hearing-impaired participants. On the neuronal level, previous research has shown cross-modal plastic reorganization in the auditory cortex following hearing loss leading to altered processing of auditory, visual and audio-visual information. However, how reduced auditory input effects audio-visual speech perception in hearing-impaired subjects is largely unknown. We here investigated the impact of mild to moderate age-related hearing loss on processing audio-visual speech using functional magnetic resonance imaging. Normal-hearing and hearing-impaired participants performed two audio-visual speech integration tasks: a sentence detection task inside the scanner and the McGurk illusion outside the scanner. Both tasks consisted of congruent and incongruent audio-visual conditions, as well as auditory-only and visual-only conditions. We found a significantly stronger McGurk illusion in the hearing-impaired participants, which indicates stronger audio-visual integration. Neurally, hearing loss was associated with an increased recruitment of frontal brain areas when processing incongruent audio-visual, auditory and also visual speech stimuli, which may reflect the increased effort to perform the task. Hearing loss modulated both the audio-visual integration strength measured with the McGurk illusion and brain activation in frontal areas in the sentence task, showing stronger integration and higher brain activation with increasing hearing loss. Incongruent compared to congruent audio-visual speech revealed an opposite brain activation pattern in left ventral postcentral gyrus in both groups, with higher activation in hearing-impaired participants in the incongruent condition. Our results indicate that already mild to moderate hearing loss impacts audio-visual speech processing accompanied by changes in brain activation particularly involving frontal areas. These changes are modulated by the extent of hearing loss. Copyright © 2018 Elsevier Inc. All rights reserved.
Speech enhancement based on modified phase-opponency detectors

NASA Astrophysics Data System (ADS)

Deshmukh, Om D.; Espy-Wilson, Carol Y.

2005-09-01

A speech enhancement algorithm based on a neural model was presented by Deshmukh et al., [149th meeting of the Acoustical Society America, 2005]. The algorithm consists of a bank of Modified Phase Opponency (MPO) filter pairs tuned to different center frequencies. This algorithm is able to enhance salient spectral features in speech signals even at low signal-to-noise ratios. However, the algorithm introduces musical noise and sometimes misses a spectral peak that is close in frequency to a stronger spectral peak. Refinement in the design of the MPO filters was recently made that takes advantage of the falling spectrum of the speech signal in sonorant regions. The modified set of filters leads to better separation of the noise and speech signals, and more accurate enhancement of spectral peaks. The improvements also lead to a significant reduction in musical noise. Continuity algorithms based on the properties of speech signals are used to further reduce the musical noise effect. The efficiency of the proposed method in enhancing the speech signal when the level of the background noise is fluctuating will be demonstrated. The performance of the improved speech enhancement method will be compared with various spectral subtraction-based methods. [Work supported by NSF BCS0236707.
Twelve-month-old infants recognize that speech can communicate unobservable intentions.

PubMed

Vouloumanos, Athena; Onishi, Kristine H; Pogue, Amanda

2012-08-07

Much of our knowledge is acquired not from direct experience but through the speech of others. Speech allows rapid and efficient transfer of information that is otherwise not directly observable. Do infants recognize that speech, even if unfamiliar, can communicate about an important aspect of the world that cannot be directly observed: a person's intentions? Twelve-month-olds saw a person (the Communicator) attempt but fail to achieve a target action (stacking a ring on a funnel). The Communicator subsequently directed either speech or a nonspeech vocalization to another person (the Recipient) who had not observed the attempts. The Recipient either successfully stacked the ring (Intended outcome), attempted but failed to stack the ring (Observable outcome), or performed a different stacking action (Related outcome). Infants recognized that speech could communicate about unobservable intentions, looking longer at Observable and Related outcomes than the Intended outcome when the Communicator used speech. However, when the Communicator used nonspeech, infants looked equally at the three outcomes. Thus, for 12-month-olds, speech can transfer information about unobservable aspects of the world such as internal mental states, which provides preverbal infants with a tool for acquiring information beyond their immediate experience.
Twelve-month-old infants recognize that speech can communicate unobservable intentions

PubMed Central

Vouloumanos, Athena; Onishi, Kristine H.; Pogue, Amanda

2012-01-01

Much of our knowledge is acquired not from direct experience but through the speech of others. Speech allows rapid and efficient transfer of information that is otherwise not directly observable. Do infants recognize that speech, even if unfamiliar, can communicate about an important aspect of the world that cannot be directly observed: a person’s intentions? Twelve-month-olds saw a person (the Communicator) attempt but fail to achieve a target action (stacking a ring on a funnel). The Communicator subsequently directed either speech or a nonspeech vocalization to another person (the Recipient) who had not observed the attempts. The Recipient either successfully stacked the ring (Intended outcome), attempted but failed to stack the ring (Observable outcome), or performed a different stacking action (Related outcome). Infants recognized that speech could communicate about unobservable intentions, looking longer at Observable and Related outcomes than the Intended outcome when the Communicator used speech. However, when the Communicator used nonspeech, infants looked equally at the three outcomes. Thus, for 12-month-olds, speech can transfer information about unobservable aspects of the world such as internal mental states, which provides preverbal infants with a tool for acquiring information beyond their immediate experience. PMID:22826217
The a priori SDR Estimation Techniques with Reduced Speech Distortion for Acoustic Echo and Noise Suppression

NASA Astrophysics Data System (ADS)

Thoonsaengngam, Rattapol; Tangsangiumvisai, Nisachon

This paper proposes an enhanced method for estimating the a priori Signal-to-Disturbance Ratio (SDR) to be employed in the Acoustic Echo and Noise Suppression (AENS) system for full-duplex hands-free communications. The proposed a priori SDR estimation technique is modified based upon the Two-Step Noise Reduction (TSNR) algorithm to suppress the background noise while preserving speech spectral components. In addition, a practical approach to determine accurately the Echo Spectrum Variance (ESV) is presented based upon the linear relationship assumption between the power spectrum of far-end speech and acoustic echo signals. The ESV estimation technique is then employed to alleviate the acoustic echo problem. The performance of the AENS system that employs these two proposed estimation techniques is evaluated through the Echo Attenuation (EA), Noise Attenuation (NA), and two speech distortion measures. Simulation results based upon real speech signals guarantee that our improved AENS system is able to mitigate efficiently the problem of acoustic echo and background noise, while preserving the speech quality and speech intelligibility.

Three speech sounds, one motor action: evidence for speech-motor disparity from English flap production.

PubMed

Derrick, Donald; Stavness, Ian; Gick, Bryan

2015-03-01

The assumption that units of speech production bear a one-to-one relationship to speech motor actions pervades otherwise widely varying theories of speech motor behavior. This speech production and simulation study demonstrates that commonly occurring flap sequences may violate this assumption. In the word "Saturday," a sequence of three sounds may be produced using a single, cyclic motor action. Under this view, the initial upward tongue tip motion, starting with the first vowel and moving to contact the hard palate on the way to a retroflex position, is under active muscular control, while the downward movement of the tongue tip, including the second contact with the hard palate, results from gravity and elasticity during tongue muscle relaxation. This sequence is reproduced using a three-dimensional computer simulation of human vocal tract biomechanics and differs greatly from other observed sequences for the same word, which employ multiple targeted speech motor actions. This outcome suggests that a goal of a speaker is to produce an entire sequence in a biomechanically efficient way at the expense of maintaining parity within the individual parts of the sequence.
An attention-gating recurrent working memory architecture for emergent speech representation

NASA Astrophysics Data System (ADS)

Elshaw, Mark; Moore, Roger K.; Klein, Michael

2010-06-01

This paper describes an attention-gating recurrent self-organising map approach for emergent speech representation. Inspired by evidence from human cognitive processing, the architecture combines two main neural components. The first component, the attention-gating mechanism, uses actor-critic learning to perform selective attention towards speech. Through this selective attention approach, the attention-gating mechanism controls access to working memory processing. The second component, the recurrent self-organising map memory, develops a temporal-distributed representation of speech using phone-like structures. Representing speech in terms of phonetic features in an emergent self-organised fashion, according to research on child cognitive development, recreates the approach found in infants. Using this representational approach, in a fashion similar to infants, should improve the performance of automatic recognition systems through aiding speech segmentation and fast word learning.
A Framework for Speech Activity Detection Using Adaptive Auditory Receptive Fields.

PubMed

Carlin, Michael A; Elhilali, Mounya

2015-12-01

One of the hallmarks of sound processing in the brain is the ability of the nervous system to adapt to changing behavioral demands and surrounding soundscapes. It can dynamically shift sensory and cognitive resources to focus on relevant sounds. Neurophysiological studies indicate that this ability is supported by adaptively retuning the shapes of cortical spectro-temporal receptive fields (STRFs) to enhance features of target sounds while suppressing those of task-irrelevant distractors. Because an important component of human communication is the ability of a listener to dynamically track speech in noisy environments, the solution obtained by auditory neurophysiology implies a useful adaptation strategy for speech activity detection (SAD). SAD is an important first step in a number of automated speech processing systems, and performance is often reduced in highly noisy environments. In this paper, we describe how task-driven adaptation is induced in an ensemble of neurophysiological STRFs, and show how speech-adapted STRFs reorient themselves to enhance spectro-temporal modulations of speech while suppressing those associated with a variety of nonspeech sounds. We then show how an adapted ensemble of STRFs can better detect speech in unseen noisy environments compared to an unadapted ensemble and a noise-robust baseline. Finally, we use a stimulus reconstruction task to demonstrate how the adapted STRF ensemble better captures the spectrotemporal modulations of attended speech in clean and noisy conditions. Our results suggest that a biologically plausible adaptation framework can be applied to speech processing systems to dynamically adapt feature representations for improving noise robustness.
Categorical speech processing in Broca's area: an fMRI study using multivariate pattern-based analysis.

PubMed

Lee, Yune-Sang; Turkeltaub, Peter; Granger, Richard; Raizada, Rajeev D S

2012-03-14

Although much effort has been directed toward understanding the neural basis of speech processing, the neural processes involved in the categorical perception of speech have been relatively less studied, and many questions remain open. In this functional magnetic resonance imaging (fMRI) study, we probed the cortical regions mediating categorical speech perception using an advanced brain-mapping technique, whole-brain multivariate pattern-based analysis (MVPA). Normal healthy human subjects (native English speakers) were scanned while they listened to 10 consonant-vowel syllables along the /ba/-/da/ continuum. Outside of the scanner, individuals' own category boundaries were measured to divide the fMRI data into /ba/ and /da/ conditions per subject. The whole-brain MVPA revealed that Broca's area and the left pre-supplementary motor area evoked distinct neural activity patterns between the two perceptual categories (/ba/ vs /da/). Broca's area was also found when the same analysis was applied to another dataset (Raizada and Poldrack, 2007), which previously yielded the supramarginal gyrus using a univariate adaptation-fMRI paradigm. The consistent MVPA findings from two independent datasets strongly indicate that Broca's area participates in categorical speech perception, with a possible role of translating speech signals into articulatory codes. The difference in results between univariate and multivariate pattern-based analyses of the same data suggest that processes in different cortical areas along the dorsal speech perception stream are distributed on different spatial scales.
SPEECH EVALUATION WITH AND WITHOUT PALATAL OBTURATOR IN PATIENTS SUBMITTED TO MAXILLECTOMY

PubMed Central

de Carvalho-Teles, Viviane; Pegoraro-Krook, Maria Inês; Lauris, José Roberto Pereira

2006-01-01

Most patients who have undergone resection of the maxillae due to benign or malignant tumors in the palatomaxillary region present with speech and swallowing disorders. Coupling of the oral and nasal cavities increases nasal resonance, resulting in hypernasality and unintelligible speech. Prosthodontic rehabilitation of maxillary resections with effective separation of the oral and nasal cavities can improve speech and esthetics, and assist the psychosocial adjustment of the patient as well. The objective of this study was to evaluate the efficacy of the palatal obturator prosthesis on speech intelligibility and resonance of 23 patients with age ranging from 18 to 83 years (Mean = 49.5 years), who had undergone inframedial-structural maxillectomy. The patients were requested to count from 1 to 20, to repeat 21 words and to spontaneously speak for 15 seconds, once with and again without the prosthesis, for tape recording purposes. The resonance and speech intelligibility were judged by 5 speech language pathologists from the tape recordings samples. The results have shown that the majority of patients (82.6%) significantly improved their speech intelligibility, and 16 patients (69.9%) exhibited a significant hypernasality reduction with the obturator in place. The results of this study indicated that maxillary obturator prosthesis was efficient to improve the speech intelligibility and resonance in patients who had undergone maxillectomy. PMID:19089242
Effects of Attention Cueing on Learning Speech Organ Operation through Mobile Phones

ERIC Educational Resources Information Center

Yang, Hui-Yu

2017-01-01

The studies regarding using a cross sectional view of speech organs enriched with attention cueing and written text to probe learners' learning efficiency and behavior through mobile phones is scant. The purpose of this study was to examine whether the presence of attention cueing can benefit learners with different amounts of prior knowledge in…
Bimodal bilingualism as multisensory training?: Evidence for improved audiovisual speech perception after sign language exposure.

PubMed

Williams, Joshua T; Darcy, Isabelle; Newman, Sharlene D

2016-02-15

The aim of the present study was to characterize effects of learning a sign language on the processing of a spoken language. Specifically, audiovisual phoneme comprehension was assessed before and after 13 weeks of sign language exposure. L2 ASL learners performed this task in the fMRI scanner. Results indicated that L2 American Sign Language (ASL) learners' behavioral classification of the speech sounds improved with time compared to hearing nonsigners. Results indicated increased activation in the supramarginal gyrus (SMG) after sign language exposure, which suggests concomitant increased phonological processing of speech. A multiple regression analysis indicated that learner's rating on co-sign speech use and lipreading ability was correlated with SMG activation. This pattern of results indicates that the increased use of mouthing and possibly lipreading during sign language acquisition may concurrently improve audiovisual speech processing in budding hearing bimodal bilinguals. Copyright © 2015 Elsevier B.V. All rights reserved.
Electrocorticographic representations of segmental features in continuous speech

PubMed Central

Lotte, Fabien; Brumberg, Jonathan S.; Brunner, Peter; Gunduz, Aysegul; Ritaccio, Anthony L.; Guan, Cuntai; Schalk, Gerwin

2015-01-01

Acoustic speech output results from coordinated articulation of dozens of muscles, bones and cartilages of the vocal mechanism. While we commonly take the fluency and speed of our speech productions for granted, the neural mechanisms facilitating the requisite muscular control are not completely understood. Previous neuroimaging and electrophysiology studies of speech sensorimotor control has typically concentrated on speech sounds (i.e., phonemes, syllables and words) in isolation; sentence-length investigations have largely been used to inform coincident linguistic processing. In this study, we examined the neural representations of segmental features (place and manner of articulation, and voicing status) in the context of fluent, continuous speech production. We used recordings from the cortical surface [electrocorticography (ECoG)] to simultaneously evaluate the spatial topography and temporal dynamics of the neural correlates of speech articulation that may mediate the generation of hypothesized gestural or articulatory scores. We found that the representation of place of articulation involved broad networks of brain regions during all phases of speech production: preparation, execution and monitoring. In contrast, manner of articulation and voicing status were dominated by auditory cortical responses after speech had been initiated. These results provide a new insight into the articulatory and auditory processes underlying speech production in terms of their motor requirements and acoustic correlates. PMID:25759647
The influence of informational masking on speech perception and pupil response in adults with hearing impairment.

PubMed

Koelewijn, Thomas; Zekveld, Adriana A; Festen, Joost M; Kramer, Sophia E

2014-03-01

A recent pupillometry study on adults with normal hearing indicates that the pupil response during speech perception (cognitive processing load) is strongly affected by the type of speech masker. The current study extends these results by recording the pupil response in 32 participants with hearing impairment (mean age 59 yr) while they were listening to sentences masked by fluctuating noise or a single-talker. Efforts were made to improve audibility of all sounds by means of spectral shaping. Additionally, participants performed tests measuring verbal working memory capacity, inhibition of interfering information in working memory, and linguistic closure. The results showed worse speech reception thresholds for speech masked by single-talker speech compared to fluctuating noise. In line with previous results for participants with normal hearing, the pupil response was larger when listening to speech masked by a single-talker compared to fluctuating noise. Regression analysis revealed that larger working memory capacity and better inhibition of interfering information related to better speech reception thresholds, but these variables did not account for inter-individual differences in the pupil response. In conclusion, people with hearing impairment show more cognitive load during speech processing when there is interfering speech compared to fluctuating noise.
Consonant identification in noise using Hilbert-transform temporal fine-structure speech and recovered-envelope speech for listeners with normal and impaired hearinga)

PubMed Central

Léger, Agnès C.; Reed, Charlotte M.; Desloge, Joseph G.; Swaminathan, Jayaganesh; Braida, Louis D.

2015-01-01

Consonant-identification ability was examined in normal-hearing (NH) and hearing-impaired (HI) listeners in the presence of steady-state and 10-Hz square-wave interrupted speech-shaped noise. The Hilbert transform was used to process speech stimuli (16 consonants in a-C-a syllables) to present envelope cues, temporal fine-structure (TFS) cues, or envelope cues recovered from TFS speech. The performance of the HI listeners was inferior to that of the NH listeners both in terms of lower levels of performance in the baseline condition and in the need for higher signal-to-noise ratio to yield a given level of performance. For NH listeners, scores were higher in interrupted noise than in steady-state noise for all speech types (indicating substantial masking release). For HI listeners, masking release was typically observed for TFS and recovered-envelope speech but not for unprocessed and envelope speech. For both groups of listeners, TFS and recovered-envelope speech yielded similar levels of performance and consonant confusion patterns. The masking release observed for TFS and recovered-envelope speech may be related to level effects associated with the manner in which the TFS processing interacts with the interrupted noise signal, rather than to the contributions of TFS cues per se. PMID:26233038
Involvement of Right STS in Audio-Visual Integration for Affective Speech Demonstrated Using MEG

PubMed Central

Hagan, Cindy C.; Woods, Will; Johnson, Sam; Green, Gary G. R.; Young, Andrew W.

2013-01-01

Speech and emotion perception are dynamic processes in which it may be optimal to integrate synchronous signals emitted from different sources. Studies of audio-visual (AV) perception of neutrally expressed speech demonstrate supra-additive (i.e., where AV>[unimodal auditory+unimodal visual]) responses in left STS to crossmodal speech stimuli. However, emotions are often conveyed simultaneously with speech; through the voice in the form of speech prosody and through the face in the form of facial expression. Previous studies of AV nonverbal emotion integration showed a role for right (rather than left) STS. The current study therefore examined whether the integration of facial and prosodic signals of emotional speech is associated with supra-additive responses in left (cf. results for speech integration) or right (due to emotional content) STS. As emotional displays are sometimes difficult to interpret, we also examined whether supra-additive responses were affected by emotional incongruence (i.e., ambiguity). Using magnetoencephalography, we continuously recorded eighteen participants as they viewed and heard AV congruent emotional and AV incongruent emotional speech stimuli. Significant supra-additive responses were observed in right STS within the first 250 ms for emotionally incongruent and emotionally congruent AV speech stimuli, which further underscores the role of right STS in processing crossmodal emotive signals. PMID:23950977
Involvement of right STS in audio-visual integration for affective speech demonstrated using MEG.

PubMed

Hagan, Cindy C; Woods, Will; Johnson, Sam; Green, Gary G R; Young, Andrew W

2013-01-01

Speech and emotion perception are dynamic processes in which it may be optimal to integrate synchronous signals emitted from different sources. Studies of audio-visual (AV) perception of neutrally expressed speech demonstrate supra-additive (i.e., where AV>[unimodal auditory+unimodal visual]) responses in left STS to crossmodal speech stimuli. However, emotions are often conveyed simultaneously with speech; through the voice in the form of speech prosody and through the face in the form of facial expression. Previous studies of AV nonverbal emotion integration showed a role for right (rather than left) STS. The current study therefore examined whether the integration of facial and prosodic signals of emotional speech is associated with supra-additive responses in left (cf. results for speech integration) or right (due to emotional content) STS. As emotional displays are sometimes difficult to interpret, we also examined whether supra-additive responses were affected by emotional incongruence (i.e., ambiguity). Using magnetoencephalography, we continuously recorded eighteen participants as they viewed and heard AV congruent emotional and AV incongruent emotional speech stimuli. Significant supra-additive responses were observed in right STS within the first 250 ms for emotionally incongruent and emotionally congruent AV speech stimuli, which further underscores the role of right STS in processing crossmodal emotive signals.
Task-dependent modulation of the visual sensory thalamus assists visual-speech recognition.

PubMed

Díaz, Begoña; Blank, Helen; von Kriegstein, Katharina

2018-05-14

The cerebral cortex modulates early sensory processing via feed-back connections to sensory pathway nuclei. The functions of this top-down modulation for human behavior are poorly understood. Here, we show that top-down modulation of the visual sensory thalamus (the lateral geniculate body, LGN) is involved in visual-speech recognition. In two independent functional magnetic resonance imaging (fMRI) studies, LGN response increased when participants processed fast-varying features of articulatory movements required for visual-speech recognition, as compared to temporally more stable features required for face identification with the same stimulus material. The LGN response during the visual-speech task correlated positively with the visual-speech recognition scores across participants. In addition, the task-dependent modulation was present for speech movements and did not occur for control conditions involving non-speech biological movements. In face-to-face communication, visual speech recognition is used to enhance or even enable understanding what is said. Speech recognition is commonly explained in frameworks focusing on cerebral cortex areas. Our findings suggest that task-dependent modulation at subcortical sensory stages has an important role for communication: Together with similar findings in the auditory modality the findings imply that task-dependent modulation of the sensory thalami is a general mechanism to optimize speech recognition. Copyright © 2018. Published by Elsevier Inc.
Reaction Times of Normal Listeners to Laryngeal, Alaryngeal, and Synthetic Speech

ERIC Educational Resources Information Center

Evitts, Paul M.; Searl, Jeff

2006-01-01

The purpose of this study was to compare listener processing demands when decoding alaryngeal compared to laryngeal speech. Fifty-six listeners were presented with single words produced by 1 proficient speaker from 5 different modes of speech: normal, tracheosophageal (TE), esophageal (ES), electrolaryngeal (EL), and synthetic speech (SS).…
A Procedure for the Computerized Analysis of Cleft Palate Speech Transcription

ERIC Educational Resources Information Center

Fitzsimons, David A.; Jones, David L.; Barton, Belinda; North, Kathryn N.

2012-01-01

The phonetic symbols used by speech-language pathologists to transcribe speech contain underlying hexadecimal values used by computers to correctly display and process transcription data. This study aimed to develop a procedure to utilise these values as the basis for subsequent computerized analysis of cleft palate speech. A computer keyboard…
The role of left inferior frontal cortex during audiovisual speech perception in infants.

PubMed

Altvater-Mackensen, Nicole; Grossmann, Tobias

2016-06-01

In the first year of life, infants' speech perception attunes to their native language. While the behavioral changes associated with native language attunement are fairly well mapped, the underlying mechanisms and neural processes are still only poorly understood. Using fNIRS and eye tracking, the current study investigated 6-month-old infants' processing of audiovisual speech that contained matching or mismatching auditory and visual speech cues. Our results revealed that infants' speech-sensitive brain responses in inferior frontal brain regions were lateralized to the left hemisphere. Critically, our results further revealed that speech-sensitive left inferior frontal regions showed enhanced responses to matching when compared to mismatching audiovisual speech, and that infants with a preference to look at the speaker's mouth showed an enhanced left inferior frontal response to speech compared to infants with a preference to look at the speaker's eyes. These results suggest that left inferior frontal regions play a crucial role in associating information from different modalities during native language attunement, fostering the formation of multimodal phonological categories. Copyright © 2016 Elsevier Inc. All rights reserved.
Speech recognition systems on the Cell Broadband Engine

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liu, Y; Jones, H; Vaidya, S

In this paper we describe our design, implementation, and first results of a prototype connected-phoneme-based speech recognition system on the Cell Broadband Engine{trademark} (Cell/B.E.). Automatic speech recognition decodes speech samples into plain text (other representations are possible) and must process samples at real-time rates. Fortunately, the computational tasks involved in this pipeline are highly data-parallel and can receive significant hardware acceleration from vector-streaming architectures such as the Cell/B.E. Identifying and exploiting these parallelism opportunities is challenging, but also critical to improving system performance. We observed, from our initial performance timings, that a single Cell/B.E. processor can recognize speech from thousandsmore » of simultaneous voice channels in real time--a channel density that is orders-of-magnitude greater than the capacity of existing software speech recognizers based on CPUs (central processing units). This result emphasizes the potential for Cell/B.E.-based speech recognition and will likely lead to the future development of production speech systems using Cell/B.E. clusters.« less
Development and functional significance of private speech among attention-deficit hyperactivity disordered and normal boys.

PubMed

Berk, L E; Potts, M K

1991-06-01

We compared the development of spontaneous private speech and its relationship to self-controlled behavior in a sample of 6- to 12-year-olds with attention-deficit hyperactivity disorder (ADHD) and matched normal controls. Thirty-eight boys were observed in their classrooms while engaged in math seatwork. Results revealed that ADHD children were delayed in private speech development in that they engaged in more externalized, self-fuiding and less inaudible, internalized speech than normal youngsters. Several findings suggest that the developmental lag was a consequence of a highly unmanageable attentional system that prevents ADHD children's private speech from gaining efficient mastery over behavior. First, self-guiding speech was associated with greater attentional focus only among the least distractible ADHD boys. Second, the most mature, internalized speech forms were correlated with self-stimulating behavior for ADHD subjects but not for controls. Third, observations of ADHD children both on and off stimulant medication indicated that reducing their symptoms substantially increased the maturity of private speech and its association with motor quiescence and attention to task. Results suggest that the Vygotskian hypothesis of a unidirectional path of influence from private speech to self-controlled behavior should be expanded into a bidirectional model. These findings may also shed light on why treatment programs that train children with attentional deficits in speech-to-self have shown limited efficacy.
Development of a test battery for evaluating speech perception in complex listening environments.

PubMed

Brungart, Douglas S; Sheffield, Benjamin M; Kubli, Lina R

2014-08-01

In the real world, spoken communication occurs in complex environments that involve audiovisual speech cues, spatially separated sound sources, reverberant listening spaces, and other complicating factors that influence speech understanding. However, most clinical tools for assessing speech perception are based on simplified listening environments that do not reflect the complexities of real-world listening. In this study, speech materials from the QuickSIN speech-in-noise test by Killion, Niquette, Gudmundsen, Revit, and Banerjee [J. Acoust. Soc. Am. 116, 2395-2405 (2004)] were modified to simulate eight listening conditions spanning the range of auditory environments listeners encounter in everyday life. The standard QuickSIN test method was used to estimate 50% speech reception thresholds (SRT50) in each condition. A method of adjustment procedure was also used to obtain subjective estimates of the lowest signal-to-noise ratio (SNR) where the listeners were able to understand 100% of the speech (SRT100) and the highest SNR where they could detect the speech but could not understand any of the words (SRT0). The results show that the modified materials maintained most of the efficiency of the QuickSIN test procedure while capturing performance differences across listening conditions comparable to those reported in previous studies that have examined the effects of audiovisual cues, binaural cues, room reverberation, and time compression on the intelligibility of speech.
Advancements in robust algorithm formulation for speaker identification of whispered speech

NASA Astrophysics Data System (ADS)

Fan, Xing

Whispered speech is an alternative speech production mode from neutral speech, which is used by talkers intentionally in natural conversational scenarios to protect privacy and to avoid certain content from being overheard/made public. Due to the profound differences between whispered and neutral speech in production mechanism and the absence of whispered adaptation data, the performance of speaker identification systems trained with neutral speech degrades significantly. This dissertation therefore focuses on developing a robust closed-set speaker recognition system for whispered speech by using no or limited whispered adaptation data from non-target speakers. This dissertation proposes the concept of "High''/"Low'' performance whispered data for the purpose of speaker identification. A variety of acoustic properties are identified that contribute to the quality of whispered data. An acoustic analysis is also conducted to compare the phoneme/speaker dependency of the differences between whispered and neutral data in the feature domain. The observations from those acoustic analysis are new in this area and also serve as a guidance for developing robust speaker identification systems for whispered speech. This dissertation further proposes two systems for speaker identification of whispered speech. One system focuses on front-end processing. A two-dimensional feature space is proposed to search for "Low''-quality performance based whispered utterances and separate feature mapping functions are applied to vowels and consonants respectively in order to retain the speaker's information shared between whispered and neutral speech. The other system focuses on speech-mode-independent model training. The proposed method generates pseudo whispered features from neutral features by using the statistical information contained in a whispered Universal Background model (UBM) trained from extra collected whispered data from non-target speakers. Four modeling methods are proposed for the transformation estimation in order to generate the pseudo whispered features. Both of the above two systems demonstrate a significant improvement over the baseline system on the evaluation data. This dissertation has therefore contributed to providing a scientific understanding of the differences between whispered and neutral speech as well as improved front-end processing and modeling method for speaker identification of whispered speech. Such advancements will ultimately contribute to improve the robustness of speech processing systems.

An ALE meta-analysis on the audiovisual integration of speech signals.

PubMed

Erickson, Laura C; Heeg, Elizabeth; Rauschecker, Josef P; Turkeltaub, Peter E

2014-11-01

The brain improves speech processing through the integration of audiovisual (AV) signals. Situations involving AV speech integration may be crudely dichotomized into those where auditory and visual inputs contain (1) equivalent, complementary signals (validating AV speech) or (2) inconsistent, different signals (conflicting AV speech). This simple framework may allow the systematic examination of broad commonalities and differences between AV neural processes engaged by various experimental paradigms frequently used to study AV speech integration. We conducted an activation likelihood estimation metaanalysis of 22 functional imaging studies comprising 33 experiments, 311 subjects, and 347 foci examining "conflicting" versus "validating" AV speech. Experimental paradigms included content congruency, timing synchrony, and perceptual measures, such as the McGurk effect or synchrony judgments, across AV speech stimulus types (sublexical to sentence). Colocalization of conflicting AV speech experiments revealed consistency across at least two contrast types (e.g., synchrony and congruency) in a network of dorsal stream regions in the frontal, parietal, and temporal lobes. There was consistency across all contrast types (synchrony, congruency, and percept) in the bilateral posterior superior/middle temporal cortex. Although fewer studies were available, validating AV speech experiments were localized to other regions, such as ventral stream visual areas in the occipital and inferior temporal cortex. These results suggest that while equivalent, complementary AV speech signals may evoke activity in regions related to the corroboration of sensory input, conflicting AV speech signals recruit widespread dorsal stream areas likely involved in the resolution of conflicting sensory signals. Copyright © 2014 Wiley Periodicals, Inc.
Tuning Neural Phase Entrainment to Speech.

PubMed

Falk, Simone; Lanzilotti, Cosima; Schön, Daniele

2017-08-01

Musical rhythm positively impacts on subsequent speech processing. However, the neural mechanisms underlying this phenomenon are so far unclear. We investigated whether carryover effects from a preceding musical cue to a speech stimulus result from a continuation of neural phase entrainment to periodicities that are present in both music and speech. Participants listened and memorized French metrical sentences that contained (quasi-)periodic recurrences of accents and syllables. Speech stimuli were preceded by a rhythmically regular or irregular musical cue. Our results show that the presence of a regular cue modulates neural response as estimated by EEG power spectral density, intertrial coherence, and source analyses at critical frequencies during speech processing compared with the irregular condition. Importantly, intertrial coherences for regular cues were indicative of the participants' success in memorizing the subsequent speech stimuli. These findings underscore the highly adaptive nature of neural phase entrainment across fundamentally different auditory stimuli. They also support current models of neural phase entrainment as a tool of predictive timing and attentional selection across cognitive domains.
Effects of social cognitive impairment on speech disorder in schizophrenia.

PubMed

Docherty, Nancy M; McCleery, Amanda; Divilbiss, Marielle; Schumann, Emily B; Moe, Aubrey; Shakeel, Mohammed K

2013-05-01

Disordered speech in schizophrenia impairs social functioning because it impedes communication with others. Treatment approaches targeting this symptom have been limited by an incomplete understanding of its causes. This study examined the process underpinnings of speech disorder, assessed in terms of communication failure. Contributions of impairments in 2 social cognitive abilities, emotion perception and theory of mind (ToM), to speech disorder were assessed in 63 patients with schizophrenia or schizoaffective disorder and 21 nonpsychiatric participants, after controlling for the effects of verbal intelligence and impairments in basic language-related neurocognitive abilities. After removal of the effects of the neurocognitive variables, impairments in emotion perception and ToM each explained additional variance in speech disorder in the patients but not the controls. The neurocognitive and social cognitive variables, taken together, explained 51% of the variance in speech disorder in the patients. Schizophrenic disordered speech may be less a concomitant of "positive" psychotic process than of illness-related limitations in neurocognitive and social cognitive functioning.
Listening to speech recruits specific tongue motor synergies as revealed by transcranial magnetic stimulation and tissue-Doppler ultrasound imaging

PubMed Central

D'Ausilio, A.; Maffongelli, L.; Bartoli, E.; Campanella, M.; Ferrari, E.; Berry, J.; Fadiga, L.

2014-01-01

The activation of listener's motor system during speech processing was first demonstrated by the enhancement of electromyographic tongue potentials as evoked by single-pulse transcranial magnetic stimulation (TMS) over tongue motor cortex. This technique is, however, technically challenging and enables only a rather coarse measurement of this motor mirroring. Here, we applied TMS to listeners’ tongue motor area in association with ultrasound tissue Doppler imaging to describe fine-grained tongue kinematic synergies evoked by passive listening to speech. Subjects listened to syllables requiring different patterns of dorso-ventral and antero-posterior movements (/ki/, /ko/, /ti/, /to/). Results show that passive listening to speech sounds evokes a pattern of motor synergies mirroring those occurring during speech production. Moreover, mirror motor synergies were more evident in those subjects showing good performances in discriminating speech in noise demonstrating a role of the speech-related mirror system in feed-forward processing the speaker's ongoing motor plan. PMID:24778384
Temporal Context in Speech Processing and Attentional Stream Selection: A Behavioral and Neural perspective

PubMed Central

Zion Golumbic, Elana M.; Poeppel, David; Schroeder, Charles E.

2012-01-01

The human capacity for processing speech is remarkable, especially given that information in speech unfolds over multiple time scales concurrently. Similarly notable is our ability to filter out of extraneous sounds and focus our attention on one conversation, epitomized by the ‘Cocktail Party’ effect. Yet, the neural mechanisms underlying on-line speech decoding and attentional stream selection are not well understood. We review findings from behavioral and neurophysiological investigations that underscore the importance of the temporal structure of speech for achieving these perceptual feats. We discuss the hypothesis that entrainment of ambient neuronal oscillations to speech’s temporal structure, across multiple time-scales, serves to facilitate its decoding and underlies the selection of an attended speech stream over other competing input. In this regard, speech decoding and attentional stream selection are examples of ‘active sensing’, emphasizing an interaction between proactive and predictive top-down modulation of neuronal dynamics and bottom-up sensory input. PMID:22285024
Perceiving speech in context: Compensation for contextual variability during acoustic cue encoding and categorization

NASA Astrophysics Data System (ADS)

Toscano, Joseph Christopher

Several fundamental questions about speech perception concern how listeners understand spoken language despite considerable variability in speech sounds across different contexts (the problem of lack of invariance in speech). This contextual variability is caused by several factors, including differences between individual talkers' voices, variation in speaking rate, and effects of coarticulatory context. A number of models have been proposed to describe how the speech system handles differences across contexts. Critically, these models make different predictions about (1) whether contextual variability is handled at the level of acoustic cue encoding or categorization, (2) whether it is driven by feedback from category-level processes or interactions between cues, and (3) whether listeners discard fine-grained acoustic information to compensate for contextual variability. Separating the effects of cue- and category-level processing has been difficult because behavioral measures tap processes that occur well after initial cue encoding and are influenced by task demands and linguistic information. Recently, we have used the event-related brain potential (ERP) technique to examine cue encoding and online categorization. Specifically, we have looked at differences in the auditory N1 as a measure of acoustic cue encoding and the P3 as a measure of categorization. This allows us to examine multiple levels of processing during speech perception and can provide a useful tool for studying effects of contextual variability. Here, I apply this approach to determine the point in processing at which context has an effect on speech perception and to examine whether acoustic cues are encoded continuously. Several types of contextual variability (talker gender, speaking rate, and coarticulation), as well as several acoustic cues (voice onset time, formant frequencies, and bandwidths), are examined in a series of experiments. The results suggest that (1) at early stages of speech processing, listeners encode continuous differences in acoustic cues, independent of phonological categories; (2) at post-perceptual stages, fine-grained acoustic information is preserved; and (3) there is preliminary evidence that listeners encode cues relative to context via feedback from categories. These results are discussed in relation to proposed models of speech perception and sources of contextual variability.
Patient satisfaction and masticatory efficiency of single implant-retained mandibular overdentures using the stud and magnetic attachments.

PubMed

Cheng, Tao; Sun, Guifeng; Huo, Jingyu; He, Xiaoji; Wang, Yining; Ren, Yan-Fang

2012-11-01

To study patient satisfaction and masticatory efficiency of single implant-retained mandibular overdentures using the stud and magnetic attachments in a randomized clinical trial with a crossover design. Patients received a single implant placed in the midline of the mandible and either a stud (Locator) or a magnetic (Magfit) attachment, assigned at random. Patient satisfaction, including patient comfort, speech, chewing ability and retention, and masticatory efficiency measured by chewing peanuts, were assessed before and 3 months after attachment insertion. Patient satisfaction and masticatory efficiency were evaluated again 3 months after insertion of the alternate attachment bodies. The outcomes were compared before and after insertion of the attachments and between the two types of attachments using Wilcoxon signed rank tests. Patient overall satisfaction, comfort, speech, chewing ability, and retention improved significantly after insertion of both types of attachment bodies (p<0.05). Masticatory efficiencies also increased in both the Locator and the Magfit groups (p<0.05). There were no statistically significant differences in patient overall satisfaction, comfort, speech, and retention between the two types of attachments (p>0.05). The Locator attachments performed better in perceived chewing ability than the Magfit (p<0.05), but there was no statistically significant difference in masticatory efficiency between the two attachment types (p>0.05). Clinical outcomes were significantly improved in single implant-retained mandibular overdentures using either the Locator or the Magfit magnetic attachments. There was no difference in masticatory efficiency between the two attachment types. Copyright © 2012 Elsevier Ltd. All rights reserved.
Speech Evoked Auditory Brainstem Response in Stuttering

PubMed Central

Tahaei, Ali Akbar; Ashayeri, Hassan; Pourbakht, Akram; Kamali, Mohammad

2014-01-01

Auditory processing deficits have been hypothesized as an underlying mechanism for stuttering. Previous studies have demonstrated abnormal responses in subjects with persistent developmental stuttering (PDS) at the higher level of the central auditory system using speech stimuli. Recently, the potential usefulness of speech evoked auditory brainstem responses in central auditory processing disorders has been emphasized. The current study used the speech evoked ABR to investigate the hypothesis that subjects with PDS have specific auditory perceptual dysfunction. Objectives. To determine whether brainstem responses to speech stimuli differ between PDS subjects and normal fluent speakers. Methods. Twenty-five subjects with PDS participated in this study. The speech-ABRs were elicited by the 5-formant synthesized syllable/da/, with duration of 40 ms. Results. There were significant group differences for the onset and offset transient peaks. Subjects with PDS had longer latencies for the onset and offset peaks relative to the control group. Conclusions. Subjects with PDS showed a deficient neural timing in the early stages of the auditory pathway consistent with temporal processing deficits and their abnormal timing may underlie to their disfluency. PMID:25215262
Speech motor planning and execution deficits in early childhood stuttering.

PubMed

Walsh, Bridget; Mettel, Kathleen Marie; Smith, Anne

2015-01-01

Five to eight percent of preschool children develop stuttering, a speech disorder with clearly observable, hallmark symptoms: sound repetitions, prolongations, and blocks. While the speech motor processes underlying stuttering have been widely documented in adults, few studies to date have assessed the speech motor dynamics of stuttering near its onset. We assessed fundamental characteristics of speech movements in preschool children who stutter and their fluent peers to determine if atypical speech motor characteristics described for adults are early features of the disorder or arise later in the development of chronic stuttering. Orofacial movement data were recorded from 58 children who stutter and 43 children who do not stutter aged 4;0 to 5;11 (years; months) in a sentence production task. For single speech movements and multiple speech movement sequences, we computed displacement amplitude, velocity, and duration. For the phrase level movement sequence, we computed an index of articulation coordination consistency for repeated productions of the sentence. Boys who stutter, but not girls, produced speech with reduced amplitudes and velocities of articulatory movement. All children produced speech with similar durations. Boys, particularly the boys who stuttered, had more variable patterns of articulatory coordination compared to girls. This study is the first to demonstrate sex-specific differences in speech motor control processes between preschool boys and girls who are stuttering. The sex-specific lag in speech motor development in many boys who stutter likely has significant implications for the dramatically different recovery rates between male and female preschoolers who stutter. Further, our findings document that atypical speech motor development is an early feature of stuttering.
Speech, stone tool-making and the evolution of language.

PubMed

Cataldo, Dana Michelle; Migliano, Andrea Bamberg; Vinicius, Lucio

2018-01-01

The 'technological hypothesis' proposes that gestural language evolved in early hominins to enable the cultural transmission of stone tool-making skills, with speech appearing later in response to the complex lithic industries of more recent hominins. However, no flintknapping study has assessed the efficiency of speech alone (unassisted by gesture) as a tool-making transmission aid. Here we show that subjects instructed by speech alone underperform in stone tool-making experiments in comparison to subjects instructed through either gesture alone or 'full language' (gesture plus speech), and also report lower satisfaction with their received instruction. The results provide evidence that gesture was likely to be selected over speech as a teaching aid in the earliest hominin tool-makers; that speech could not have replaced gesturing as a tool-making teaching aid in later hominins, possibly explaining the functional retention of gesturing in the full language of modern humans; and that speech may have evolved for reasons unrelated to tool-making. We conclude that speech is unlikely to have evolved as tool-making teaching aid superior to gesture, as claimed by the technological hypothesis, and therefore alternative views should be considered. For example, gestural language may have evolved to enable tool-making in earlier hominins, while speech may have later emerged as a response to increased trade and more complex inter- and intra-group interactions in Middle Pleistocene ancestors of Neanderthals and Homo sapiens; or gesture and speech may have evolved in parallel rather than in sequence.
On the relationship between auditory cognition and speech intelligibility in cochlear implant users: An ERP study.

PubMed

Finke, Mareike; Büchner, Andreas; Ruigendijk, Esther; Meyer, Martin; Sandmann, Pascale

2016-07-01

There is a high degree of variability in speech intelligibility outcomes across cochlear-implant (CI) users. To better understand how auditory cognition affects speech intelligibility with the CI, we performed an electroencephalography study in which we examined the relationship between central auditory processing, cognitive abilities, and speech intelligibility. Postlingually deafened CI users (N=13) and matched normal-hearing (NH) listeners (N=13) performed an oddball task with words presented in different background conditions (quiet, stationary noise, modulated noise). Participants had to categorize words as living (targets) or non-living entities (standards). We also assessed participants' working memory (WM) capacity and verbal abilities. For the oddball task, we found lower hit rates and prolonged response times in CI users when compared with NH listeners. Noise-related prolongation of the N1 amplitude was found for all participants. Further, we observed group-specific modulation effects of event-related potentials (ERPs) as a function of background noise. While NH listeners showed stronger noise-related modulation of the N1 latency, CI users revealed enhanced modulation effects of the N2/N4 latency. In general, higher-order processing (N2/N4, P3) was prolonged in CI users in all background conditions when compared with NH listeners. Longer N2/N4 latency in CI users suggests that these individuals have difficulties to map acoustic-phonetic features to lexical representations. These difficulties seem to be increased for speech-in-noise conditions when compared with speech in quiet background. Correlation analyses showed that shorter ERP latencies were related to enhanced speech intelligibility (N1, N2/N4), better lexical fluency (N1), and lower ratings of listening effort (N2/N4) in CI users. In sum, our findings suggest that CI users and NH listeners differ with regards to both the sensory and the higher-order processing of speech in quiet as well as in noisy background conditions. Our results also revealed that verbal abilities are related to speech processing and speech intelligibility in CI users, confirming the view that auditory cognition plays an important role for CI outcome. We conclude that differences in auditory-cognitive processing contribute to the variability in speech performance outcomes observed in CI users. Copyright © 2016 Elsevier Ltd. All rights reserved.
Optoelectronic Reservoir Computing

PubMed Central

Paquot, Y.; Duport, F.; Smerieri, A.; Dambre, J.; Schrauwen, B.; Haelterman, M.; Massar, S.

2012-01-01

Reservoir computing is a recently introduced, highly efficient bio-inspired approach for processing time dependent data. The basic scheme of reservoir computing consists of a non linear recurrent dynamical system coupled to a single input layer and a single output layer. Within these constraints many implementations are possible. Here we report an optoelectronic implementation of reservoir computing based on a recently proposed architecture consisting of a single non linear node and a delay line. Our implementation is sufficiently fast for real time information processing. We illustrate its performance on tasks of practical importance such as nonlinear channel equalization and speech recognition, and obtain results comparable to state of the art digital implementations. PMID:22371825
Perception of Hearing Aid-Processed Speech in Individuals with Late-Onset Auditory Neuropathy Spectrum Disorder.

PubMed

Mathai, Jijo Pottackal; Appu, Sabarish

2015-01-01

Auditory neuropathy spectrum disorder (ANSD) is a form of sensorineural hearing loss, causing severe deficits in speech perception. The perceptual problems of individuals with ANSD were attributed to their temporal processing impairment rather than to reduced audibility. This rendered their rehabilitation difficult using hearing aids. Although hearing aids can restore audibility, compression circuits in a hearing aid might distort the temporal modulations of speech, causing poor aided performance. Therefore, hearing aid settings that preserve the temporal modulations of speech might be an effective way to improve speech perception in ANSD. The purpose of the study was to investigate the perception of hearing aid-processed speech in individuals with late-onset ANSD. A repeated measures design was used to study the effect of various compression time settings on speech perception and perceived quality. Seventeen individuals with late-onset ANSD within the age range of 20-35 yr participated in the study. The word recognition scores (WRSs) and quality judgment of phonemically balanced words, processed using four different compression settings of a hearing aid (slow, medium, fast, and linear), were evaluated. The modulation spectra of hearing aid-processed stimuli were estimated to probe the effect of amplification on the temporal envelope of speech. Repeated measures analysis of variance and post hoc Bonferroni's pairwise comparisons were used to analyze the word recognition performance and quality judgment. The comparison between unprocessed and all four hearing aid-processed stimuli showed significantly higher perception using the former stimuli. Even though perception of words processed using slow compression time settings of the hearing aids were significantly higher than the fast one, their difference was only 4%. In addition, there were no significant differences in perception between any other hearing aid-processed stimuli. Analysis of the temporal envelope of hearing aid-processed stimuli revealed minimal changes in the temporal envelope across the four hearing aid settings. In terms of quality, the highest number of individuals preferred stimuli processed using slow compression time settings. Individuals who preferred medium ones followed this. However, none of the individuals preferred fast compression time settings. Analysis of quality judgment showed that slow, medium, and linear settings presented significantly higher preference scores than the fast compression setting. Individuals with ANSD showed no marked difference in perception of speech that was processed using the four different hearing aid settings. However, significantly higher preference, in terms of quality, was found for stimuli processed using slow, medium, and linear settings over the fast one. Therefore, whenever hearing aids are recommended for ANSD, those having slow compression time settings or linear amplification may be chosen over the fast (syllabic compression) one. In addition, WRSs obtained using hearing aid-processed stimuli were remarkably poorer than unprocessed stimuli. This shows that processing of speech through hearing aids might have caused a large reduction of performance in individuals with ANSD. However, further evaluation is needed using individually programmed hearing aids rather than hearing aid-processed stimuli. American Academy of Audiology.
Relationships Among Peripheral and Central Electrophysiological Measures of Spatial and Spectral Selectivity and Speech Perception in Cochlear Implant Users.

PubMed

Scheperle, Rachel A; Abbas, Paul J

2015-01-01

The ability to perceive speech is related to the listener's ability to differentiate among frequencies (i.e., spectral resolution). Cochlear implant (CI) users exhibit variable speech-perception and spectral-resolution abilities, which can be attributed in part to the extent of electrode interactions at the periphery (i.e., spatial selectivity). However, electrophysiological measures of peripheral spatial selectivity have not been found to correlate with speech perception. The purpose of this study was to evaluate auditory processing at the periphery and cortex using both simple and spectrally complex stimuli to better understand the stages of neural processing underlying speech perception. The hypotheses were that (1) by more completely characterizing peripheral excitation patterns than in previous studies, significant correlations with measures of spectral selectivity and speech perception would be observed, (2) adding information about processing at a level central to the auditory nerve would account for additional variability in speech perception, and (3) responses elicited with spectrally complex stimuli would be more strongly correlated with speech perception than responses elicited with spectrally simple stimuli. Eleven adult CI users participated. Three experimental processor programs (MAPs) were created to vary the likelihood of electrode interactions within each participant. For each MAP, a subset of 7 of 22 intracochlear electrodes was activated: adjacent (MAP 1), every other (MAP 2), or every third (MAP 3). Peripheral spatial selectivity was assessed using the electrically evoked compound action potential (ECAP) to obtain channel-interaction functions for all activated electrodes (13 functions total). Central processing was assessed by eliciting the auditory change complex with both spatial (electrode pairs) and spectral (rippled noise) stimulus changes. Speech-perception measures included vowel discrimination and the Bamford-Kowal-Bench Speech-in-Noise test. Spatial and spectral selectivity and speech perception were expected to be poorest with MAP 1 (closest electrode spacing) and best with MAP 3 (widest electrode spacing). Relationships among the electrophysiological and speech-perception measures were evaluated using mixed-model and simple linear regression analyses. All electrophysiological measures were significantly correlated with each other and with speech scores for the mixed-model analysis, which takes into account multiple measures per person (i.e., experimental MAPs). The ECAP measures were the best predictor. In the simple linear regression analysis on MAP 3 data, only the cortical measures were significantly correlated with speech scores; spectral auditory change complex amplitude was the strongest predictor. The results suggest that both peripheral and central electrophysiological measures of spatial and spectral selectivity provide valuable information about speech perception. Clinically, it is often desirable to optimize performance for individual CI users. These results suggest that ECAP measures may be most useful for within-subject applications when multiple measures are performed to make decisions about processor options. They also suggest that if the goal is to compare performance across individuals based on a single measure, then processing central to the auditory nerve (specifically, cortical measures of discriminability) should be considered.
Speech Comprehension Difficulties in Chronic Tinnitus and Its Relation to Hyperacusis

PubMed Central

Vielsmeier, Veronika; Kreuzer, Peter M.; Haubner, Frank; Steffens, Thomas; Semmler, Philipp R. O.; Kleinjung, Tobias; Schlee, Winfried; Langguth, Berthold; Schecklmann, Martin

2016-01-01

Objective: Many tinnitus patients complain about difficulties regarding speech comprehension. In spite of the high clinical relevance little is known about underlying mechanisms and predisposing factors. Here, we performed an exploratory investigation in a large sample of tinnitus patients to (1) estimate the prevalence of speech comprehension difficulties among tinnitus patients, to (2) compare subjective reports of speech comprehension difficulties with behavioral measurements in a standardized speech comprehension test and to (3) explore underlying mechanisms by analyzing the relationship between speech comprehension difficulties and peripheral hearing function (pure tone audiogram), as well as with co-morbid hyperacusis as a central auditory processing disorder. Subjects and Methods: Speech comprehension was assessed in 361 tinnitus patients presenting between 07/2012 and 08/2014 at the Interdisciplinary Tinnitus Clinic at the University of Regensburg. The assessment included standard audiological assessments (pure tone audiometry, tinnitus pitch, and loudness matching), the Goettingen sentence test (in quiet) for speech audiometric evaluation, two questions about hyperacusis, and two questions about speech comprehension in quiet and noisy environments (“How would you rate your ability to understand speech?”; “How would you rate your ability to follow a conversation when multiple people are speaking simultaneously?”). Results: Subjectively-reported speech comprehension deficits are frequent among tinnitus patients, especially in noisy environments (cocktail party situation). 74.2% of all investigated patients showed disturbed speech comprehension (indicated by values above 21.5 dB SPL in the Goettingen sentence test). Subjective speech comprehension complaints (both for general and in noisy environment) were correlated with hearing level and with audiologically-assessed speech comprehension ability. In contrast, co-morbid hyperacusis was only correlated with speech comprehension difficulties in noisy environments, but not with speech comprehension difficulties in general. Conclusion: Speech comprehension deficits are frequent among tinnitus patients. Whereas speech comprehension deficits in quiet environments are primarily due to peripheral hearing loss, speech comprehension deficits in noisy environments are related to both peripheral hearing loss and dysfunctional central auditory processing. Disturbed speech comprehension in noisy environments might be modulated by a central inhibitory deficit. In addition, attentional and cognitive aspects may play a role. PMID:28018209
Speech Comprehension Difficulties in Chronic Tinnitus and Its Relation to Hyperacusis.

PubMed

Vielsmeier, Veronika; Kreuzer, Peter M; Haubner, Frank; Steffens, Thomas; Semmler, Philipp R O; Kleinjung, Tobias; Schlee, Winfried; Langguth, Berthold; Schecklmann, Martin

2016-01-01

Objective: Many tinnitus patients complain about difficulties regarding speech comprehension. In spite of the high clinical relevance little is known about underlying mechanisms and predisposing factors. Here, we performed an exploratory investigation in a large sample of tinnitus patients to (1) estimate the prevalence of speech comprehension difficulties among tinnitus patients, to (2) compare subjective reports of speech comprehension difficulties with behavioral measurements in a standardized speech comprehension test and to (3) explore underlying mechanisms by analyzing the relationship between speech comprehension difficulties and peripheral hearing function (pure tone audiogram), as well as with co-morbid hyperacusis as a central auditory processing disorder. Subjects and Methods: Speech comprehension was assessed in 361 tinnitus patients presenting between 07/2012 and 08/2014 at the Interdisciplinary Tinnitus Clinic at the University of Regensburg. The assessment included standard audiological assessments (pure tone audiometry, tinnitus pitch, and loudness matching), the Goettingen sentence test (in quiet) for speech audiometric evaluation, two questions about hyperacusis, and two questions about speech comprehension in quiet and noisy environments ("How would you rate your ability to understand speech?"; "How would you rate your ability to follow a conversation when multiple people are speaking simultaneously?"). Results: Subjectively-reported speech comprehension deficits are frequent among tinnitus patients, especially in noisy environments (cocktail party situation). 74.2% of all investigated patients showed disturbed speech comprehension (indicated by values above 21.5 dB SPL in the Goettingen sentence test). Subjective speech comprehension complaints (both for general and in noisy environment) were correlated with hearing level and with audiologically-assessed speech comprehension ability. In contrast, co-morbid hyperacusis was only correlated with speech comprehension difficulties in noisy environments, but not with speech comprehension difficulties in general. Conclusion: Speech comprehension deficits are frequent among tinnitus patients. Whereas speech comprehension deficits in quiet environments are primarily due to peripheral hearing loss, speech comprehension deficits in noisy environments are related to both peripheral hearing loss and dysfunctional central auditory processing. Disturbed speech comprehension in noisy environments might be modulated by a central inhibitory deficit. In addition, attentional and cognitive aspects may play a role.
Binaural noise reduction via cue-preserving MMSE filter and adaptive-blocking-based noise PSD estimation

NASA Astrophysics Data System (ADS)

Azarpour, Masoumeh; Enzner, Gerald

2017-12-01

Binaural noise reduction, with applications for instance in hearing aids, has been a very significant challenge. This task relates to the optimal utilization of the available microphone signals for the estimation of the ambient noise characteristics and for the optimal filtering algorithm to separate the desired speech from the noise. The additional requirements of low computational complexity and low latency further complicate the design. A particular challenge results from the desired reconstruction of binaural speech input with spatial cue preservation. The latter essentially diminishes the utility of multiple-input/single-output filter-and-sum techniques such as beamforming. In this paper, we propose a comprehensive and effective signal processing configuration with which most of the aforementioned criteria can be met suitably. This relates especially to the requirement of efficient online adaptive processing for noise estimation and optimal filtering while preserving the binaural cues. Regarding noise estimation, we consider three different architectures: interaural (ITF), cross-relation (CR), and principal-component (PCA) target blocking. An objective comparison with two other noise PSD estimation algorithms demonstrates the superiority of the blocking-based noise estimators, especially the CR-based and ITF-based blocking architectures. Moreover, we present a new noise reduction filter based on minimum mean-square error (MMSE), which belongs to the class of common gain filters, hence being rigorous in terms of spatial cue preservation but also efficient and competitive for the acoustic noise reduction task. A formal real-time subjective listening test procedure is also developed in this paper. The proposed listening test enables a real-time assessment of the proposed computationally efficient noise reduction algorithms in a realistic acoustic environment, e.g., considering time-varying room impulse responses and the Lombard effect. The listening test outcome reveals that the signals processed by the blocking-based algorithms are significantly preferred over the noisy signal in terms of instantaneous noise attenuation. Furthermore, the listening test data analysis confirms the conclusions drawn based on the objective evaluation.
Neural dynamics of speech act comprehension: an MEG study of naming and requesting.

PubMed

Egorova, Natalia; Pulvermüller, Friedemann; Shtyrov, Yury

2014-05-01

The neurobiological basis and temporal dynamics of communicative language processing pose important yet unresolved questions. It has previously been suggested that comprehension of the communicative function of an utterance, i.e. the so-called speech act, is supported by an ensemble of neural networks, comprising lexico-semantic, action and mirror neuron as well as theory of mind circuits, all activated in concert. It has also been demonstrated that recognition of the speech act type occurs extremely rapidly. These findings however, were obtained in experiments with insufficient spatio-temporal resolution, thus possibly concealing important facets of the neural dynamics of the speech act comprehension process. Here, we used magnetoencephalography to investigate the comprehension of Naming and Request actions performed with utterances controlled for physical features, psycholinguistic properties and the probability of occurrence in variable contexts. The results show that different communicative actions are underpinned by a dynamic neural network, which differentiates between speech act types very early after the speech act onset. Within 50-90 ms, Requests engaged mirror-neuron action-comprehension systems in sensorimotor cortex, possibly for processing action knowledge and intentions. Still, within the first 200 ms of stimulus onset (100-150 ms), Naming activated brain areas involved in referential semantic retrieval. Subsequently (200-300 ms), theory of mind and mentalising circuits were activated in medial prefrontal and temporo-parietal areas, possibly indexing processing of intentions and assumptions of both communication partners. This cascade of stages of processing information about actions and intentions, referential semantics, and theory of mind may underlie dynamic and interactive speech act comprehension.
Reference-free automatic quality assessment of tracheoesophageal speech.

PubMed

Huang, Andy; Falk, Tiago H; Chan, Wai-Yip; Parsa, Vijay; Doyle, Philip

2009-01-01

Evaluation of the quality of tracheoesophageal (TE) speech using machines instead of human experts can enhance the voice rehabilitation process for patients who have undergone total laryngectomy and voice restoration. Towards the goal of devising a reference-free TE speech quality estimation algorithm, we investigate the efficacy of speech signal features that are used in standard telephone-speech quality assessment algorithms, in conjunction with a recently introduced speech modulation spectrum measure. Tests performed on two TE speech databases demonstrate that the modulation spectral measure and a subset of features in the standard ITU-T P.563 algorithm estimate TE speech quality with better correlation (up to 0.9) than previously proposed features.
Cross-modal interactions during perception of audiovisual speech and nonspeech signals: an fMRI study.

PubMed

Hertrich, Ingo; Dietrich, Susanne; Ackermann, Hermann

2011-01-01

During speech communication, visual information may interact with the auditory system at various processing stages. Most noteworthy, recent magnetoencephalography (MEG) data provided first evidence for early and preattentive phonetic/phonological encoding of the visual data stream--prior to its fusion with auditory phonological features [Hertrich, I., Mathiak, K., Lutzenberger, W., & Ackermann, H. Time course of early audiovisual interactions during speech and non-speech central-auditory processing: An MEG study. Journal of Cognitive Neuroscience, 21, 259-274, 2009]. Using functional magnetic resonance imaging, the present follow-up study aims to further elucidate the topographic distribution of visual-phonological operations and audiovisual (AV) interactions during speech perception. Ambiguous acoustic syllables--disambiguated to /pa/ or /ta/ by the visual channel (speaking face)--served as test materials, concomitant with various control conditions (nonspeech AV signals, visual-only and acoustic-only speech, and nonspeech stimuli). (i) Visual speech yielded an AV-subadditive activation of primary auditory cortex and the anterior superior temporal gyrus (STG), whereas the posterior STG responded both to speech and nonspeech motion. (ii) The inferior frontal and the fusiform gyrus of the right hemisphere showed a strong phonetic/phonological impact (differential effects of visual /pa/ vs. /ta/) upon hemodynamic activation during presentation of speaking faces. Taken together with the previous MEG data, these results point at a dual-pathway model of visual speech information processing: On the one hand, access to the auditory system via the anterior supratemporal “what" path may give rise to direct activation of "auditory objects." On the other hand, visual speech information seems to be represented in a right-hemisphere visual working memory, providing a potential basis for later interactions with auditory information such as the McGurk effect.

Statistical Learning, Syllable Processing, and Speech Production in Healthy Hearing and Hearing-Impaired Preschool Children: A Mismatch Negativity Study.

PubMed

Studer-Eichenberger, Esther; Studer-Eichenberger, Felix; Koenig, Thomas

2016-01-01

The objectives of the present study were to investigate temporal/spectral sound-feature processing in preschool children (4 to 7 years old) with peripheral hearing loss compared with age-matched controls. The results verified the presence of statistical learning, which was diminished in children with hearing impairments (HIs), and elucidated possible perceptual mediators of speech production. Perception and production of the syllables /ba/, /da/, /ta/, and /na/ were recorded in 13 children with normal hearing and 13 children with HI. Perception was assessed physiologically through event-related potentials (ERPs) recorded by EEG in a multifeature mismatch negativity paradigm and behaviorally through a discrimination task. Temporal and spectral features of the ERPs during speech perception were analyzed, and speech production was quantitatively evaluated using speech motor maximum performance tasks. Proximal to stimulus onset, children with HI displayed a difference in map topography, indicating diminished statistical learning. In later ERP components, children with HI exhibited reduced amplitudes in the N2 and early parts of the late disciminative negativity components specifically, which are associated with temporal and spectral control mechanisms. Abnormalities of speech perception were only subtly reflected in speech production, as the lone difference found in speech production studies was a mild delay in regulating speech intensity. In addition to previously reported deficits of sound-feature discriminations, the present study results reflect diminished statistical learning in children with HI, which plays an early and important, but so far neglected, role in phonological processing. Furthermore, the lack of corresponding behavioral abnormalities in speech production implies that impaired perceptual capacities do not necessarily translate into productive deficits.
Investigation of potential cognitive tests for use with older adults in audiology clinics.

PubMed

Vaughan, Nancy; Storzbach, Daniel; Furukawa, Izumi

2008-01-01

Cognitive declines in working memory and processing speed are hallmarks of aging. Deficits in speech understanding also are seen in aging individuals. A clinical test to determine whether the cognitive aging changes contribute to aging speech understanding difficulties would be helpful for determining rehabilitation strategies in audiology clinics. To identify a clinical neurocognitive test or battery of tests that could be used in audiology clinics to help explain deficits in speech recognition in some older listeners. A correlational study examining the association between certain cognitive test scores and speech recognition performance. Speeded (time-compressed) speech was used to increase the cognitive processing load. Two hundred twenty-five adults aged 50 through 75 years were participants in this study. Both batteries of tests were administered to all participants in two separate sessions. A selected battery of neurocognitive tests and a time-compressed speech recognition test battery using various rates of speech were administered. Principal component analysis was used to extract the important component factors from each set of tests, and regression models were constructed to examine the association between tests and to identify the neurocognitive test most strongly associated with speech recognition performance. A sequencing working memory test (Letter-Number Sequencing [LNS]) was most strongly associated with rapid speech understanding. The association between the LNS test results and the compressed sentence recognition scores (CSRS) was strong even when age and hearing loss were controlled. The LNS is a sequencing test that provides information about temporal processing at the cognitive level and may prove useful in diagnosis of speech understanding problems, and in the development of aural rehabilitation and training strategies.
Speech motor development: Integrating muscles, movements, and linguistic units.

PubMed

Smith, Anne

2006-01-01

A fundamental problem for those interested in human communication is to determine how ideas and the various units of language structure are communicated through speaking. The physiological concepts involved in the control of muscle contraction and movement are theoretically distant from the processing levels and units postulated to exist in language production models. A review of the literature on adult speakers suggests that they engage complex, parallel processes involving many units, including sentence, phrase, syllable, and phoneme levels. Infants must develop multilayered interactions among language and motor systems. This discussion describes recent studies of speech motor performance relative to varying linguistic goals during the childhood, teenage, and young adult years. Studies of the developing interactions between speech motor and language systems reveal both qualitative and quantitative differences between the developing and the mature systems. These studies provide an experimental basis for a more comprehensive theoretical account of how mappings between units of language and units of action are formed and how they function. Readers will be able to: (1) understand the theoretical differences between models of speech motor control and models of language processing, as well as the nature of the concepts used in the two different kinds of models, (2) explain the concept of coarticulation and state why this phenomenon has confounded attempts to determine the role of linguistic units, such as syllables and phonemes, in speech production, (3) describe the development of speech motor performance skills and specify quantitative and qualitative differences between speech motor performance in children and adults, and (4) describe experimental methods that allow scientists to study speech and limb motor control, as well as compare units of action used to study non-speech and speech movements.
Automated Depression Analysis Using Convolutional Neural Networks from Speech.

PubMed

He, Lang; Cao, Cui

2018-05-28

To help clinicians to efficiently diagnose the severity of a person's depression, the affective computing community and the artificial intelligence field have shown a growing interest in designing automated systems. The speech features have useful information for the diagnosis of depression. However, manually designing and domain knowledge are still important for the selection of the feature, which makes the process labor consuming and subjective. In recent years, deep-learned features based on neural networks have shown superior performance to hand-crafted features in various areas. In this paper, to overcome the difficulties mentioned above, we propose a combination of hand-crafted and deep-learned features which can effectively measure the severity of depression from speech. In the proposed method, Deep Convolutional Neural Networks (DCNN) are firstly built to learn deep-learned features from spectrograms and raw speech waveforms. Then we manually extract the state-of-the-art texture descriptors named median robust extended local binary patterns (MRELBP) from spectrograms. To capture the complementary information within the hand-crafted features and deep-learned features, we propose joint fine-tuning layers to combine the raw and spectrogram DCNN to boost the depression recognition performance. Moreover, to address the problems with small samples, a data augmentation method was proposed. Experiments conducted on AVEC2013 and AVEC2014 depression databases show that our approach is robust and effective for the diagnosis of depression when compared to state-of-the-art audio-based methods. Copyright © 2018. Published by Elsevier Inc.
Speech Act Theory and Business Communication Conventions.

ERIC Educational Resources Information Center

Ewald, Helen Rothschild; Stine, Donna

1983-01-01

Applies speech act theory to business writing to determine why certain letters and memos succeed while others fail. Specifically, shows how speech act theorist H. P. Grice's rules or maxims illuminate the writing process in business communication. (PD)
Functional topography of the cerebellum in verbal working memory.

PubMed

Marvel, Cherie L; Desmond, John E

2010-09-01

Speech-both overt and covert-facilitates working memory by creating and refreshing motor memory traces, allowing new information to be received and processed. Neuroimaging studies suggest a functional topography within the sub-regions of the cerebellum that subserve verbal working memory. Medial regions of the anterior cerebellum support overt speech, consistent with other forms of motor execution such as finger tapping, whereas lateral portions of the superior cerebellum support speech planning and preparation (e.g., covert speech). The inferior cerebellum is active when information is maintained across a delay, but activation appears to be independent of speech, lateralized by modality of stimulus presentation, and possibly related to phonological storage processes. Motor (dorsal) and cognitive (ventral) channels of cerebellar output nuclei can be distinguished in working memory. Clinical investigations suggest that hyper-activity of cerebellum and disrupted control of inner speech may contribute to certain psychiatric symptoms.
[A modified speech enhancement algorithm for electronic cochlear implant and its digital signal processing realization].

PubMed

Wang, Yulin; Tian, Xuelong

2014-08-01

In order to improve the speech quality and auditory perceptiveness of electronic cochlear implant under strong noise background, a speech enhancement system used for electronic cochlear implant front-end was constructed. Taking digital signal processing (DSP) as the core, the system combines its multi-channel buffered serial port (McBSP) data transmission channel with extended audio interface chip TLV320AIC10, so speech signal acquisition and output with high speed are realized. Meanwhile, due to the traditional speech enhancement method which has the problems as bad adaptability, slow convergence speed and big steady-state error, versiera function and de-correlation principle were used to improve the existing adaptive filtering algorithm, which effectively enhanced the quality of voice communications. Test results verified the stability of the system and the de-noising performance of the algorithm, and it also proved that they could provide clearer speech signals for the deaf or tinnitus patients.
Use of amplitude modulation cues recovered from frequency modulation for cochlear implant users when original speech cues are severely degraded.

PubMed

Won, Jong Ho; Shim, Hyun Joon; Lorenzi, Christian; Rubinstein, Jay T

2014-06-01

Won et al. (J Acoust Soc Am 132:1113-1119, 2012) reported that cochlear implant (CI) speech processors generate amplitude-modulation (AM) cues recovered from broadband speech frequency modulation (FM) and that CI users can use these cues for speech identification in quiet. The present study was designed to extend this finding for a wide range of listening conditions, where the original speech cues were severely degraded by manipulating either the acoustic signals or the speech processor. The manipulation of the acoustic signals included the presentation of background noise, simulation of reverberation, and amplitude compression. The manipulation of the speech processor included changing the input dynamic range and the number of channels. For each of these conditions, multiple levels of speech degradation were tested. Speech identification was measured for CI users and compared for stimuli having both AM and FM information (intact condition) or FM information only (FM condition). Each manipulation degraded speech identification performance for both intact and FM conditions. Performance for the intact and FM conditions became similar for stimuli having the most severe degradations. Identification performance generally overlapped for the intact and FM conditions. Moreover, identification performance for the FM condition was better than chance performance even at the maximum level of distortion. Finally, significant correlations were found between speech identification scores for the intact and FM conditions. Altogether, these results suggest that despite poor frequency selectivity, CI users can make efficient use of AM cues recovered from speech FM in difficult listening situations.
Speech Research: A Report on the Status and Progress of Studies on the Nature of Speech, Instrumentation for Its Investigation, and Practical Applications.

ERIC Educational Resources Information Center

Haskins Labs., New Haven, CT.

This collection on speech research presents a number of reports of experiments conducted on neurological, physiological, and phonological questions, using electronic equipment for analysis. The neurological experiments cover auditory and phonetic processes in speech perception, auditory storage, ear asymmetry in dichotic listening, auditory…
Evidence for a Specific Cross-Modal Association Deficit in Dyslexia: An Electrophysiological Study of Letter-Speech Sound Processing

ERIC Educational Resources Information Center

Froyen, Dries; Willems, Gonny; Blomert, Leo

2011-01-01

The phonological deficit theory of dyslexia assumes that degraded speech sound representations might hamper the acquisition of stable letter-speech sound associations necessary for learning to read. However, there is only scarce and mainly indirect evidence for this assumed letter-speech sound association problem. The present study aimed at…
Ingressive Speech Errors: A Service Evaluation of Speech-Sound Therapy in a Child Aged 4;6

ERIC Educational Resources Information Center

Hrastelj, Laura; Knight, Rachael-Anne

2017-01-01

Background: A pattern of ingressive substitutions for word-final sibilants can be identified in a small number of cases in child speech disorder, with growing evidence suggesting it is a phonological difficulty, despite the unusual surface form. Phonological difficulty implies a problem with the cognitive process of organizing speech into sound…
The Role of Categorical Speech Perception and Phonological Processing in Familial Risk Children with and without Dyslexia

ERIC Educational Resources Information Center

Hakvoort, Britt; de Bree, Elise; van der Leij, Aryan; Maassen, Ben; van Setten, Ellie; Maurits, Natasha; van Zuijen, Titia L.

2016-01-01

Purpose: This study assessed whether a categorical speech perception (CP) deficit is associated with dyslexia or familial risk for dyslexia, by exploring a possible cascading relation from speech perception to phonology to reading and by identifying whether speech perception distinguishes familial risk (FR) children with dyslexia (FRD) from those…
Application of Business Process Management to drive the deployment of a speech recognition system in a healthcare organization.

PubMed

González Sánchez, María José; Framiñán Torres, José Manuel; Parra Calderón, Carlos Luis; Del Río Ortega, Juan Antonio; Vigil Martín, Eduardo; Nieto Cervera, Jaime

2008-01-01

We present a methodology based on Business Process Management to guide the development of a speech recognition system in a hospital in Spain. The methodology eases the deployment of the system by 1) involving the clinical staff in the process, 2) providing the IT professionals with a description of the process and its requirements, 3) assessing advantages and disadvantages of the speech recognition system, as well as its impact in the organisation, and 4) help reorganising the healthcare process before implementing the new technology in order to identify how it can better contribute to the overall objective of the organisation.
Speech versus Song: Multiple Pitch-Sensitive Areas Revealed by a Naturally Occurring Musical Illusion

PubMed Central

Dick, Fred; Deutsch, Diana; Sereno, Marty

2013-01-01

It is normally obvious to listeners whether a human vocalization is intended to be heard as speech or song. However, the 2 signals are remarkably similar acoustically. A naturally occurring boundary case between speech and song has been discovered where a spoken phrase sounds as if it were sung when isolated and repeated. In the present study, an extensive search of audiobooks uncovered additional similar examples, which were contrasted with samples from the same corpus that do not sound like song, despite containing clear prosodic pitch contours. Using functional magnetic resonance imaging, we show that hearing these 2 closely matched stimuli is not associated with differences in response of early auditory areas. Rather, we find that a network of 8 regions, including the anterior superior temporal gyrus (STG) just anterior to Heschl's gyrus and the right midposterior STG, respond more strongly to speech perceived as song than to mere speech. This network overlaps a number of areas previously associated with pitch extraction and song production, confirming that phrases originally intended to be heard as speech can, under certain circumstances, be heard as song. Our results suggest that song processing compared with speech processing makes increased demands on pitch processing and auditory–motor integration. PMID:22314043
A music perception disorder (congenital amusia) influences speech comprehension.

PubMed

Liu, Fang; Jiang, Cunmei; Wang, Bei; Xu, Yi; Patel, Aniruddh D

2015-01-01

This study investigated the underlying link between speech and music by examining whether and to what extent congenital amusia, a musical disorder characterized by degraded pitch processing, would impact spoken sentence comprehension for speakers of Mandarin, a tone language. Sixteen Mandarin-speaking amusics and 16 matched controls were tested on the intelligibility of news-like Mandarin sentences with natural and flat fundamental frequency (F0) contours (created via speech resynthesis) under four signal-to-noise (SNR) conditions (no noise, +5, 0, and -5dB SNR). While speech intelligibility in quiet and extremely noisy conditions (SNR=-5dB) was not significantly compromised by flattened F0, both amusic and control groups achieved better performance with natural-F0 sentences than flat-F0 sentences under moderately noisy conditions (SNR=+5 and 0dB). Relative to normal listeners, amusics demonstrated reduced speech intelligibility in both quiet and noise, regardless of whether the F0 contours of the sentences were natural or flattened. This deficit in speech intelligibility was not associated with impaired pitch perception in amusia. These findings provide evidence for impaired speech comprehension in congenital amusia, suggesting that the deficit of amusics extends beyond pitch processing and includes segmental processing. Copyright © 2014 Elsevier Ltd. All rights reserved.
Deficits in Sequential Processing Manifest in Motor and Linguistic Tasks in a Multigenerational Family with Childhood Apraxia of Speech

ERIC Educational Resources Information Center

Peter, Beate; Button, Le; Stoel-Gammon, Carol; Chapman, Kathy; Raskind, Wendy H.

2013-01-01

The purpose of this study was to evaluate a global deficit in sequential processing as candidate endophenotypein a family with familial childhood apraxia of speech (CAS). Of 10 adults and 13 children in a three-generational family with speech sound disorder (SSD) consistent with CAS, 3 adults and 6 children had past or present SSD diagnoses. Two…
Some Behavioral and Neurobiological Constraints on Theories of Audiovisual Speech Integration: A Review and Suggestions for New Directions

PubMed Central

Altieri, Nicholas; Pisoni, David B.; Townsend, James T.

2012-01-01

Summerfield (1987) proposed several accounts of audiovisual speech perception, a field of research that has burgeoned in recent years. The proposed accounts included the integration of discrete phonetic features, vectors describing the values of independent acoustical and optical parameters, the filter function of the vocal tract, and articulatory dynamics of the vocal tract. The latter two accounts assume that the representations of audiovisual speech perception are based on abstract gestures, while the former two assume that the representations consist of symbolic or featural information obtained from visual and auditory modalities. Recent converging evidence from several different disciplines reveals that the general framework of Summerfield’s feature-based theories should be expanded. An updated framework building upon the feature-based theories is presented. We propose a processing model arguing that auditory and visual brain circuits provide facilitatory information when the inputs are correctly timed, and that auditory and visual speech representations do not necessarily undergo translation into a common code during information processing. Future research on multisensory processing in speech perception should investigate the connections between auditory and visual brain regions, and utilize dynamic modeling tools to further understand the timing and information processing mechanisms involved in audiovisual speech integration. PMID:21968081
Spatiotemporal imaging of cortical activation during verb generation and picture naming.

PubMed

Edwards, Erik; Nagarajan, Srikantan S; Dalal, Sarang S; Canolty, Ryan T; Kirsch, Heidi E; Barbaro, Nicholas M; Knight, Robert T

2010-03-01

One hundred and fifty years of neurolinguistic research has identified the key structures in the human brain that support language. However, neither the classic neuropsychological approaches introduced by Broca (1861) and Wernicke (1874), nor modern neuroimaging employing PET and fMRI has been able to delineate the temporal flow of language processing in the human brain. We recorded the electrocorticogram (ECoG) from indwelling electrodes over left hemisphere language cortices during two common language tasks, verb generation and picture naming. We observed that the very high frequencies of the ECoG (high-gamma, 70-160 Hz) track language processing with spatial and temporal precision. Serial progression of activations is seen at a larger timescale, showing distinct stages of perception, semantic association/selection, and speech production. Within the areas supporting each of these larger processing stages, parallel (or "incremental") processing is observed. In addition to the traditional posterior vs. anterior localization for speech perception vs. production, we provide novel evidence for the role of premotor cortex in speech perception and of Wernicke's and surrounding cortex in speech production. The data are discussed with regards to current leading models of speech perception and production, and a "dual ventral stream" hybrid of leading speech perception models is given. Copyright (c) 2009 Elsevier Inc. All rights reserved.
Some behavioral and neurobiological constraints on theories of audiovisual speech integration: a review and suggestions for new directions.

PubMed

Altieri, Nicholas; Pisoni, David B; Townsend, James T

2011-01-01

Summerfield (1987) proposed several accounts of audiovisual speech perception, a field of research that has burgeoned in recent years. The proposed accounts included the integration of discrete phonetic features, vectors describing the values of independent acoustical and optical parameters, the filter function of the vocal tract, and articulatory dynamics of the vocal tract. The latter two accounts assume that the representations of audiovisual speech perception are based on abstract gestures, while the former two assume that the representations consist of symbolic or featural information obtained from visual and auditory modalities. Recent converging evidence from several different disciplines reveals that the general framework of Summerfield's feature-based theories should be expanded. An updated framework building upon the feature-based theories is presented. We propose a processing model arguing that auditory and visual brain circuits provide facilitatory information when the inputs are correctly timed, and that auditory and visual speech representations do not necessarily undergo translation into a common code during information processing. Future research on multisensory processing in speech perception should investigate the connections between auditory and visual brain regions, and utilize dynamic modeling tools to further understand the timing and information processing mechanisms involved in audiovisual speech integration.
Non-Attendance and Utilization of a Speech and Language Therapy Service: A Retrospective Pilot Study of School-Aged Referrals

ERIC Educational Resources Information Center

Curran, Amy; Flynn, Catherine; Antonijevic-Elliott, Stanislava; Lyons, Rena

2015-01-01

Background: Non-attendance and inappropriate referrals affect the effective and efficient running of healthcare services. Non-engagement with speech and language therapy (SLT) services may lead to negative long-term consequences for children in need of SLT intervention. Currently there is a dearth of research on non-attendance and non-engagement…

Meeting the Needs of Children and Young People with Speech, Language and Communication Difficulties

ERIC Educational Resources Information Center

Lindsay, Geoff; Dockrell, Julie; Desforges, Martin; Law, James; Peacey, Nick

2010-01-01

Background: The UK government set up a review of provision for children and young people with the full range of speech, language and communication needs led by a Member of Parliament, John Bercow. A research study was commissioned to provide empirical evidence to inform the Bercow Review. Aims: To examine the efficiency and effectiveness of…
Cortical Responses to Chinese Phonemes in Preschoolers Predict Their Literacy Skills at School Age.

PubMed

Hong, Tian; Shuai, Lan; Frost, Stephen J; Landi, Nicole; Pugh, Kenneth R; Shu, Hua

2018-01-01

We investigated whether preschoolers with poor phonological awareness (PA) skills had impaired cortical basis for detecting speech feature, and whether speech perception influences future literacy outcomes in preschoolers. We recorded ERP responses to speech in 52 Chinese preschoolers. The results showed that the poor PA group processed speech changes differentially compared to control group in mismatch negativity (MMN) and late discriminative negativity (LDN). Furthermore, speech perception in kindergarten could predict literacy outcomes after literacy acquisition. These suggest that impairment in detecting speech features occurs before formal reading instruction, and that speech perception plays an important role in reading development.
Relationship between listeners' nonnative speech recognition and categorization abilities

PubMed Central

Atagi, Eriko; Bent, Tessa

2015-01-01

Enhancement of the perceptual encoding of talker characteristics (indexical information) in speech can facilitate listeners' recognition of linguistic content. The present study explored this indexical-linguistic relationship in nonnative speech processing by examining listeners' performance on two tasks: nonnative accent categorization and nonnative speech-in-noise recognition. Results indicated substantial variability across listeners in their performance on both the accent categorization and nonnative speech recognition tasks. Moreover, listeners' accent categorization performance correlated with their nonnative speech-in-noise recognition performance. These results suggest that having more robust indexical representations for nonnative accents may allow listeners to more accurately recognize the linguistic content of nonnative speech. PMID:25618098
Postcategorical auditory distraction in short-term memory: Insights from increased task load and task type.

PubMed

Marsh, John E; Yang, Jingqi; Qualter, Pamela; Richardson, Cassandra; Perham, Nick; Vachon, François; Hughes, Robert W

2018-06-01

Task-irrelevant speech impairs short-term serial recall appreciably. On the interference-by-process account, the processing of physical (i.e., precategorical) changes in speech yields order cues that conflict with the serial-ordering process deployed to perform the serial recall task. In this view, the postcategorical properties (e.g., phonology, meaning) of speech play no role. The present study reassessed the implications of recent demonstrations of auditory postcategorical distraction in serial recall that have been taken as support for an alternative, attentional-diversion, account of the irrelevant speech effect. Focusing on the disruptive effect of emotionally valent compared with neutral words on serial recall, we show that the distracter-valence effect is eliminated under conditions-high task-encoding load-thought to shield against attentional diversion whereas the general effect of speech (neutral words compared with quiet) remains unaffected (Experiment 1). Furthermore, the distracter-valence effect generalizes to a task that does not require the processing of serial order-the missing-item task-whereas the effect of speech per se is attenuated in this task (Experiment 2). We conclude that postcategorical auditory distraction phenomena in serial short-term memory (STM) are incidental: they are observable in such a setting but, unlike the acoustically driven irrelevant speech effect, are not integral to it. As such, the findings support a duplex-mechanism account over a unitary view of auditory distraction. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Exploring the Roles of Spectral Detail and Intonation Contour in Speech Intelligibility: An fMRI Study

PubMed Central

Kyong, Jeong S.; Scott, Sophie K.; Rosen, Stuart; Howe, Timothy B.; Agnew, Zarinah K.; McGettigan, Carolyn

2014-01-01

The melodic contour of speech forms an important perceptual aspect of tonal and nontonal languages and an important limiting factor on the intelligibility of speech heard through a cochlear implant. Previous work exploring the neural correlates of speech comprehension identified a left-dominant pathway in the temporal lobes supporting the extraction of an intelligible linguistic message, whereas the right anterior temporal lobe showed an overall preference for signals clearly conveying dynamic pitch information. The current study combined modulations of overall intelligibility (through vocoding and spectral inversion) with a manipulation of pitch contour (normal vs. falling) to investigate the processing of spoken sentences in functional MRI. Our overall findings replicate and extend those of Scott et al., whereas greater sentence intelligibility was predominately associated with increased activity in the left STS, the greatest response to normal sentence melody was found right superior temporal gyrus. These data suggest a spatial distinction between brain areas associated with intelligibility and those involved in the processing of dynamic pitch information in speech. By including a set of complexity-matched unintelligible conditions created by spectral inversion, this is additionally the first study reporting a fully factorial exploration of spectrotemporal complexity and spectral inversion as they relate to the neural processing of speech intelligibility. Perhaps surprisingly, there was no evidence for an interaction between the two factors—we discuss the implications for the processing of sound and speech in the dorsolateral temporal lobes. PMID:24568205
Processing Mechanisms in Hearing-Impaired Listeners: Evidence from Reaction Times and Sentence Interpretation.

PubMed

Carroll, Rebecca; Uslar, Verena; Brand, Thomas; Ruigendijk, Esther

The authors aimed to determine whether hearing impairment affects sentence comprehension beyond phoneme or word recognition (i.e., on the sentence level), and to distinguish grammatically induced processing difficulties in structurally complex sentences from perceptual difficulties associated with listening to degraded speech. Effects of hearing impairment or speech in noise were expected to reflect hearer-specific speech recognition difficulties. Any additional processing time caused by the sustained perceptual challenges across the sentence may either be independent of or interact with top-down processing mechanisms associated with grammatical sentence structure. Forty-nine participants listened to canonical subject-initial or noncanonical object-initial sentences that were presented either in quiet or in noise. Twenty-four participants had mild-to-moderate hearing impairment and received hearing-loss-specific amplification. Twenty-five participants were age-matched peers with normal hearing status. Reaction times were measured on-line at syntactically critical processing points as well as two control points to capture differences in processing mechanisms. An off-line comprehension task served as an additional indicator of sentence (mis)interpretation, and enforced syntactic processing. The authors found general effects of hearing impairment and speech in noise that negatively affected perceptual processing, and an effect of word order, where complex grammar locally caused processing difficulties for the noncanonical sentence structure. Listeners with hearing impairment were hardly affected by noise at the beginning of the sentence, but were affected markedly toward the end of the sentence, indicating a sustained perceptual effect of speech recognition. Comprehension of sentences with noncanonical word order was negatively affected by degraded signals even after sentence presentation. Hearing impairment adds perceptual processing load during sentence processing, but affects grammatical processing beyond the word level to the same degree as in normal hearing, with minor differences in processing mechanisms. The data contribute to our understanding of individual differences in speech perception and language understanding. The authors interpret their results within the ease of language understanding model.
Bee Dances, Bird Songs, Monkey Calls, and Cetacean Sonar: Is Speech Unique?

ERIC Educational Resources Information Center

Liska, Jo

1993-01-01

Examines to what extent, and in what ways, speech is unusual and how it compares to other semiotic systems. Discusses language and speech, neurolinguistic processing, comparative vocal/auditory abilities, primate evolution, and semiogenesis. (SR)
Improved Neural Processing Efficiency in a Chronic Aphasia Patient Following Melodic Intonation Therapy: A Neuropsychological and Functional MRI Study

PubMed Central

Tabei, Ken-ichi; Satoh, Masayuki; Nakano, Chizuru; Ito, Ai; Shimoji, Yasuo; Kida, Hirotaka; Sakuma, Hajime; Tomimoto, Hidekazu

2016-01-01

Melodic intonation therapy (MIT) is a treatment program for the rehabilitation of aphasic patients with speech production disorders. We report a case of severe chronic non-fluent aphasia unresponsive to several years of conventional therapy that showed a marked improvement following intensive 9-day training on the Japanese version of MIT (MIT-J). The purpose of this study was to verify the efficacy of MIT-J by functional assessment and examine associated changes in neural processing by functional magnetic resonance imaging. MIT improved language output and auditory comprehension, and decreased the response time for picture naming. Following MIT-J, an area of the right hemisphere was less activated on correct naming trials than compared with before training but similarly activated on incorrect trials. These results suggest that the aphasic symptoms of our patient were improved by increased neural processing efficiency and a concomitant decrease in cognitive load. PMID:27698650
A hardware experimental platform for neural circuits in the auditory cortex

NASA Astrophysics Data System (ADS)

Rodellar-Biarge, Victoria; García-Dominguez, Pablo; Ruiz-Rizaldos, Yago; Gómez-Vilda, Pedro

2011-05-01

Speech processing in the human brain is a very complex process far from being fully understood although much progress has been done recently. Neuromorphic Speech Processing is a new research orientation in bio-inspired systems approach to find solutions to automatic treatment of specific problems (recognition, synthesis, segmentation, diarization, etc) which can not be adequately solved using classical algorithms. In this paper a neuromorphic speech processing architecture is presented. The systematic bottom-up synthesis of layered structures reproduce the dynamic feature detection of speech related to plausible neural circuits which work as interpretation centres located in the Auditory Cortex. The elementary model is based on Hebbian neuron-like units. For the computation of the architecture a flexible framework is proposed in the environment of Matlab®/Simulink®/HDL, which allows building models in different description styles, complexity and implementation levels. It provides a flexible platform for experimenting on the influence of the number of neurons and interconnections, in the precision of the results and in performance evaluation. The experimentation with different architecture configurations may help both in better understanding how neural circuits may work in the brain as well as in how speech processing can benefit from this understanding.
Amusia results in abnormal brain activity following inappropriate intonation during speech comprehension.

PubMed

Jiang, Cunmei; Hamm, Jeff P; Lim, Vanessa K; Kirk, Ian J; Chen, Xuhai; Yang, Yufang

2012-01-01

Pitch processing is a critical ability on which humans' tonal musical experience depends, and which is also of paramount importance for decoding prosody in speech. Congenital amusia refers to deficits in the ability to properly process musical pitch, and recent evidence has suggested that this musical pitch disorder may impact upon the processing of speech sounds. Here we present the first electrophysiological evidence demonstrating that individuals with amusia who speak Mandarin Chinese are impaired in classifying prosody as appropriate or inappropriate during a speech comprehension task. When presented with inappropriate prosody stimuli, control participants elicited a larger P600 and smaller N100 relative to the appropriate condition. In contrast, amusics did not show significant differences between the appropriate and inappropriate conditions in either the N100 or the P600 component. This provides further evidence that the pitch perception deficits associated with amusia may also affect intonation processing during speech comprehension in those who speak a tonal language such as Mandarin, and suggests music and language share some cognitive and neural resources.
Musical melody and speech intonation: singing a different tune.

PubMed

Zatorre, Robert J; Baum, Shari R

2012-01-01

Music and speech are often cited as characteristically human forms of communication. Both share the features of hierarchical structure, complex sound systems, and sensorimotor sequencing demands, and both are used to convey and influence emotions, among other functions [1]. Both music and speech also prominently use acoustical frequency modulations, perceived as variations in pitch, as part of their communicative repertoire. Given these similarities, and the fact that pitch perception and production involve the same peripheral transduction system (cochlea) and the same production mechanism (vocal tract), it might be natural to assume that pitch processing in speech and music would also depend on the same underlying cognitive and neural mechanisms. In this essay we argue that the processing of pitch information differs significantly for speech and music; specifically, we suggest that there are two pitch-related processing systems, one for more coarse-grained, approximate analysis and one for more fine-grained accurate representation, and that the latter is unique to music. More broadly, this dissociation offers clues about the interface between sensory and motor systems, and highlights the idea that multiple processing streams are a ubiquitous feature of neuro-cognitive architectures.
Amusia Results in Abnormal Brain Activity following Inappropriate Intonation during Speech Comprehension

PubMed Central

Jiang, Cunmei; Hamm, Jeff P.; Lim, Vanessa K.; Kirk, Ian J.; Chen, Xuhai; Yang, Yufang

2012-01-01

Pitch processing is a critical ability on which humans’ tonal musical experience depends, and which is also of paramount importance for decoding prosody in speech. Congenital amusia refers to deficits in the ability to properly process musical pitch, and recent evidence has suggested that this musical pitch disorder may impact upon the processing of speech sounds. Here we present the first electrophysiological evidence demonstrating that individuals with amusia who speak Mandarin Chinese are impaired in classifying prosody as appropriate or inappropriate during a speech comprehension task. When presented with inappropriate prosody stimuli, control participants elicited a larger P600 and smaller N100 relative to the appropriate condition. In contrast, amusics did not show significant differences between the appropriate and inappropriate conditions in either the N100 or the P600 component. This provides further evidence that the pitch perception deficits associated with amusia may also affect intonation processing during speech comprehension in those who speak a tonal language such as Mandarin, and suggests music and language share some cognitive and neural resources. PMID:22859982
Neurophysiological aspects of brainstem processing of speech stimuli in audiometric-normal geriatric population.

PubMed

Ansari, M S; Rangasayee, R; Ansari, M A H

2017-03-01

Poor auditory speech perception in geriatrics is attributable to neural de-synchronisation due to structural and degenerative changes of ageing auditory pathways. The speech-evoked auditory brainstem response may be useful for detecting alterations that cause loss of speech discrimination. Therefore, this study aimed to compare the speech-evoked auditory brainstem response in adult and geriatric populations with normal hearing. The auditory brainstem responses to click sounds and to a 40 ms speech sound (the Hindi phoneme |da|) were compared in 25 young adults and 25 geriatric people with normal hearing. The latencies and amplitudes of transient peaks representing neural responses to the onset, offset and sustained portions of the speech stimulus in quiet and noisy conditions were recorded. The older group had significantly smaller amplitudes and longer latencies for the onset and offset responses to |da| in noisy conditions. Stimulus-to-response times were longer and the spectral amplitude of the sustained portion of the stimulus was reduced. The overall stimulus level caused significant shifts in latency across the entire speech-evoked auditory brainstem response in the older group. The reduction in neural speech processing in older adults suggests diminished subcortical responsiveness to acoustically dynamic spectral cues. However, further investigations are needed to encode temporal cues at the brainstem level and determine their relationship to speech perception for developing a routine tool for clinical decision-making.
Neural networks supporting audiovisual integration for speech: A large-scale lesion study.

PubMed

Hickok, Gregory; Rogalsky, Corianne; Matchin, William; Basilakos, Alexandra; Cai, Julia; Pillay, Sara; Ferrill, Michelle; Mickelsen, Soren; Anderson, Steven W; Love, Tracy; Binder, Jeffrey; Fridriksson, Julius

2018-06-01

Auditory and visual speech information are often strongly integrated resulting in perceptual enhancements for audiovisual (AV) speech over audio alone and sometimes yielding compelling illusory fusion percepts when AV cues are mismatched, the McGurk-MacDonald effect. Previous research has identified three candidate regions thought to be critical for AV speech integration: the posterior superior temporal sulcus (STS), early auditory cortex, and the posterior inferior frontal gyrus. We assess the causal involvement of these regions (and others) in the first large-scale (N = 100) lesion-based study of AV speech integration. Two primary findings emerged. First, behavioral performance and lesion maps for AV enhancement and illusory fusion measures indicate that classic metrics of AV speech integration are not necessarily measuring the same process. Second, lesions involving superior temporal auditory, lateral occipital visual, and multisensory zones in the STS are the most disruptive to AV speech integration. Further, when AV speech integration fails, the nature of the failure-auditory vs visual capture-can be predicted from the location of the lesions. These findings show that AV speech processing is supported by unimodal auditory and visual cortices as well as multimodal regions such as the STS at their boundary. Motor related frontal regions do not appear to play a role in AV speech integration. Copyright © 2018 Elsevier Ltd. All rights reserved.
A Cognitive Neuroscience View of Voice-Processing Abnormalities in Schizophrenia: A Window into Auditory Verbal Hallucinations?

PubMed

Conde, Tatiana; Gonçalves, Oscar F; Pinheiro, Ana P

2016-01-01

Auditory verbal hallucinations (AVH) are a core symptom of schizophrenia. Like "real" voices, AVH carry a rich amount of linguistic and paralinguistic cues that convey not only speech, but also affect and identity, information. Disturbed processing of voice identity, affective, and speech information has been reported in patients with schizophrenia. More recent evidence has suggested a link between voice-processing abnormalities and specific clinical symptoms of schizophrenia, especially AVH. It is still not well understood, however, to what extent these dimensions are impaired and how abnormalities in these processes might contribute to AVH. In this review, we consider behavioral, neuroimaging, and electrophysiological data to investigate the speech, identity, and affective dimensions of voice processing in schizophrenia, and we discuss how abnormalities in these processes might help to elucidate the mechanisms underlying specific phenomenological features of AVH. Schizophrenia patients exhibit behavioral and neural disturbances in the three dimensions of voice processing. Evidence suggesting a role of dysfunctional voice processing in AVH seems to be stronger for the identity and speech dimensions than for the affective domain.
Speech perception in older adults: the importance of speech-specific cognitive abilities.

PubMed

Sommers, M S

1997-05-01

To provide a critical evaluation of studies examining the contribution of changes in language-specific cognitive abilities to the speech perception difficulties of older adults. A review of the literature on aging and speech perception. The research considered in the present review suggests that age-related changes in absolute sensitivity is the principal factor affecting older listeners' speech perception in quiet. However, under less favorable listening conditions, changes in a number of speech-specific cognitive abilities can also affect spoken language processing in older people. Clinically, these findings suggest that hearing aids, which have been the traditional treatment for improving speech perception in older adults, are likely to offer considerable benefit in quiet listening situations because the amplification they provide can serve to compensate for age-related hearing losses. However, such devices may be less beneficial in more natural environments, (e.g., noisy backgrounds, multiple talkers, reverberant rooms) because they are less effective for improving speech perception difficulties that result from age-related cognitive declines. It is suggested that an integrative approach to designing test batteries that can assess both sensory and cognitive abilities needed for processing spoken language offers the most promising approach for developing therapeutic interventions to improve speech perception in older adults.
Naturalistic FMRI mapping reveals superior temporal sulcus as the hub for the distributed brain network for social perception.

PubMed

Lahnakoski, Juha M; Glerean, Enrico; Salmi, Juha; Jääskeläinen, Iiro P; Sams, Mikko; Hari, Riitta; Nummenmaa, Lauri

2012-01-01

Despite the abundant data on brain networks processing static social signals, such as pictures of faces, the neural systems supporting social perception in naturalistic conditions are still poorly understood. Here we delineated brain networks subserving social perception under naturalistic conditions in 19 healthy humans who watched, during 3-T functional magnetic resonance imaging (fMRI), a set of 137 short (approximately 16 s each, total 27 min) audiovisual movie clips depicting pre-selected social signals. Two independent raters estimated how well each clip represented eight social features (faces, human bodies, biological motion, goal-oriented actions, emotion, social interaction, pain, and speech) and six filler features (places, objects, rigid motion, people not in social interaction, non-goal-oriented action, and non-human sounds) lacking social content. These ratings were used as predictors in the fMRI analysis. The posterior superior temporal sulcus (STS) responded to all social features but not to any non-social features, and the anterior STS responded to all social features except bodies and biological motion. We also found four partially segregated, extended networks for processing of specific social signals: (1) a fronto-temporal network responding to multiple social categories, (2) a fronto-parietal network preferentially activated to bodies, motion, and pain, (3) a temporo-amygdalar network responding to faces, social interaction, and speech, and (4) a fronto-insular network responding to pain, emotions, social interactions, and speech. Our results highlight the role of the pSTS in processing multiple aspects of social information, as well as the feasibility and efficiency of fMRI mapping under conditions that resemble the complexity of real life.
Swahili speech development: preliminary normative data from typically developing pre-school children in Tanzania.

PubMed

Gangji, Nazneen; Pascoe, Michelle; Smouse, Mantoa

2015-01-01

Swahili is widely spoken in East Africa, but to date there are no culturally and linguistically appropriate materials available for speech-language therapists working in the region. The challenges are further exacerbated by the limited research available on the typical acquisition of Swahili phonology. To describe the speech development of 24 typically developing first language Swahili-speaking children between the ages of 3;0 and 5;11 years in Dar es Salaam, Tanzania. A cross-sectional design was used with six groups of four children in 6-month age bands. Single-word speech samples were obtained from each child using a set of culturally appropriate pictures designed to elicit all consonants and vowels of Swahili. Each child's speech was audio-recorded and phonetically transcribed using International Phonetic Alphabet (IPA) conventions. Children's speech development is described in terms of (1) phonetic inventory, (2) syllable structure inventory, (3) phonological processes and (4) percentage consonants correct (PCC) and percentage vowels correct (PVC). Results suggest a gradual progression in the acquisition of speech sounds and syllables between the ages of 3;0 and 5;11 years. Vowel acquisition was completed and most of the consonants acquired by age 3;0. Fricatives/z, s, h/ were later acquired at 4 years and /θ/and /r/ were the last acquired consonants at age 5;11. Older children were able to produce speech sounds more accurately and had fewer phonological processes in their speech than younger children. Common phonological processes included lateralization and sound preference substitutions. The study contributes a preliminary set of normative data on speech development of Swahili-speaking children. Findings are discussed in relation to theories of phonological development, and may be used as a basis for further normative studies with larger numbers of children and ultimately the development of a contextually relevant assessment of the phonology of Swahili-speaking children. © 2014 Royal College of Speech and Language Therapists.
Listening to an Audio Drama Activates Two Processing Networks, One for All Sounds, Another Exclusively for Speech

PubMed Central

Boldt, Robert; Malinen, Sanna; Seppä, Mika; Tikka, Pia; Savolainen, Petri; Hari, Riitta; Carlson, Synnöve

2013-01-01

Earlier studies have shown considerable intersubject synchronization of brain activity when subjects watch the same movie or listen to the same story. Here we investigated the across-subjects similarity of brain responses to speech and non-speech sounds in a continuous audio drama designed for blind people. Thirteen healthy adults listened for ∼19 min to the audio drama while their brain activity was measured with 3 T functional magnetic resonance imaging (fMRI). An intersubject-correlation (ISC) map, computed across the whole experiment to assess the stimulus-driven extrinsic brain network, indicated statistically significant ISC in temporal, frontal and parietal cortices, cingulate cortex, and amygdala. Group-level independent component (IC) analysis was used to parcel out the brain signals into functionally coupled networks, and the dependence of the ICs on external stimuli was tested by comparing them with the ISC map. This procedure revealed four extrinsic ICs of which two–covering non-overlapping areas of the auditory cortex–were modulated by both speech and non-speech sounds. The two other extrinsic ICs, one left-hemisphere-lateralized and the other right-hemisphere-lateralized, were speech-related and comprised the superior and middle temporal gyri, temporal poles, and the left angular and inferior orbital gyri. In areas of low ISC four ICs that were defined intrinsic fluctuated similarly as the time-courses of either the speech-sound-related or all-sounds-related extrinsic ICs. These ICs included the superior temporal gyrus, the anterior insula, and the frontal, parietal and midline occipital cortices. Taken together, substantial intersubject synchronization of cortical activity was observed in subjects listening to an audio drama, with results suggesting that speech is processed in two separate networks, one dedicated to the processing of speech sounds and the other to both speech and non-speech sounds. PMID:23734202
Listening to an audio drama activates two processing networks, one for all sounds, another exclusively for speech.

PubMed

Boldt, Robert; Malinen, Sanna; Seppä, Mika; Tikka, Pia; Savolainen, Petri; Hari, Riitta; Carlson, Synnöve

2013-01-01

Earlier studies have shown considerable intersubject synchronization of brain activity when subjects watch the same movie or listen to the same story. Here we investigated the across-subjects similarity of brain responses to speech and non-speech sounds in a continuous audio drama designed for blind people. Thirteen healthy adults listened for ∼19 min to the audio drama while their brain activity was measured with 3 T functional magnetic resonance imaging (fMRI). An intersubject-correlation (ISC) map, computed across the whole experiment to assess the stimulus-driven extrinsic brain network, indicated statistically significant ISC in temporal, frontal and parietal cortices, cingulate cortex, and amygdala. Group-level independent component (IC) analysis was used to parcel out the brain signals into functionally coupled networks, and the dependence of the ICs on external stimuli was tested by comparing them with the ISC map. This procedure revealed four extrinsic ICs of which two-covering non-overlapping areas of the auditory cortex-were modulated by both speech and non-speech sounds. The two other extrinsic ICs, one left-hemisphere-lateralized and the other right-hemisphere-lateralized, were speech-related and comprised the superior and middle temporal gyri, temporal poles, and the left angular and inferior orbital gyri. In areas of low ISC four ICs that were defined intrinsic fluctuated similarly as the time-courses of either the speech-sound-related or all-sounds-related extrinsic ICs. These ICs included the superior temporal gyrus, the anterior insula, and the frontal, parietal and midline occipital cortices. Taken together, substantial intersubject synchronization of cortical activity was observed in subjects listening to an audio drama, with results suggesting that speech is processed in two separate networks, one dedicated to the processing of speech sounds and the other to both speech and non-speech sounds.

Neural Oscillations Carry Speech Rhythm through to Comprehension

PubMed Central

Peelle, Jonathan E.; Davis, Matthew H.

2012-01-01

A key feature of speech is the quasi-regular rhythmic information contained in its slow amplitude modulations. In this article we review the information conveyed by speech rhythm, and the role of ongoing brain oscillations in listeners’ processing of this content. Our starting point is the fact that speech is inherently temporal, and that rhythmic information conveyed by the amplitude envelope contains important markers for place and manner of articulation, segmental information, and speech rate. Behavioral studies demonstrate that amplitude envelope information is relied upon by listeners and plays a key role in speech intelligibility. Extending behavioral findings, data from neuroimaging – particularly electroencephalography (EEG) and magnetoencephalography (MEG) – point to phase locking by ongoing cortical oscillations to low-frequency information (~4–8 Hz) in the speech envelope. This phase modulation effectively encodes a prediction of when important events (such as stressed syllables) are likely to occur, and acts to increase sensitivity to these relevant acoustic cues. We suggest a framework through which such neural entrainment to speech rhythm can explain effects of speech rate on word and segment perception (i.e., that the perception of phonemes and words in connected speech is influenced by preceding speech rate). Neuroanatomically, acoustic amplitude modulations are processed largely bilaterally in auditory cortex, with intelligible speech resulting in differential recruitment of left-hemisphere regions. Notable among these is lateral anterior temporal cortex, which we propose functions in a domain-general fashion to support ongoing memory and integration of meaningful input. Together, the reviewed evidence suggests that low-frequency oscillations in the acoustic speech signal form the foundation of a rhythmic hierarchy supporting spoken language, mirrored by phase-locked oscillations in the human brain. PMID:22973251
"Say Just One Word at First": The Emergence of Reliable Speech in a Student Labeled with Autism.

ERIC Educational Resources Information Center

Broderick, Alicia A.; Kasa-Hendrickson, Christi

2001-01-01

A qualitative study documents the emergence, in the context of typed expression, of increasingly useful and reliable speech for an adolescent labeled with autism. The process of speech development is described and the three categories of supports that he and his family experienced that supported his emergent speech are discussed. (Contains…
Automatic Speech Recognition Predicts Speech Intelligibility and Comprehension for Listeners with Simulated Age-Related Hearing Loss

ERIC Educational Resources Information Center

Fontan, Lionel; Ferrané, Isabelle; Farinas, Jérôme; Pinquier, Julien; Tardieu, Julien; Magnen, Cynthia; Gaillard, Pascal; Aumont, Xavier; Füllgrabe, Christian

2017-01-01

Purpose: The purpose of this article is to assess speech processing for listeners with simulated age-related hearing loss (ARHL) and to investigate whether the observed performance can be replicated using an automatic speech recognition (ASR) system. The long-term goal of this research is to develop a system that will assist…
Auditory Speech Perception Tests in Relation to the Coding Strategy in Cochlear Implant.

PubMed

Bazon, Aline Cristine; Mantello, Erika Barioni; Gonçales, Alina Sanches; Isaac, Myriam de Lima; Hyppolito, Miguel Angelo; Reis, Ana Cláudia Mirândola Barbosa

2016-07-01

The objective of the evaluation of auditory perception of cochlear implant users is to determine how the acoustic signal is processed, leading to the recognition and understanding of sound. To investigate the differences in the process of auditory speech perception in individuals with postlingual hearing loss wearing a cochlear implant, using two different speech coding strategies, and to analyze speech perception and handicap perception in relation to the strategy used. This study is prospective cross-sectional cohort study of a descriptive character. We selected ten cochlear implant users that were characterized by hearing threshold by the application of speech perception tests and of the Hearing Handicap Inventory for Adults. There was no significant difference when comparing the variables subject age, age at acquisition of hearing loss, etiology, time of hearing deprivation, time of cochlear implant use and mean hearing threshold with the cochlear implant with the shift in speech coding strategy. There was no relationship between lack of handicap perception and improvement in speech perception in both speech coding strategies used. There was no significant difference between the strategies evaluated and no relation was observed between them and the variables studied.
Effects of speech and print feedback on spelling performance of a child with cerebral palsy using a speech generating device.

PubMed

Raghavendra, Parimala; Oaten, Rebecca

2007-09-01

The aim of the study was to investigate the effectiveness of three feedback conditions, using a speech-generating device, on spelling performance of Tom, an 11-year-old boy with cerebral palsy and complex communication needs. Tom was taught to spell 12 words under three feedback conditions. In the SPEECH condition, he received only speech feedback from the device and in the PRINT condition he received only the orthographic feedback on the display of the device. In the SPEECH-PRINT condition, Tom received both speech output and orthographic feedback. An adapted alternating treatment design was used to investigate the effects of the three-feedback conditions. To strengthen the reliability and increase the internal validity of the findings, an intrasubject direct replication was carried out using the same procedure, but teaching 12 different spelling words to Tom. Tom reached criterion with the PRINT feedback condition first, followed by SPEECH and SPEECH-PRINT conditions simultaneously for the first 12 words, and the same order for the second set of 12 words. Overall, the PRINT condition was most efficient for Tom. The results are discussed in terms of evidence for learning style preferences within spelling instruction for a child with complex communication needs. Furthermore, the implications for targeting intervention to optimise spelling achievement amongst this group are considered.
Noise reduction algorithm with the soft thresholding based on the Shannon entropy and bone-conduction speech cross- correlation bands.

PubMed

Na, Sung Dae; Wei, Qun; Seong, Ki Woong; Cho, Jin Ho; Kim, Myoung Nam

2018-01-01

The conventional methods of speech enhancement, noise reduction, and voice activity detection are based on the suppression of noise or non-speech components of the target air-conduction signals. However, air-conduced speech is hard to differentiate from babble or white noise signals. To overcome this problem, the proposed algorithm uses the bone-conduction speech signals and soft thresholding based on the Shannon entropy principle and cross-correlation of air- and bone-conduction signals. A new algorithm for speech detection and noise reduction is proposed, which makes use of the Shannon entropy principle and cross-correlation with the bone-conduction speech signals to threshold the wavelet packet coefficients of the noisy speech. The proposed method can be get efficient result by objective quality measure that are PESQ, RMSE, Correlation, SNR. Each threshold is generated by the entropy and cross-correlation approaches in the decomposed bands using the wavelet packet decomposition. As a result, the noise is reduced by the proposed method using the MATLAB simulation. To verify the method feasibility, we compared the air- and bone-conduction speech signals and their spectra by the proposed method. As a result, high performance of the proposed method is confirmed, which makes it quite instrumental to future applications in communication devices, noisy environment, construction, and military operations.
Relating dynamic brain states to dynamic machine states: Human and machine solutions to the speech recognition problem

PubMed Central

Liu, Xunying; Zhang, Chao; Woodland, Phil; Fonteneau, Elisabeth

2017-01-01

There is widespread interest in the relationship between the neurobiological systems supporting human cognition and emerging computational systems capable of emulating these capacities. Human speech comprehension, poorly understood as a neurobiological process, is an important case in point. Automatic Speech Recognition (ASR) systems with near-human levels of performance are now available, which provide a computationally explicit solution for the recognition of words in continuous speech. This research aims to bridge the gap between speech recognition processes in humans and machines, using novel multivariate techniques to compare incremental ‘machine states’, generated as the ASR analysis progresses over time, to the incremental ‘brain states’, measured using combined electro- and magneto-encephalography (EMEG), generated as the same inputs are heard by human listeners. This direct comparison of dynamic human and machine internal states, as they respond to the same incrementally delivered sensory input, revealed a significant correspondence between neural response patterns in human superior temporal cortex and the structural properties of ASR-derived phonetic models. Spatially coherent patches in human temporal cortex responded selectively to individual phonetic features defined on the basis of machine-extracted regularities in the speech to lexicon mapping process. These results demonstrate the feasibility of relating human and ASR solutions to the problem of speech recognition, and suggest the potential for further studies relating complex neural computations in human speech comprehension to the rapidly evolving ASR systems that address the same problem domain. PMID:28945744
Buried Messages, Hidden Meanings: Speech Mannerisms Revisited.

ERIC Educational Resources Information Center

Morgan, Lewis B.

1988-01-01

Introduces counselors to 10 commonly used mannerisms of speech and the part that each mannerism plays in the communication process, especially in the counseling context. Offers suggestions on how to respond to these speech mannerisms in a straightforward and effective manner. (Author)
Speech Correction for Children with Cleft Lip and Palate by Networking of Community-Based Care.

PubMed

Hanchanlert, Yotsak; Pramakhatay, Worawat; Pradubwong, Suteera; Prathanee, Benjamas

2015-08-01

Prevalence of cleft lip and palate (CLP) is high in Northeast Thailand. Most children with CLP face many problems, particularly compensatory articulation disorders (CAD) beyond surgery while speech services and the number of speech and language pathologists (SLPs) are limited. To determine the effectiveness of networking of Khon Kaen University (KKU) Community-Based Speech Therapy Model: Kosumphisai Hospital, Kosumphisai District and Maha Sarakham Hospital, Mueang District, Maha Sarakham Province for reduction of the number of articulations errors for children with CLP. Eleven children with CLP were recruited in 3 1-year projects of KKU Community-Based Speech Therapy Model. Articulation tests were formally assessed by qualified language pathologists (SLPs) for baseline and post treatment outcomes. Teachings on services for speech assistants (SAs) were conducted by SLPs. Assigned speech correction (SC) was performed by SAs at home and at local hospitals. Caregivers also gave SC at home 3-4 days a week. Networking of Community-Based Speech Therapy Model signficantly reduced the number of articulation errors for children with CLP in both word and sentence levels (mean difference = 6.91, 95% confidence interval = 4.15-9.67; mean difference = 5.36, 95% confidence interval = 2.99-7.73, respectively). Networking by Kosumphisai and Maha Sarakham of KKU Community-Based Speech Therapy Model was a valid and efficient method for providing speech services for children with cleft palate and could be extended to any area in Thailand and other developing countries, where have similar contexts.
Exploring the roles of spectral detail and intonation contour in speech intelligibility: an FMRI study.

PubMed

Kyong, Jeong S; Scott, Sophie K; Rosen, Stuart; Howe, Timothy B; Agnew, Zarinah K; McGettigan, Carolyn

2014-08-01

The melodic contour of speech forms an important perceptual aspect of tonal and nontonal languages and an important limiting factor on the intelligibility of speech heard through a cochlear implant. Previous work exploring the neural correlates of speech comprehension identified a left-dominant pathway in the temporal lobes supporting the extraction of an intelligible linguistic message, whereas the right anterior temporal lobe showed an overall preference for signals clearly conveying dynamic pitch information [Johnsrude, I. S., Penhune, V. B., & Zatorre, R. J. Functional specificity in the right human auditory cortex for perceiving pitch direction. Brain, 123, 155-163, 2000; Scott, S. K., Blank, C. C., Rosen, S., & Wise, R. J. Identification of a pathway for intelligible speech in the left temporal lobe. Brain, 123, 2400-2406, 2000]. The current study combined modulations of overall intelligibility (through vocoding and spectral inversion) with a manipulation of pitch contour (normal vs. falling) to investigate the processing of spoken sentences in functional MRI. Our overall findings replicate and extend those of Scott et al. [Scott, S. K., Blank, C. C., Rosen, S., & Wise, R. J. Identification of a pathway for intelligible speech in the left temporal lobe. Brain, 123, 2400-2406, 2000], where greater sentence intelligibility was predominately associated with increased activity in the left STS, and the greatest response to normal sentence melody was found in right superior temporal gyrus. These data suggest a spatial distinction between brain areas associated with intelligibility and those involved in the processing of dynamic pitch information in speech. By including a set of complexity-matched unintelligible conditions created by spectral inversion, this is additionally the first study reporting a fully factorial exploration of spectrotemporal complexity and spectral inversion as they relate to the neural processing of speech intelligibility. Perhaps surprisingly, there was little evidence for an interaction between the two factors-we discuss the implications for the processing of sound and speech in the dorsolateral temporal lobes.
Subcortical processing of speech regularities underlies reading and music aptitude in children

PubMed Central

2011-01-01

Background Neural sensitivity to acoustic regularities supports fundamental human behaviors such as hearing in noise and reading. Although the failure to encode acoustic regularities in ongoing speech has been associated with language and literacy deficits, how auditory expertise, such as the expertise that is associated with musical skill, relates to the brainstem processing of speech regularities is unknown. An association between musical skill and neural sensitivity to acoustic regularities would not be surprising given the importance of repetition and regularity in music. Here, we aimed to define relationships between the subcortical processing of speech regularities, music aptitude, and reading abilities in children with and without reading impairment. We hypothesized that, in combination with auditory cognitive abilities, neural sensitivity to regularities in ongoing speech provides a common biological mechanism underlying the development of music and reading abilities. Methods We assessed auditory working memory and attention, music aptitude, reading ability, and neural sensitivity to acoustic regularities in 42 school-aged children with a wide range of reading ability. Neural sensitivity to acoustic regularities was assessed by recording brainstem responses to the same speech sound presented in predictable and variable speech streams. Results Through correlation analyses and structural equation modeling, we reveal that music aptitude and literacy both relate to the extent of subcortical adaptation to regularities in ongoing speech as well as with auditory working memory and attention. Relationships between music and speech processing are specifically driven by performance on a musical rhythm task, underscoring the importance of rhythmic regularity for both language and music. Conclusions These data indicate common brain mechanisms underlying reading and music abilities that relate to how the nervous system responds to regularities in auditory input. Definition of common biological underpinnings for music and reading supports the usefulness of music for promoting child literacy, with the potential to improve reading remediation. PMID:22005291
Identification of speech transients using variable frame rate analysis and wavelet packets.

PubMed

Rasetshwane, Daniel M; Boston, J Robert; Li, Ching-Chung

2006-01-01

Speech transients are important cues for identifying and discriminating speech sounds. Yoo et al. and Tantibundhit et al. were successful in identifying speech transients and, emphasizing them, improving the intelligibility of speech in noise. However, their methods are computationally intensive and unsuitable for real-time applications. This paper presents a method to identify and emphasize speech transients that combines subband decomposition by the wavelet packet transform with variable frame rate (VFR) analysis and unvoiced consonant detection. The VFR analysis is applied to each wavelet packet to define a transitivity function that describes the extent to which the wavelet coefficients of that packet are changing. Unvoiced consonant detection is used to identify unvoiced consonant intervals and the transitivity function is amplified during these intervals. The wavelet coefficients are multiplied by the transitivity function for that packet, amplifying the coefficients localized at times when they are changing and attenuating coefficients at times when they are steady. Inverse transform of the modified wavelet packet coefficients produces a signal corresponding to speech transients similar to the transients identified by Yoo et al. and Tantibundhit et al. A preliminary implementation of the algorithm runs more efficiently.
Speech evaluation after intravelar veloplasty. How to use Borel-Maisonny classification in the international literature?

PubMed

Kadlub, N; Chapuis Vandenbogaerde, C; Joly, A; Neiva, C; Vazquez, M-P; Picard, A

2018-04-01

Comparing functional outcomes after velar repair appeared to be difficult because of the absence of international standardized scale. Moreover most of the studies evaluating speech after cleft surgery present multiple biases. The aim of our study was to assess speech outcomes in a homogeneous group of patients, and to define an equivalence table between different speech scales. Patients with isolated cleft lip and palate (CLP), operated in our unit by the same senior surgeon were included. All patient were operated according to the same protocol (cheilo-rhinoplasty and intravelar veloplasty at 6 months, followed by a direct closure of the hard palate at 15 months). Speech evaluation was performed after 3 year-old and before the alveolar cleft repair. Borel-Maisonny scale and nasometry were used for speech evaluation. Twenty-four patients were included: 17 unilateral CLP and 7 bilateral CLP. According to the Borel-Maisonny classifications, 82.5% were ranged phonation 1, 1-2 or 2b. Nasometry were normal in almost 60% of cases. This study showed the efficiency of our protocol, and intravelar veloplasty. Moreover we proposed an equivalence table for speech evaluation scale. Copyright © 2017 Elsevier Masson SAS. All rights reserved.
Multistage audiovisual integration of speech: dissociating identification and detection.

PubMed

Eskelund, Kasper; Tuomainen, Jyrki; Andersen, Tobias S

2011-02-01

Speech perception integrates auditory and visual information. This is evidenced by the McGurk illusion where seeing the talking face influences the auditory phonetic percept and by the audiovisual detection advantage where seeing the talking face influences the detectability of the acoustic speech signal. Here, we show that identification of phonetic content and detection can be dissociated as speech-specific and non-specific audiovisual integration effects. To this end, we employed synthetically modified stimuli, sine wave speech (SWS), which is an impoverished speech signal that only observers informed of its speech-like nature recognize as speech. While the McGurk illusion only occurred for informed observers, the audiovisual detection advantage occurred for naïve observers as well. This finding supports a multistage account of audiovisual integration of speech in which the many attributes of the audiovisual speech signal are integrated by separate integration processes.
Reconciling phonological neighborhood effects in speech production through single trial analysis.

PubMed

Sadat, Jasmin; Martin, Clara D; Costa, Albert; Alario, F-Xavier

2014-02-01

A crucial step for understanding how lexical knowledge is represented is to describe the relative similarity of lexical items, and how it influences language processing. Previous studies of the effects of form similarity on word production have reported conflicting results, notably within and across languages. The aim of the present study was to clarify this empirical issue to provide specific constraints for theoretical models of language production. We investigated the role of phonological neighborhood density in a large-scale picture naming experiment using fine-grained statistical models. The results showed that increasing phonological neighborhood density has a detrimental effect on naming latencies, and re-analyses of independently obtained data sets provide supplementary evidence for this effect. Finally, we reviewed a large body of evidence concerning phonological neighborhood density effects in word production, and discussed the occurrence of facilitatory and inhibitory effects in accuracy measures. The overall pattern shows that phonological neighborhood generates two opposite forces, one facilitatory and one inhibitory. In cases where speech production is disrupted (e.g. certain aphasic symptoms), the facilitatory component may emerge, but inhibitory processes dominate in efficient naming by healthy speakers. These findings are difficult to accommodate in terms of monitoring processes, but can be explained within interactive activation accounts combining phonological facilitation and lexical competition. Copyright © 2013 Elsevier Inc. All rights reserved.
Integrated Sensing and Processing (ISP) Phase 2: Demonstration and Evaluation for Distributed Sensor Networks and Missile Seeker Systems

DTIC Science & Technology

2007-05-29

International Conference Acoustics Speech and Signal Processing (ICASSP 2007) conference 15 − 20 April 2007 in Honolulu, Hawaii. 1. E. Near Term...from the sensor measured in feet. The detection performance of the footstep in the presence of interfering speech was characterized in previously...investigation, we developed a simple piecewise linear approximation to the probability of detection curve with no interfering speech . This approximation was
Combinatorial Markov Random Fields and Their Applications to Information Organization

DTIC Science & Technology

2008-02-01

titles, part-of- speech tags; • Image processing: images, colors, texture, blobs, interest points, caption words; • Video processing: video signal, audio...McGurk and MacDonald published their pioneering work [80] that revealed the multi-modal nature of speech perception: sound and moving lips compose one... Speech (POS) n-grams (that correspond to the syntactic structure of text). POS n-grams are extracted from sentences in an incremental manner: the first n
Classifying acoustic signals into phoneme categories: average and dyslexic readers make use of complex dynamical patterns and multifractal scaling properties of the speech signal

PubMed Central

2015-01-01

Several competing aetiologies of developmental dyslexia suggest that the problems with acquiring literacy skills are causally entailed by low-level auditory and/or speech perception processes. The purpose of this study is to evaluate the diverging claims about the specific deficient peceptual processes under conditions of strong inference. Theoretically relevant acoustic features were extracted from a set of artificial speech stimuli that lie on a /bAk/-/dAk/ continuum. The features were tested on their ability to enable a simple classifier (Quadratic Discriminant Analysis) to reproduce the observed classification performance of average and dyslexic readers in a speech perception experiment. The ‘classical’ features examined were based on component process accounts of developmental dyslexia such as the supposed deficit in Envelope Rise Time detection and the deficit in the detection of rapid changes in the distribution of energy in the frequency spectrum (formant transitions). Studies examining these temporal processing deficit hypotheses do not employ measures that quantify the temporal dynamics of stimuli. It is shown that measures based on quantification of the dynamics of complex, interaction-dominant systems (Recurrence Quantification Analysis and the multifractal spectrum) enable QDA to classify the stimuli almost identically as observed in dyslexic and average reading participants. It seems unlikely that participants used any of the features that are traditionally associated with accounts of (impaired) speech perception. The nature of the variables quantifying the temporal dynamics of the speech stimuli imply that the classification of speech stimuli cannot be regarded as a linear aggregate of component processes that each parse the acoustic signal independent of one another, as is assumed by the ‘classical’ aetiologies of developmental dyslexia. It is suggested that the results imply that the differences in speech perception performance between average and dyslexic readers represent a scaled continuum rather than being caused by a specific deficient component. PMID:25834769
Research on oral test modeling based on multi-feature fusion

NASA Astrophysics Data System (ADS)

Shi, Yuliang; Tao, Yiyue; Lei, Jun

2018-04-01

In this paper, the spectrum of speech signal is taken as an input of feature extraction. The advantage of PCNN in image segmentation and other processing is used to process the speech spectrum and extract features. And a new method combining speech signal processing and image processing is explored. At the same time of using the features of the speech map, adding the MFCC to establish the spectral features and integrating them with the features of the spectrogram to further improve the accuracy of the spoken language recognition. Considering that the input features are more complicated and distinguishable, we use Support Vector Machine (SVM) to construct the classifier, and then compare the extracted test voice features with the standard voice features to achieve the spoken standard detection. Experiments show that the method of extracting features from spectrograms using PCNN is feasible, and the fusion of image features and spectral features can improve the detection accuracy.
Impact of speech presentation level on cognitive task performance: implications for auditory display design.

PubMed

Baldwin, Carryl L; Struckman-Johnson, David

2002-01-15

Speech displays and verbal response technologies are increasingly being used in complex, high workload environments that require the simultaneous performance of visual and manual tasks. Examples of such environments include the flight decks of modern aircraft, advanced transport telematics systems providing invehicle route guidance and navigational information and mobile communication equipment in emergency and public safety vehicles. Previous research has established an optimum range for speech intelligibility. However, the potential for variations in presentation levels within this range to affect attentional resources and cognitive processing of speech material has not been examined previously. Results of the current experimental investigation demonstrate that as presentation level increases within this 'optimum' range, participants in high workload situations make fewer sentence-processing errors and generally respond faster. Processing errors were more sensitive to changes in presentation level than were measures of reaction time. Implications of these findings are discussed in terms of their application for the design of speech communications displays in complex multi-task environments.

Automatic initial and final segmentation in cleft palate speech of Mandarin speakers

PubMed Central

Liu, Yin; Yin, Heng; Zhang, Junpeng; Zhang, Jing; Zhang, Jiang

2017-01-01

The speech unit segmentation is an important pre-processing step in the analysis of cleft palate speech. In Mandarin, one syllable is composed of two parts: initial and final. In cleft palate speech, the resonance disorders occur at the finals and the voiced initials, while the articulation disorders occur at the unvoiced initials. Thus, the initials and finals are the minimum speech units, which could reflect the characteristics of cleft palate speech disorders. In this work, an automatic initial/final segmentation method is proposed. It is an important preprocessing step in cleft palate speech signal processing. The tested cleft palate speech utterances are collected from the Cleft Palate Speech Treatment Center in the Hospital of Stomatology, Sichuan University, which has the largest cleft palate patients in China. The cleft palate speech data includes 824 speech segments, and the control samples contain 228 speech segments. The syllables are extracted from the speech utterances firstly. The proposed syllable extraction method avoids the training stage, and achieves a good performance for both voiced and unvoiced speech. Then, the syllables are classified into with “quasi-unvoiced” or with “quasi-voiced” initials. Respective initial/final segmentation methods are proposed to these two types of syllables. Moreover, a two-step segmentation method is proposed. The rough locations of syllable and initial/final boundaries are refined in the second segmentation step, in order to improve the robustness of segmentation accuracy. The experiments show that the initial/final segmentation accuracies for syllables with quasi-unvoiced initials are higher than quasi-voiced initials. For the cleft palate speech, the mean time error is 4.4ms for syllables with quasi-unvoiced initials, and 25.7ms for syllables with quasi-voiced initials, and the correct segmentation accuracy P30 for all the syllables is 91.69%. For the control samples, P30 for all the syllables is 91.24%. PMID:28926572
Automatic initial and final segmentation in cleft palate speech of Mandarin speakers.

PubMed

He, Ling; Liu, Yin; Yin, Heng; Zhang, Junpeng; Zhang, Jing; Zhang, Jiang

2017-01-01

The speech unit segmentation is an important pre-processing step in the analysis of cleft palate speech. In Mandarin, one syllable is composed of two parts: initial and final. In cleft palate speech, the resonance disorders occur at the finals and the voiced initials, while the articulation disorders occur at the unvoiced initials. Thus, the initials and finals are the minimum speech units, which could reflect the characteristics of cleft palate speech disorders. In this work, an automatic initial/final segmentation method is proposed. It is an important preprocessing step in cleft palate speech signal processing. The tested cleft palate speech utterances are collected from the Cleft Palate Speech Treatment Center in the Hospital of Stomatology, Sichuan University, which has the largest cleft palate patients in China. The cleft palate speech data includes 824 speech segments, and the control samples contain 228 speech segments. The syllables are extracted from the speech utterances firstly. The proposed syllable extraction method avoids the training stage, and achieves a good performance for both voiced and unvoiced speech. Then, the syllables are classified into with "quasi-unvoiced" or with "quasi-voiced" initials. Respective initial/final segmentation methods are proposed to these two types of syllables. Moreover, a two-step segmentation method is proposed. The rough locations of syllable and initial/final boundaries are refined in the second segmentation step, in order to improve the robustness of segmentation accuracy. The experiments show that the initial/final segmentation accuracies for syllables with quasi-unvoiced initials are higher than quasi-voiced initials. For the cleft palate speech, the mean time error is 4.4ms for syllables with quasi-unvoiced initials, and 25.7ms for syllables with quasi-voiced initials, and the correct segmentation accuracy P30 for all the syllables is 91.69%. For the control samples, P30 for all the syllables is 91.24%.
Implementation of the Intelligent Voice System for Kazakh

NASA Astrophysics Data System (ADS)

Yessenbayev, Zh; Saparkhojayev, N.; Tibeyev, T.

2014-04-01

Modern speech technologies are highly advanced and widely used in day-to-day applications. However, this is mostly concerned with the languages of well-developed countries such as English, German, Japan, Russian, etc. As for Kazakh, the situation is less prominent and research in this field is only starting to evolve. In this research and application-oriented project, we introduce an intelligent voice system for the fast deployment of call-centers and information desks supporting Kazakh speech. The demand on such a system is obvious if the country's large size and small population is considered. The landline and cell phones become the only means of communication for the distant villages and suburbs. The system features Kazakh speech recognition and synthesis modules as well as a web-GUI for efficient dialog management. For speech recognition we use CMU Sphinx engine and for speech synthesis- MaryTTS. The web-GUI is implemented in Java enabling operators to quickly create and manage the dialogs in user-friendly graphical environment. The call routines are handled by Asterisk PBX and JBoss Application Server. The system supports such technologies and protocols as VoIP, VoiceXML, FastAGI, Java SpeechAPI and J2EE. For the speech recognition experiments we compiled and used the first Kazakh speech corpus with the utterances from 169 native speakers. The performance of the speech recognizer is 4.1% WER on isolated word recognition and 6.9% WER on clean continuous speech recognition tasks. The speech synthesis experiments include the training of male and female voices.
A speech processing study using an acoustic model of a multiple-channel cochlear implant

NASA Astrophysics Data System (ADS)

Xu, Ying

1998-10-01

A cochlear implant is an electronic device designed to provide sound information for adults and children who have bilateral profound hearing loss. The task of representing speech signals as electrical stimuli is central to the design and performance of cochlear implants. Studies have shown that the current speech- processing strategies provide significant benefits to cochlear implant users. However, the evaluation and development of speech-processing strategies have been complicated by hardware limitations and large variability in user performance. To alleviate these problems, an acoustic model of a cochlear implant with the SPEAK strategy is implemented in this study, in which a set of acoustic stimuli whose psychophysical characteristics are as close as possible to those produced by a cochlear implant are presented on normal-hearing subjects. To test the effectiveness and feasibility of this acoustic model, a psychophysical experiment was conducted to match the performance of a normal-hearing listener using model- processed signals to that of a cochlear implant user. Good agreement was found between an implanted patient and an age-matched normal-hearing subject in a dynamic signal discrimination experiment, indicating that this acoustic model is a reasonably good approximation of a cochlear implant with the SPEAK strategy. The acoustic model was then used to examine the potential of the SPEAK strategy in terms of its temporal and frequency encoding of speech. It was hypothesized that better temporal and frequency encoding of speech can be accomplished by higher stimulation rates and a larger number of activated channels. Vowel and consonant recognition tests were conducted on normal-hearing subjects using speech tokens processed by the acoustic model, with different combinations of stimulation rate and number of activated channels. The results showed that vowel recognition was best at 600 pps and 8 activated channels, but further increases in stimulation rate and channel numbers were not beneficial. Manipulations of stimulation rate and number of activated channels did not appreciably affect consonant recognition. These results suggest that overall speech performance may improve by appropriately increasing stimulation rate and number of activated channels. Future revision of this acoustic model is necessary to provide more accurate amplitude representation of speech.
Common and distinct neural substrates for the perception of speech rhythm and intonation.

PubMed

Zhang, Linjun; Shu, Hua; Zhou, Fengying; Wang, Xiaoyi; Li, Ping

2010-07-01

The present study examines the neural substrates for the perception of speech rhythm and intonation. Subjects listened passively to synthesized speech stimuli that contained no semantic and phonological information, in three conditions: (1) continuous speech stimuli with fixed syllable duration and fundamental frequency in the standard condition, (2) stimuli with varying vocalic durations of syllables in the speech rhythm condition, and (3) stimuli with varying fundamental frequency in the intonation condition. Compared to the standard condition, speech rhythm activated the right middle superior temporal gyrus (mSTG), whereas intonation activated the bilateral superior temporal gyrus and sulcus (STG/STS) and the right posterior STS. Conjunction analysis further revealed that rhythm and intonation activated a common area in the right mSTG but compared to speech rhythm, intonation elicited additional activations in the right anterior STS. Findings from the current study reveal that the right mSTG plays an important role in prosodic processing. Implications of our findings are discussed with respect to neurocognitive theories of auditory processing. (c) 2009 Wiley-Liss, Inc.
Differential Gaze Patterns on Eyes and Mouth During Audiovisual Speech Segmentation

PubMed Central

Lusk, Laina G.; Mitchel, Aaron D.

2016-01-01

Speech is inextricably multisensory: both auditory and visual components provide critical information for all aspects of speech processing, including speech segmentation, the visual components of which have been the target of a growing number of studies. In particular, a recent study (Mitchel and Weiss, 2014) established that adults can utilize facial cues (i.e., visual prosody) to identify word boundaries in fluent speech. The current study expanded upon these results, using an eye tracker to identify highly attended facial features of the audiovisual display used in Mitchel and Weiss (2014). Subjects spent the most time watching the eyes and mouth. A significant trend in gaze durations was found with the longest gaze duration on the mouth, followed by the eyes and then the nose. In addition, eye-gaze patterns changed across familiarization as subjects learned the word boundaries, showing decreased attention to the mouth in later blocks while attention on other facial features remained consistent. These findings highlight the importance of the visual component of speech processing and suggest that the mouth may play a critical role in visual speech segmentation. PMID:26869959
The impact of phonetic dissimilarity on the perception of foreign accented speech

NASA Astrophysics Data System (ADS)

Weil, Shawn A.

2003-10-01

Non-normative speech (i.e., synthetic speech, pathological speech, foreign accented speech) is more difficult to process for native listeners than is normative speech. Does perceptual dissimilarity affect only intelligibility, or are there other costs to processing? The current series of experiments investigates both the intelligibility and time course of foreign accented speech (FAS) perception. Native English listeners heard single English words spoken by both native English speakers and non-native speakers (Mandarin or Russian). Words were chosen based on the similarity between the phonetic inventories of the respective languages. Three experimental designs were used: a cross-modal matching task, a word repetition (shadowing) task, and two subjective ratings tasks which measured impressions of accentedness and effortfulness. The results replicate previous investigations that have found that FAS significantly lowers word intelligibility. Furthermore, in FAS as well as perceptual effort, in the word repetition task, correct responses are slower to accented words than to nonaccented words. An analysis indicates that both intelligibility and reaction time are, in part, functions of the similarity between the talker's utterance and the listener's representation of the word.
Neurophysiology of speech differences in childhood apraxia of speech.

PubMed

Preston, Jonathan L; Molfese, Peter J; Gumkowski, Nina; Sorcinelli, Andrea; Harwood, Vanessa; Irwin, Julia R; Landi, Nicole

2014-01-01

Event-related potentials (ERPs) were recorded during a picture naming task of simple and complex words in children with typical speech and with childhood apraxia of speech (CAS). Results reveal reduced amplitude prior to speaking complex (multisyllabic) words relative to simple (monosyllabic) words for the CAS group over the right hemisphere during a time window thought to reflect phonological encoding of word forms. Group differences were also observed prior to production of spoken tokens regardless of word complexity during a time window just prior to speech onset (thought to reflect motor planning/programming). Results suggest differences in pre-speech neurolinguistic processes.
Integrating cognitive and peripheral factors in predicting hearing-aid processing effectiveness

PubMed Central

Kates, James M.; Arehart, Kathryn H.; Souza, Pamela E.

2013-01-01

Individual factors beyond the audiogram, such as age and cognitive abilities, can influence speech intelligibility and speech quality judgments. This paper develops a neural network framework for combining multiple subject factors into a single model that predicts speech intelligibility and quality for a nonlinear hearing-aid processing strategy. The nonlinear processing approach used in the paper is frequency compression, which is intended to improve the audibility of high-frequency speech sounds by shifting them to lower frequency regions where listeners with high-frequency loss have better hearing thresholds. An ensemble averaging approach is used for the neural network to avoid the problems associated with overfitting. Models are developed for two subject groups, one having nearly normal hearing and the other mild-to-moderate sloping losses. PMID:25669257
Recommendations for elaboration, transcultural adaptation and validation process of tests in Speech, Hearing and Language Pathology.

PubMed

Pernambuco, Leandro; Espelt, Albert; Magalhães, Hipólito Virgílio; Lima, Kenio Costa de

2017-06-08

to present a guide with recommendations for translation, adaptation, elaboration and process of validation of tests in Speech and Language Pathology. the recommendations were based on international guidelines with a focus on the elaboration, translation, cross-cultural adaptation and validation process of tests. the recommendations were grouped into two Charts, one of them with procedures for translation and transcultural adaptation and the other for obtaining evidence of validity, reliability and measures of accuracy of the tests. a guide with norms for the organization and systematization of the process of elaboration, translation, cross-cultural adaptation and validation process of tests in Speech and Language Pathology was created.
Advances in natural language processing.

PubMed

Hirschberg, Julia; Manning, Christopher D

2015-07-17

Natural language processing employs computational techniques for the purpose of learning, understanding, and producing human language content. Early computational approaches to language research focused on automating the analysis of the linguistic structure of language and developing basic technologies such as machine translation, speech recognition, and speech synthesis. Today's researchers refine and make use of such tools in real-world applications, creating spoken dialogue systems and speech-to-speech translation engines, mining social media for information about health or finance, and identifying sentiment and emotion toward products and services. We describe successes and challenges in this rapidly advancing area. Copyright © 2015, American Association for the Advancement of Science.
Development of an algorithm for improving quality and information processing capacity of MathSpeak synthetic speech renderings.

PubMed

Isaacson, M D; Srinivasan, S; Lloyd, L L

2010-01-01

MathSpeak is a set of rules for non speaking of mathematical expressions. These rules have been incorporated into a computerised module that translates printed mathematics into the non-ambiguous MathSpeak form for synthetic speech rendering. Differences between individual utterances produced with the translator module are difficult to discern because of insufficient pausing between utterances; hence, the purpose of this study was to develop an algorithm for improving the synthetic speech rendering of MathSpeak. To improve synthetic speech renderings, an algorithm for inserting pauses was developed based upon recordings of middle and high school math teachers speaking mathematic expressions. Efficacy testing of this algorithm was conducted with college students without disabilities and high school/college students with visual impairments. Parameters measured included reception accuracy, short-term memory retention, MathSpeak processing capacity and various rankings concerning the quality of synthetic speech renderings. All parameters measured showed statistically significant improvements when the algorithm was used. The algorithm improves the quality and information processing capacity of synthetic speech renderings of MathSpeak. This increases the capacity of individuals with print disabilities to perform mathematical activities and to successfully fulfill science, technology, engineering and mathematics academic and career objectives.
A General Audiovisual Temporal Processing Deficit in Adult Readers With Dyslexia.

PubMed

Francisco, Ana A; Jesse, Alexandra; Groen, Margriet A; McQueen, James M

2017-01-01

Because reading is an audiovisual process, reading impairment may reflect an audiovisual processing deficit. The aim of the present study was to test the existence and scope of such a deficit in adult readers with dyslexia. We tested 39 typical readers and 51 adult readers with dyslexia on their sensitivity to the simultaneity of audiovisual speech and nonspeech stimuli, their time window of audiovisual integration for speech (using incongruent /aCa/ syllables), and their audiovisual perception of phonetic categories. Adult readers with dyslexia showed less sensitivity to audiovisual simultaneity than typical readers for both speech and nonspeech events. We found no differences between readers with dyslexia and typical readers in the temporal window of integration for audiovisual speech or in the audiovisual perception of phonetic categories. The results suggest an audiovisual temporal deficit in dyslexia that is not specific to speech-related events. But the differences found for audiovisual temporal sensitivity did not translate into a deficit in audiovisual speech perception. Hence, there seems to be a hiatus between simultaneity judgment and perception, suggesting a multisensory system that uses different mechanisms across tasks. Alternatively, it is possible that the audiovisual deficit in dyslexia is only observable when explicit judgments about audiovisual simultaneity are required.
Spatial hearing benefits demonstrated with presentation of acoustic temporal fine structure cues in bilateral cochlear implant listeners.

PubMed

Churchill, Tyler H; Kan, Alan; Goupell, Matthew J; Litovsky, Ruth Y

2014-09-01

Most contemporary cochlear implant (CI) processing strategies discard acoustic temporal fine structure (TFS) information, and this may contribute to the observed deficits in bilateral CI listeners' ability to localize sounds when compared to normal hearing listeners. Additionally, for best speech envelope representation, most contemporary speech processing strategies use high-rate carriers (≥900 Hz) that exceed the limit for interaural pulse timing to provide useful binaural information. Many bilateral CI listeners are sensitive to interaural time differences (ITDs) in low-rate (<300 Hz) constant-amplitude pulse trains. This study explored the trade-off between superior speech temporal envelope representation with high-rate carriers and binaural pulse timing sensitivity with low-rate carriers. The effects of carrier pulse rate and pulse timing on ITD discrimination, ITD lateralization, and speech recognition in quiet were examined in eight bilateral CI listeners. Stimuli consisted of speech tokens processed at different electrical stimulation rates, and pulse timings that either preserved or did not preserve acoustic TFS cues. Results showed that CI listeners were able to use low-rate pulse timing cues derived from acoustic TFS when presented redundantly on multiple electrodes for ITD discrimination and lateralization of speech stimuli.
Relationships Among Peripheral and Central Electrophysiological Measures of Spatial and Spectral Selectivity and Speech Perception in Cochlear Implant Users

PubMed Central

Scheperle, Rachel A.; Abbas, Paul J.

2014-01-01

Objectives The ability to perceive speech is related to the listener’s ability to differentiate among frequencies (i.e., spectral resolution). Cochlear implant (CI) users exhibit variable speech-perception and spectral-resolution abilities, which can be attributed in part to the extent of electrode interactions at the periphery (i.e., spatial selectivity). However, electrophysiological measures of peripheral spatial selectivity have not been found to correlate with speech perception. The purpose of this study was to evaluate auditory processing at the periphery and cortex using both simple and spectrally complex stimuli to better understand the stages of neural processing underlying speech perception. The hypotheses were that (1) by more completely characterizing peripheral excitation patterns than in previous studies, significant correlations with measures of spectral selectivity and speech perception would be observed, (2) adding information about processing at a level central to the auditory nerve would account for additional variability in speech perception, and (3) responses elicited with spectrally complex stimuli would be more strongly correlated with speech perception than responses elicited with spectrally simple stimuli. Design Eleven adult CI users participated. Three experimental processor programs (MAPs) were created to vary the likelihood of electrode interactions within each participant. For each MAP, a subset of 7 of 22 intracochlear electrodes was activated: adjacent (MAP 1), every-other (MAP 2), or every third (MAP 3). Peripheral spatial selectivity was assessed using the electrically evoked compound action potential (ECAP) to obtain channel-interaction functions for all activated electrodes (13 functions total). Central processing was assessed by eliciting the auditory change complex (ACC) with both spatial (electrode pairs) and spectral (rippled noise) stimulus changes. Speech-perception measures included vowel-discrimination and the Bamford-Kowal-Bench Sentence-in-Noise (BKB-SIN) test. Spatial and spectral selectivity and speech perception were expected to be poorest with MAP 1 (closest electrode spacing) and best with MAP 3 (widest electrode spacing). Relationships among the electrophysiological and speech-perception measures were evaluated using mixed-model and simple linear regression analyses. Results All electrophysiological measures were significantly correlated with each other and with speech perception for the mixed-model analysis, which takes into account multiple measures per person (i.e. experimental MAPs). The ECAP measures were the best predictor of speech perception. In the simple linear regression analysis on MAP 3 data, only the cortical measures were significantly correlated with speech; spectral ACC amplitude was the strongest predictor. Conclusions The results suggest that both peripheral and central electrophysiological measures of spatial and spectral selectivity provide valuable information about speech perception. Clinically, it is often desirable to optimize performance for individual CI users. These results suggest that ECAP measures may be the most useful for within-subject applications, when multiple measures are performed to make decisions about processor options. They also suggest that if the goal is to compare performance across individuals based on single measure, then processing central to the auditory nerve (specifically, cortical measures of discriminability) should be considered. PMID:25658746
Detection of target phonemes in spontaneous and read speech.

PubMed

Mehta, G; Cutler, A

1988-01-01

Although spontaneous speech occurs more frequently in most listeners' experience than read speech, laboratory studies of human speech recognition typically use carefully controlled materials read from a script. The phonological and prosodic characteristics of spontaneous and read speech differ considerably, however, which suggests that laboratory results may not generalise to the recognition of spontaneous speech. In the present study listeners were presented with both spontaneous and read speech materials, and their response time to detect word-initial target phonemes was measured. Responses were, overall, equally fast in each speech mode. However, analysis of effects previously reported in phoneme detection studies revealed significant differences between speech modes. In read speech but not in spontaneous speech, later targets were detected more rapidly than targets preceded by short words. In contrast, in spontaneous speech but not in read speech, targets were detected more rapidly in accented than in unaccented words and in strong than in weak syllables. An explanation for this pattern is offered in terms of characteristic prosodic differences between spontaneous and read speech. The results support claims from previous work that listeners pay great attention to prosodic information in the process of recognising speech.
A user's guide for the signal processing software for image and speech compression developed in the Communications and Signal Processing Laboratory (CSPL), version 1

NASA Technical Reports Server (NTRS)

Kumar, P.; Lin, F. Y.; Vaishampayan, V.; Farvardin, N.

1986-01-01

A complete documentation of the software developed in the Communication and Signal Processing Laboratory (CSPL) during the period of July 1985 to March 1986 is provided. Utility programs and subroutines that were developed for a user-friendly image and speech processing environment are described. Additional programs for data compression of image and speech type signals are included. Also, programs for the zero-memory and block transform quantization in the presence of channel noise are described. Finally, several routines for simulating the perfromance of image compression algorithms are included.
Asymmetries in the Processing of Vowel Height

ERIC Educational Resources Information Center

Scharinger, Mathias; Monahan, Philip J.; Idsardi, William J.

2012-01-01

Purpose: Speech perception can be described as the transformation of continuous acoustic information into discrete memory representations. Therefore, research on neural representations of speech sounds is particularly important for a better understanding of this transformation. Speech perception models make specific assumptions regarding the…
Visual Phonetic Processing Localized Using Speech and Non-Speech Face Gestures in Video and Point-Light Displays

PubMed Central

Bernstein, Lynne E.; Jiang, Jintao; Pantazis, Dimitrios; Lu, Zhong-Lin; Joshi, Anand

2011-01-01

The talking face affords multiple types of information. To isolate cortical sites with responsibility for integrating linguistically relevant visual speech cues, speech and non-speech face gestures were presented in natural video and point-light displays during fMRI scanning at 3.0T. Participants with normal hearing viewed the stimuli and also viewed localizers for the fusiform face area (FFA), the lateral occipital complex (LOC), and the visual motion (V5/MT) regions of interest (ROIs). The FFA, the LOC, and V5/MT were significantly less activated for speech relative to non-speech and control stimuli. Distinct activation of the posterior superior temporal sulcus and the adjacent middle temporal gyrus to speech, independent of media, was obtained in group analyses. Individual analyses showed that speech and non-speech stimuli were associated with adjacent but different activations, with the speech activations more anterior. We suggest that the speech activation area is the temporal visual speech area (TVSA), and that it can be localized with the combination of stimuli used in this study. PMID:20853377
Slowed Speech Input Has a Differential Impact on On-Line and Off-Line Processing in Children's Comprehension of Pronouns

ERIC Educational Resources Information Center

Love, Tracy; Walenski, Matthew; Swinney, David

2009-01-01

The central question underlying this study revolves around how children process co-reference relationships--such as those evidenced by pronouns ("him") and reflexives ("himself")--and how a slowed rate of speech input may critically affect this process. Previous studies of child language processing have demonstrated that typical language…

A Binaural Grouping Model for Predicting Speech Intelligibility in Multitalker Environments

PubMed Central

Colburn, H. Steven

2016-01-01

Spatially separating speech maskers from target speech often leads to a large intelligibility improvement. Modeling this phenomenon has long been of interest to binaural-hearing researchers for uncovering brain mechanisms and for improving signal-processing algorithms in hearing-assistive devices. Much of the previous binaural modeling work focused on the unmasking enabled by binaural cues at the periphery, and little quantitative modeling has been directed toward the grouping or source-separation benefits of binaural processing. In this article, we propose a binaural model that focuses on grouping, specifically on the selection of time-frequency units that are dominated by signals from the direction of the target. The proposed model uses Equalization-Cancellation (EC) processing with a binary decision rule to estimate a time-frequency binary mask. EC processing is carried out to cancel the target signal and the energy change between the EC input and output is used as a feature that reflects target dominance in each time-frequency unit. The processing in the proposed model requires little computational resources and is straightforward to implement. In combination with the Coherence-based Speech Intelligibility Index, the model is applied to predict the speech intelligibility data measured by Marrone et al. The predicted speech reception threshold matches the pattern of the measured data well, even though the predicted intelligibility improvements relative to the colocated condition are larger than some of the measured data, which may reflect the lack of internal noise in this initial version of the model. PMID:27698261
A Binaural Grouping Model for Predicting Speech Intelligibility in Multitalker Environments.

PubMed

Mi, Jing; Colburn, H Steven

2016-10-03

Spatially separating speech maskers from target speech often leads to a large intelligibility improvement. Modeling this phenomenon has long been of interest to binaural-hearing researchers for uncovering brain mechanisms and for improving signal-processing algorithms in hearing-assistive devices. Much of the previous binaural modeling work focused on the unmasking enabled by binaural cues at the periphery, and little quantitative modeling has been directed toward the grouping or source-separation benefits of binaural processing. In this article, we propose a binaural model that focuses on grouping, specifically on the selection of time-frequency units that are dominated by signals from the direction of the target. The proposed model uses Equalization-Cancellation (EC) processing with a binary decision rule to estimate a time-frequency binary mask. EC processing is carried out to cancel the target signal and the energy change between the EC input and output is used as a feature that reflects target dominance in each time-frequency unit. The processing in the proposed model requires little computational resources and is straightforward to implement. In combination with the Coherence-based Speech Intelligibility Index, the model is applied to predict the speech intelligibility data measured by Marrone et al. The predicted speech reception threshold matches the pattern of the measured data well, even though the predicted intelligibility improvements relative to the colocated condition are larger than some of the measured data, which may reflect the lack of internal noise in this initial version of the model. © The Author(s) 2016.
Speech rhythm facilitates syntactic ambiguity resolution: ERP evidence.

PubMed

Roncaglia-Denissen, Maria Paula; Schmidt-Kassow, Maren; Kotz, Sonja A

2013-01-01

In the current event-related potential (ERP) study, we investigated how speech rhythm impacts speech segmentation and facilitates the resolution of syntactic ambiguities in auditory sentence processing. Participants listened to syntactically ambiguous German subject- and object-first sentences that were spoken with either regular or irregular speech rhythm. Rhythmicity was established by a constant metric pattern of three unstressed syllables between two stressed ones that created rhythmic groups of constant size. Accuracy rates in a comprehension task revealed that participants understood rhythmically regular sentences better than rhythmically irregular ones. Furthermore, the mean amplitude of the P600 component was reduced in response to object-first sentences only when embedded in rhythmically regular but not rhythmically irregular context. This P600 reduction indicates facilitated processing of sentence structure possibly due to a decrease in processing costs for the less-preferred structure (object-first). Our data suggest an early and continuous use of rhythm by the syntactic parser and support language processing models assuming an interactive and incremental use of linguistic information during language processing.
Speech Rhythm Facilitates Syntactic Ambiguity Resolution: ERP Evidence

PubMed Central

Roncaglia-Denissen, Maria Paula; Schmidt-Kassow, Maren; Kotz, Sonja A.

2013-01-01

In the current event-related potential (ERP) study, we investigated how speech rhythm impacts speech segmentation and facilitates the resolution of syntactic ambiguities in auditory sentence processing. Participants listened to syntactically ambiguous German subject- and object-first sentences that were spoken with either regular or irregular speech rhythm. Rhythmicity was established by a constant metric pattern of three unstressed syllables between two stressed ones that created rhythmic groups of constant size. Accuracy rates in a comprehension task revealed that participants understood rhythmically regular sentences better than rhythmically irregular ones. Furthermore, the mean amplitude of the P600 component was reduced in response to object-first sentences only when embedded in rhythmically regular but not rhythmically irregular context. This P600 reduction indicates facilitated processing of sentence structure possibly due to a decrease in processing costs for the less-preferred structure (object-first). Our data suggest an early and continuous use of rhythm by the syntactic parser and support language processing models assuming an interactive and incremental use of linguistic information during language processing. PMID:23409109
Speech Skill Learning of Persons Who Stutter and Fluent Speakers under Single and Dual Task Conditions

ERIC Educational Resources Information Center

Smits-Bandstra, Sarah; De Nil, Luc

2009-01-01

Two studies compared the accuracy and efficiency of initiating oral reading of nonsense syllables by persons who stutter (PWS) and fluent speakers (PNS) over practise. Findings of Study One, comparing 12 PWS and 12 PNS, replicated previous findings of slow speech sequence initiation over practise by PWS relative to PNS. In Study Two, nine PWS and…
The Wildcat Corpus of Native- and Foreign-Accented English: Communicative Efficiency across Conversational Dyads with Varying Language Alignment Profiles

ERIC Educational Resources Information Center

Van Engen, Kristin J.; Baese-Berk, Melissa; Baker, Rachel E.; Choi, Arim; Kim, Midam; Bradlow, Ann R.

2010-01-01

This paper describes the development of the Wildcat Corpus of native- and foreign-accented English, a corpus containing scripted and spontaneous speech recordings from 24 native speakers of American English and 52 non-native speakers of English. The core element of this corpus is a set of spontaneous speech recordings, for which a new method of…
Solid State Audio/Speech Processor Analysis.

DTIC Science & Technology

1980-03-01

techniques. The techniques were demonstrated to be worthwhile in an efficient realtime AWR system. Finally, microprocessor architectures were designed to...do not include custom chip development, detailed hardware design , construction or testing. ITTDCD is very encouraged by the results obtained in this...California, Berkley, was responsible for furnishing the simulation data of OD speech analysis techniques and for the design and development of the hardware OD
A Hierarchical Generative Framework of Language Processing: Linking Language Perception, Interpretation, and Production Abnormalities in Schizophrenia

PubMed Central

Brown, Meredith; Kuperberg, Gina R.

2015-01-01

Language and thought dysfunction are central to the schizophrenia syndrome. They are evident in the major symptoms of psychosis itself, particularly as disorganized language output (positive thought disorder) and auditory verbal hallucinations (AVHs), and they also manifest as abnormalities in both high-level semantic and contextual processing and low-level perception. However, the literatures characterizing these abnormalities have largely been separate and have sometimes provided mutually exclusive accounts of aberrant language in schizophrenia. In this review, we propose that recent generative probabilistic frameworks of language processing can provide crucial insights that link these four lines of research. We first outline neural and cognitive evidence that real-time language comprehension and production normally involve internal generative circuits that propagate probabilistic predictions to perceptual cortices — predictions that are incrementally updated based on prediction error signals as new inputs are encountered. We then explain how disruptions to these circuits may compromise communicative abilities in schizophrenia by reducing the efficiency and robustness of both high-level language processing and low-level speech perception. We also argue that such disruptions may contribute to the phenomenology of thought-disordered speech and false perceptual inferences in the language system (i.e., AVHs). This perspective suggests a number of productive avenues for future research that may elucidate not only the mechanisms of language abnormalities in schizophrenia, but also promising directions for cognitive rehabilitation. PMID:26640435
The NTID speech recognition test: NSRT(®).

PubMed

Bochner, Joseph H; Garrison, Wayne M; Doherty, Karen A

2015-07-01

The purpose of this study was to collect and analyse data necessary for expansion of the NSRT item pool and to evaluate the NSRT adaptive testing software. Participants were administered pure-tone and speech recognition tests including W-22 and QuickSIN, as well as a set of 323 new NSRT items and NSRT adaptive tests in quiet and background noise. Performance on the adaptive tests was compared to pure-tone thresholds and performance on other speech recognition measures. The 323 new items were subjected to Rasch scaling analysis. Seventy adults with mild to moderately severe hearing loss participated in this study. Their mean age was 62.4 years (sd = 20.8). The 323 new NSRT items fit very well with the original item bank, enabling the item pool to be more than doubled in size. Data indicate high reliability coefficients for the NSRT and moderate correlations with pure-tone thresholds (PTA and HFPTA) and other speech recognition measures (W-22, QuickSIN, and SRT). The adaptive NSRT is an efficient and effective measure of speech recognition, providing valid and reliable information concerning respondents' speech perception abilities.
A Near-Infrared Spectroscopy Study on Cortical Hemodynamic Responses to Normal and Whispered Speech in 3- to 7-Year-Old Children

ERIC Educational Resources Information Center

Remijn, Gerard B.; Kikuchi, Mitsuru; Yoshimura, Yuko; Shitamichi, Kiyomi; Ueno, Sanae; Tsubokawa, Tsunehisa; Kojima, Haruyuki; Higashida, Haruhiro; Minabe, Yoshio

2017-01-01

Purpose: The purpose of this study was to assess cortical hemodynamic response patterns in 3- to 7-year-old children listening to two speech modes: normally vocalized and whispered speech. Understanding whispered speech requires processing of the relatively weak, noisy signal, as well as the cognitive ability to understand the speaker's reason for…
Advocate: A Distributed Architecture for Speech-to-Speech Translation

DTIC Science & Technology

2009-01-01

tecture, are either wrapped natural-language processing ( NLP ) components or objects developed from scratch using the architecture’s API. GATE is...framework, we put together a demonstration Arabic -to- English speech translation system using both internally developed ( Arabic speech recognition and MT...conditions of our Arabic S2S demonstration system described earlier. Once again, the data size was varied and eighty identical requests were
Performance of Automated Speech Scoring on Different Low- to Medium-Entropy Item Types for Low-Proficiency English Learners. Research Report. ETS RR-17-12

ERIC Educational Resources Information Center

Loukina, Anastassia; Zechner, Klaus; Yoon, Su-Youn; Zhang, Mo; Tao, Jidong; Wang, Xinhao; Lee, Chong Min; Mulholland, Matthew

2017-01-01

This report presents an overview of the "SpeechRater"? automated scoring engine model building and evaluation process for several item types with a focus on a low-English-proficiency test-taker population. We discuss each stage of speech scoring, including automatic speech recognition, filtering models for nonscorable responses, and…
Evaluation of NASA speech encoder

NASA Technical Reports Server (NTRS)

1976-01-01

Techniques developed by NASA for spaceflight instrumentation were used in the design of a quantizer for speech-decoding. Computer simulation of the actions of the quantizer was tested with synthesized and real speech signals. Results were evaluated by a phometician. Topics discussed include the relationship between the number of quantizer levels and the required sampling rate; reconstruction of signals; digital filtering; speech recording, sampling, and storage, and processing results.
Atypical central auditory speech-sound discrimination in children who stutter as indexed by the mismatch negativity.

PubMed

Jansson-Verkasalo, Eira; Eggers, Kurt; Järvenpää, Anu; Suominen, Kalervo; Van den Bergh, Bea; De Nil, Luc; Kujala, Teija

2014-09-01

Recent theoretical conceptualizations suggest that disfluencies in stuttering may arise from several factors, one of them being atypical auditory processing. The main purpose of the present study was to investigate whether speech sound encoding and central auditory discrimination, are affected in children who stutter (CWS). Participants were 10 CWS, and 12 typically developing children with fluent speech (TDC). Event-related potentials (ERPs) for syllables and syllable changes [consonant, vowel, vowel-duration, frequency (F0), and intensity changes], critical in speech perception and language development of CWS were compared to those of TDC. There were no significant group differences in the amplitudes or latencies of the P1 or N2 responses elicited by the standard stimuli. However, the Mismatch Negativity (MMN) amplitude was significantly smaller in CWS than in TDC. For TDC all deviants of the linguistic multifeature paradigm elicited significant MMN amplitudes, comparable with the results found earlier with the same paradigm in 6-year-old children. In contrast, only the duration change elicited a significant MMN in CWS. The results showed that central auditory speech-sound processing was typical at the level of sound encoding in CWS. In contrast, central speech-sound discrimination, as indexed by the MMN for multiple sound features (both phonetic and prosodic), was atypical in the group of CWS. Findings were linked to existing conceptualizations on stuttering etiology. The reader will be able (a) to describe recent findings on central auditory speech-sound processing in individuals who stutter, (b) to describe the measurement of auditory reception and central auditory speech-sound discrimination, (c) to describe the findings of central auditory speech-sound discrimination, as indexed by the mismatch negativity (MMN), in children who stutter. Copyright © 2014 Elsevier Inc. All rights reserved.
Automatic analysis of slips of the tongue: Insights into the cognitive architecture of speech production.

PubMed

Goldrick, Matthew; Keshet, Joseph; Gustafson, Erin; Heller, Jordana; Needle, Jeremy

2016-04-01

Traces of the cognitive mechanisms underlying speaking can be found within subtle variations in how we pronounce sounds. While speech errors have traditionally been seen as categorical substitutions of one sound for another, acoustic/articulatory analyses show they partially reflect the intended sound. When "pig" is mispronounced as "big," the resulting /b/ sound differs from correct productions of "big," moving towards intended "pig"-revealing the role of graded sound representations in speech production. Investigating the origins of such phenomena requires detailed estimation of speech sound distributions; this has been hampered by reliance on subjective, labor-intensive manual annotation. Computational methods can address these issues by providing for objective, automatic measurements. We develop a novel high-precision computational approach, based on a set of machine learning algorithms, for measurement of elicited speech. The algorithms are trained on existing manually labeled data to detect and locate linguistically relevant acoustic properties with high accuracy. Our approach is robust, is designed to handle mis-productions, and overall matches the performance of expert coders. It allows us to analyze a very large dataset of speech errors (containing far more errors than the total in the existing literature), illuminating properties of speech sound distributions previously impossible to reliably observe. We argue that this provides novel evidence that two sources both contribute to deviations in speech errors: planning processes specifying the targets of articulation and articulatory processes specifying the motor movements that execute this plan. These findings illustrate how a much richer picture of speech provides an opportunity to gain novel insights into language processing. Copyright © 2016 Elsevier B.V. All rights reserved.
Relationship between channel interaction and spectral-ripple discrimination in cochlear implant usersa

PubMed Central

Jones, Gary L.; Ho Won, Jong; Drennan, Ward R.; Rubinstein, Jay T.

2013-01-01

Cochlear implant (CI) users can achieve remarkable speech understanding, but there is great variability in outcomes that is only partially accounted for by age, residual hearing, and duration of deafness. Results might be improved with the use of psychophysical tests to predict which sound processing strategies offer the best potential outcomes. In particular, the spectral-ripple discrimination test offers a time-efficient, nonlinguistic measure that is correlated with perception of both speech and music by CI users. Features that make this “one-point” test time-efficient, and thus potentially clinically useful, are also connected to controversy within the CI field about what the test measures. The current work examined the relationship between thresholds in the one-point spectral-ripple test, in which stimuli are presented acoustically, and interaction indices measured under the controlled conditions afforded by direct stimulation with a research processor. Results of these studies include the following: (1) within individual subjects there were large variations in the interaction index along the electrode array, (2) interaction indices generally decreased with increasing electrode separation, and (3) spectral-ripple discrimination improved with decreasing mean interaction index at electrode separations of one, three, and five electrodes. These results indicate that spectral-ripple discrimination thresholds can provide a useful metric of the spectral resolution of CI users. PMID:23297914
Relationship between channel interaction and spectral-ripple discrimination in cochlear implant users.

PubMed

Jones, Gary L; Won, Jong Ho; Drennan, Ward R; Rubinstein, Jay T

2013-01-01

Cochlear implant (CI) users can achieve remarkable speech understanding, but there is great variability in outcomes that is only partially accounted for by age, residual hearing, and duration of deafness. Results might be improved with the use of psychophysical tests to predict which sound processing strategies offer the best potential outcomes. In particular, the spectral-ripple discrimination test offers a time-efficient, nonlinguistic measure that is correlated with perception of both speech and music by CI users. Features that make this "one-point" test time-efficient, and thus potentially clinically useful, are also connected to controversy within the CI field about what the test measures. The current work examined the relationship between thresholds in the one-point spectral-ripple test, in which stimuli are presented acoustically, and interaction indices measured under the controlled conditions afforded by direct stimulation with a research processor. Results of these studies include the following: (1) within individual subjects there were large variations in the interaction index along the electrode array, (2) interaction indices generally decreased with increasing electrode separation, and (3) spectral-ripple discrimination improved with decreasing mean interaction index at electrode separations of one, three, and five electrodes. These results indicate that spectral-ripple discrimination thresholds can provide a useful metric of the spectral resolution of CI users.
Analysis and enhancement of country singing

NASA Astrophysics Data System (ADS)

Lee, Matthew; Smith, Mark J. T.

2003-04-01

The study of human singing has focused extensively on the analysis of voice characteristics. At the same time, a substantial body of work has been under study aimed at modeling and synthesizing the human voice. The work on which we report brings together some key analysis and synthesis principles to create a new model for digitally improving the perceived quality of an average singing voice. The model presented employs an analysis-by-synthesis overlap-add (ABS-OLA) sinusoidal model, which in the past has been used for the analysis and synthesis of speech, in combination with a spectral model of the vocal tract. The ABS-OLA sinusoidal model for speech has been shown to be a flexible, accurate, and computationally efficient representation capable of producing a natural-sounding singing voice [E. B. George and M. J. T. Smith, Trans. Speech Audio Processing 5, 389-406 (1997)]. A spectral model infused in the ABS-OLA uses Generalized Gaussian functions to provide a simple framework which enables the precise modification of spectral characteristics while maintaining the quality and naturalness of the original voice. Furthermore, it is shown that the parameters of the new ABS-OLA can accommodate pitch corrections and vocal quality enhancements while preserving naturalness and singer identity. Examples of enhanced country singing will be presented.
Natural user interface as a supplement of the holographic Raman tweezers

NASA Astrophysics Data System (ADS)

Tomori, Zoltan; Kanka, Jan; Kesa, Peter; Jakl, Petr; Sery, Mojmir; Bernatova, Silvie; Antalik, Marian; Zemánek, Pavel

2014-09-01

Holographic Raman tweezers (HRT) manipulates with microobjects by controlling the positions of multiple optical traps via the mouse or joystick. Several attempts have appeared recently to exploit touch tablets, 2D cameras or Kinect game console instead. We proposed a multimodal "Natural User Interface" (NUI) approach integrating hands tracking, gestures recognition, eye tracking and speech recognition. For this purpose we exploited "Leap Motion" and "MyGaze" low-cost sensors and a simple speech recognition program "Tazti". We developed own NUI software which processes signals from the sensors and sends the control commands to HRT which subsequently controls the positions of trapping beams, micropositioning stage and the acquisition system of Raman spectra. System allows various modes of operation proper for specific tasks. Virtual tools (called "pin" and "tweezers") serving for the manipulation with particles are displayed on the transparent "overlay" window above the live camera image. Eye tracker identifies the position of the observed particle and uses it for the autofocus. Laser trap manipulation navigated by the dominant hand can be combined with the gestures recognition of the secondary hand. Speech commands recognition is useful if both hands are busy. Proposed methods make manual control of HRT more efficient and they are also a good platform for its future semi-automated and fully automated work.
Speech processing using conditional observable maximum likelihood continuity mapping

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hogden, John; Nix, David

A computer implemented method enables the recognition of speech and speech characteristics. Parameters are initialized of first probability density functions that map between the symbols in the vocabulary of one or more sequences of speech codes that represent speech sounds and a continuity map. Parameters are also initialized of second probability density functions that map between the elements in the vocabulary of one or more desired sequences of speech transcription symbols and the continuity map. The parameters of the probability density functions are then trained to maximize the probabilities of the desired sequences of speech-transcription symbols. A new sequence ofmore » speech codes is then input to the continuity map having the trained first and second probability function parameters. A smooth path is identified on the continuity map that has the maximum probability for the new sequence of speech codes. The probability of each speech transcription symbol for each input speech code can then be output.« less

Some links on this page may take you to non-federal websites. Their policies may differ from this site.